[Swift-devel] Swift-issues (PBS+NFS Cluster)
Michael Wilde
wilde at mcs.anl.gov
Wed May 6 19:51:54 CDT 2009
Yi, I assume you are testing from communicado to the tp-x001 virtual
machine?
Does that vm have GT2 GRAM running?
If so, I would first verify that you can do basic globus-job-run and
globus-url-copy to the vm, and then use a GT2 setting in sites.xml to
talk to the vm from Swift.
- Mike
ps. I have not been following the list for the past week, so I need to
review it and see what your plan was, and what Ben and others
recommended. I'll try to catch up on that soon, and then we should meet
at the CI and discuss.
On 5/6/09 7:24 PM, yizhu wrote:
> Hi all
>
> I tried running swift-0.8 over Nimbus Cloud (PBS+NFS), and configured
> sites.xml and tc.data accordingly.[1]
>
> When i tried to run "$swift first.swift", it stuck on "Submitting:1"
> phase[2], (keep repeated showing "Progress:Submitting:1" and never
> return).
>
> Then I ssh to pbs_server to check the server log[3] and found that the
> job has been enqueued, ran, and successfully dequeued. I also check the
> queue status[4] when this job is running and found that the output_path
> is "/dev/null" somewhat i don't expected. ( The working directory of
> swift is "/home/jobrun".
>
> I think the problem might be the pbs_server failed to return the output
> to the correct path (btw. what the output_path suppose to be, the same
> work_directory of swift?), or anyone has a better idea?
>
>
> Many Thanks.
>
>
> -Yi
>
>
>
> --------------------------------------------------
> [1]----sites.xml
>
> <pool handle="nb_basecluster">
> <gridftp url="gsiftp://tp-x001.ci.uchicago.edu" />
> <execution
> url="https://tp-x001.ci.uchicago.edu:8443/wsrf/services/ManagedJobFactoryService"
> jobManager="PBS" provider="gt4" />
> <workdirectory >/home/jobrun</workdirectory>
> </pool>
>
> ----tc.data
>
> nb_basecluster echo /bin/echo INSTALLED
> INTEL32::LINUX null
> nb_basecluster cat /bin/cat INSTALLED
> INTEL32::LINUX null
> nb_basecluster ls /bin/ls INSTALLED
> INTEL32::LINUX null
> nb_basecluster grep /bin/grep INSTALLED
> INTEL32::LINUX null
> nb_basecluster sort /bin/sort INSTALLED
> INTEL32::LINUX null
> nb_basecluster paste /bin/paste INSTALLED
> INTEL32::LINUX null
>
> ---------------------------------------------------------
> [2]
> yizhu at ubuntu:~/swift-0.8/examples/swift$ swift -d first.swift
> Recompilation suppressed.
> Using sites file: /home/yizhu/swift-0.8/bin/../etc/sites.xml
> Using tc.data: /home/yizhu/swift-0.8/bin/../etc/tc.data
> Setting resources to: {nb_basecluster=nb_basecluster}
> Swift 0.8 swift-r2448 cog-r2261
>
> Swift 0.8 swift-r2448 cog-r2261
>
> RUNID id=tag:benc at ci.uchicago.edu,2007:swift:run:20090506-1912-zqd8t5hg
> RunID: 20090506-1912-zqd8t5hg
> closed org.griphyn.vdl.mapping.RootDataNode identifier
> tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
> type string value=Hello, world! dataset=unnamed SwiftScript value (closed)
> ROOTPATH
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
> path=$
> VALUE
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
> VALUE=Hello, world!
> closed org.griphyn.vdl.mapping.RootDataNode identifier
> tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
> type string value=Hello, world! dataset=unnamed SwiftScript value (closed)
> ROOTPATH
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
> path=$
> VALUE
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
> VALUE=Hello, world!
> NEW
> id=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000001
>
> Found mapped data org.griphyn.vdl.mapping.RootDataNode identifier
> tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000002
> type messagefile with no value at dataset=outfile (not closed).$
> NEW
> id=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000002
>
> Progress:
> PROCEDURE thread=0 name=greeting
> PARAM thread=0 direction=output variable=t
> provenanceid=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000002
>
> closed org.griphyn.vdl.mapping.RootDataNode identifier
> tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000003
> type string value=hello.txt dataset=unnamed SwiftScript value (closed)
> ROOTPATH
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000003
> path=$
> VALUE
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000003
> VALUE=hello.txt
> closed org.griphyn.vdl.mapping.RootDataNode identifier
> tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000003
> type string value=hello.txt dataset=unnamed SwiftScript value (closed)
> ROOTPATH
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000003
> path=$
> VALUE
> dataset=tag:benc at ci.uchicago.edu,2008:swift:dataset:20090506-1912-lxp69uu2:720000000003
> VALUE=hello.txt
> START thread=0 tr=echo
> Sorted: [nb_basecluster:0.000(1.000):0/1 overload: 0]
> Rand: 0.6583597597672994, sum: 1.0
> Next contact: nb_basecluster:0.000(1.000):0/1 overload: 0
> Progress: Initializing site shared directory:1
> START host=nb_basecluster - Initializing shared directory
> multiplyScore(nb_basecluster:0.000(1.000):1/1 overload: 0, -0.01)
> Old score: 0.000, new score: -0.010
> No global submit throttle set. Using default (100)
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174935) setting status
> to Submitting
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174935) setting status
> to Submitted
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174935) setting status
> to Active
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174935) setting status
> to Completed
> multiplyScore(nb_basecluster:-0.010(0.994):1/1 overload: 0, 0.01)
> Old score: -0.010, new score: 0.000
> multiplyScore(nb_basecluster:0.000(1.000):1/1 overload: 0, 0.1)
> Old score: 0.000, new score: 0.100
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174935) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 63M, Max heap: 720M
> multiplyScore(nb_basecluster:0.100(1.060):1/1 overload: 0, -0.2)
> Old score: 0.100, new score: -0.100
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174938) setting status
> to Submitting
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174938) setting status
> to Submitted
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174938) setting status
> to Active
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174938) setting status
> to Completed
> multiplyScore(nb_basecluster:-0.100(0.943):1/1 overload: 0, 0.2)
> Old score: -0.100, new score: 0.100
> multiplyScore(nb_basecluster:0.100(1.060):1/1 overload: 0, 0.1)
> Old score: 0.100, new score: 0.200
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174938) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 60M, Max heap: 720M
> multiplyScore(nb_basecluster:0.200(1.124):1/1 overload: 0, -0.2)
> Old score: 0.200, new score: 0.000
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174940) setting status
> to Submitting
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174940) setting status
> to Submitted
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174940) setting status
> to Active
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174940) setting status
> to Completed
> multiplyScore(nb_basecluster:0.000(1.000):1/1 overload: 0, 0.2)
> Old score: 0.000, new score: 0.200
> multiplyScore(nb_basecluster:0.200(1.124):1/1 overload: 0, 0.1)
> Old score: 0.200, new score: 0.300
> Task(type=FILE_TRANSFER, identity=urn:0-1-1241655174940) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 59M, Max heap: 720M
> multiplyScore(nb_basecluster:0.300(1.192):1/1 overload: 0, -0.01)
> Old score: 0.300, new score: 0.290
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174942) setting status
> to Submitting
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174942) setting status
> to Submitted
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174942) setting status
> to Active
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174942) setting status
> to Completed
> multiplyScore(nb_basecluster:0.290(1.185):1/1 overload: 0, 0.01)
> Old score: 0.290, new score: 0.300
> multiplyScore(nb_basecluster:0.300(1.192):1/1 overload: 0, 0.1)
> Old score: 0.300, new score: 0.400
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174942) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 59M, Max heap: 720M
> multiplyScore(nb_basecluster:0.400(1.264):1/1 overload: 0, -0.01)
> Old score: 0.400, new score: 0.390
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174944) setting status
> to Submitting
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174944) setting status
> to Submitted
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174944) setting status
> to Active
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174944) setting status
> to Completed
> multiplyScore(nb_basecluster:0.390(1.256):1/1 overload: 0, 0.01)
> Old score: 0.390, new score: 0.400
> multiplyScore(nb_basecluster:0.400(1.264):1/1 overload: 0, 0.1)
> Old score: 0.400, new score: 0.500
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174944) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 59M, Max heap: 720M
> multiplyScore(nb_basecluster:0.500(1.339):1/1 overload: 0, -0.01)
> Old score: 0.500, new score: 0.490
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174946) setting status
> to Submitting
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174946) setting status
> to Submitted
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174946) setting status
> to Active
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174946) setting status
> to Completed
> multiplyScore(nb_basecluster:0.490(1.332):1/1 overload: 0, 0.01)
> Old score: 0.490, new score: 0.500
> multiplyScore(nb_basecluster:0.500(1.339):1/1 overload: 0, 0.1)
> Old score: 0.500, new score: 0.600
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174946) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 59M, Max heap: 720M
> END host=nb_basecluster - Done initializing shared directory
> THREAD_ASSOCIATION jobid=echo-kfecpfaj thread=0-1 host=nb_basecluster
> replicationGroup=jfecpfaj
> Progress: Stage in:1
> START jobid=echo-kfecpfaj host=nb_basecluster - Initializing directory
> structure
> START path= dir=first-20090506-1912-zqd8t5hg/shared - Creating directory
> structure
> multiplyScore(nb_basecluster:0.600(1.419):1/1 overload: 0, -0.01)
> Old score: 0.600, new score: 0.590
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174948) setting status
> to Submitting
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174948) setting status
> to Submitted
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174948) setting status
> to Active
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174948) setting status
> to Completed
> multiplyScore(nb_basecluster:0.590(1.411):1/1 overload: 0, 0.01)
> Old score: 0.590, new score: 0.600
> multiplyScore(nb_basecluster:0.600(1.419):1/1 overload: 0, 0.1)
> Old score: 0.600, new score: 0.700
> Task(type=FILE_OPERATION, identity=urn:0-1-1241655174948) Completed.
> Waiting: 0, Running: 0. Heap size: 75M, Heap free: 59M, Max heap: 720M
> END jobid=echo-kfecpfaj - Done initializing directory structure
> START jobid=echo-kfecpfaj - Staging in files
> END jobid=echo-kfecpfaj - Staging in finished
> JOB_START jobid=echo-kfecpfaj tr=echo arguments=[Hello, world!]
> tmpdir=first-20090506-1912-zqd8t5hg/jobs/k/echo-kfecpfaj
> host=nb_basecluster
> jobid=echo-kfecpfaj task=Task(type=JOB_SUBMISSION,
> identity=urn:0-1-1241655174950)
> multiplyScore(nb_basecluster:0.700(1.503):1/1 overload: 0, -0.2)
> Old score: 0.700, new score: 0.500
> Task(type=JOB_SUBMISSION, identity=urn:0-1-1241655174950) setting status
> to Submitting
> Submitting task: Task(type=JOB_SUBMISSION, identity=urn:0-1-1241655174950)
> <startTime name="submission">1241655180260</startTime>
> <startTime name="createManagedJob">1241655180623</startTime>
> <endTime name="createManagedJob">1241655181975</endTime
> Task submitted: Task(type=JOB_SUBMISSION, identity=urn:0-1-1241655174950)
> Progress: Submitting:1
>
> Progress: Submitting:1
>
> Progress: Submitting:1
> Progress: Submitting:1
> Progress: Submitting:1
> Progress: Submitting:1
> ^C
> yizhu at ubuntu:~/swift-0.8/examples/swift$
>
> --------------------------------------------------------------
> [3] see attachment
> -------------------------------------------------------------
> [4]
> tp-x001 torque # qstat -f
> Job Id: 3.tp-x001.ci.uchicago.edu
> Job_Name = STDIN
> Job_Owner = jobrun at tp-x001.ci.uchicago.edu
> job_state = R
> queue = batch
> server = tp-x001.ci.uchicago.edu
> Checkpoint = u
> ctime = Wed May 6 19:22:10 2009
> Error_Path = tp-x001.ci.uchicago.edu:/dev/null
> exec_host = tp-x002/0
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = n
> mtime = Wed May 6 19:22:10 2009
> Output_Path = tp-x001.ci.uchicago.edu:/dev/null
> Priority = 0
> qtime = Wed May 6 19:22:10 2009
> Rerunable = True
> Resource_List.neednodes = 1
> Resource_List.nodect = 1
> Resource_List.nodes = 1
> Shell_Path_List = /bin/sh
> substate = 40
> Variable_List = PBS_O_HOME=/home/jobrun,PBS_O_LOGNAME=jobrun,
> PBS_O_PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sb
> in:/opt/bin,PBS_O_SHELL=/bin/bash,PBS_SERVER=tp-x001.ci.uchicago.edu,
> PBS_O_HOST=tp-x001.ci.uchicago.edu,
> PBS_O_WORKDIR=/home/jobrun/first-20090506-1922-xaandi54,
> PBS_O_QUEUE=batch
> euser = jobrun
> egroup = users
> hashname = 3.tp-x001.c
> queue_rank = 2
> queue_type = E
> comment = Job started on Wed May 06 at 19:22
> etime = Wed May 6 19:22:10 2009
>
> tp-x001 torque #
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list