[Swift-devel] more on # of coasters workers vs actual requested on ranger
Allan Espinosa
aespinosa at cs.uchicago.edu
Tue Jul 21 11:49:21 CDT 2009
According to the gram logs, swift sends requests for blocks of 1, 2, 3
and 4 nodes but SGE receives requests for four 1 node jobs. This
maybe a GRAM2-SGE interaction problem. Is there a way to get the
globus RSL files from swift so I can submit manually and verify this?
-Allan
coasters.log:
...
...
2009-07-21 10:46:13,788-0500 INFO BlockQueueProcessor Required size:
28800 for 2 jobs
2009-07-21 10:46:13,788-0500 INFO BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 1
2009-07-21 10:46:13,788-0500 INFO BlockQueueProcessor h: 43200, w:
16, size: 28800, msz: 28800, w*h: 691200
2009-07-21 10:46:13,797-0500 INFO BlockQueueProcessor Added: 0 - 1
2009-07-21 10:46:13,797-0500 INFO Block Starting block: workers=16,
walltime=43200.000s
2009-07-21 10:46:13,859-0500 INFO BlockTaskSubmitter Queuing block
Block 0721-461009-000000 (16x43200.000s) for submission
2009-07-21 10:46:13,859-0500 INFO BlockQueueProcessor Added 2 jobs to
new blocks
2009-07-21 10:46:13,860-0500 INFO BlockQueueProcessor Plan time: 287
2009-07-21 10:46:13,863-0500 INFO BlockTaskSubmitter Submitting block
Block 0721-461009-000000 (16x43200.000s)
2009-07-21 10:46:13,887-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191171562) setting status to Submitting
2009-07-21 10:46:13,889-0500 INFO Block Block task status changed: Submitting
2009-07-21 10:46:15,339-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191171562) setting status to Submitted
2009-07-21 10:46:15,339-0500 INFO Block Block task status changed: Submitted
...
...
...
2009-07-21 10:46:31,545-0500 INFO BlockQueueProcessor Required size:
1152000 for 80 jobs
2009-07-21 10:46:31,545-0500 INFO BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 31
2009-07-21 10:46:31,545-0500 INFO BlockQueueProcessor h: 43200, w:
48, size: 1152000, msz: 1152000, w*h: 2073600
2009-07-21 10:46:31,545-0500 INFO BlockQueueProcessor Added: 0 - 79
2009-07-21 10:46:31,545-0500 INFO Block Starting block: workers=48,
walltime=43200.000s
2009-07-21 10:46:31,546-0500 INFO BlockTaskSubmitter Queuing block
Block 0721-461009-000001 (48x43200.000s) for submission
2009-07-21 10:46:31,546-0500 INFO BlockQueueProcessor Added 80 jobs
to new blocks
2009-07-21 10:46:31,546-0500 INFO BlockQueueProcessor Plan time: 3
2009-07-21 10:46:31,546-0500 INFO BlockTaskSubmitter Submitting block
Block 0721-461009-000001 (48x43200.000s)
2009-07-21 10:46:31,546-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191171941) setting status to Submitting
2009-07-21 10:46:31,547-0500 INFO Block Block task status changed: Submitting
...
...
2009-07-21 10:46:33,755-0500 INFO BlockQueueProcessor Requeued 133
non-fitting jobs
2009-07-21 10:46:33,755-0500 INFO BlockQueueProcessor Required size:
1915200 for 133 jobs
2009-07-21 10:46:33,755-0500 INFO BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 4
2009-07-21 10:46:33,755-0500 INFO BlockQueueProcessor h: 43200, w:
64, size: 1915200, msz: 1915200, w*h: 2764800
2009-07-21 10:46:33,756-0500 INFO BlockQueueProcessor Added: 0 - 132
2009-07-21 10:46:33,756-0500 INFO Block Starting block: workers=64,
walltime=43200.000s
2009-07-21 10:46:33,756-0500 INFO BlockTaskSubmitter Queuing block
Block 0721-461009-000002 (64x43200.000s) for submission
2009-07-21 10:46:33,757-0500 INFO BlockQueueProcessor Added 133 jobs
to new blocks
2009-07-21 10:46:33,757-0500 INFO BlockQueueProcessor Plan time: 4
...
...
2009-07-21 10:46:35,980-0500 INFO BlockQueueProcessor Required size:
705600 for 49 jobs
2009-07-21 10:46:35,980-0500 INFO BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 16
2009-07-21 10:46:35,980-0500 INFO BlockQueueProcessor h: 43200, w:
32, size: 705600, msz: 705600, w*h: 1382400
2009-07-21 10:46:35,980-0500 INFO BlockQueueProcessor Added: 0 - 48
2009-07-21 10:46:35,980-0500 INFO Block Starting block: workers=32,
walltime=43200.000s
2009-07-21 10:46:35,981-0500 INFO BlockTaskSubmitter Queuing block
Block 0721-461009-000003 (32x43200.000s) for submission
2009-07-21 10:46:35,981-0500 INFO BlockQueueProcessor Added 49 jobs
to new blocks
2009-07-21 10:46:35,981-0500 INFO BlockQueueProcessor Plan time: 4
2009-07-21 10:46:35,981-0500 INFO BlockTaskSubmitter Submitting block
Block 0721-461009-000003 (32x43200.000s)
2009-07-21 10:46:35,981-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191172858) setting status to Submitting
2009-07-21 10:46:35,982-0500 INFO Block Block task status changed: Submitting
...
...
gram log snippets:
log1: (16 cpus)
...
7/21 10:46:14 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
)( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191171562" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000000" "16" )( count = "1" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
71562" )( maxwalltime = "720" )
7/21 10:46:14
...
log2: (48 cpus)
...
7/21 10:46:32 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
)( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191171941" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000001" "16" )( count = "3" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
71941" )( maxwalltime = "720" )
7/21 10:46:32
...
log3: (64 cpus)
...
7/21 10:46:34 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
)( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191172533" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000002" "16" )( count = "4" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
72533" )( maxwalltime = "720" )
7/21 10:46:34
...
log4: (32 cpus)
...
7/21 10:46:36 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
)( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191172858" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000003" "16" )( count = "2" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
72858" )( maxwalltime = "720" )
7/21 10:46:36
...
what was actually requested:
login4$ showq -u
ACTIVE JOBS--------------------------
JOBID JOBNAME USERNAME STATE CORE REMAINING STARTTIME
================================================================================
0 active jobs : 0 of 3828 hosts ( 0.00 %)
WAITING JOBS------------------------
JOBID JOBNAME USERNAME STATE CORE WCLIMIT QUEUETIME
================================================================================
873041 data tg802895 Waiting 16 12:00:00 Tue Jul 21 10:46:17
873043 data tg802895 Waiting 16 12:00:00 Tue Jul 21 10:46:33
873044 data tg802895 Waiting 16 12:00:00 Tue Jul 21 10:46:36
873045 data tg802895 Waiting 16 12:00:00 Tue Jul 21 10:46:38
WAITING JOBS WITH JOB DEPENDENCIES---
JOBID JOBNAME USERNAME STATE CORE WCLIMIT QUEUETIME
================================================================================
UNSCHEDULED JOBS---------------------
JOBID JOBNAME USERNAME STATE CORE WCLIMIT QUEUETIME
================================================================================
Total jobs: 4 Active Jobs: 0 Waiting Jobs: 4 Dep/Unsched Jobs: 0
login4$
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list