[Swift-devel] more on # of coasters workers vs actual requested on ranger

Allan Espinosa aespinosa at cs.uchicago.edu
Tue Jul 21 11:49:21 CDT 2009


According to the gram logs, swift sends requests for blocks of 1, 2, 3
and 4 nodes but SGE receives requests for  four 1 node jobs.   This
maybe a GRAM2-SGE interaction problem.  Is there a way to get the
globus RSL files from swift so I can submit manually and verify this?

-Allan

coasters.log:
...
...
2009-07-21 10:46:13,788-0500 INFO  BlockQueueProcessor Required size:
28800 for 2 jobs
2009-07-21 10:46:13,788-0500 INFO  BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 1
2009-07-21 10:46:13,788-0500 INFO  BlockQueueProcessor h: 43200, w:
16, size: 28800, msz: 28800, w*h: 691200
2009-07-21 10:46:13,797-0500 INFO  BlockQueueProcessor Added: 0 - 1
2009-07-21 10:46:13,797-0500 INFO  Block Starting block: workers=16,
walltime=43200.000s
2009-07-21 10:46:13,859-0500 INFO  BlockTaskSubmitter Queuing block
Block 0721-461009-000000 (16x43200.000s) for submission
2009-07-21 10:46:13,859-0500 INFO  BlockQueueProcessor Added 2 jobs to
new blocks
2009-07-21 10:46:13,860-0500 INFO  BlockQueueProcessor Plan time: 287
2009-07-21 10:46:13,863-0500 INFO  BlockTaskSubmitter Submitting block
Block 0721-461009-000000 (16x43200.000s)
2009-07-21 10:46:13,887-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191171562) setting status to Submitting
2009-07-21 10:46:13,889-0500 INFO  Block Block task status changed: Submitting
2009-07-21 10:46:15,339-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191171562) setting status to Submitted
2009-07-21 10:46:15,339-0500 INFO  Block Block task status changed: Submitted
...
...
...
2009-07-21 10:46:31,545-0500 INFO  BlockQueueProcessor Required size:
1152000 for 80 jobs
2009-07-21 10:46:31,545-0500 INFO  BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 31
2009-07-21 10:46:31,545-0500 INFO  BlockQueueProcessor h: 43200, w:
48, size: 1152000, msz: 1152000, w*h: 2073600
2009-07-21 10:46:31,545-0500 INFO  BlockQueueProcessor Added: 0 - 79
2009-07-21 10:46:31,545-0500 INFO  Block Starting block: workers=48,
walltime=43200.000s
2009-07-21 10:46:31,546-0500 INFO  BlockTaskSubmitter Queuing block
Block 0721-461009-000001 (48x43200.000s) for submission
2009-07-21 10:46:31,546-0500 INFO  BlockQueueProcessor Added 80 jobs
to new blocks
2009-07-21 10:46:31,546-0500 INFO  BlockQueueProcessor Plan time: 3
2009-07-21 10:46:31,546-0500 INFO  BlockTaskSubmitter Submitting block
Block 0721-461009-000001 (48x43200.000s)
2009-07-21 10:46:31,546-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191171941) setting status to Submitting
2009-07-21 10:46:31,547-0500 INFO  Block Block task status changed: Submitting
...
...
2009-07-21 10:46:33,755-0500 INFO  BlockQueueProcessor Requeued 133
non-fitting jobs
2009-07-21 10:46:33,755-0500 INFO  BlockQueueProcessor Required size:
1915200 for 133 jobs
2009-07-21 10:46:33,755-0500 INFO  BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 4
2009-07-21 10:46:33,755-0500 INFO  BlockQueueProcessor h: 43200, w:
64, size: 1915200, msz: 1915200, w*h: 2764800
2009-07-21 10:46:33,756-0500 INFO  BlockQueueProcessor Added: 0 - 132
2009-07-21 10:46:33,756-0500 INFO  Block Starting block: workers=64,
walltime=43200.000s
2009-07-21 10:46:33,756-0500 INFO  BlockTaskSubmitter Queuing block
Block 0721-461009-000002 (64x43200.000s) for submission
2009-07-21 10:46:33,757-0500 INFO  BlockQueueProcessor Added 133 jobs
to new blocks
2009-07-21 10:46:33,757-0500 INFO  BlockQueueProcessor Plan time: 4
...
...
2009-07-21 10:46:35,980-0500 INFO  BlockQueueProcessor Required size:
705600 for 49 jobs
2009-07-21 10:46:35,980-0500 INFO  BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 16
2009-07-21 10:46:35,980-0500 INFO  BlockQueueProcessor h: 43200, w:
32, size: 705600, msz: 705600, w*h: 1382400
2009-07-21 10:46:35,980-0500 INFO  BlockQueueProcessor Added: 0 - 48
2009-07-21 10:46:35,980-0500 INFO  Block Starting block: workers=32,
walltime=43200.000s
2009-07-21 10:46:35,981-0500 INFO  BlockTaskSubmitter Queuing block
Block 0721-461009-000003 (32x43200.000s) for submission
2009-07-21 10:46:35,981-0500 INFO  BlockQueueProcessor Added 49 jobs
to new blocks
2009-07-21 10:46:35,981-0500 INFO  BlockQueueProcessor Plan time: 4
2009-07-21 10:46:35,981-0500 INFO  BlockTaskSubmitter Submitting block
Block 0721-461009-000003 (32x43200.000s)
2009-07-21 10:46:35,981-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248191172858) setting status to Submitting
2009-07-21 10:46:35,982-0500 INFO  Block Block task status changed: Submitting
...
...

gram log snippets:
log1: (16 cpus)
...
7/21 10:46:14 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
 )( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191171562" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000000" "16" )( count = "1" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
71562" )( maxwalltime = "720" )
7/21 10:46:14
...

log2: (48 cpus)
...
7/21 10:46:32 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
 )( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191171941" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000001" "16" )( count = "3" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
71941" )( maxwalltime = "720" )
7/21 10:46:32
...

log3: (64 cpus)
...
7/21 10:46:34 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
 )( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191172533" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000002" "16" )( count = "4" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
72533" )( maxwalltime = "720" )
7/21 10:46:34
...

log4: (32 cpus)
...
7/21 10:46:36 Pre-parsed RSL string: &( rsl_substitution =
(GLOBUSRUN_GASS_URL "https://129.114.50.163:52077") )( queue =
"normal" )( project = "TG-CCR080022N"
 )( stdout = $(GLOBUSRUN_GASS_URL) #
"/dev/stdout-urn:cog-1248191172858" )( arguments =
"/share/home/01035/tg802895/.globus/coasters/cscript26994.pl"
"http://1
29.114.50.163:52072" "0721-461009-000003" "16" )( count = "2" )(
executable = "/usr/bin/perl" )( stderr = $(GLOBUSRUN_GASS_URL) #
"/dev/stderr-urn:cog-12481911
72858" )( maxwalltime = "720" )
7/21 10:46:36
...


what was actually requested:
login4$ showq -u
ACTIVE JOBS--------------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
================================================================================

     0 active jobs :    0 of 3828 hosts (  0.00 %)

WAITING JOBS------------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================
873041    data       tg802895      Waiting 16     12:00:00  Tue Jul 21 10:46:17
873043    data       tg802895      Waiting 16     12:00:00  Tue Jul 21 10:46:33
873044    data       tg802895      Waiting 16     12:00:00  Tue Jul 21 10:46:36
873045    data       tg802895      Waiting 16     12:00:00  Tue Jul 21 10:46:38

WAITING JOBS WITH JOB DEPENDENCIES---
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

UNSCHEDULED JOBS---------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

Total jobs: 4     Active Jobs: 0     Waiting Jobs: 4     Dep/Unsched Jobs: 0
login4$


-- 
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list