[Swift-devel] misassignment of jobs

Mihael Hategan hategan at mcs.anl.gov
Thu Nov 18 16:03:09 CST 2010


I'm sure there is a reasonable explanation for this.

Can you post your entire tc.data? And to make sure we're talking about
the right one, can you look at the swift log and use exactly the one
that swift claims is using?

Mihael

On Thu, 2010-11-18 at 14:39 -0600, Allan Espinosa wrote:
> tc.data for worker15:
> SPRACE_osg-ce.sprace.org.br  worker15 /osg/app/engage/scec/worker.pl
>    INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="02:00:00"
> 
> But it was assigned to another site instead:
> $ grep 0erqqq1k worker-*.log
> 2010-11-17 15:38:58,804-0600 DEBUG vdl:execute2 THREAD_ASSOCIATION
> jobid=worker15-0erqqq1k thread
>  host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu replicationGroup=2pnqqq1k
> 2010-11-17 15:38:59,110-0600 INFO  vdl:createdirset START
> jobid=worker15-0erqqq1k host=LIGO_UWM_N
> ce.phys.uwm.edu - Initializing directory structure
> 2010-11-17 15:38:59,137-0600 INFO  vdl:createdirset END
> jobid=worker15-0erqqq1k - Done initializi
> structure
> 2010-11-17 15:38:59,172-0600 INFO  vdl:dostagein START
> jobid=worker15-0erqqq1k - Staging in files
> 2010-11-17 15:38:59,257-0600 INFO  vdl:dostagein END
> jobid=worker15-0erqqq1k - Staging in finishe
> 2010-11-17 15:38:59,323-0600 DEBUG vdl:execute2 JOB_START
> jobid=worker15-0erqqq1k tr=worker15 arg
> //128.135.125.17:61015, SPRACE_osg-ce.sprace.org.br, /tmp, 7200]
> tmpdir=worker-20101117-1538-fe9a
> orker15-0erqqq1k host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> 2010-11-17 15:39:01,394-0600 INFO  Execute Submit: in:
> worker-20101117-1538-fe9aq209 command: /bi
> /_swiftwrap worker15-0erqqq1k -jobdir 0 -scratch  -e worker15 -out
> stdout.txt -err stderr.txt -i
>  -k  -cdmfile  -status files -a http://128.135.125.17:61015
> SPRACE_osg-ce.sprace.org.br /tmp 7200
> 2010-11-17 15:39:01,394-0600 INFO  GridExec TASK_DEFINITION:
> Task(type=JOB_SUBMISSION, identity=u
> -1-1290029938030) is /bin/bash shared/_swiftwrap worker15-0erqqq1k
> -jobdir 0 -scratch  -e worker1
> .txt -err stderr.txt -i -d  -if  -of  -k  -cdmfile  -status files -a
> http://128.135.125.17:61015
> .sprace.org.br /tmp 7200
> 2010-11-17 16:49:33,106-0600 DEBUG vdl:checkjobstatus START
> jobid=worker15-0erqqq1k
> 2010-11-17 16:49:33,278-0600 INFO  vdl:checkjobstatus FAILURE
> jobid=worker15-0erqqq1k - Failure f
> 2010-11-17 16:49:38,180-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION
> jobid=worker15-0erqqq1k - A
> ception: Cannot find executable worker15 on site system path
> 
> There is no entry for worker15 for the site LIGO_UWM_NEMO in my tc.data
> 
> -Allan
> 





More information about the Swift-devel mailing list