[Swift-devel] misassignment of jobs
Mihael Hategan
hategan at mcs.anl.gov
Thu Nov 18 16:03:09 CST 2010
I'm sure there is a reasonable explanation for this.
Can you post your entire tc.data? And to make sure we're talking about
the right one, can you look at the swift log and use exactly the one
that swift claims is using?
Mihael
On Thu, 2010-11-18 at 14:39 -0600, Allan Espinosa wrote:
> tc.data for worker15:
> SPRACE_osg-ce.sprace.org.br worker15 /osg/app/engage/scec/worker.pl
> INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="02:00:00"
>
> But it was assigned to another site instead:
> $ grep 0erqqq1k worker-*.log
> 2010-11-17 15:38:58,804-0600 DEBUG vdl:execute2 THREAD_ASSOCIATION
> jobid=worker15-0erqqq1k thread
> host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu replicationGroup=2pnqqq1k
> 2010-11-17 15:38:59,110-0600 INFO vdl:createdirset START
> jobid=worker15-0erqqq1k host=LIGO_UWM_N
> ce.phys.uwm.edu - Initializing directory structure
> 2010-11-17 15:38:59,137-0600 INFO vdl:createdirset END
> jobid=worker15-0erqqq1k - Done initializi
> structure
> 2010-11-17 15:38:59,172-0600 INFO vdl:dostagein START
> jobid=worker15-0erqqq1k - Staging in files
> 2010-11-17 15:38:59,257-0600 INFO vdl:dostagein END
> jobid=worker15-0erqqq1k - Staging in finishe
> 2010-11-17 15:38:59,323-0600 DEBUG vdl:execute2 JOB_START
> jobid=worker15-0erqqq1k tr=worker15 arg
> //128.135.125.17:61015, SPRACE_osg-ce.sprace.org.br, /tmp, 7200]
> tmpdir=worker-20101117-1538-fe9a
> orker15-0erqqq1k host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
> 2010-11-17 15:39:01,394-0600 INFO Execute Submit: in:
> worker-20101117-1538-fe9aq209 command: /bi
> /_swiftwrap worker15-0erqqq1k -jobdir 0 -scratch -e worker15 -out
> stdout.txt -err stderr.txt -i
> -k -cdmfile -status files -a http://128.135.125.17:61015
> SPRACE_osg-ce.sprace.org.br /tmp 7200
> 2010-11-17 15:39:01,394-0600 INFO GridExec TASK_DEFINITION:
> Task(type=JOB_SUBMISSION, identity=u
> -1-1290029938030) is /bin/bash shared/_swiftwrap worker15-0erqqq1k
> -jobdir 0 -scratch -e worker1
> .txt -err stderr.txt -i -d -if -of -k -cdmfile -status files -a
> http://128.135.125.17:61015
> .sprace.org.br /tmp 7200
> 2010-11-17 16:49:33,106-0600 DEBUG vdl:checkjobstatus START
> jobid=worker15-0erqqq1k
> 2010-11-17 16:49:33,278-0600 INFO vdl:checkjobstatus FAILURE
> jobid=worker15-0erqqq1k - Failure f
> 2010-11-17 16:49:38,180-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION
> jobid=worker15-0erqqq1k - A
> ception: Cannot find executable worker15 on site system path
>
> There is no entry for worker15 for the site LIGO_UWM_NEMO in my tc.data
>
> -Allan
>
More information about the Swift-devel
mailing list