[Swift-devel] misassignment of jobs

Allan Espinosa aespinosa at cs.uchicago.edu
Thu Nov 18 14:39:04 CST 2010


tc.data for worker15:
SPRACE_osg-ce.sprace.org.br  worker15 /osg/app/engage/scec/worker.pl
   INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="02:00:00"

But it was assigned to another site instead:
$ grep 0erqqq1k worker-*.log
2010-11-17 15:38:58,804-0600 DEBUG vdl:execute2 THREAD_ASSOCIATION
jobid=worker15-0erqqq1k thread
 host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu replicationGroup=2pnqqq1k
2010-11-17 15:38:59,110-0600 INFO  vdl:createdirset START
jobid=worker15-0erqqq1k host=LIGO_UWM_N
ce.phys.uwm.edu - Initializing directory structure
2010-11-17 15:38:59,137-0600 INFO  vdl:createdirset END
jobid=worker15-0erqqq1k - Done initializi
structure
2010-11-17 15:38:59,172-0600 INFO  vdl:dostagein START
jobid=worker15-0erqqq1k - Staging in files
2010-11-17 15:38:59,257-0600 INFO  vdl:dostagein END
jobid=worker15-0erqqq1k - Staging in finishe
2010-11-17 15:38:59,323-0600 DEBUG vdl:execute2 JOB_START
jobid=worker15-0erqqq1k tr=worker15 arg
//128.135.125.17:61015, SPRACE_osg-ce.sprace.org.br, /tmp, 7200]
tmpdir=worker-20101117-1538-fe9a
orker15-0erqqq1k host=LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu
2010-11-17 15:39:01,394-0600 INFO  Execute Submit: in:
worker-20101117-1538-fe9aq209 command: /bi
/_swiftwrap worker15-0erqqq1k -jobdir 0 -scratch  -e worker15 -out
stdout.txt -err stderr.txt -i
 -k  -cdmfile  -status files -a http://128.135.125.17:61015
SPRACE_osg-ce.sprace.org.br /tmp 7200
2010-11-17 15:39:01,394-0600 INFO  GridExec TASK_DEFINITION:
Task(type=JOB_SUBMISSION, identity=u
-1-1290029938030) is /bin/bash shared/_swiftwrap worker15-0erqqq1k
-jobdir 0 -scratch  -e worker1
.txt -err stderr.txt -i -d  -if  -of  -k  -cdmfile  -status files -a
http://128.135.125.17:61015
.sprace.org.br /tmp 7200
2010-11-17 16:49:33,106-0600 DEBUG vdl:checkjobstatus START
jobid=worker15-0erqqq1k
2010-11-17 16:49:33,278-0600 INFO  vdl:checkjobstatus FAILURE
jobid=worker15-0erqqq1k - Failure f
2010-11-17 16:49:38,180-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION
jobid=worker15-0erqqq1k - A
ception: Cannot find executable worker15 on site system path

There is no entry for worker15 for the site LIGO_UWM_NEMO in my tc.data

-Allan

-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list