[Swift-devel] Re: swift-falkon problem

Ioan Raicu iraicu at cs.uchicago.edu
Mon Mar 17 08:03:52 CDT 2008



Michael Wilde wrote:
> Ioan,
>
> Im stuck at:
>
> RunID: 20080316-1643-g4n8t252
> Progress:
> runam3 started
> error: Notification(int timeout): socket = new ServerSocket(recvPort); 
> Address already in use
> error: Notification(int timeout): socket = new ServerSocket(recvPort); 
> Address already in use
this is just a warning, its not causing any trouble.
> Waiting for notification for 0 ms
> Received notification with 1 messages
this means that the Falkon service sent back a notification, which means 
that all went well, it had received a task, attempted to execute it, and 
returned back a result... but apparently a failed result.
> Failed to transfer wrapper log from 
> amps1-20080316-1643-g4n8t252/info/0/sico
I don't understant this error, how is this error text being generated?  
Falkon only returns back a numeric exit code.  Could this error be a 
post processing error when Swift couldn't manipulate the local file 
system, or it couldn't find some expecting files?  What exit code does 
Falkon return for this task, 0, or something else?
> runam3 failed
> Execution failed:
>         Exception in runam3:
> Arguments: [0000, 0.1899, 0.1858]
> Host: sico
>
> Does this look familiar?
>
> -- 
>
> What Im confused about is:
>
> - the deef-provider code that I get with a swift checkout seems to 
> have out of date falkon stubs (I get a runtime error on a missing xml 
> element)
I just updated them in SVN.
>
> - if I grab a FalkonStubs jar from Zhao's bgp swift tree and use it in 
> a newly compiled swift tree, should that work? It seems to get further.
>
> It *seems* like swift is reaching Falkon - I can see something in a 
> falkon logfile that looks like swift-generated job ids) but then I'm 
> getting the errors above.
We need to figure out if the failure is in executing the tasks in 
Falkon, or if that is OK, and the error is in Swift not finding some 
files afterwards.
>
> The log file doesnt contain any details, just whats below.
>
> I'll double-check all my steps and package up the full log file, but 
> wanted to get this out to you before I spend too much more time 
> debugging, hoping someone recognizes the problem.
>
> I note that I havent yet found the strings above, like "Waiting for 
> notification" in the swift source tree.
That is from the FalkonStubs.jar 
(falkon/service/org/globus/GenericPortal/common/Notification.java), so 
you won't find that.  I should probably disable all the logging from 
FalkonStubs.jar code by default.

Once you enable the Falkon provider debug logging, there are more per 
task logs that get printed... for example, file 
cog/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java 
would print
"Falkon: waiting for notifications...", and then print the contents of 
the notification when it received them...

Ioan

>
> Thanks,
>
> Mike
>
>
>
>
> 2008-03-16 16:43:42,807-0600 INFO  vdl:createdirset END 
> jobid=runam3-0cu5avpi - Done initializing directory structure
> 2008-03-16 16:43:42,809-0600 INFO  vdl:dostagein START 
> jobid=runam3-0cu5avpi - Staging in files
> 2008-03-16 16:43:42,810-0600 INFO  vdl:dostagein END 
> jobid=runam3-0cu5avpi - Staging in finished
> 2008-03-16 16:43:42,812-0600 DEBUG vdl:execute2 JOB_START 
> jobid=runam3-0cu5avpi tr=runam3 arguments=[0000, 0.1899, 0.1858] 
> tmpdir=amps1-20080316-1643-g4n8t252/jobs/0/runam3-0cu5avpi host=sico
> 2008-03-16 16:43:42,829-0600 DEBUG WeightedHostScoreScheduler 
> multiplyScore(sico:0.000(1.000):1/1000002, -0.2)
> 2008-03-16 16:43:42,829-0600 DEBUG WeightedHostScoreScheduler Old 
> score: 0.000, new score: -0.200
> 2008-03-16 16:43:43,693-0600 DEBUG TaskImpl Task(type=JOB_SUBMISSION, 
> identity=urn:0-1-1-1205707420808) setting status to Submitting
> 2008-03-16 16:43:43,693-0600 DEBUG TaskImpl Task(type=JOB_SUBMISSION, 
> identity=urn:0-1-1-1205707420808) setting status to Submitted
> 2008-03-16 16:43:43,693-0600 DEBUG WeightedHostScoreScheduler 
> Submission time for Task(type=JOB_SUBMISSION, 
> identity=urn:0-1-1-1205707420808): 0ms. Score delta: 0.002564102564102564
> 2008-03-16 16:43:43,694-0600 DEBUG WeightedHostScoreScheduler 
> multiplyScore(sico:-0.200(0.889):1/889402, 0.002564102564102564)
> 2008-03-16 16:43:43,694-0600 DEBUG WeightedHostScoreScheduler Old 
> score: -0.200, new score: -0.197
> 2008-03-16 16:43:43,694-0600 INFO  JobSubmissionTaskHandler Job submitted
> 2008-03-16 16:43:44,213-0600 DEBUG TaskImpl Task(type=JOB_SUBMISSION, 
> identity=urn:0-1-1-1205707420808) setting status to Active
> 2008-03-16 16:43:44,213-0600 DEBUG TaskImpl Task(type=JOB_SUBMISSION, 
> identity=urn:0-1-1-1205707420808) setting status to Failed
> 2008-03-16 16:43:44,218-0600 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> jobid=runam3-0cu5avpi - Application exception: Task failed
>         task:execute @ vdl-int.k, line: 386
>         sys:sequential @ vdl-int.k, line: 378
>         sys:try @ vdl-int.k, line: 377
>         task:allocatehost @ vdl-int.k, line: 356
>         vdl:execute2 @ execute-default.k, line: 23
>         sys:restartonerror @ execute-default.k, line: 21
>         sys:sequential @ execute-default.k, line: 19
>         sys:try @ execute-default.k, line: 18
>         sys:if @ execute-default.k, line: 17
>         sys:then @ execute-default.k, line: 16
>         sys:if @ execute-default.k, line: 15
>         vdl:execute @ amps1.kml, line: 52
>         runam3 @ amps1.kml, line: 92
>         sys:sequential @ amps1.kml, line: 91
>         sys:parallelfor @ amps1.kml, line: 73
>         sys:sequential @ amps1.kml, line: 72
>         doall @ amps1.kml, line: 142
>         sys:sequential @ amps1.kml, line: 141
>         sys:parallel @ amps1.kml, line: 131
>         vdl:mainp @ amps1.kml, line: 130
>         mainp @ vdl.k, line: 150
>         vdl:mains @ amps1.kml, line: 128
>         vdl:mains @ amps1.kml, line: 128
>         rlog:restartlog @ amps1.kml, line: 126
>         kernel:project @ amps1.kml, line: 2
>         amps1-20080316-1643-g4n8t252
>
>

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================





More information about the Swift-devel mailing list