[Swift-devel] Re: coaster on EC2 (error log)

Yi Zhu yizhu at cs.uchicago.edu
Tue Apr 27 14:54:49 CDT 2010


Hi Mihael

Thanks! You are absolutely right.  In the mean time, I found there is a 
"EC /number" /  line at the end of the coaster log, I assume it indicate 
a error code, then I tried to played with the coaster with different 
parameter of job provider and get the following result:


1) running without coaster --- gt2 + ssh +pbs (successful)

<filesystem provider="ssh" url="ec2-204-236-204-71.compute-1.amazonaws.co\
m"/>
<jobmanager universe="vanilla" url="ec2-204-236-204-71.compute-1.amazonaw\
s.com/jobmanager-pbs" major="2" />

2) running without coaster --- gt2 + gridftp + pbs(successful)

<gridftp url="gsiftp://ec2-204-236-204-71.compute-1.amazonaws.com"/>
<jobmanager universe="vanilla" url="ec2-204-236-204-71.compute-1.amazonaw\
s.com/jobmanager-pbs" major="2" />

3) coaster + gt2+ gridftp + pbs (failed)

sites.xml

<pool handle="ec2">
<execution provider="coaster" url="ec2-204-236-204-71.compute-1.amazonaws\
.com" jobmanager="gt2:gt2:pbs"/>

<profile namespace="globus" key="workersPerNode">1</profile>
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="nodeGranularity">5</profile>
<profile namespace="globus" key="maxNodes">2</profile>
<profile namespace="karajan" key="jobThrottle">1</profile>
<profile namespace="karajan" key="initialScore">10000</profile>

<gridftp url="gsiftp://ec2-204-236-204-71.compute-1.amazonaws.com"/>
<workdirectory>/home/torqueuser/swiftwork</workdirectory>
</pool>

screen output:

-bash-3.2$ swift -tc.file tc.test.data -sites.file sshpbscoast.xml 
first.swift
Swift svn swift-r3276 (swift modified locally) cog-r2739 (cog modified 
locally)

RunID: 20100427-1449-gfryv636
Progress:
Progress:  Stage in:1
Progress:  Submitted:1
Progress:  Active:1
Failed to transfer wrapper log from first-20100427-1449-gfryv636/info/h 
on ec2
Progress:  Failed:1
Execution failed:
         Exception in echo:
Arguments: [Hello, world!]
Host: ec2
Directory: first-20100427-1449-gfryv636/jobs/h/echo-hveu16rj
stderr.txt:

stdout.txt:

----

Caused by:
         Task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: 
Cannot submit job
         at 
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
         at 
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
         at 
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
         at 
org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
         at 
org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: org.globus.gram.GramException: Data transfer to the server 
failed [Caused by: Authentication failed [Caused by: Failure unspecified 
at GSS-API level [Caused by: Unknown CA]]]
         at org.globus.gram.Gram.request(Gram.java:334)
         at org.globus.gram.GramJob.request(GramJob.java:262)
         at 
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
         ... 4 more

Cleaning up...
Shutting down service at https://10.251.214.179:50260
Got channel MetaChannel: 499668036 -> GSSSChannel-01006506816(1)
+ Done
-bash-3.2$

coaster log:

[...]
EC 13


since I can submit job to gt2 via swift successfully, it should not have 
any authentication issues to gt2,  but when I changed the provider to 
coaster, I got  [Unknown CA] error,  is it because of  any possible 
authentication issue between coaster server node and gt2?

-Yi Zhu




On 4/27/2010 1:59 PM, Mihael Hategan wrote:
> On Tue, 2010-04-27 at 13:18 -0500, Yi Zhu wrote:
>
>    
>> I also checked the coaster log in server node, it shows it need a
>> binary file called gmd5sum,
>>      
> It shows it looked for gmd5sum and didn't find it. So it tries md5sum
> instead. I believe gmd5sum is the default on OS X and the reason why it
> looks for it. In any event, if the coaster process doesn't fail saying
> "didn't find gmd5sum or md5sum", then that's likely not your problem,
> and judging from the log below which says "Computed checksum...", md5sum
> was found.
>
>    
>> [...]
>>      
>    
>> coaster-bootstrap-11894108087.log
>> using plain mode
>> BS: http://tp-login2.ci.uchicago.edu:37470
>> which: no gmd5sum in
>> (/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/pacman-3.26/bin:/usr/local/bin:/bin:/usr/bin)
>> Expected checksum: acab90e149a0188fbc963803a42156c5
>> Computed checksum: acab90e149a0188fbc963803a42156c5
>>      
> [...]
>
>
>    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100427/5a20a264/attachment.html>


More information about the Swift-devel mailing list