[Swift-devel] Re: coaster on EC2 (error log)
Yi Zhu
yizhu at cs.uchicago.edu
Tue Apr 27 14:54:49 CDT 2010
Hi Mihael
Thanks! You are absolutely right. In the mean time, I found there is a
"EC /number" / line at the end of the coaster log, I assume it indicate
a error code, then I tried to played with the coaster with different
parameter of job provider and get the following result:
1) running without coaster --- gt2 + ssh +pbs (successful)
<filesystem provider="ssh" url="ec2-204-236-204-71.compute-1.amazonaws.co\
m"/>
<jobmanager universe="vanilla" url="ec2-204-236-204-71.compute-1.amazonaw\
s.com/jobmanager-pbs" major="2" />
2) running without coaster --- gt2 + gridftp + pbs(successful)
<gridftp url="gsiftp://ec2-204-236-204-71.compute-1.amazonaws.com"/>
<jobmanager universe="vanilla" url="ec2-204-236-204-71.compute-1.amazonaw\
s.com/jobmanager-pbs" major="2" />
3) coaster + gt2+ gridftp + pbs (failed)
sites.xml
<pool handle="ec2">
<execution provider="coaster" url="ec2-204-236-204-71.compute-1.amazonaws\
.com" jobmanager="gt2:gt2:pbs"/>
<profile namespace="globus" key="workersPerNode">1</profile>
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="nodeGranularity">5</profile>
<profile namespace="globus" key="maxNodes">2</profile>
<profile namespace="karajan" key="jobThrottle">1</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<gridftp url="gsiftp://ec2-204-236-204-71.compute-1.amazonaws.com"/>
<workdirectory>/home/torqueuser/swiftwork</workdirectory>
</pool>
screen output:
-bash-3.2$ swift -tc.file tc.test.data -sites.file sshpbscoast.xml
first.swift
Swift svn swift-r3276 (swift modified locally) cog-r2739 (cog modified
locally)
RunID: 20100427-1449-gfryv636
Progress:
Progress: Stage in:1
Progress: Submitted:1
Progress: Active:1
Failed to transfer wrapper log from first-20100427-1449-gfryv636/info/h
on ec2
Progress: Failed:1
Execution failed:
Exception in echo:
Arguments: [Hello, world!]
Host: ec2
Directory: first-20100427-1449-gfryv636/jobs/h/echo-hveu16rj
stderr.txt:
stdout.txt:
----
Caused by:
Task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
Cannot submit job
at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
at
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
at
org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
at
org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: org.globus.gram.GramException: Data transfer to the server
failed [Caused by: Authentication failed [Caused by: Failure unspecified
at GSS-API level [Caused by: Unknown CA]]]
at org.globus.gram.Gram.request(Gram.java:334)
at org.globus.gram.GramJob.request(GramJob.java:262)
at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
... 4 more
Cleaning up...
Shutting down service at https://10.251.214.179:50260
Got channel MetaChannel: 499668036 -> GSSSChannel-01006506816(1)
+ Done
-bash-3.2$
coaster log:
[...]
EC 13
since I can submit job to gt2 via swift successfully, it should not have
any authentication issues to gt2, but when I changed the provider to
coaster, I got [Unknown CA] error, is it because of any possible
authentication issue between coaster server node and gt2?
-Yi Zhu
On 4/27/2010 1:59 PM, Mihael Hategan wrote:
> On Tue, 2010-04-27 at 13:18 -0500, Yi Zhu wrote:
>
>
>> I also checked the coaster log in server node, it shows it need a
>> binary file called gmd5sum,
>>
> It shows it looked for gmd5sum and didn't find it. So it tries md5sum
> instead. I believe gmd5sum is the default on OS X and the reason why it
> looks for it. In any event, if the coaster process doesn't fail saying
> "didn't find gmd5sum or md5sum", then that's likely not your problem,
> and judging from the log below which says "Computed checksum...", md5sum
> was found.
>
>
>> [...]
>>
>
>> coaster-bootstrap-11894108087.log
>> using plain mode
>> BS: http://tp-login2.ci.uchicago.edu:37470
>> which: no gmd5sum in
>> (/opt/vdt-1.10.1/gums/scripts:/opt/vdt-1.10.1/prima/bin:/opt/vdt-1.10.1/cert-scripts/bin:/opt/vdt-1.10.1/glite/sbin:/opt/vdt-1.10.1/glite/bin:/opt/vdt-1.10.1/jdk1.5/bin:/opt/vdt-1.10.1/edg/sbin:/opt/vdt-1.10.1/gip/bin:/opt/vdt-1.10.1/gpt/sbin:/opt/vdt-1.10.1/globus/bin:/opt/vdt-1.10.1/globus/sbin:/opt/vdt-1.10.1/wget/bin:/opt/vdt-1.10.1/logrotate/sbin:/opt/vdt-1.10.1/perl/bin:/opt/pacman-3.26/bin:/opt/vdt-1.10.1/vdt/sbin:/opt/vdt-1.10.1/vdt/bin:/opt/pacman-3.26/bin:/usr/local/bin:/bin:/usr/bin)
>> Expected checksum: acab90e149a0188fbc963803a42156c5
>> Computed checksum: acab90e149a0188fbc963803a42156c5
>>
> [...]
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100427/5a20a264/attachment.html>
More information about the Swift-devel
mailing list