[Swift-devel] RE: [Swift-user] Execution error

Yue, Chen - BMD yuechen at bsd.uchicago.edu
Thu Apr 30 16:08:58 CDT 2009


Hi Michael,
 
Thank you for the advices. I tested ranger with 1 job and new specifications of maxwalltime. It shows the following error message. I don't know if there is other problem with my setup. Thank you!
 
/////////////////////////////////////////////////
[yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file sites.xml -tc.file tc.data
Swift 0.9rc2 swift-r2860 cog-r2388
RunID: 20090430-1559-2vi6x811
Progress:
Progress:  Stage in:1
Progress:  Submitting:1
Progress:  Submitting:1
Progress:  Submitted:1
Progress:  Active:1
Failed to transfer wrapper log from PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
Progress:  Active:1
Failed to transfer wrapper log from PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
Progress:  Stage in:1
Progress:  Active:1
Failed to transfer wrapper log from PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
Progress:  Failed:1
Execution failed:
        Exception in PTMap2:
Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, inputs-unmod.txt, parameters.txt]
Host: ranger
Directory: PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
stderr.txt:
stdout.txt:
----
Caused by:
        Failed to start worker:
null
null
org.globus.gram.GramException: The job manager detected an invalid script response
        at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)
        at org.globus.gram.GramJob.setStatus(GramJob.java:184)
        at org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
        at java.lang.Thread.run(Thread.java:619)
Cleaning up...
Shutting down service at https://129.114.50.163:45562 <https://129.114.50.163:45562> 
Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
- Done
[yuechen at communicado PTMap2]$

///////////////////////////////////////////////////////////
 
Chen, Yue
 

________________________________

From: Michael Wilde [mailto:wilde at mcs.anl.gov]
Sent: Thu 4/30/2009 3:02 PM
To: Yue, Chen - BMD; swift-devel
Subject: Re: [Swift-user] Execution error



Back on list here (I only went off-list to discuss accounts, etc)

The problem in the run below is this:

2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION
jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with
the given max walltime worker constraint (task: 3000, \
maxwalltime: 2400s)

You have this on the ptmap app in your tc.data:

globus::maxwalltime=50

But you only gave coasters 40 mins per coaster worker. So its
complaining that it cant run a 50 minute job in a 40 minute (max)
coaster worker. ;)

I mentioned in a prior mail that you need to set the two time vals in
your sites.xml entry; thats what you need to do next, now.

change the coaster time in your sites.xml to:
     key="coasterWorkerMaxwalltime">00:51:00</profile>

If you have more info on the variability of your ptmap run times, send
that to the list, and we can discuss how to handle.


(NOTE: doing grp -i of the log for "except" or scanning for "except"
with an editor will often locate the first "exception" that your job
encountered. Thats how I found the error above).

Also, Yue, for testing new sites, or for validating that old sites still
work, you should create the smallest possible ptmap workflow - 1 job if
that is possible - and verify that this works.  Then say 10 jobs to make
sure scheduling etc is sane.  Then, send in your huge jobs.

With only 1 job, its easier to spot the errors in the log file.

- Mike


On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
> Hi Michael,
> 
> I run into the same messages again when I use Ranger:
> 
> Progress:  Selecting site:146  Stage in:25  Submitting:15  Submitted:821
> Failed but can retry:16
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
> Progress:  Selecting site:146  Stage in:3  Submitting:1  Submitted:857
> Failed but can retry:16
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> The log for the search is at : 
> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
> 
> The sites.xml I have is:
> 
>  <pool handle="ranger">
>      <execution provider="coaster"
>                 url="gatekeeper.ranger.tacc.teragrid.org"
>                 jobManager="gt2:gt2:SGE"/>
>      <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>      <profile namespace="env"
>               key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>      <profile namespace="globus" key="project">TG-CCR080022N</profile>
>      <profile namespace="globus" key="coastersPerNode">16</profile>
>      <profile namespace="globus" key="queue">development</profile>
>      <profile namespace="globus"
>               key="coasterWorkerMaxwalltime">00:40:00</profile>
>      <profile namespace="globus" key="maxwalltime">31</profile>
>      <profile namespace="karajan" key="initialScore">50</profile>
>      <profile namespace="karajan" key="jobThrottle">10</profile>
>      <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
>  </pool>
> The tc.data I have is:
> 
> ranger          PTMap2         
> /share/home/01164/yuechen/PTMap2/PTMap2         INSTALLED      
> INTEL32::LINUX  globus::maxwalltime=50
>
> I'm using swift 0.9 rc2
>
> Thank you very much for help!
>
> Chen, Yue
>
> 
>
> ------------------------------------------------------------------------
> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> *Sent:* Thu 4/30/2009 2:05 PM
> *To:* Yue, Chen - BMD
> *Subject:* Re: [Swift-user] Execution error
>
>
>
> On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
>  > Hi Michael,
>  >
>  > When I tried to activate my account, I encountered the following error:
>  >
>  > "Sorry, this account is in an invalid state. You may not activate your
>  > at this time."
>  >
>  > I used the username and password from TG-CDA070002T. Should I use a
>  > different password?
>
> If you can already login to Ranger, then you are all set - you must have
> done this previously.
>
> I thought you had *not*, because when I looked up your login on ranger
> ("finger yuechen") it said "never logged in". But seems like that info
> is incorrect.
>
> If you have ptmap compiled, seems like you are almost all set.
>
> Let me know if it works.
>
> - Mike
>
>  > Thanks!
>  >
>  > Chen, Yue
>  >
>  >
>  > ------------------------------------------------------------------------
>  > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>  > *Sent:* Thu 4/30/2009 1:07 PM
>  > *To:* Yue, Chen - BMD
>  > *Cc:* swift user
>  > *Subject:* Re: [Swift-user] Execution error
>  >
>  > Yue, use this XML pool element to access ranger:
>  >
>  >   <pool handle="ranger">
>  >      <execution provider="coaster"
>  >                 url="gatekeeper.ranger.tacc.teragrid.org"
>  >                 jobManager="gt2:gt2:SGE"/>
>  >      <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>  >      <profile namespace="env"
>  >               key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>  >      <profile namespace="globus" key="project">TG-CCR080022N</profile>
>  >      <profile namespace="globus" key="coastersPerNode">16</profile>
>  >      <profile namespace="globus" key="queue">development</profile>
>  >      <profile namespace="globus"
>  >               key="coasterWorkerMaxwalltime">00:40:00</profile>
>  >      <profile namespace="globus" key="maxwalltime">31</profile>
>  >      <profile namespace="karajan" key="initialScore">50</profile>
>  >      <profile namespace="karajan" key="jobThrottle">10</profile>
>  >      <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
>  >    </pool>
>  >
>  >
>  > You will need to also do these steps:
>  >
>  > Go to this web page to enable your Ranger account:
>  >
>  > https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
>  >
>  > Then login to Ranger via the TeraGrid portal and put your ssh keys in
>  > place (assuming you use ssh keys, which you should)
>  >
>  > While on Ranger, do this:
>  >
>  > echo $WORK
>  > mkdir $work/swiftwork
>  >
>  > and put the full path of your $WORK/swiftwork directory in the
>  > <workdirectory> element above. (My login is tg455etc, yours is yuechen)
>  >
>  > Then scp your code to Ranger and compile it.
>  >
>  > Then create a tc.data entry for your ptmap app
>  >
>  > Next, set your time values in the sites.xml entry above to suitable
>  > values for Ranger. You'll need to measure times, but I think you will
>  > find Ranger about twice as fast as Mercury for CPU-bound jobs.
>  >
>  > The values above were set for one app job per coaster. I think you can
>  > probably do more.
>  >
>  > If you estimate a run time of 5 minutes, use:
>  >
>  >      <profile namespace="globus"
>  >               key="coasterWorkerMaxwalltime">00:30:00</profile>
>  >      <profile namespace="globus" key="maxwalltime">5</profile>
>  >
>  > Other people on the list - please sanity check what I suggest here.
>  >
>  > - Mike
>  >
>  >
>  > On 4/30/09 12:40 PM, Michael Wilde wrote:
>  >  > I just checked - TG-CDA070002T has indeed expired.
>  >  >
>  >  > The best for now is to move to use (only) Ranger, under this account:
>  >  > TG-CCR080022N
>  >  >
>  >  > I will locate and send you a sites.xml entry in a moment.
>  >  >
>  >  > You need to go to a web page to activate your Ranger login.
>  >  >
>  >  > Best to contact me in IM and we can work this out.
>  >  >
>  >  > - Mike
>  >  >
>  >  >
>  >  >
>  >  > On 4/30/09 12:23 PM, Michael Wilde wrote:
>  >  >> Also, what account are you running under? We may need to change
> you to
>  >  >> a new account - as the OSG Training account expires today.
>  >  >> If that happend at Noon, it *might* be the problem.
>  >  >>
>  >  >> - Mike
>  >  >>
>  >  >>
>  >  >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
>  >  >>> Hi,
>  >  >>>
>  >  >>> I came back to re-run my application on NCSA Mercury which was
> tested
>  >  >>> successfully last week after I just set up coasters with swift 0.9,
>  >  >>> but I got many messages like the following:
>  >  >>>
>  >  >>> Progress:  Stage in:219  Submitting:803  Submitted:1
>  >  >>> Progress:  Stage in:129  Submitting:703  Submitted:190 Failed
> but can
>  >  >>> retry:1
>  >  >>> Progress:  Stage in:38  Submitting:425  Submitted:556 Failed but can
>  >  >>> retry:4
>  >  >>> Failed to transfer wrapper log from
>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY
>  >  >>> Failed to transfer wrapper log from
>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY
>  >  >>> Failed to transfer wrapper log from
>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY
>  >  >>> Failed to transfer wrapper log from
>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY
>  >  >>> Failed to transfer wrapper log from
>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY
>  >  >>> Failed to transfer wrapper log from
>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY
>  >  >>> Progress:  Stage in:1  Submitted:1013  Active:1 Failed but can
> retry:8
>  >  >>> Progress:  Submitted:1011  Active:1 Failed but can retry:11
>  >  >>> The log file for the successful run last week is ;
>  >  >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
>  >  >>>
>  >  >>> The log file for the failed run is :
>  >  >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
>  >  >>>
>  >  >>> I don't think I did anything different, so I don't know why this
> time
>  >  >>> they failed. The sites.xml for Mercury is:
>  >  >>>
>  >  >>>  <pool handle="NCSA_MERCURY">
>  >  >>>     <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
>  >  >>>     <execution provider="coaster" url="grid-hg.ncsa.teragrid.org"
>  >  >>> jobManager="gt2:PBS"/>
>  >  >>>     <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
>  >  >>>     <profile namespace="globus" key="queue">debug</profile>
>  >  >>>  </pool>
>  >  >>>
>  >  >>> Thank you for help!
>  >  >>>
>  >  >>> Chen, Yue
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>>
>  >  >>> This email is intended only for the use of the individual or entity
>  >  >>> to which it is addressed and may contain information that is
>  >  >>> privileged and confidential. If the reader of this email message is
>  >  >>> not the intended recipient, you are hereby notified that any
>  >  >>> dissemination, distribution, or copying of this communication is
>  >  >>> prohibited. If you have received this email in error, please notify
>  >  >>> the sender and destroy/delete all copies of the transmittal.
> Thank you.
>  >  >>>
>  >  >>>
>  >  >>>
>  > ------------------------------------------------------------------------
>  >  >>>
>  >  >>> _______________________________________________
>  >  >>> Swift-user mailing list
>  >  >>> Swift-user at ci.uchicago.edu
>  >  >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>  >  >> _______________________________________________
>  >  >> Swift-user mailing list
>  >  >> Swift-user at ci.uchicago.edu
>  >  >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>  >  > _______________________________________________
>  >  > Swift-user mailing list
>  >  > Swift-user at ci.uchicago.edu
>  >  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>  >
>  >
>  >
>  >
>  > This email is intended only for the use of the individual or entity to
>  > which it is addressed and may contain information that is privileged and
>  > confidential. If the reader of this email message is not the intended
>  > recipient, you are hereby notified that any dissemination, distribution,
>  > or copying of this communication is prohibited. If you have received
>  > this email in error, please notify the sender and destroy/delete all
>  > copies of the transmittal. Thank you.
>
> 
>
>
> This email is intended only for the use of the individual or entity to
> which it is addressed and may contain information that is privileged and
> confidential. If the reader of this email message is not the intended
> recipient, you are hereby notified that any dissemination, distribution,
> or copying of this communication is prohibited. If you have received
> this email in error, please notify the sender and destroy/delete all
> copies of the transmittal. Thank you.




This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090430/f2b070eb/attachment.html>


More information about the Swift-devel mailing list