<HTML dir=ltr><HEAD><TITLE>Re: [Swift-user] Execution error</TITLE>
<META http-equiv=Content-Type content="text/html; charset=unicode">
<META content="MSHTML 6.00.6000.16825" name=GENERATOR></HEAD>
<BODY>
<DIV id=idOWAReplyText3786 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>Hi Michael,</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>Thank you for the advices. I tested ranger with 1 job and new specifications of maxwalltime. It shows the following error message. I don't know if there is other problem with my setup. Thank you!</FONT></DIV>
<DIV dir=ltr> </DIV>
<DIV dir=ltr>/////////////////////////////////////////////////</DIV>
<DIV dir=ltr><FONT face=Arial size=2>[yuechen@communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file sites.xml -tc.file tc.data<BR>Swift 0.9rc2 swift-r2860 cog-r2388</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>RunID: 20090430-1559-2vi6x811<BR>Progress:<BR>Progress: Stage in:1<BR>Progress: Submitting:1<BR>Progress: Submitting:1<BR>Progress: Submitted:1<BR>Progress: Active:1<BR>Failed to transfer wrapper log from PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger<BR>Progress: Active:1<BR>Failed to transfer wrapper log from PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger<BR>Progress: Stage in:1<BR>Progress: Active:1<BR>Failed to transfer wrapper log from PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger<BR>Progress: Failed:1<BR>Execution failed:<BR> Exception in PTMap2:<BR>Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, inputs-unmod.txt, parameters.txt]<BR>Host: ranger<BR>Directory: PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj<BR>stderr.txt:</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>stdout.txt:</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>----</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>Caused by:<BR> Failed to start worker:<BR>null<BR>null<BR>org.globus.gram.GramException: The job manager detected an invalid script response<BR> at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)<BR> at org.globus.gram.GramJob.setStatus(GramJob.java:184)<BR> at org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)<BR> at java.lang.Thread.run(Thread.java:619)</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>Cleaning up...<BR>Shutting down service at </FONT><A href="https://129.114.50.163:45562"><FONT face=Arial size=2>https://129.114.50.163:45562</FONT></A><BR><FONT face=Arial size=2>Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)<BR>- Done<BR>[yuechen@communicado PTMap2]$<BR></FONT></DIV></DIV>
<DIV dir=ltr>///////////////////////////////////////////////////////////</DIV>
<DIV dir=ltr> </DIV>
<DIV dir=ltr>Chen, Yue</DIV>
<DIV dir=ltr> </DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Michael Wilde [mailto:wilde@mcs.anl.gov]<BR><B>Sent:</B> Thu 4/30/2009 3:02 PM<BR><B>To:</B> Yue, Chen - BMD; swift-devel<BR><B>Subject:</B> Re: [Swift-user] Execution error<BR></FONT><BR></DIV>
<DIV>
<P><FONT size=2>Back on list here (I only went off-list to discuss accounts, etc)<BR><BR>The problem in the run below is this:<BR><BR>2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION<BR>jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with<BR>the given max walltime worker constraint (task: 3000, \<BR>maxwalltime: 2400s)<BR><BR>You have this on the ptmap app in your tc.data:<BR><BR>globus::maxwalltime=50<BR><BR>But you only gave coasters 40 mins per coaster worker. So its<BR>complaining that it cant run a 50 minute job in a 40 minute (max)<BR>coaster worker. ;)<BR><BR>I mentioned in a prior mail that you need to set the two time vals in<BR>your sites.xml entry; thats what you need to do next, now.<BR><BR>change the coaster time in your sites.xml to:<BR> key="coasterWorkerMaxwalltime">00:51:00</profile><BR><BR>If you have more info on the variability of your ptmap run times, send<BR>that to the list, and we can discuss how to handle.<BR><BR><BR>(NOTE: doing grp -i of the log for "except" or scanning for "except"<BR>with an editor will often locate the first "exception" that your job<BR>encountered. Thats how I found the error above).<BR><BR>Also, Yue, for testing new sites, or for validating that old sites still<BR>work, you should create the smallest possible ptmap workflow - 1 job if<BR>that is possible - and verify that this works. Then say 10 jobs to make<BR>sure scheduling etc is sane. Then, send in your huge jobs.<BR><BR>With only 1 job, its easier to spot the errors in the log file.<BR><BR>- Mike<BR><BR><BR>On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:<BR>> Hi Michael,<BR>> <BR>> I run into the same messages again when I use Ranger:<BR>> <BR>> Progress: Selecting site:146 Stage in:25 Submitting:15 Submitted:821<BR>> Failed but can retry:16<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger<BR>> Progress: Selecting site:146 Stage in:3 Submitting:1 Submitted:857<BR>> Failed but can retry:16<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger<BR>> Failed to transfer wrapper log from<BR>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger<BR>> The log for the search is at : <BR>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log<BR>> <BR>> The sites.xml I have is:<BR>> <BR>> <pool handle="ranger"><BR>> <execution provider="coaster"<BR>> url="gatekeeper.ranger.tacc.teragrid.org"<BR>> jobManager="gt2:gt2:SGE"/><BR>> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" /><BR>> <profile namespace="env"<BR>> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile><BR>> <profile namespace="globus" key="project">TG-CCR080022N</profile><BR>> <profile namespace="globus" key="coastersPerNode">16</profile><BR>> <profile namespace="globus" key="queue">development</profile><BR>> <profile namespace="globus"<BR>> key="coasterWorkerMaxwalltime">00:40:00</profile><BR>> <profile namespace="globus" key="maxwalltime">31</profile><BR>> <profile namespace="karajan" key="initialScore">50</profile><BR>> <profile namespace="karajan" key="jobThrottle">10</profile><BR>> <workdirectory>/work/01164/yuechen/swiftwork</workdirectory><BR>> </pool><BR>> The tc.data I have is:<BR>> <BR>> ranger PTMap2 <BR>> /share/home/01164/yuechen/PTMap2/PTMap2 INSTALLED <BR>> INTEL32::LINUX globus::maxwalltime=50<BR>><BR>> I'm using swift 0.9 rc2<BR>><BR>> Thank you very much for help!<BR>><BR>> Chen, Yue<BR>><BR>> <BR>><BR>> ------------------------------------------------------------------------<BR>> *From:* Michael Wilde [<A href="mailto:wilde@mcs.anl.gov">mailto:wilde@mcs.anl.gov</A>]<BR>> *Sent:* Thu 4/30/2009 2:05 PM<BR>> *To:* Yue, Chen - BMD<BR>> *Subject:* Re: [Swift-user] Execution error<BR>><BR>><BR>><BR>> On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:<BR>> > Hi Michael,<BR>> ><BR>> > When I tried to activate my account, I encountered the following error:<BR>> ><BR>> > "Sorry, this account is in an invalid state. You may not activate your<BR>> > at this time."<BR>> ><BR>> > I used the username and password from TG-CDA070002T. Should I use a<BR>> > different password?<BR>><BR>> If you can already login to Ranger, then you are all set - you must have<BR>> done this previously.<BR>><BR>> I thought you had *not*, because when I looked up your login on ranger<BR>> ("finger yuechen") it said "never logged in". But seems like that info<BR>> is incorrect.<BR>><BR>> If you have ptmap compiled, seems like you are almost all set.<BR>><BR>> Let me know if it works.<BR>><BR>> - Mike<BR>><BR>> > Thanks!<BR>> ><BR>> > Chen, Yue<BR>> ><BR>> ><BR>> > ------------------------------------------------------------------------<BR>> > *From:* Michael Wilde [<A href="mailto:wilde@mcs.anl.gov">mailto:wilde@mcs.anl.gov</A>]<BR>> > *Sent:* Thu 4/30/2009 1:07 PM<BR>> > *To:* Yue, Chen - BMD<BR>> > *Cc:* swift user<BR>> > *Subject:* Re: [Swift-user] Execution error<BR>> ><BR>> > Yue, use this XML pool element to access ranger:<BR>> ><BR>> > <pool handle="ranger"><BR>> > <execution provider="coaster"<BR>> > url="gatekeeper.ranger.tacc.teragrid.org"<BR>> > jobManager="gt2:gt2:SGE"/><BR>> > <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" /><BR>> > <profile namespace="env"<BR>> > key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile><BR>> > <profile namespace="globus" key="project">TG-CCR080022N</profile><BR>> > <profile namespace="globus" key="coastersPerNode">16</profile><BR>> > <profile namespace="globus" key="queue">development</profile><BR>> > <profile namespace="globus"<BR>> > key="coasterWorkerMaxwalltime">00:40:00</profile><BR>> > <profile namespace="globus" key="maxwalltime">31</profile><BR>> > <profile namespace="karajan" key="initialScore">50</profile><BR>> > <profile namespace="karajan" key="jobThrottle">10</profile><BR>> > <workdirectory>/work/00306/tg455797/swiftwork</workdirectory><BR>> > </pool><BR>> ><BR>> ><BR>> > You will need to also do these steps:<BR>> ><BR>> > Go to this web page to enable your Ranger account:<BR>> ><BR>> > <A href="https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx">https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx</A><BR>> ><BR>> > Then login to Ranger via the TeraGrid portal and put your ssh keys in<BR>> > place (assuming you use ssh keys, which you should)<BR>> ><BR>> > While on Ranger, do this:<BR>> ><BR>> > echo $WORK<BR>> > mkdir $work/swiftwork<BR>> ><BR>> > and put the full path of your $WORK/swiftwork directory in the<BR>> > <workdirectory> element above. (My login is tg455etc, yours is yuechen)<BR>> ><BR>> > Then scp your code to Ranger and compile it.<BR>> ><BR>> > Then create a tc.data entry for your ptmap app<BR>> ><BR>> > Next, set your time values in the sites.xml entry above to suitable<BR>> > values for Ranger. You'll need to measure times, but I think you will<BR>> > find Ranger about twice as fast as Mercury for CPU-bound jobs.<BR>> ><BR>> > The values above were set for one app job per coaster. I think you can<BR>> > probably do more.<BR>> ><BR>> > If you estimate a run time of 5 minutes, use:<BR>> ><BR>> > <profile namespace="globus"<BR>> > key="coasterWorkerMaxwalltime">00:30:00</profile><BR>> > <profile namespace="globus" key="maxwalltime">5</profile><BR>> ><BR>> > Other people on the list - please sanity check what I suggest here.<BR>> ><BR>> > - Mike<BR>> ><BR>> ><BR>> > On 4/30/09 12:40 PM, Michael Wilde wrote:<BR>> > > I just checked - TG-CDA070002T has indeed expired.<BR>> > ><BR>> > > The best for now is to move to use (only) Ranger, under this account:<BR>> > > TG-CCR080022N<BR>> > ><BR>> > > I will locate and send you a sites.xml entry in a moment.<BR>> > ><BR>> > > You need to go to a web page to activate your Ranger login.<BR>> > ><BR>> > > Best to contact me in IM and we can work this out.<BR>> > ><BR>> > > - Mike<BR>> > ><BR>> > ><BR>> > ><BR>> > > On 4/30/09 12:23 PM, Michael Wilde wrote:<BR>> > >> Also, what account are you running under? We may need to change<BR>> you to<BR>> > >> a new account - as the OSG Training account expires today.<BR>> > >> If that happend at Noon, it *might* be the problem.<BR>> > >><BR>> > >> - Mike<BR>> > >><BR>> > >><BR>> > >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:<BR>> > >>> Hi,<BR>> > >>><BR>> > >>> I came back to re-run my application on NCSA Mercury which was<BR>> tested<BR>> > >>> successfully last week after I just set up coasters with swift 0.9,<BR>> > >>> but I got many messages like the following:<BR>> > >>><BR>> > >>> Progress: Stage in:219 Submitting:803 Submitted:1<BR>> > >>> Progress: Stage in:129 Submitting:703 Submitted:190 Failed<BR>> but can<BR>> > >>> retry:1<BR>> > >>> Progress: Stage in:38 Submitting:425 Submitted:556 Failed but can<BR>> > >>> retry:4<BR>> > >>> Failed to transfer wrapper log from<BR>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY<BR>> > >>> Failed to transfer wrapper log from<BR>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY<BR>> > >>> Failed to transfer wrapper log from<BR>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY<BR>> > >>> Failed to transfer wrapper log from<BR>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY<BR>> > >>> Failed to transfer wrapper log from<BR>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY<BR>> > >>> Failed to transfer wrapper log from<BR>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY<BR>> > >>> Progress: Stage in:1 Submitted:1013 Active:1 Failed but can<BR>> retry:8<BR>> > >>> Progress: Submitted:1011 Active:1 Failed but can retry:11<BR>> > >>> The log file for the successful run last week is ;<BR>> > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log<BR>> > >>><BR>> > >>> The log file for the failed run is :<BR>> > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log<BR>> > >>><BR>> > >>> I don't think I did anything different, so I don't know why this<BR>> time<BR>> > >>> they failed. The sites.xml for Mercury is:<BR>> > >>><BR>> > >>> <pool handle="NCSA_MERCURY"><BR>> > >>> <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/><BR>> > >>> <execution provider="coaster" url="grid-hg.ncsa.teragrid.org"<BR>> > >>> jobManager="gt2:PBS"/><BR>> > >>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory><BR>> > >>> <profile namespace="globus" key="queue">debug</profile><BR>> > >>> </pool><BR>> > >>><BR>> > >>> Thank you for help!<BR>> > >>><BR>> > >>> Chen, Yue<BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>><BR>> > >>> This email is intended only for the use of the individual or entity<BR>> > >>> to which it is addressed and may contain information that is<BR>> > >>> privileged and confidential. If the reader of this email message is<BR>> > >>> not the intended recipient, you are hereby notified that any<BR>> > >>> dissemination, distribution, or copying of this communication is<BR>> > >>> prohibited. If you have received this email in error, please notify<BR>> > >>> the sender and destroy/delete all copies of the transmittal.<BR>> Thank you.<BR>> > >>><BR>> > >>><BR>> > >>><BR>> > ------------------------------------------------------------------------<BR>> > >>><BR>> > >>> _______________________________________________<BR>> > >>> Swift-user mailing list<BR>> > >>> Swift-user@ci.uchicago.edu<BR>> > >>> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-user">http://mail.ci.uchicago.edu/mailman/listinfo/swift-user</A><BR>> > >> _______________________________________________<BR>> > >> Swift-user mailing list<BR>> > >> Swift-user@ci.uchicago.edu<BR>> > >> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-user">http://mail.ci.uchicago.edu/mailman/listinfo/swift-user</A><BR>> > > _______________________________________________<BR>> > > Swift-user mailing list<BR>> > > Swift-user@ci.uchicago.edu<BR>> > > <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-user">http://mail.ci.uchicago.edu/mailman/listinfo/swift-user</A><BR>> ><BR>> ><BR>> ><BR>> ><BR>> > This email is intended only for the use of the individual or entity to<BR>> > which it is addressed and may contain information that is privileged and<BR>> > confidential. If the reader of this email message is not the intended<BR>> > recipient, you are hereby notified that any dissemination, distribution,<BR>> > or copying of this communication is prohibited. If you have received<BR>> > this email in error, please notify the sender and destroy/delete all<BR>> > copies of the transmittal. Thank you.<BR>><BR>> <BR>><BR>><BR>> This email is intended only for the use of the individual or entity to<BR>> which it is addressed and may contain information that is privileged and<BR>> confidential. If the reader of this email message is not the intended<BR>> recipient, you are hereby notified that any dissemination, distribution,<BR>> or copying of this communication is prohibited. If you have received<BR>> this email in error, please notify the sender and destroy/delete all<BR>> copies of the transmittal. Thank you.<BR></FONT></P></DIV><DIV> </DIV><br><br>This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you.<br></BODY></HTML>