[Swift-devel] RE: [Swift-user] Execution error
Zhao Zhang
zhaozhang at uchicago.edu
Thu Apr 30 16:52:43 CDT 2009
Hi, Glen
Can you point me to the working swift on ranger?
zhao
Glen Hocky wrote:
> I have the identical response on ranger. It started yesterday evening.
> Possibly a problem that the TACC folks need to fix?
>
> Glen
>
> Yue, Chen - BMD wrote:
>> Hi Michael,
>>
>> Thank you for the advices. I tested ranger with 1 job and new
>> specifications of maxwalltime. It shows the following error message.
>> I don't know if there is other problem with my setup. Thank you!
>>
>> /////////////////////////////////////////////////
>> [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file
>> sites.xml -tc.file tc.data
>> Swift 0.9rc2 swift-r2860 cog-r2388
>> RunID: 20090430-1559-2vi6x811
>> Progress:
>> Progress: Stage in:1
>> Progress: Submitting:1
>> Progress: Submitting:1
>> Progress: Submitted:1
>> Progress: Active:1
>> Failed to transfer wrapper log from
>> PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
>> Progress: Active:1
>> Failed to transfer wrapper log from
>> PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
>> Progress: Stage in:1
>> Progress: Active:1
>> Failed to transfer wrapper log from
>> PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
>> Progress: Failed:1
>> Execution failed:
>> Exception in PTMap2:
>> Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, inputs-unmod.txt,
>> parameters.txt]
>> Host: ranger
>> Directory: PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
>> stderr.txt:
>> stdout.txt:
>> ----
>> Caused by:
>> Failed to start worker:
>> null
>> null
>> org.globus.gram.GramException: The job manager detected an invalid
>> script response
>> at
>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)
>>
>> at org.globus.gram.GramJob.setStatus(GramJob.java:184)
>> at
>> org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
>> at java.lang.Thread.run(Thread.java:619)
>> Cleaning up...
>> Shutting down service at https://129.114.50.163:45562
>> <https://129.114.50.163:45562>
>> Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
>> - Done
>> [yuechen at communicado PTMap2]$
>> ///////////////////////////////////////////////////////////
>>
>> Chen, Yue
>>
>>
>> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>> *Sent:* Thu 4/30/2009 3:02 PM
>> *To:* Yue, Chen - BMD; swift-devel
>> *Subject:* Re: [Swift-user] Execution error
>>
>> Back on list here (I only went off-list to discuss accounts, etc)
>>
>> The problem in the run below is this:
>>
>> 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION
>> jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with
>> the given max walltime worker constraint (task: 3000, \
>> maxwalltime: 2400s)
>>
>> You have this on the ptmap app in your tc.data:
>>
>> globus::maxwalltime=50
>>
>> But you only gave coasters 40 mins per coaster worker. So its
>> complaining that it cant run a 50 minute job in a 40 minute (max)
>> coaster worker. ;)
>>
>> I mentioned in a prior mail that you need to set the two time vals in
>> your sites.xml entry; thats what you need to do next, now.
>>
>> change the coaster time in your sites.xml to:
>> key="coasterWorkerMaxwalltime">00:51:00</profile>
>>
>> If you have more info on the variability of your ptmap run times, send
>> that to the list, and we can discuss how to handle.
>>
>>
>> (NOTE: doing grp -i of the log for "except" or scanning for "except"
>> with an editor will often locate the first "exception" that your job
>> encountered. Thats how I found the error above).
>>
>> Also, Yue, for testing new sites, or for validating that old sites still
>> work, you should create the smallest possible ptmap workflow - 1 job if
>> that is possible - and verify that this works. Then say 10 jobs to make
>> sure scheduling etc is sane. Then, send in your huge jobs.
>>
>> With only 1 job, its easier to spot the errors in the log file.
>>
>> - Mike
>>
>>
>> On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
>> > Hi Michael,
>> > > I run into the same messages again when I use Ranger:
>> > > Progress: Selecting site:146 Stage in:25 Submitting:15
>> Submitted:821
>> > Failed but can retry:16
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
>> > Progress: Selecting site:146 Stage in:3 Submitting:1 Submitted:857
>> > Failed but can retry:16
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
>> > Failed to transfer wrapper log from
>> > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>> > The log for the search is at : >
>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
>> > > The sites.xml I have is:
>> > > <pool handle="ranger">
>> > <execution provider="coaster"
>> > url="gatekeeper.ranger.tacc.teragrid.org"
>> > jobManager="gt2:gt2:SGE"/>
>> > <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>> > <profile namespace="env"
>> > key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>> > <profile namespace="globus" key="project">TG-CCR080022N</profile>
>> > <profile namespace="globus" key="coastersPerNode">16</profile>
>> > <profile namespace="globus" key="queue">development</profile>
>> > <profile namespace="globus"
>> > key="coasterWorkerMaxwalltime">00:40:00</profile>
>> > <profile namespace="globus" key="maxwalltime">31</profile>
>> > <profile namespace="karajan" key="initialScore">50</profile>
>> > <profile namespace="karajan" key="jobThrottle">10</profile>
>> > <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
>> > </pool>
>> > The tc.data I have is:
>> > > ranger PTMap2 >
>> /share/home/01164/yuechen/PTMap2/PTMap2 INSTALLED >
>> INTEL32::LINUX globus::maxwalltime=50
>> >
>> > I'm using swift 0.9 rc2
>> >
>> > Thank you very much for help!
>> >
>> > Chen, Yue
>> >
>> > >
>> >
>> ------------------------------------------------------------------------
>> > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>> > *Sent:* Thu 4/30/2009 2:05 PM
>> > *To:* Yue, Chen - BMD
>> > *Subject:* Re: [Swift-user] Execution error
>> >
>> >
>> >
>> > On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
>> > > Hi Michael,
>> > >
>> > > When I tried to activate my account, I encountered the following
>> error:
>> > >
>> > > "Sorry, this account is in an invalid state. You may not
>> activate your
>> > > at this time."
>> > >
>> > > I used the username and password from TG-CDA070002T. Should I use a
>> > > different password?
>> >
>> > If you can already login to Ranger, then you are all set - you must
>> have
>> > done this previously.
>> >
>> > I thought you had *not*, because when I looked up your login on ranger
>> > ("finger yuechen") it said "never logged in". But seems like that info
>> > is incorrect.
>> >
>> > If you have ptmap compiled, seems like you are almost all set.
>> >
>> > Let me know if it works.
>> >
>> > - Mike
>> >
>> > > Thanks!
>> > >
>> > > Chen, Yue
>> > >
>> > >
>> > >
>> ------------------------------------------------------------------------
>> > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>> > > *Sent:* Thu 4/30/2009 1:07 PM
>> > > *To:* Yue, Chen - BMD
>> > > *Cc:* swift user
>> > > *Subject:* Re: [Swift-user] Execution error
>> > >
>> > > Yue, use this XML pool element to access ranger:
>> > >
>> > > <pool handle="ranger">
>> > > <execution provider="coaster"
>> > > url="gatekeeper.ranger.tacc.teragrid.org"
>> > > jobManager="gt2:gt2:SGE"/>
>> > > <gridftp
>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>> > > <profile namespace="env"
>> > > key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>> > > <profile namespace="globus"
>> key="project">TG-CCR080022N</profile>
>> > > <profile namespace="globus" key="coastersPerNode">16</profile>
>> > > <profile namespace="globus" key="queue">development</profile>
>> > > <profile namespace="globus"
>> > > key="coasterWorkerMaxwalltime">00:40:00</profile>
>> > > <profile namespace="globus" key="maxwalltime">31</profile>
>> > > <profile namespace="karajan" key="initialScore">50</profile>
>> > > <profile namespace="karajan" key="jobThrottle">10</profile>
>> > > <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
>> > > </pool>
>> > >
>> > >
>> > > You will need to also do these steps:
>> > >
>> > > Go to this web page to enable your Ranger account:
>> > >
>> > > https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
>> > >
>> > > Then login to Ranger via the TeraGrid portal and put your ssh
>> keys in
>> > > place (assuming you use ssh keys, which you should)
>> > >
>> > > While on Ranger, do this:
>> > >
>> > > echo $WORK
>> > > mkdir $work/swiftwork
>> > >
>> > > and put the full path of your $WORK/swiftwork directory in the
>> > > <workdirectory> element above. (My login is tg455etc, yours is
>> yuechen)
>> > >
>> > > Then scp your code to Ranger and compile it.
>> > >
>> > > Then create a tc.data entry for your ptmap app
>> > >
>> > > Next, set your time values in the sites.xml entry above to suitable
>> > > values for Ranger. You'll need to measure times, but I think you
>> will
>> > > find Ranger about twice as fast as Mercury for CPU-bound jobs.
>> > >
>> > > The values above were set for one app job per coaster. I think
>> you can
>> > > probably do more.
>> > >
>> > > If you estimate a run time of 5 minutes, use:
>> > >
>> > > <profile namespace="globus"
>> > > key="coasterWorkerMaxwalltime">00:30:00</profile>
>> > > <profile namespace="globus" key="maxwalltime">5</profile>
>> > >
>> > > Other people on the list - please sanity check what I suggest here.
>> > >
>> > > - Mike
>> > >
>> > >
>> > > On 4/30/09 12:40 PM, Michael Wilde wrote:
>> > > > I just checked - TG-CDA070002T has indeed expired.
>> > > >
>> > > > The best for now is to move to use (only) Ranger, under this
>> account:
>> > > > TG-CCR080022N
>> > > >
>> > > > I will locate and send you a sites.xml entry in a moment.
>> > > >
>> > > > You need to go to a web page to activate your Ranger login.
>> > > >
>> > > > Best to contact me in IM and we can work this out.
>> > > >
>> > > > - Mike
>> > > >
>> > > >
>> > > >
>> > > > On 4/30/09 12:23 PM, Michael Wilde wrote:
>> > > >> Also, what account are you running under? We may need to change
>> > you to
>> > > >> a new account - as the OSG Training account expires today.
>> > > >> If that happend at Noon, it *might* be the problem.
>> > > >>
>> > > >> - Mike
>> > > >>
>> > > >>
>> > > >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
>> > > >>> Hi,
>> > > >>>
>> > > >>> I came back to re-run my application on NCSA Mercury which was
>> > tested
>> > > >>> successfully last week after I just set up coasters with
>> swift 0.9,
>> > > >>> but I got many messages like the following:
>> > > >>>
>> > > >>> Progress: Stage in:219 Submitting:803 Submitted:1
>> > > >>> Progress: Stage in:129 Submitting:703 Submitted:190 Failed
>> > but can
>> > > >>> retry:1
>> > > >>> Progress: Stage in:38 Submitting:425 Submitted:556
>> Failed but can
>> > > >>> retry:4
>> > > >>> Failed to transfer wrapper log from
>> > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY
>> > > >>> Failed to transfer wrapper log from
>> > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY
>> > > >>> Failed to transfer wrapper log from
>> > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY
>> > > >>> Failed to transfer wrapper log from
>> > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY
>> > > >>> Failed to transfer wrapper log from
>> > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY
>> > > >>> Failed to transfer wrapper log from
>> > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY
>> > > >>> Progress: Stage in:1 Submitted:1013 Active:1 Failed but can
>> > retry:8
>> > > >>> Progress: Submitted:1011 Active:1 Failed but can retry:11
>> > > >>> The log file for the successful run last week is ;
>> > > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
>> > > >>>
>> > > >>> The log file for the failed run is :
>> > > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
>> > > >>>
>> > > >>> I don't think I did anything different, so I don't know why
>> this
>> > time
>> > > >>> they failed. The sites.xml for Mercury is:
>> > > >>>
>> > > >>> <pool handle="NCSA_MERCURY">
>> > > >>> <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
>> > > >>> <execution provider="coaster"
>> url="grid-hg.ncsa.teragrid.org"
>> > > >>> jobManager="gt2:PBS"/>
>> > > >>>
>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
>> > > >>> <profile namespace="globus" key="queue">debug</profile>
>> > > >>> </pool>
>> > > >>>
>> > > >>> Thank you for help!
>> > > >>>
>> > > >>> Chen, Yue
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> This email is intended only for the use of the individual
>> or entity
>> > > >>> to which it is addressed and may contain information that is
>> > > >>> privileged and confidential. If the reader of this email
>> message is
>> > > >>> not the intended recipient, you are hereby notified that any
>> > > >>> dissemination, distribution, or copying of this
>> communication is
>> > > >>> prohibited. If you have received this email in error,
>> please notify
>> > > >>> the sender and destroy/delete all copies of the transmittal.
>> > Thank you.
>> > > >>>
>> > > >>>
>> > > >>>
>> > >
>> ------------------------------------------------------------------------
>> > > >>>
>> > > >>> _______________________________________________
>> > > >>> Swift-user mailing list
>> > > >>> Swift-user at ci.uchicago.edu
>> > > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>> > > >> _______________________________________________
>> > > >> Swift-user mailing list
>> > > >> Swift-user at ci.uchicago.edu
>> > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>> > > > _______________________________________________
>> > > > Swift-user mailing list
>> > > > Swift-user at ci.uchicago.edu
>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>> > >
>> > >
>> > >
>> > >
>> > > This email is intended only for the use of the individual or
>> entity to
>> > > which it is addressed and may contain information that is
>> privileged and
>> > > confidential. If the reader of this email message is not the
>> intended
>> > > recipient, you are hereby notified that any dissemination,
>> distribution,
>> > > or copying of this communication is prohibited. If you have
>> received
>> > > this email in error, please notify the sender and destroy/delete
>> all
>> > > copies of the transmittal. Thank you.
>> >
>> > >
>> >
>> > This email is intended only for the use of the individual or entity to
>> > which it is addressed and may contain information that is
>> privileged and
>> > confidential. If the reader of this email message is not the intended
>> > recipient, you are hereby notified that any dissemination,
>> distribution,
>> > or copying of this communication is prohibited. If you have received
>> > this email in error, please notify the sender and destroy/delete all
>> > copies of the transmittal. Thank you.
>>
>>
>>
>>
>> This email is intended only for the use of the individual or entity
>> to which it is addressed and may contain information that is
>> privileged and confidential. If the reader of this email message is
>> not the intended recipient, you are hereby notified that any
>> dissemination, distribution, or copying of this communication is
>> prohibited. If you have received this email in error, please notify
>> the sender and destroy/delete all copies of the transmittal. Thank you.
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list