[Swift-devel] Re: [Swift-user] Execution error
Michael Wilde
wilde at mcs.anl.gov
Thu Apr 30 15:02:50 CDT 2009
Back on list here (I only went off-list to discuss accounts, etc)
The problem in the run below is this:
2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION
jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with
the given max walltime worker constraint (task: 3000, \
maxwalltime: 2400s)
You have this on the ptmap app in your tc.data:
globus::maxwalltime=50
But you only gave coasters 40 mins per coaster worker. So its
complaining that it cant run a 50 minute job in a 40 minute (max)
coaster worker. ;)
I mentioned in a prior mail that you need to set the two time vals in
your sites.xml entry; thats what you need to do next, now.
change the coaster time in your sites.xml to:
key="coasterWorkerMaxwalltime">00:51:00</profile>
If you have more info on the variability of your ptmap run times, send
that to the list, and we can discuss how to handle.
(NOTE: doing grp -i of the log for "except" or scanning for "except"
with an editor will often locate the first "exception" that your job
encountered. Thats how I found the error above).
Also, Yue, for testing new sites, or for validating that old sites still
work, you should create the smallest possible ptmap workflow - 1 job if
that is possible - and verify that this works. Then say 10 jobs to make
sure scheduling etc is sane. Then, send in your huge jobs.
With only 1 job, its easier to spot the errors in the log file.
- Mike
On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
> Hi Michael,
>
> I run into the same messages again when I use Ranger:
>
> Progress: Selecting site:146 Stage in:25 Submitting:15 Submitted:821
> Failed but can retry:16
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
> Progress: Selecting site:146 Stage in:3 Submitting:1 Submitted:857
> Failed but can retry:16
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> The log for the search is at :
> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
>
> The sites.xml I have is:
>
> <pool handle="ranger">
> <execution provider="coaster"
> url="gatekeeper.ranger.tacc.teragrid.org"
> jobManager="gt2:gt2:SGE"/>
> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
> <profile namespace="env"
> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
> <profile namespace="globus" key="project">TG-CCR080022N</profile>
> <profile namespace="globus" key="coastersPerNode">16</profile>
> <profile namespace="globus" key="queue">development</profile>
> <profile namespace="globus"
> key="coasterWorkerMaxwalltime">00:40:00</profile>
> <profile namespace="globus" key="maxwalltime">31</profile>
> <profile namespace="karajan" key="initialScore">50</profile>
> <profile namespace="karajan" key="jobThrottle">10</profile>
> <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
> </pool>
> The tc.data I have is:
>
> ranger PTMap2
> /share/home/01164/yuechen/PTMap2/PTMap2 INSTALLED
> INTEL32::LINUX globus::maxwalltime=50
>
> I'm using swift 0.9 rc2
>
> Thank you very much for help!
>
> Chen, Yue
>
>
>
> ------------------------------------------------------------------------
> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> *Sent:* Thu 4/30/2009 2:05 PM
> *To:* Yue, Chen - BMD
> *Subject:* Re: [Swift-user] Execution error
>
>
>
> On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
> > Hi Michael,
> >
> > When I tried to activate my account, I encountered the following error:
> >
> > "Sorry, this account is in an invalid state. You may not activate your
> > at this time."
> >
> > I used the username and password from TG-CDA070002T. Should I use a
> > different password?
>
> If you can already login to Ranger, then you are all set - you must have
> done this previously.
>
> I thought you had *not*, because when I looked up your login on ranger
> ("finger yuechen") it said "never logged in". But seems like that info
> is incorrect.
>
> If you have ptmap compiled, seems like you are almost all set.
>
> Let me know if it works.
>
> - Mike
>
> > Thanks!
> >
> > Chen, Yue
> >
> >
> > ------------------------------------------------------------------------
> > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> > *Sent:* Thu 4/30/2009 1:07 PM
> > *To:* Yue, Chen - BMD
> > *Cc:* swift user
> > *Subject:* Re: [Swift-user] Execution error
> >
> > Yue, use this XML pool element to access ranger:
> >
> > <pool handle="ranger">
> > <execution provider="coaster"
> > url="gatekeeper.ranger.tacc.teragrid.org"
> > jobManager="gt2:gt2:SGE"/>
> > <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
> > <profile namespace="env"
> > key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
> > <profile namespace="globus" key="project">TG-CCR080022N</profile>
> > <profile namespace="globus" key="coastersPerNode">16</profile>
> > <profile namespace="globus" key="queue">development</profile>
> > <profile namespace="globus"
> > key="coasterWorkerMaxwalltime">00:40:00</profile>
> > <profile namespace="globus" key="maxwalltime">31</profile>
> > <profile namespace="karajan" key="initialScore">50</profile>
> > <profile namespace="karajan" key="jobThrottle">10</profile>
> > <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
> > </pool>
> >
> >
> > You will need to also do these steps:
> >
> > Go to this web page to enable your Ranger account:
> >
> > https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
> >
> > Then login to Ranger via the TeraGrid portal and put your ssh keys in
> > place (assuming you use ssh keys, which you should)
> >
> > While on Ranger, do this:
> >
> > echo $WORK
> > mkdir $work/swiftwork
> >
> > and put the full path of your $WORK/swiftwork directory in the
> > <workdirectory> element above. (My login is tg455etc, yours is yuechen)
> >
> > Then scp your code to Ranger and compile it.
> >
> > Then create a tc.data entry for your ptmap app
> >
> > Next, set your time values in the sites.xml entry above to suitable
> > values for Ranger. You'll need to measure times, but I think you will
> > find Ranger about twice as fast as Mercury for CPU-bound jobs.
> >
> > The values above were set for one app job per coaster. I think you can
> > probably do more.
> >
> > If you estimate a run time of 5 minutes, use:
> >
> > <profile namespace="globus"
> > key="coasterWorkerMaxwalltime">00:30:00</profile>
> > <profile namespace="globus" key="maxwalltime">5</profile>
> >
> > Other people on the list - please sanity check what I suggest here.
> >
> > - Mike
> >
> >
> > On 4/30/09 12:40 PM, Michael Wilde wrote:
> > > I just checked - TG-CDA070002T has indeed expired.
> > >
> > > The best for now is to move to use (only) Ranger, under this account:
> > > TG-CCR080022N
> > >
> > > I will locate and send you a sites.xml entry in a moment.
> > >
> > > You need to go to a web page to activate your Ranger login.
> > >
> > > Best to contact me in IM and we can work this out.
> > >
> > > - Mike
> > >
> > >
> > >
> > > On 4/30/09 12:23 PM, Michael Wilde wrote:
> > >> Also, what account are you running under? We may need to change
> you to
> > >> a new account - as the OSG Training account expires today.
> > >> If that happend at Noon, it *might* be the problem.
> > >>
> > >> - Mike
> > >>
> > >>
> > >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
> > >>> Hi,
> > >>>
> > >>> I came back to re-run my application on NCSA Mercury which was
> tested
> > >>> successfully last week after I just set up coasters with swift 0.9,
> > >>> but I got many messages like the following:
> > >>>
> > >>> Progress: Stage in:219 Submitting:803 Submitted:1
> > >>> Progress: Stage in:129 Submitting:703 Submitted:190 Failed
> but can
> > >>> retry:1
> > >>> Progress: Stage in:38 Submitting:425 Submitted:556 Failed but can
> > >>> retry:4
> > >>> Failed to transfer wrapper log from
> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY
> > >>> Failed to transfer wrapper log from
> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY
> > >>> Failed to transfer wrapper log from
> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY
> > >>> Failed to transfer wrapper log from
> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY
> > >>> Failed to transfer wrapper log from
> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY
> > >>> Failed to transfer wrapper log from
> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY
> > >>> Progress: Stage in:1 Submitted:1013 Active:1 Failed but can
> retry:8
> > >>> Progress: Submitted:1011 Active:1 Failed but can retry:11
> > >>> The log file for the successful run last week is ;
> > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
> > >>>
> > >>> The log file for the failed run is :
> > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
> > >>>
> > >>> I don't think I did anything different, so I don't know why this
> time
> > >>> they failed. The sites.xml for Mercury is:
> > >>>
> > >>> <pool handle="NCSA_MERCURY">
> > >>> <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
> > >>> <execution provider="coaster" url="grid-hg.ncsa.teragrid.org"
> > >>> jobManager="gt2:PBS"/>
> > >>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
> > >>> <profile namespace="globus" key="queue">debug</profile>
> > >>> </pool>
> > >>>
> > >>> Thank you for help!
> > >>>
> > >>> Chen, Yue
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> This email is intended only for the use of the individual or entity
> > >>> to which it is addressed and may contain information that is
> > >>> privileged and confidential. If the reader of this email message is
> > >>> not the intended recipient, you are hereby notified that any
> > >>> dissemination, distribution, or copying of this communication is
> > >>> prohibited. If you have received this email in error, please notify
> > >>> the sender and destroy/delete all copies of the transmittal.
> Thank you.
> > >>>
> > >>>
> > >>>
> > ------------------------------------------------------------------------
> > >>>
> > >>> _______________________________________________
> > >>> Swift-user mailing list
> > >>> Swift-user at ci.uchicago.edu
> > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >> _______________________________________________
> > >> Swift-user mailing list
> > >> Swift-user at ci.uchicago.edu
> > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
> >
> >
> >
> > This email is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged and
> > confidential. If the reader of this email message is not the intended
> > recipient, you are hereby notified that any dissemination, distribution,
> > or copying of this communication is prohibited. If you have received
> > this email in error, please notify the sender and destroy/delete all
> > copies of the transmittal. Thank you.
>
>
>
>
> This email is intended only for the use of the individual or entity to
> which it is addressed and may contain information that is privileged and
> confidential. If the reader of this email message is not the intended
> recipient, you are hereby notified that any dissemination, distribution,
> or copying of this communication is prohibited. If you have received
> this email in error, please notify the sender and destroy/delete all
> copies of the transmittal. Thank you.
More information about the Swift-devel
mailing list