[Swift-devel] RE: [Swift-user] Execution error

Mihael Hategan hategan at mcs.anl.gov
Thu Apr 30 16:31:13 CDT 2009


Can you guys try to run first.swift on ranger with the settings you have
(you'll need to add "echo" to tc.data)?


On Thu, 2009-04-30 at 16:13 -0500, Glen Hocky wrote:
> I have the identical response on ranger. It started yesterday evening. 
> Possibly a problem that the TACC folks need to fix?
> 
> Glen
> 
> Yue, Chen - BMD wrote:
> > Hi Michael,
> >  
> > Thank you for the advices. I tested ranger with 1 job and new 
> > specifications of maxwalltime. It shows the following error message. I 
> > don't know if there is other problem with my setup. Thank you!
> >  
> > /////////////////////////////////////////////////
> > [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file 
> > sites.xml -tc.file tc.data
> > Swift 0.9rc2 swift-r2860 cog-r2388
> > RunID: 20090430-1559-2vi6x811
> > Progress:
> > Progress:  Stage in:1
> > Progress:  Submitting:1
> > Progress:  Submitting:1
> > Progress:  Submitted:1
> > Progress:  Active:1
> > Failed to transfer wrapper log from 
> > PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
> > Progress:  Active:1
> > Failed to transfer wrapper log from 
> > PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
> > Progress:  Stage in:1
> > Progress:  Active:1
> > Failed to transfer wrapper log from 
> > PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
> > Progress:  Failed:1
> > Execution failed:
> >         Exception in PTMap2:
> > Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, inputs-unmod.txt, 
> > parameters.txt]
> > Host: ranger
> > Directory: PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
> > stderr.txt:
> > stdout.txt:
> > ----
> > Caused by:
> >         Failed to start worker:
> > null
> > null
> > org.globus.gram.GramException: The job manager detected an invalid 
> > script response
> >         at 
> > org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)
> >         at org.globus.gram.GramJob.setStatus(GramJob.java:184)
> >         at 
> > org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
> >         at java.lang.Thread.run(Thread.java:619)
> > Cleaning up...
> > Shutting down service at https://129.114.50.163:45562 
> > <https://129.114.50.163:45562>
> > Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
> > - Done
> > [yuechen at communicado PTMap2]$
> > ///////////////////////////////////////////////////////////
> >  
> > Chen, Yue
> >  
> >
> > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> > *Sent:* Thu 4/30/2009 3:02 PM
> > *To:* Yue, Chen - BMD; swift-devel
> > *Subject:* Re: [Swift-user] Execution error
> >
> > Back on list here (I only went off-list to discuss accounts, etc)
> >
> > The problem in the run below is this:
> >
> > 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION
> > jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with
> > the given max walltime worker constraint (task: 3000, \
> > maxwalltime: 2400s)
> >
> > You have this on the ptmap app in your tc.data:
> >
> > globus::maxwalltime=50
> >
> > But you only gave coasters 40 mins per coaster worker. So its
> > complaining that it cant run a 50 minute job in a 40 minute (max)
> > coaster worker. ;)
> >
> > I mentioned in a prior mail that you need to set the two time vals in
> > your sites.xml entry; thats what you need to do next, now.
> >
> > change the coaster time in your sites.xml to:
> >      key="coasterWorkerMaxwalltime">00:51:00</profile>
> >
> > If you have more info on the variability of your ptmap run times, send
> > that to the list, and we can discuss how to handle.
> >
> >
> > (NOTE: doing grp -i of the log for "except" or scanning for "except"
> > with an editor will often locate the first "exception" that your job
> > encountered. Thats how I found the error above).
> >
> > Also, Yue, for testing new sites, or for validating that old sites still
> > work, you should create the smallest possible ptmap workflow - 1 job if
> > that is possible - and verify that this works.  Then say 10 jobs to make
> > sure scheduling etc is sane.  Then, send in your huge jobs.
> >
> > With only 1 job, its easier to spot the errors in the log file.
> >
> > - Mike
> >
> >
> > On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
> > > Hi Michael,
> > > 
> > > I run into the same messages again when I use Ranger:
> > > 
> > > Progress:  Selecting site:146  Stage in:25  Submitting:15  Submitted:821
> > > Failed but can retry:16
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
> > > Progress:  Selecting site:146  Stage in:3  Submitting:1  Submitted:857
> > > Failed but can retry:16
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
> > > Failed to transfer wrapper log from
> > > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> > > The log for the search is at : 
> > > /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
> > > 
> > > The sites.xml I have is:
> > > 
> > >  <pool handle="ranger">
> > >      <execution provider="coaster"
> > >                 url="gatekeeper.ranger.tacc.teragrid.org"
> > >                 jobManager="gt2:gt2:SGE"/>
> > >      <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
> > >      <profile namespace="env"
> > >               key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
> > >      <profile namespace="globus" key="project">TG-CCR080022N</profile>
> > >      <profile namespace="globus" key="coastersPerNode">16</profile>
> > >      <profile namespace="globus" key="queue">development</profile>
> > >      <profile namespace="globus"
> > >               key="coasterWorkerMaxwalltime">00:40:00</profile>
> > >      <profile namespace="globus" key="maxwalltime">31</profile>
> > >      <profile namespace="karajan" key="initialScore">50</profile>
> > >      <profile namespace="karajan" key="jobThrottle">10</profile>
> > >      <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
> > >  </pool>
> > > The tc.data I have is:
> > > 
> > > ranger          PTMap2         
> > > /share/home/01164/yuechen/PTMap2/PTMap2         INSTALLED      
> > > INTEL32::LINUX  globus::maxwalltime=50
> > >
> > > I'm using swift 0.9 rc2
> > >
> > > Thank you very much for help!
> > >
> > > Chen, Yue
> > >
> > > 
> > >
> > > ------------------------------------------------------------------------
> > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> > > *Sent:* Thu 4/30/2009 2:05 PM
> > > *To:* Yue, Chen - BMD
> > > *Subject:* Re: [Swift-user] Execution error
> > >
> > >
> > >
> > > On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
> > >  > Hi Michael,
> > >  >
> > >  > When I tried to activate my account, I encountered the following 
> > error:
> > >  >
> > >  > "Sorry, this account is in an invalid state. You may not activate 
> > your
> > >  > at this time."
> > >  >
> > >  > I used the username and password from TG-CDA070002T. Should I use a
> > >  > different password?
> > >
> > > If you can already login to Ranger, then you are all set - you must have
> > > done this previously.
> > >
> > > I thought you had *not*, because when I looked up your login on ranger
> > > ("finger yuechen") it said "never logged in". But seems like that info
> > > is incorrect.
> > >
> > > If you have ptmap compiled, seems like you are almost all set.
> > >
> > > Let me know if it works.
> > >
> > > - Mike
> > >
> > >  > Thanks!
> > >  >
> > >  > Chen, Yue
> > >  >
> > >  >
> > >  > 
> > ------------------------------------------------------------------------
> > >  > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> > >  > *Sent:* Thu 4/30/2009 1:07 PM
> > >  > *To:* Yue, Chen - BMD
> > >  > *Cc:* swift user
> > >  > *Subject:* Re: [Swift-user] Execution error
> > >  >
> > >  > Yue, use this XML pool element to access ranger:
> > >  >
> > >  >   <pool handle="ranger">
> > >  >      <execution provider="coaster"
> > >  >                 url="gatekeeper.ranger.tacc.teragrid.org"
> > >  >                 jobManager="gt2:gt2:SGE"/>
> > >  >      <gridftp 
> > url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
> > >  >      <profile namespace="env"
> > >  >               key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
> > >  >      <profile namespace="globus" 
> > key="project">TG-CCR080022N</profile>
> > >  >      <profile namespace="globus" key="coastersPerNode">16</profile>
> > >  >      <profile namespace="globus" key="queue">development</profile>
> > >  >      <profile namespace="globus"
> > >  >               key="coasterWorkerMaxwalltime">00:40:00</profile>
> > >  >      <profile namespace="globus" key="maxwalltime">31</profile>
> > >  >      <profile namespace="karajan" key="initialScore">50</profile>
> > >  >      <profile namespace="karajan" key="jobThrottle">10</profile>
> > >  >      <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
> > >  >    </pool>
> > >  >
> > >  >
> > >  > You will need to also do these steps:
> > >  >
> > >  > Go to this web page to enable your Ranger account:
> > >  >
> > >  > https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
> > >  >
> > >  > Then login to Ranger via the TeraGrid portal and put your ssh keys in
> > >  > place (assuming you use ssh keys, which you should)
> > >  >
> > >  > While on Ranger, do this:
> > >  >
> > >  > echo $WORK
> > >  > mkdir $work/swiftwork
> > >  >
> > >  > and put the full path of your $WORK/swiftwork directory in the
> > >  > <workdirectory> element above. (My login is tg455etc, yours is 
> > yuechen)
> > >  >
> > >  > Then scp your code to Ranger and compile it.
> > >  >
> > >  > Then create a tc.data entry for your ptmap app
> > >  >
> > >  > Next, set your time values in the sites.xml entry above to suitable
> > >  > values for Ranger. You'll need to measure times, but I think you will
> > >  > find Ranger about twice as fast as Mercury for CPU-bound jobs.
> > >  >
> > >  > The values above were set for one app job per coaster. I think 
> > you can
> > >  > probably do more.
> > >  >
> > >  > If you estimate a run time of 5 minutes, use:
> > >  >
> > >  >      <profile namespace="globus"
> > >  >               key="coasterWorkerMaxwalltime">00:30:00</profile>
> > >  >      <profile namespace="globus" key="maxwalltime">5</profile>
> > >  >
> > >  > Other people on the list - please sanity check what I suggest here.
> > >  >
> > >  > - Mike
> > >  >
> > >  >
> > >  > On 4/30/09 12:40 PM, Michael Wilde wrote:
> > >  >  > I just checked - TG-CDA070002T has indeed expired.
> > >  >  >
> > >  >  > The best for now is to move to use (only) Ranger, under this 
> > account:
> > >  >  > TG-CCR080022N
> > >  >  >
> > >  >  > I will locate and send you a sites.xml entry in a moment.
> > >  >  >
> > >  >  > You need to go to a web page to activate your Ranger login.
> > >  >  >
> > >  >  > Best to contact me in IM and we can work this out.
> > >  >  >
> > >  >  > - Mike
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  > On 4/30/09 12:23 PM, Michael Wilde wrote:
> > >  >  >> Also, what account are you running under? We may need to change
> > > you to
> > >  >  >> a new account - as the OSG Training account expires today.
> > >  >  >> If that happend at Noon, it *might* be the problem.
> > >  >  >>
> > >  >  >> - Mike
> > >  >  >>
> > >  >  >>
> > >  >  >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
> > >  >  >>> Hi,
> > >  >  >>>
> > >  >  >>> I came back to re-run my application on NCSA Mercury which was
> > > tested
> > >  >  >>> successfully last week after I just set up coasters with 
> > swift 0.9,
> > >  >  >>> but I got many messages like the following:
> > >  >  >>>
> > >  >  >>> Progress:  Stage in:219  Submitting:803  Submitted:1
> > >  >  >>> Progress:  Stage in:129  Submitting:703  Submitted:190 Failed
> > > but can
> > >  >  >>> retry:1
> > >  >  >>> Progress:  Stage in:38  Submitting:425  Submitted:556 Failed 
> > but can
> > >  >  >>> retry:4
> > >  >  >>> Failed to transfer wrapper log from
> > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY
> > >  >  >>> Failed to transfer wrapper log from
> > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY
> > >  >  >>> Failed to transfer wrapper log from
> > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY
> > >  >  >>> Failed to transfer wrapper log from
> > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY
> > >  >  >>> Failed to transfer wrapper log from
> > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY
> > >  >  >>> Failed to transfer wrapper log from
> > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY
> > >  >  >>> Progress:  Stage in:1  Submitted:1013  Active:1 Failed but can
> > > retry:8
> > >  >  >>> Progress:  Submitted:1011  Active:1 Failed but can retry:11
> > >  >  >>> The log file for the successful run last week is ;
> > >  >  >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
> > >  >  >>>
> > >  >  >>> The log file for the failed run is :
> > >  >  >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
> > >  >  >>>
> > >  >  >>> I don't think I did anything different, so I don't know why this
> > > time
> > >  >  >>> they failed. The sites.xml for Mercury is:
> > >  >  >>>
> > >  >  >>>  <pool handle="NCSA_MERCURY">
> > >  >  >>>     <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
> > >  >  >>>     <execution provider="coaster" 
> > url="grid-hg.ncsa.teragrid.org"
> > >  >  >>> jobManager="gt2:PBS"/>
> > >  >  >>>     
> > <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
> > >  >  >>>     <profile namespace="globus" key="queue">debug</profile>
> > >  >  >>>  </pool>
> > >  >  >>>
> > >  >  >>> Thank you for help!
> > >  >  >>>
> > >  >  >>> Chen, Yue
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  >  >>> This email is intended only for the use of the individual or 
> > entity
> > >  >  >>> to which it is addressed and may contain information that is
> > >  >  >>> privileged and confidential. If the reader of this email 
> > message is
> > >  >  >>> not the intended recipient, you are hereby notified that any
> > >  >  >>> dissemination, distribution, or copying of this communication is
> > >  >  >>> prohibited. If you have received this email in error, please 
> > notify
> > >  >  >>> the sender and destroy/delete all copies of the transmittal.
> > > Thank you.
> > >  >  >>>
> > >  >  >>>
> > >  >  >>>
> > >  > 
> > ------------------------------------------------------------------------
> > >  >  >>>
> > >  >  >>> _______________________________________________
> > >  >  >>> Swift-user mailing list
> > >  >  >>> Swift-user at ci.uchicago.edu
> > >  >  >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >  >  >> _______________________________________________
> > >  >  >> Swift-user mailing list
> > >  >  >> Swift-user at ci.uchicago.edu
> > >  >  >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >  >  > _______________________________________________
> > >  >  > Swift-user mailing list
> > >  >  > Swift-user at ci.uchicago.edu
> > >  >  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >  >
> > >  >
> > >  >
> > >  >
> > >  > This email is intended only for the use of the individual or 
> > entity to
> > >  > which it is addressed and may contain information that is 
> > privileged and
> > >  > confidential. If the reader of this email message is not the intended
> > >  > recipient, you are hereby notified that any dissemination, 
> > distribution,
> > >  > or copying of this communication is prohibited. If you have received
> > >  > this email in error, please notify the sender and destroy/delete all
> > >  > copies of the transmittal. Thank you.
> > >
> > > 
> > >
> > >
> > > This email is intended only for the use of the individual or entity to
> > > which it is addressed and may contain information that is privileged and
> > > confidential. If the reader of this email message is not the intended
> > > recipient, you are hereby notified that any dissemination, distribution,
> > > or copying of this communication is prohibited. If you have received
> > > this email in error, please notify the sender and destroy/delete all
> > > copies of the transmittal. Thank you.
> >
> >  
> >
> >
> > This email is intended only for the use of the individual or entity to 
> > which it is addressed and may contain information that is privileged 
> > and confidential. If the reader of this email message is not the 
> > intended recipient, you are hereby notified that any dissemination, 
> > distribution, or copying of this communication is prohibited. If you 
> > have received this email in error, please notify the sender and 
> > destroy/delete all copies of the transmittal. Thank you.
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >   
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list