<HTML dir=ltr><HEAD><TITLE>Re: [Swift-devel] RE: [Swift-user] Execution error</TITLE>
<META http-equiv=Content-Type content="text/html; charset=unicode">
<META content="MSHTML 6.00.6000.16825" name=GENERATOR></HEAD>
<BODY>
<DIV id=idOWAReplyText45939 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>Hi Michael,</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>I already have +osg-client-1.0.0-r1 in my .soft file. But I change it to +osg-client and tried again. "ranger" gave me the same error message. In the meantime, I tested one job on both Abe and Lonestar and they both gave me qsub error. I attached as following:</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>////////////////////////////////////</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>[yuechen@communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file sites.xml -tc.file tc.data<BR>Swift 0.9rc2 swift-r2860 cog-r2388</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>RunID: 20090430-1722-oncfdolb<BR>Progress:<BR>Progress: Stage in:1<BR>Progress: Stage in:1<BR>Progress: Stage in:1<BR>Progress: Submitting:1<BR>Progress: Submitted:1<BR>Failed to transfer wrapper log from PTMap2-unmod-20090430-1722-oncfdolb/info/3 on TACC_LoneStar<BR>Progress: Active:1<BR>Failed to transfer wrapper log from PTMap2-unmod-20090430-1722-oncfdolb/info/5 on TACC_LoneStar<BR>Progress: Stage in:1<BR>Progress: Active:1<BR>Failed to transfer wrapper log from PTMap2-unmod-20090430-1722-oncfdolb/info/7 on TACC_LoneStar<BR>Progress: Failed:1<BR>Execution failed:<BR> Exception in PTMap2:<BR>Arguments: [e04.mzXML, ./seqs-ecolik12/fasta02, inputs-unmod.txt, parameters.txt]<BR>Host: TACC_LoneStar<BR>Directory: PTMap2-unmod-20090430-1722-oncfdolb/jobs/7/PTMap2-7uagp5aj<BR>stderr.txt:</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>stdout.txt:</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>----</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>Caused by:<BR> Cannot submit job: Could not submit job (qsub reported an exit code of -1). no error output<BR>org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job: Could not submit job (qsub reported an exit code of -1). no error output<BR> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)<BR> at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)<BR> at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:43)<BR> at org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.startWorker(WorkerManager.java:221)<BR> at org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.run(WorkerManager.java:145)<BR>Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of -1). no error output<BR> at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:94)<BR> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)<BR> ... 4 more</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>Cleaning up...<BR>Shutting down service at </FONT><A href="https://129.114.50.32:34704"><FONT face=Arial size=2>https://129.114.50.32:34704</FONT></A><BR><FONT face=Arial size=2>Got channel MetaChannel: 2013263 -> GSSSChannel-null(1)<BR>- Done<BR>/////////////////////////////////////////</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>My sites.xml is at : /home/yuechen/PTMap2/sites.xml. I'm wondering if this still relates to my setup. Thanks!</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>Chen, Yue</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr> </DIV></DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Michael Wilde [mailto:wilde@mcs.anl.gov]<BR><B>Sent:</B> Thu 4/30/2009 5:23 PM<BR><B>To:</B> Mihael Hategan<BR><B>Cc:</B> swift-devel; Yue, Chen - BMD<BR><B>Subject:</B> Re: [Swift-devel] RE: [Swift-user] Execution error<BR></FONT><BR></DIV>
<DIV><BR><BR>
<P><FONT size=2>On 4/30/09 5:13 PM, Mihael Hategan wrote:<BR> >> GRAM Job submission failed because the job manager failed to open<BR>stderr<BR> >> (error code 74)<BR> ><BR> > That seems like an IP address problem. Make sure you set GLOBUS_HOSTNAME<BR> > properly.<BR><BR>OK, I will try that. But in the test below, I caused the error by<BR>unsetting X509_CERT_DIR and fixed the error by resetting it - no other<BR>changes.<BR><BR>I *think* that as recently as a few weeks ago globus-job-run to ranger<BR>worked with just @globus in my .soft file.<BR><BR>Adding +osg-client seemed to make it work by setting X509_CERT_DIR.<BR><BR>So as far as I can tell, at least at the level of globus-job-run, these<BR>seems to be related to certs.<BR><BR>Given what Im seeing, do you still think GLOBUS_HOSTNAME is a factor?<BR><BR>- Mike<BR><BR><BR>> On Thu, 2009-04-30 at 17:01 -0500, Michael Wilde wrote:<BR>>> A bit more info on this: it *seems* like a cert issue.<BR>>><BR>>> I last accessed Ranger via globus-job-run perhaps 2 weeks ago, no problem.<BR>>><BR>>> Yesterday, while debugging with Glen, globus-job-run was giving me GRAM<BR>>> err 74. (and GRM err 12 to all other sites)<BR>>><BR>>> So I added +osg-client to my .soft file, and then globus-job-run worked.<BR>>><BR>>> But I noticed that my globus-job-run was still coming from the GT4 dir,<BR>>> not from an OSG dir.<BR>>><BR>>> Just now I traced this back to X509_CERT_DIR:<BR>>><BR>>> <works here> then I did:<BR>>><BR>>> com$ unset X509_CERT_DIR<BR>>> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id<BR>>> GRAM Job submission failed because the job manager failed to open stderr<BR>>> (error code 74)<BR>><BR>> That seems like an IP address problem. Make sure you set GLOBUS_HOSTNAME<BR>> properly.<BR>><BR>>> com$<BR>>> com$<BR>>> com$ X509_CERT_DIR=/soft/osg-client-1.0.0-r1/globus/TRUSTED_CA<BR>>> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id<BR>>> GRAM Job submission failed because the job manager failed to open stderr<BR>>> (error code 74)<BR>>> com$ export X509_CERT_DIR=/soft/osg-client-1.0.0-r1/globus/TRUSTED_CA<BR>>> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id<BR>>> uid=455797(tg455797) gid=80243(G-80243)<BR>>> groups=80243(G-80243),81031(G-81031),81411(G-81411),81611(G-81611),81613(G-81613),81621(G-81621),81747(G-81747),81792(G-81792),800744(G-800744),800745(G-800745),800889(G-800889),800981(G-800981),800983(G-800983),801271(G-801271),801364(G-801364)<BR>>> com$<BR>>><BR>>> Mihael, does swift honor X509_CERT_DIR? If so, Glen, Yue, that is<BR>>> something to try.<BR>>><BR>>> You may need to put +osg-client this in your .soft file and re-login:<BR>>><BR>>> @python-2.5<BR>>> +java-sun<BR>>><BR>>> +apache-ant<BR>>> +gx-map<BR>>> +condor<BR>>> +gx-map<BR>>> @globus-4<BR>>> @default<BR>>> +R<BR>>> +torque<BR>>> +maui<BR>>> +matlab-7.7<BR>>> +osg-client<BR>>><BR>>> - Mike<BR>>><BR>>><BR>>><BR>>><BR>>><BR>>> On 4/30/09 4:39 PM, Michael Wilde wrote:<BR>>>> And we should also drill back down to why (at least yesterday) the GT4<BR>>>> softev package failed, but the OSG client worked, for globus-job-run.<BR>>>><BR>>>> I guess its possible there is a host or CA cert issue here.<BR>>>><BR>>>> - Mike<BR>>>><BR>>>><BR>>>> On 4/30/09 4:31 PM, Mihael Hategan wrote:<BR>>>>> Can you guys try to run first.swift on ranger with the settings you have<BR>>>>> (you'll need to add "echo" to tc.data)?<BR>>>>><BR>>>>><BR>>>>> On Thu, 2009-04-30 at 16:13 -0500, Glen Hocky wrote:<BR>>>>>> I have the identical response on ranger. It started yesterday<BR>>>>>> evening. Possibly a problem that the TACC folks need to fix?<BR>>>>>><BR>>>>>> Glen<BR>>>>>><BR>>>>>> Yue, Chen - BMD wrote:<BR>>>>>>> Hi Michael,<BR>>>>>>> <BR>>>>>>> Thank you for the advices. I tested ranger with 1 job and new<BR>>>>>>> specifications of maxwalltime. It shows the following error message.<BR>>>>>>> I don't know if there is other problem with my setup. Thank you!<BR>>>>>>> <BR>>>>>>> /////////////////////////////////////////////////<BR>>>>>>> [yuechen@communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file<BR>>>>>>> sites.xml -tc.file tc.data<BR>>>>>>> Swift 0.9rc2 swift-r2860 cog-r2388<BR>>>>>>> RunID: 20090430-1559-2vi6x811<BR>>>>>>> Progress:<BR>>>>>>> Progress: Stage in:1<BR>>>>>>> Progress: Submitting:1<BR>>>>>>> Progress: Submitting:1<BR>>>>>>> Progress: Submitted:1<BR>>>>>>> Progress: Active:1<BR>>>>>>> Failed to transfer wrapper log from<BR>>>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger<BR>>>>>>> Progress: Active:1<BR>>>>>>> Failed to transfer wrapper log from<BR>>>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger<BR>>>>>>> Progress: Stage in:1<BR>>>>>>> Progress: Active:1<BR>>>>>>> Failed to transfer wrapper log from<BR>>>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger<BR>>>>>>> Progress: Failed:1<BR>>>>>>> Execution failed:<BR>>>>>>> Exception in PTMap2:<BR>>>>>>> Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, inputs-unmod.txt,<BR>>>>>>> parameters.txt]<BR>>>>>>> Host: ranger<BR>>>>>>> Directory: PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj<BR>>>>>>> stderr.txt:<BR>>>>>>> stdout.txt:<BR>>>>>>> ----<BR>>>>>>> Caused by:<BR>>>>>>> Failed to start worker:<BR>>>>>>> null<BR>>>>>>> null<BR>>>>>>> org.globus.gram.GramException: The job manager detected an invalid<BR>>>>>>> script response<BR>>>>>>> at<BR>>>>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)<BR>>>>>>><BR>>>>>>> at org.globus.gram.GramJob.setStatus(GramJob.java:184)<BR>>>>>>> at<BR>>>>>>> org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)<BR>>>>>>> at java.lang.Thread.run(Thread.java:619)<BR>>>>>>> Cleaning up...<BR>>>>>>> Shutting down service at <A href="https://129.114.50.163:45562/">https://129.114.50.163:45562</A><BR>>>>>>> <<A href="https://129.114.50.163:45562/">https://129.114.50.163:45562</A>><BR>>>>>>> Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)<BR>>>>>>> - Done<BR>>>>>>> [yuechen@communicado PTMap2]$<BR>>>>>>> ///////////////////////////////////////////////////////////<BR>>>>>>> <BR>>>>>>> Chen, Yue<BR>>>>>>> <BR>>>>>>><BR>>>>>>> *From:* Michael Wilde [<A href="mailto:wilde@mcs.anl.gov">mailto:wilde@mcs.anl.gov</A>]<BR>>>>>>> *Sent:* Thu 4/30/2009 3:02 PM<BR>>>>>>> *To:* Yue, Chen - BMD; swift-devel<BR>>>>>>> *Subject:* Re: [Swift-user] Execution error<BR>>>>>>><BR>>>>>>> Back on list here (I only went off-list to discuss accounts, etc)<BR>>>>>>><BR>>>>>>> The problem in the run below is this:<BR>>>>>>><BR>>>>>>> 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION<BR>>>>>>> jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with<BR>>>>>>> the given max walltime worker constraint (task: 3000, \<BR>>>>>>> maxwalltime: 2400s)<BR>>>>>>><BR>>>>>>> You have this on the ptmap app in your tc.data:<BR>>>>>>><BR>>>>>>> globus::maxwalltime=50<BR>>>>>>><BR>>>>>>> But you only gave coasters 40 mins per coaster worker. So its<BR>>>>>>> complaining that it cant run a 50 minute job in a 40 minute (max)<BR>>>>>>> coaster worker. ;)<BR>>>>>>><BR>>>>>>> I mentioned in a prior mail that you need to set the two time vals in<BR>>>>>>> your sites.xml entry; thats what you need to do next, now.<BR>>>>>>><BR>>>>>>> change the coaster time in your sites.xml to:<BR>>>>>>> key="coasterWorkerMaxwalltime">00:51:00</profile><BR>>>>>>><BR>>>>>>> If you have more info on the variability of your ptmap run times, send<BR>>>>>>> that to the list, and we can discuss how to handle.<BR>>>>>>><BR>>>>>>><BR>>>>>>> (NOTE: doing grp -i of the log for "except" or scanning for "except"<BR>>>>>>> with an editor will often locate the first "exception" that your job<BR>>>>>>> encountered. Thats how I found the error above).<BR>>>>>>><BR>>>>>>> Also, Yue, for testing new sites, or for validating that old sites<BR>>>>>>> still<BR>>>>>>> work, you should create the smallest possible ptmap workflow - 1 job if<BR>>>>>>> that is possible - and verify that this works. Then say 10 jobs to<BR>>>>>>> make<BR>>>>>>> sure scheduling etc is sane. Then, send in your huge jobs.<BR>>>>>>><BR>>>>>>> With only 1 job, its easier to spot the errors in the log file.<BR>>>>>>><BR>>>>>>> - Mike<BR>>>>>>><BR>>>>>>><BR>>>>>>> On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:<BR>>>>>>>> Hi Michael,<BR>>>>>>>><BR>>>>>>>> I run into the same messages again when I use Ranger:<BR>>>>>>>><BR>>>>>>>> Progress: Selecting site:146 Stage in:25 Submitting:15 <BR>>>>>>>> Submitted:821<BR>>>>>>>> Failed but can retry:16<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger<BR>>>>>>>> Progress: Selecting site:146 Stage in:3 Submitting:1 Submitted:857<BR>>>>>>>> Failed but can retry:16<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger<BR>>>>>>>> Failed to transfer wrapper log from<BR>>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger<BR>>>>>>>> The log for the search is at :<BR>>>>>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log<BR>>>>>>>><BR>>>>>>>> The sites.xml I have is:<BR>>>>>>>><BR>>>>>>>> <pool handle="ranger"><BR>>>>>>>> <execution provider="coaster"<BR>>>>>>>> url="gatekeeper.ranger.tacc.teragrid.org"<BR>>>>>>>> jobManager="gt2:gt2:SGE"/><BR>>>>>>>> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" /><BR>>>>>>>> <profile namespace="env"<BR>>>>>>>> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile><BR>>>>>>>> <profile namespace="globus" key="project">TG-CCR080022N</profile><BR>>>>>>>> <profile namespace="globus" key="coastersPerNode">16</profile><BR>>>>>>>> <profile namespace="globus" key="queue">development</profile><BR>>>>>>>> <profile namespace="globus"<BR>>>>>>>> key="coasterWorkerMaxwalltime">00:40:00</profile><BR>>>>>>>> <profile namespace="globus" key="maxwalltime">31</profile><BR>>>>>>>> <profile namespace="karajan" key="initialScore">50</profile><BR>>>>>>>> <profile namespace="karajan" key="jobThrottle">10</profile><BR>>>>>>>> <workdirectory>/work/01164/yuechen/swiftwork</workdirectory><BR>>>>>>>> </pool><BR>>>>>>>> The tc.data I have is:<BR>>>>>>>><BR>>>>>>>> ranger PTMap2 <BR>>>>>>>> /share/home/01164/yuechen/PTMap2/PTMap2 INSTALLED <BR>>>>>>>> INTEL32::LINUX globus::maxwalltime=50<BR>>>>>>>><BR>>>>>>>> I'm using swift 0.9 rc2<BR>>>>>>>><BR>>>>>>>> Thank you very much for help!<BR>>>>>>>><BR>>>>>>>> Chen, Yue<BR>>>>>>>><BR>>>>>>>><BR>>>>>>>><BR>>>>>>>> ------------------------------------------------------------------------<BR>>>>>>>><BR>>>>>>>> *From:* Michael Wilde [<A href="mailto:wilde@mcs.anl.gov">mailto:wilde@mcs.anl.gov</A>]<BR>>>>>>>> *Sent:* Thu 4/30/2009 2:05 PM<BR>>>>>>>> *To:* Yue, Chen - BMD<BR>>>>>>>> *Subject:* Re: [Swift-user] Execution error<BR>>>>>>>><BR>>>>>>>><BR>>>>>>>><BR>>>>>>>> On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:<BR>>>>>>>> > Hi Michael,<BR>>>>>>>> ><BR>>>>>>>> > When I tried to activate my account, I encountered the following<BR>>>>>>> error:<BR>>>>>>>> ><BR>>>>>>>> > "Sorry, this account is in an invalid state. You may not activate<BR>>>>>>> your<BR>>>>>>>> > at this time."<BR>>>>>>>> ><BR>>>>>>>> > I used the username and password from TG-CDA070002T. Should I use a<BR>>>>>>>> > different password?<BR>>>>>>>><BR>>>>>>>> If you can already login to Ranger, then you are all set - you must<BR>>>>>>>> have<BR>>>>>>>> done this previously.<BR>>>>>>>><BR>>>>>>>> I thought you had *not*, because when I looked up your login on ranger<BR>>>>>>>> ("finger yuechen") it said "never logged in". But seems like that info<BR>>>>>>>> is incorrect.<BR>>>>>>>><BR>>>>>>>> If you have ptmap compiled, seems like you are almost all set.<BR>>>>>>>><BR>>>>>>>> Let me know if it works.<BR>>>>>>>><BR>>>>>>>> - Mike<BR>>>>>>>><BR>>>>>>>> > Thanks!<BR>>>>>>>> ><BR>>>>>>>> > Chen, Yue<BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>> ------------------------------------------------------------------------<BR>>>>>>><BR>>>>>>>> > *From:* Michael Wilde [<A href="mailto:wilde@mcs.anl.gov">mailto:wilde@mcs.anl.gov</A>]<BR>>>>>>>> > *Sent:* Thu 4/30/2009 1:07 PM<BR>>>>>>>> > *To:* Yue, Chen - BMD<BR>>>>>>>> > *Cc:* swift user<BR>>>>>>>> > *Subject:* Re: [Swift-user] Execution error<BR>>>>>>>> ><BR>>>>>>>> > Yue, use this XML pool element to access ranger:<BR>>>>>>>> ><BR>>>>>>>> > <pool handle="ranger"><BR>>>>>>>> > <execution provider="coaster"<BR>>>>>>>> > url="gatekeeper.ranger.tacc.teragrid.org"<BR>>>>>>>> > jobManager="gt2:gt2:SGE"/><BR>>>>>>>> > <gridftp<BR>>>>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" /><BR>>>>>>>> > <profile namespace="env"<BR>>>>>>>> > key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile><BR>>>>>>>> > <profile namespace="globus"<BR>>>>>>> key="project">TG-CCR080022N</profile><BR>>>>>>>> > <profile namespace="globus" key="coastersPerNode">16</profile><BR>>>>>>>> > <profile namespace="globus" key="queue">development</profile><BR>>>>>>>> > <profile namespace="globus"<BR>>>>>>>> > key="coasterWorkerMaxwalltime">00:40:00</profile><BR>>>>>>>> > <profile namespace="globus" key="maxwalltime">31</profile><BR>>>>>>>> > <profile namespace="karajan" key="initialScore">50</profile><BR>>>>>>>> > <profile namespace="karajan" key="jobThrottle">10</profile><BR>>>>>>>> > <workdirectory>/work/00306/tg455797/swiftwork</workdirectory><BR>>>>>>>> > </pool><BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>>> > You will need to also do these steps:<BR>>>>>>>> ><BR>>>>>>>> > Go to this web page to enable your Ranger account:<BR>>>>>>>> ><BR>>>>>>>> > <A href="https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx">https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx</A><BR>>>>>>>> ><BR>>>>>>>> > Then login to Ranger via the TeraGrid portal and put your ssh<BR>>>>>>>> keys in<BR>>>>>>>> > place (assuming you use ssh keys, which you should)<BR>>>>>>>> ><BR>>>>>>>> > While on Ranger, do this:<BR>>>>>>>> ><BR>>>>>>>> > echo $WORK<BR>>>>>>>> > mkdir $work/swiftwork<BR>>>>>>>> ><BR>>>>>>>> > and put the full path of your $WORK/swiftwork directory in the<BR>>>>>>>> > <workdirectory> element above. (My login is tg455etc, yours is<BR>>>>>>> yuechen)<BR>>>>>>>> ><BR>>>>>>>> > Then scp your code to Ranger and compile it.<BR>>>>>>>> ><BR>>>>>>>> > Then create a tc.data entry for your ptmap app<BR>>>>>>>> ><BR>>>>>>>> > Next, set your time values in the sites.xml entry above to suitable<BR>>>>>>>> > values for Ranger. You'll need to measure times, but I think you<BR>>>>>>>> will<BR>>>>>>>> > find Ranger about twice as fast as Mercury for CPU-bound jobs.<BR>>>>>>>> ><BR>>>>>>>> > The values above were set for one app job per coaster. I think<BR>>>>>>> you can<BR>>>>>>>> > probably do more.<BR>>>>>>>> ><BR>>>>>>>> > If you estimate a run time of 5 minutes, use:<BR>>>>>>>> ><BR>>>>>>>> > <profile namespace="globus"<BR>>>>>>>> > key="coasterWorkerMaxwalltime">00:30:00</profile><BR>>>>>>>> > <profile namespace="globus" key="maxwalltime">5</profile><BR>>>>>>>> ><BR>>>>>>>> > Other people on the list - please sanity check what I suggest here.<BR>>>>>>>> ><BR>>>>>>>> > - Mike<BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>>> > On 4/30/09 12:40 PM, Michael Wilde wrote:<BR>>>>>>>> > > I just checked - TG-CDA070002T has indeed expired.<BR>>>>>>>> > ><BR>>>>>>>> > > The best for now is to move to use (only) Ranger, under this<BR>>>>>>> account:<BR>>>>>>>> > > TG-CCR080022N<BR>>>>>>>> > ><BR>>>>>>>> > > I will locate and send you a sites.xml entry in a moment.<BR>>>>>>>> > ><BR>>>>>>>> > > You need to go to a web page to activate your Ranger login.<BR>>>>>>>> > ><BR>>>>>>>> > > Best to contact me in IM and we can work this out.<BR>>>>>>>> > ><BR>>>>>>>> > > - Mike<BR>>>>>>>> > ><BR>>>>>>>> > ><BR>>>>>>>> > ><BR>>>>>>>> > > On 4/30/09 12:23 PM, Michael Wilde wrote:<BR>>>>>>>> > >> Also, what account are you running under? We may need to change<BR>>>>>>>> you to<BR>>>>>>>> > >> a new account - as the OSG Training account expires today.<BR>>>>>>>> > >> If that happend at Noon, it *might* be the problem.<BR>>>>>>>> > >><BR>>>>>>>> > >> - Mike<BR>>>>>>>> > >><BR>>>>>>>> > >><BR>>>>>>>> > >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:<BR>>>>>>>> > >>> Hi,<BR>>>>>>>> > >>><BR>>>>>>>> > >>> I came back to re-run my application on NCSA Mercury which was<BR>>>>>>>> tested<BR>>>>>>>> > >>> successfully last week after I just set up coasters with<BR>>>>>>> swift 0.9,<BR>>>>>>>> > >>> but I got many messages like the following:<BR>>>>>>>> > >>><BR>>>>>>>> > >>> Progress: Stage in:219 Submitting:803 Submitted:1<BR>>>>>>>> > >>> Progress: Stage in:129 Submitting:703 Submitted:190 Failed<BR>>>>>>>> but can<BR>>>>>>>> > >>> retry:1<BR>>>>>>>> > >>> Progress: Stage in:38 Submitting:425 Submitted:556 Failed<BR>>>>>>> but can<BR>>>>>>>> > >>> retry:4<BR>>>>>>>> > >>> Failed to transfer wrapper log from<BR>>>>>>>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY<BR>>>>>>>> > >>> Failed to transfer wrapper log from<BR>>>>>>>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY<BR>>>>>>>> > >>> Failed to transfer wrapper log from<BR>>>>>>>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY<BR>>>>>>>> > >>> Failed to transfer wrapper log from<BR>>>>>>>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY<BR>>>>>>>> > >>> Failed to transfer wrapper log from<BR>>>>>>>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY<BR>>>>>>>> > >>> Failed to transfer wrapper log from<BR>>>>>>>> > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY<BR>>>>>>>> > >>> Progress: Stage in:1 Submitted:1013 Active:1 Failed but can<BR>>>>>>>> retry:8<BR>>>>>>>> > >>> Progress: Submitted:1011 Active:1 Failed but can retry:11<BR>>>>>>>> > >>> The log file for the successful run last week is ;<BR>>>>>>>> > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log<BR>>>>>>>> > >>><BR>>>>>>>> > >>> The log file for the failed run is :<BR>>>>>>>> > >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log<BR>>>>>>>> > >>><BR>>>>>>>> > >>> I don't think I did anything different, so I don't know why<BR>>>>>>>> this<BR>>>>>>>> time<BR>>>>>>>> > >>> they failed. The sites.xml for Mercury is:<BR>>>>>>>> > >>><BR>>>>>>>> > >>> <pool handle="NCSA_MERCURY"><BR>>>>>>>> > >>> <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/><BR>>>>>>>> > >>> <execution provider="coaster"<BR>>>>>>> url="grid-hg.ncsa.teragrid.org"<BR>>>>>>>> > >>> jobManager="gt2:PBS"/><BR>>>>>>>> > >>> <BR>>>>>>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory><BR>>>>>>>> > >>> <profile namespace="globus" key="queue">debug</profile><BR>>>>>>>> > >>> </pool><BR>>>>>>>> > >>><BR>>>>>>>> > >>> Thank you for help!<BR>>>>>>>> > >>><BR>>>>>>>> > >>> Chen, Yue<BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>> This email is intended only for the use of the individual or<BR>>>>>>> entity<BR>>>>>>>> > >>> to which it is addressed and may contain information that is<BR>>>>>>>> > >>> privileged and confidential. If the reader of this email<BR>>>>>>> message is<BR>>>>>>>> > >>> not the intended recipient, you are hereby notified that any<BR>>>>>>>> > >>> dissemination, distribution, or copying of this<BR>>>>>>>> communication is<BR>>>>>>>> > >>> prohibited. If you have received this email in error, please<BR>>>>>>> notify<BR>>>>>>>> > >>> the sender and destroy/delete all copies of the transmittal.<BR>>>>>>>> Thank you.<BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> > >>><BR>>>>>>>> ><BR>>>>>>> ------------------------------------------------------------------------<BR>>>>>>><BR>>>>>>>> > >>><BR>>>>>>>> > >>> _______________________________________________<BR>>>>>>>> > >>> Swift-user mailing list<BR>>>>>>>> > >>> Swift-user@ci.uchicago.edu<BR>>>>>>>> > >>> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-user">http://mail.ci.uchicago.edu/mailman/listinfo/swift-user</A><BR>>>>>>>> > >> _______________________________________________<BR>>>>>>>> > >> Swift-user mailing list<BR>>>>>>>> > >> Swift-user@ci.uchicago.edu<BR>>>>>>>> > >> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-user">http://mail.ci.uchicago.edu/mailman/listinfo/swift-user</A><BR>>>>>>>> > > _______________________________________________<BR>>>>>>>> > > Swift-user mailing list<BR>>>>>>>> > > Swift-user@ci.uchicago.edu<BR>>>>>>>> > > <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-user">http://mail.ci.uchicago.edu/mailman/listinfo/swift-user</A><BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>>> ><BR>>>>>>>> > This email is intended only for the use of the individual or<BR>>>>>>> entity to<BR>>>>>>>> > which it is addressed and may contain information that is<BR>>>>>>> privileged and<BR>>>>>>>> > confidential. If the reader of this email message is not the<BR>>>>>>>> intended<BR>>>>>>>> > recipient, you are hereby notified that any dissemination,<BR>>>>>>> distribution,<BR>>>>>>>> > or copying of this communication is prohibited. If you have<BR>>>>>>>> received<BR>>>>>>>> > this email in error, please notify the sender and destroy/delete<BR>>>>>>>> all<BR>>>>>>>> > copies of the transmittal. Thank you.<BR>>>>>>>><BR>>>>>>>><BR>>>>>>>><BR>>>>>>>><BR>>>>>>>> This email is intended only for the use of the individual or entity to<BR>>>>>>>> which it is addressed and may contain information that is<BR>>>>>>>> privileged and<BR>>>>>>>> confidential. If the reader of this email message is not the intended<BR>>>>>>>> recipient, you are hereby notified that any dissemination,<BR>>>>>>>> distribution,<BR>>>>>>>> or copying of this communication is prohibited. If you have received<BR>>>>>>>> this email in error, please notify the sender and destroy/delete all<BR>>>>>>>> copies of the transmittal. Thank you.<BR>>>>>>> <BR>>>>>>><BR>>>>>>><BR>>>>>>> This email is intended only for the use of the individual or entity<BR>>>>>>> to which it is addressed and may contain information that is<BR>>>>>>> privileged and confidential. If the reader of this email message is<BR>>>>>>> not the intended recipient, you are hereby notified that any<BR>>>>>>> dissemination, distribution, or copying of this communication is<BR>>>>>>> prohibited. If you have received this email in error, please notify<BR>>>>>>> the sender and destroy/delete all copies of the transmittal. Thank you.<BR>>>>>>> ------------------------------------------------------------------------<BR>>>>>>><BR>>>>>>><BR>>>>>>> _______________________________________________<BR>>>>>>> Swift-devel mailing list<BR>>>>>>> Swift-devel@ci.uchicago.edu<BR>>>>>>> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</A><BR>>>>>>> <BR>>>>>> _______________________________________________<BR>>>>>> Swift-devel mailing list<BR>>>>>> Swift-devel@ci.uchicago.edu<BR>>>>>> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</A><BR>>>>> _______________________________________________<BR>>>>> Swift-devel mailing list<BR>>>>> Swift-devel@ci.uchicago.edu<BR>>>>> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</A><BR>>>> _______________________________________________<BR>>>> Swift-devel mailing list<BR>>>> Swift-devel@ci.uchicago.edu<BR>>>> <A href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</A><BR>><BR></FONT></P></DIV><DIV> </DIV><br><br>This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you.<br></BODY></HTML>