[Swift-devel] RE: [Swift-user] Execution error

Mihael Hategan hategan at mcs.anl.gov
Thu Apr 30 17:37:53 CDT 2009


On Thu, 2009-04-30 at 17:32 -0500, Yue, Chen - BMD wrote:
> Hi Michael,
>  
> I already have +osg-client-1.0.0-r1 in my .soft file. But I change it
> to +osg-client and tried again. "ranger" gave me the same error
> message. In the meantime, I tested one job on both Abe and Lonestar
> and they both gave me qsub error.

Can you try "gt2:gt2:PBS" instead of "gt2:PBS" on all sites?

>  I attached as following:
>  
> ////////////////////////////////////
> [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file
> sites.xml -tc.file tc.data
> Swift 0.9rc2 swift-r2860 cog-r2388
> RunID: 20090430-1722-oncfdolb
> Progress:
> Progress:  Stage in:1
> Progress:  Stage in:1
> Progress:  Stage in:1
> Progress:  Submitting:1
> Progress:  Submitted:1
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1722-oncfdolb/info/3 on TACC_LoneStar
> Progress:  Active:1
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1722-oncfdolb/info/5 on TACC_LoneStar
> Progress:  Stage in:1
> Progress:  Active:1
> Failed to transfer wrapper log from
> PTMap2-unmod-20090430-1722-oncfdolb/info/7 on TACC_LoneStar
> Progress:  Failed:1
> Execution failed:
>         Exception in PTMap2:
> Arguments: [e04.mzXML, ./seqs-ecolik12/fasta02, inputs-unmod.txt,
> parameters.txt]
> Host: TACC_LoneStar
> Directory: PTMap2-unmod-20090430-1722-oncfdolb/jobs/7/PTMap2-7uagp5aj
> stderr.txt:
> stdout.txt:
> ----
> Caused by:
>         Cannot submit job: Could not submit job (qsub reported an exit
> code of -1). no error output
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Cannot submit job: Could not submit job (qsub reported an exit code of
> -1). no error output
>         at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
>         at
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>         at
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:43)
>         at
> org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.startWorker(WorkerManager.java:221)
>         at
> org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.run(WorkerManager.java:145)
> Caused by:
> org.globus.cog.abstraction.impl.scheduler.common.ProcessException:
> Could not submit job (qsub reported an exit code of -1). no error
> output
>         at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:94)
>         at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
>         ... 4 more
> Cleaning up...
> Shutting down service at https://129.114.50.32:34704
> Got channel MetaChannel: 2013263 -> GSSSChannel-null(1)
> - Done
> /////////////////////////////////////////
>  
> My sites.xml is at : /home/yuechen/PTMap2/sites.xml. I'm wondering if
> this still relates to my setup. Thanks!
>  
> Chen, Yue
>  
>  
> 
> 
> ______________________________________________________________________
> From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> Sent: Thu 4/30/2009 5:23 PM
> To: Mihael Hategan
> Cc: swift-devel; Yue, Chen - BMD
> Subject: Re: [Swift-devel] RE: [Swift-user] Execution error
> 
> 
> 
> 
> On 4/30/09 5:13 PM, Mihael Hategan wrote:
>  >> GRAM Job submission failed because the job manager failed to open
> stderr
>  >> (error code 74)
>  >
>  > That seems like an IP address problem. Make sure you set
> GLOBUS_HOSTNAME
>  > properly.
> 
> OK, I will try that. But in the test below, I caused the error by
> unsetting X509_CERT_DIR and fixed the error by resetting it - no other
> changes.
> 
> I *think* that as recently as a few weeks ago globus-job-run to ranger
> worked with just @globus in my .soft file.
> 
> Adding +osg-client seemed to make it work by setting X509_CERT_DIR.
> 
> So as far as I can tell, at least at the level of globus-job-run,
> these
> seems to be related to certs.
> 
> Given what Im seeing, do you still think GLOBUS_HOSTNAME is a factor?
> 
> - Mike
> 
> 
> > On Thu, 2009-04-30 at 17:01 -0500, Michael Wilde wrote:
> >> A bit more info on this: it *seems* like a cert issue.
> >>
> >> I last accessed Ranger via globus-job-run perhaps 2 weeks ago, no
> problem.
> >>
> >> Yesterday, while debugging with Glen, globus-job-run was giving me
> GRAM
> >> err 74. (and GRM err 12 to all other sites)
> >>
> >> So I added +osg-client to my .soft file, and then globus-job-run
> worked.
> >>
> >> But I noticed that my globus-job-run was still coming from the GT4
> dir,
> >> not from an OSG dir.
> >>
> >> Just now I traced this back to X509_CERT_DIR:
> >>
> >> <works here>  then I did:
> >>
> >> com$ unset X509_CERT_DIR
> >> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id
> >> GRAM Job submission failed because the job manager failed to open
> stderr
> >> (error code 74)
> >
> > That seems like an IP address problem. Make sure you set
> GLOBUS_HOSTNAME
> > properly.
> >
> >> com$
> >> com$
> >> com$ X509_CERT_DIR=/soft/osg-client-1.0.0-r1/globus/TRUSTED_CA
> >> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id
> >> GRAM Job submission failed because the job manager failed to open
> stderr
> >> (error code 74)
> >> com$ export
> X509_CERT_DIR=/soft/osg-client-1.0.0-r1/globus/TRUSTED_CA
> >> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id
> >> uid=455797(tg455797) gid=80243(G-80243)
> >>
> groups=80243(G-80243),81031(G-81031),81411(G-81411),81611(G-81611),81613(G-81613),81621(G-81621),81747(G-81747),81792(G-81792),800744(G-800744),800745(G-800745),800889(G-800889),800981(G-800981),800983(G-800983),801271(G-801271),801364(G-801364)
> >> com$
> >>
> >> Mihael, does swift honor X509_CERT_DIR? If so, Glen, Yue, that is
> >> something to try.
> >>
> >> You may need to put +osg-client this in your .soft file and
> re-login:
> >>
> >> @python-2.5
> >> +java-sun
> >>
> >> +apache-ant
> >> +gx-map
> >> +condor
> >> +gx-map
> >> @globus-4
> >> @default
> >> +R
> >> +torque
> >> +maui
> >> +matlab-7.7
> >> +osg-client
> >>
> >> - Mike
> >>
> >>
> >>
> >>
> >>
> >> On 4/30/09 4:39 PM, Michael Wilde wrote:
> >>> And we should also drill back down to why (at least yesterday) the
> GT4
> >>> softev package failed, but the OSG client worked, for
> globus-job-run.
> >>>
> >>> I guess its possible there is a host or CA cert issue here.
> >>>
> >>> - Mike
> >>>
> >>>
> >>> On 4/30/09 4:31 PM, Mihael Hategan wrote:
> >>>> Can you guys try to run first.swift on ranger with the settings
> you have
> >>>> (you'll need to add "echo" to tc.data)?
> >>>>
> >>>>
> >>>> On Thu, 2009-04-30 at 16:13 -0500, Glen Hocky wrote:
> >>>>> I have the identical response on ranger. It started yesterday
> >>>>> evening. Possibly a problem that the TACC folks need to fix?
> >>>>>
> >>>>> Glen
> >>>>>
> >>>>> Yue, Chen - BMD wrote:
> >>>>>> Hi Michael,
> >>>>>> 
> >>>>>> Thank you for the advices. I tested ranger with 1 job and new
> >>>>>> specifications of maxwalltime. It shows the following error
> message.
> >>>>>> I don't know if there is other problem with my setup. Thank
> you!
> >>>>>> 
> >>>>>> /////////////////////////////////////////////////
> >>>>>> [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift
> -sites.file
> >>>>>> sites.xml -tc.file tc.data
> >>>>>> Swift 0.9rc2 swift-r2860 cog-r2388
> >>>>>> RunID: 20090430-1559-2vi6x811
> >>>>>> Progress:
> >>>>>> Progress:  Stage in:1
> >>>>>> Progress:  Submitting:1
> >>>>>> Progress:  Submitting:1
> >>>>>> Progress:  Submitted:1
> >>>>>> Progress:  Active:1
> >>>>>> Failed to transfer wrapper log from
> >>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
> >>>>>> Progress:  Active:1
> >>>>>> Failed to transfer wrapper log from
> >>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
> >>>>>> Progress:  Stage in:1
> >>>>>> Progress:  Active:1
> >>>>>> Failed to transfer wrapper log from
> >>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
> >>>>>> Progress:  Failed:1
> >>>>>> Execution failed:
> >>>>>>         Exception in PTMap2:
> >>>>>> Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01,
> inputs-unmod.txt,
> >>>>>> parameters.txt]
> >>>>>> Host: ranger
> >>>>>> Directory:
> PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
> >>>>>> stderr.txt:
> >>>>>> stdout.txt:
> >>>>>> ----
> >>>>>> Caused by:
> >>>>>>         Failed to start worker:
> >>>>>> null
> >>>>>> null
> >>>>>> org.globus.gram.GramException: The job manager detected an
> invalid
> >>>>>> script response
> >>>>>>         at
> >>>>>>
> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)
> >>>>>>
> >>>>>>         at org.globus.gram.GramJob.setStatus(GramJob.java:184)
> >>>>>>         at
> >>>>>>
> org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
> >>>>>>         at java.lang.Thread.run(Thread.java:619)
> >>>>>> Cleaning up...
> >>>>>> Shutting down service at https://129.114.50.163:45562
> >>>>>> <https://129.114.50.163:45562>
> >>>>>> Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
> >>>>>> - Done
> >>>>>> [yuechen at communicado PTMap2]$
> >>>>>> ///////////////////////////////////////////////////////////
> >>>>>> 
> >>>>>> Chen, Yue
> >>>>>> 
> >>>>>>
> >>>>>> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> >>>>>> *Sent:* Thu 4/30/2009 3:02 PM
> >>>>>> *To:* Yue, Chen - BMD; swift-devel
> >>>>>> *Subject:* Re: [Swift-user] Execution error
> >>>>>>
> >>>>>> Back on list here (I only went off-list to discuss accounts,
> etc)
> >>>>>>
> >>>>>> The problem in the run below is this:
> >>>>>>
> >>>>>> 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2
> APPLICATION_EXCEPTION
> >>>>>> jobid=PTMap2-abeii5aj - Application exception: Job cannot be
> run with
> >>>>>> the given max walltime worker constraint (task: 3000, \
> >>>>>> maxwalltime: 2400s)
> >>>>>>
> >>>>>> You have this on the ptmap app in your tc.data:
> >>>>>>
> >>>>>> globus::maxwalltime=50
> >>>>>>
> >>>>>> But you only gave coasters 40 mins per coaster worker. So its
> >>>>>> complaining that it cant run a 50 minute job in a 40 minute
> (max)
> >>>>>> coaster worker. ;)
> >>>>>>
> >>>>>> I mentioned in a prior mail that you need to set the two time
> vals in
> >>>>>> your sites.xml entry; thats what you need to do next, now.
> >>>>>>
> >>>>>> change the coaster time in your sites.xml to:
> >>>>>>      key="coasterWorkerMaxwalltime">00:51:00</profile>
> >>>>>>
> >>>>>> If you have more info on the variability of your ptmap run
> times, send
> >>>>>> that to the list, and we can discuss how to handle.
> >>>>>>
> >>>>>>
> >>>>>> (NOTE: doing grp -i of the log for "except" or scanning for
> "except"
> >>>>>> with an editor will often locate the first "exception" that
> your job
> >>>>>> encountered. Thats how I found the error above).
> >>>>>>
> >>>>>> Also, Yue, for testing new sites, or for validating that old
> sites
> >>>>>> still
> >>>>>> work, you should create the smallest possible ptmap workflow -
> 1 job if
> >>>>>> that is possible - and verify that this works.  Then say 10
> jobs to
> >>>>>> make
> >>>>>> sure scheduling etc is sane.  Then, send in your huge jobs.
> >>>>>>
> >>>>>> With only 1 job, its easier to spot the errors in the log file.
> >>>>>>
> >>>>>> - Mike
> >>>>>>
> >>>>>>
> >>>>>> On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
> >>>>>>> Hi Michael,
> >>>>>>>
> >>>>>>> I run into the same messages again when I use Ranger:
> >>>>>>>
> >>>>>>> Progress:  Selecting site:146  Stage in:25  Submitting:15 
> >>>>>>> Submitted:821
> >>>>>>> Failed but can retry:16
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
> >>>>>>> Progress:  Selecting site:146  Stage in:3  Submitting:1
> Submitted:857
> >>>>>>> Failed but can retry:16
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
> >>>>>>> Failed to transfer wrapper log from
> >>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
> >>>>>>> The log for the search is at :
> >>>>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
> >>>>>>>
> >>>>>>> The sites.xml I have is:
> >>>>>>>
> >>>>>>>  <pool handle="ranger">
> >>>>>>>      <execution provider="coaster"
> >>>>>>>                 url="gatekeeper.ranger.tacc.teragrid.org"
> >>>>>>>                 jobManager="gt2:gt2:SGE"/>
> >>>>>>>      <gridftp
> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
> >>>>>>>      <profile namespace="env"
> >>>>>>>
> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
> >>>>>>>      <profile namespace="globus"
> key="project">TG-CCR080022N</profile>
> >>>>>>>      <profile namespace="globus"
> key="coastersPerNode">16</profile>
> >>>>>>>      <profile namespace="globus"
> key="queue">development</profile>
> >>>>>>>      <profile namespace="globus"
> >>>>>>>
> key="coasterWorkerMaxwalltime">00:40:00</profile>
> >>>>>>>      <profile namespace="globus"
> key="maxwalltime">31</profile>
> >>>>>>>      <profile namespace="karajan"
> key="initialScore">50</profile>
> >>>>>>>      <profile namespace="karajan"
> key="jobThrottle">10</profile>
> >>>>>>>
> <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
> >>>>>>>  </pool>
> >>>>>>> The tc.data I have is:
> >>>>>>>
> >>>>>>> ranger          PTMap2        
> >>>>>>> /share/home/01164/yuechen/PTMap2/PTMap2         INSTALLED     
> >>>>>>> INTEL32::LINUX  globus::maxwalltime=50
> >>>>>>>
> >>>>>>> I'm using swift 0.9 rc2
> >>>>>>>
> >>>>>>> Thank you very much for help!
> >>>>>>>
> >>>>>>> Chen, Yue
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ------------------------------------------------------------------------
> >>>>>>>
> >>>>>>> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> >>>>>>> *Sent:* Thu 4/30/2009 2:05 PM
> >>>>>>> *To:* Yue, Chen - BMD
> >>>>>>> *Subject:* Re: [Swift-user] Execution error
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
> >>>>>>>  > Hi Michael,
> >>>>>>>  >
> >>>>>>>  > When I tried to activate my account, I encountered the
> following
> >>>>>> error:
> >>>>>>>  >
> >>>>>>>  > "Sorry, this account is in an invalid state. You may not
> activate
> >>>>>> your
> >>>>>>>  > at this time."
> >>>>>>>  >
> >>>>>>>  > I used the username and password from TG-CDA070002T. Should
> I use a
> >>>>>>>  > different password?
> >>>>>>>
> >>>>>>> If you can already login to Ranger, then you are all set - you
> must
> >>>>>>> have
> >>>>>>> done this previously.
> >>>>>>>
> >>>>>>> I thought you had *not*, because when I looked up your login
> on ranger
> >>>>>>> ("finger yuechen") it said "never logged in". But seems like
> that info
> >>>>>>> is incorrect.
> >>>>>>>
> >>>>>>> If you have ptmap compiled, seems like you are almost all set.
> >>>>>>>
> >>>>>>> Let me know if it works.
> >>>>>>>
> >>>>>>> - Mike
> >>>>>>>
> >>>>>>>  > Thanks!
> >>>>>>>  >
> >>>>>>>  > Chen, Yue
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>
> ------------------------------------------------------------------------
> >>>>>>
> >>>>>>>  > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
> >>>>>>>  > *Sent:* Thu 4/30/2009 1:07 PM
> >>>>>>>  > *To:* Yue, Chen - BMD
> >>>>>>>  > *Cc:* swift user
> >>>>>>>  > *Subject:* Re: [Swift-user] Execution error
> >>>>>>>  >
> >>>>>>>  > Yue, use this XML pool element to access ranger:
> >>>>>>>  >
> >>>>>>>  >   <pool handle="ranger">
> >>>>>>>  >      <execution provider="coaster"
> >>>>>>>  >                 url="gatekeeper.ranger.tacc.teragrid.org"
> >>>>>>>  >                 jobManager="gt2:gt2:SGE"/>
> >>>>>>>  >      <gridftp
> >>>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
> >>>>>>>  >      <profile namespace="env"
> >>>>>>>  >
> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
> >>>>>>>  >      <profile namespace="globus"
> >>>>>> key="project">TG-CCR080022N</profile>
> >>>>>>>  >      <profile namespace="globus"
> key="coastersPerNode">16</profile>
> >>>>>>>  >      <profile namespace="globus"
> key="queue">development</profile>
> >>>>>>>  >      <profile namespace="globus"
> >>>>>>>  >
> key="coasterWorkerMaxwalltime">00:40:00</profile>
> >>>>>>>  >      <profile namespace="globus"
> key="maxwalltime">31</profile>
> >>>>>>>  >      <profile namespace="karajan"
> key="initialScore">50</profile>
> >>>>>>>  >      <profile namespace="karajan"
> key="jobThrottle">10</profile>
> >>>>>>>  >
> <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
> >>>>>>>  >    </pool>
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>>  > You will need to also do these steps:
> >>>>>>>  >
> >>>>>>>  > Go to this web page to enable your Ranger account:
> >>>>>>>  >
> >>>>>>>  >
> https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
> >>>>>>>  >
> >>>>>>>  > Then login to Ranger via the TeraGrid portal and put your
> ssh
> >>>>>>> keys in
> >>>>>>>  > place (assuming you use ssh keys, which you should)
> >>>>>>>  >
> >>>>>>>  > While on Ranger, do this:
> >>>>>>>  >
> >>>>>>>  > echo $WORK
> >>>>>>>  > mkdir $work/swiftwork
> >>>>>>>  >
> >>>>>>>  > and put the full path of your $WORK/swiftwork directory in
> the
> >>>>>>>  > <workdirectory> element above. (My login is tg455etc, yours
> is
> >>>>>> yuechen)
> >>>>>>>  >
> >>>>>>>  > Then scp your code to Ranger and compile it.
> >>>>>>>  >
> >>>>>>>  > Then create a tc.data entry for your ptmap app
> >>>>>>>  >
> >>>>>>>  > Next, set your time values in the sites.xml entry above to
> suitable
> >>>>>>>  > values for Ranger. You'll need to measure times, but I
> think you
> >>>>>>> will
> >>>>>>>  > find Ranger about twice as fast as Mercury for CPU-bound
> jobs.
> >>>>>>>  >
> >>>>>>>  > The values above were set for one app job per coaster. I
> think
> >>>>>> you can
> >>>>>>>  > probably do more.
> >>>>>>>  >
> >>>>>>>  > If you estimate a run time of 5 minutes, use:
> >>>>>>>  >
> >>>>>>>  >      <profile namespace="globus"
> >>>>>>>  >
> key="coasterWorkerMaxwalltime">00:30:00</profile>
> >>>>>>>  >      <profile namespace="globus"
> key="maxwalltime">5</profile>
> >>>>>>>  >
> >>>>>>>  > Other people on the list - please sanity check what I
> suggest here.
> >>>>>>>  >
> >>>>>>>  > - Mike
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>>  > On 4/30/09 12:40 PM, Michael Wilde wrote:
> >>>>>>>  >  > I just checked - TG-CDA070002T has indeed expired.
> >>>>>>>  >  >
> >>>>>>>  >  > The best for now is to move to use (only) Ranger, under
> this
> >>>>>> account:
> >>>>>>>  >  > TG-CCR080022N
> >>>>>>>  >  >
> >>>>>>>  >  > I will locate and send you a sites.xml entry in a
> moment.
> >>>>>>>  >  >
> >>>>>>>  >  > You need to go to a web page to activate your Ranger
> login.
> >>>>>>>  >  >
> >>>>>>>  >  > Best to contact me in IM and we can work this out.
> >>>>>>>  >  >
> >>>>>>>  >  > - Mike
> >>>>>>>  >  >
> >>>>>>>  >  >
> >>>>>>>  >  >
> >>>>>>>  >  > On 4/30/09 12:23 PM, Michael Wilde wrote:
> >>>>>>>  >  >> Also, what account are you running under? We may need
> to change
> >>>>>>> you to
> >>>>>>>  >  >> a new account - as the OSG Training account expires
> today.
> >>>>>>>  >  >> If that happend at Noon, it *might* be the problem.
> >>>>>>>  >  >>
> >>>>>>>  >  >> - Mike
> >>>>>>>  >  >>
> >>>>>>>  >  >>
> >>>>>>>  >  >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
> >>>>>>>  >  >>> Hi,
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> I came back to re-run my application on NCSA Mercury
> which was
> >>>>>>> tested
> >>>>>>>  >  >>> successfully last week after I just set up coasters
> with
> >>>>>> swift 0.9,
> >>>>>>>  >  >>> but I got many messages like the following:
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> Progress:  Stage in:219  Submitting:803  Submitted:1
> >>>>>>>  >  >>> Progress:  Stage in:129  Submitting:703  Submitted:190
> Failed
> >>>>>>> but can
> >>>>>>>  >  >>> retry:1
> >>>>>>>  >  >>> Progress:  Stage in:38  Submitting:425  Submitted:556
> Failed
> >>>>>> but can
> >>>>>>>  >  >>> retry:4
> >>>>>>>  >  >>> Failed to transfer wrapper log from
> >>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on
> NCSA_MERCURY
> >>>>>>>  >  >>> Failed to transfer wrapper log from
> >>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on
> NCSA_MERCURY
> >>>>>>>  >  >>> Failed to transfer wrapper log from
> >>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on
> NCSA_MERCURY
> >>>>>>>  >  >>> Failed to transfer wrapper log from
> >>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on
> NCSA_MERCURY
> >>>>>>>  >  >>> Failed to transfer wrapper log from
> >>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on
> NCSA_MERCURY
> >>>>>>>  >  >>> Failed to transfer wrapper log from
> >>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on
> NCSA_MERCURY
> >>>>>>>  >  >>> Progress:  Stage in:1  Submitted:1013  Active:1 Failed
> but can
> >>>>>>> retry:8
> >>>>>>>  >  >>> Progress:  Submitted:1011  Active:1 Failed but can
> retry:11
> >>>>>>>  >  >>> The log file for the successful run last week is ;
> >>>>>>>  >
> >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> The log file for the failed run is :
> >>>>>>>  >
> >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> I don't think I did anything different, so I don't
> know why
> >>>>>>> this
> >>>>>>> time
> >>>>>>>  >  >>> they failed. The sites.xml for Mercury is:
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>  <pool handle="NCSA_MERCURY">
> >>>>>>>  >  >>>     <gridftp
> url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
> >>>>>>>  >  >>>     <execution provider="coaster"
> >>>>>> url="grid-hg.ncsa.teragrid.org"
> >>>>>>>  >  >>> jobManager="gt2:PBS"/>
> >>>>>>>  >  >>>    
> >>>>>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
> >>>>>>>  >  >>>     <profile namespace="globus"
> key="queue">debug</profile>
> >>>>>>>  >  >>>  </pool>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> Thank you for help!
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> Chen, Yue
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> This email is intended only for the use of the
> individual or
> >>>>>> entity
> >>>>>>>  >  >>> to which it is addressed and may contain information
> that is
> >>>>>>>  >  >>> privileged and confidential. If the reader of this
> email
> >>>>>> message is
> >>>>>>>  >  >>> not the intended recipient, you are hereby notified
> that any
> >>>>>>>  >  >>> dissemination, distribution, or copying of this
> >>>>>>> communication is
> >>>>>>>  >  >>> prohibited. If you have received this email in error,
> please
> >>>>>> notify
> >>>>>>>  >  >>> the sender and destroy/delete all copies of the
> transmittal.
> >>>>>>> Thank you.
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>>
> >>>>>>>  >
> >>>>>>
> ------------------------------------------------------------------------
> >>>>>>
> >>>>>>>  >  >>>
> >>>>>>>  >  >>> _______________________________________________
> >>>>>>>  >  >>> Swift-user mailing list
> >>>>>>>  >  >>> Swift-user at ci.uchicago.edu
> >>>>>>>  >  >>>
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >>>>>>>  >  >> _______________________________________________
> >>>>>>>  >  >> Swift-user mailing list
> >>>>>>>  >  >> Swift-user at ci.uchicago.edu
> >>>>>>>  >  >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >>>>>>>  >  > _______________________________________________
> >>>>>>>  >  > Swift-user mailing list
> >>>>>>>  >  > Swift-user at ci.uchicago.edu
> >>>>>>>  >  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>>  >
> >>>>>>>  > This email is intended only for the use of the individual
> or
> >>>>>> entity to
> >>>>>>>  > which it is addressed and may contain information that is
> >>>>>> privileged and
> >>>>>>>  > confidential. If the reader of this email message is not
> the
> >>>>>>> intended
> >>>>>>>  > recipient, you are hereby notified that any dissemination,
> >>>>>> distribution,
> >>>>>>>  > or copying of this communication is prohibited. If you have
> >>>>>>> received
> >>>>>>>  > this email in error, please notify the sender and
> destroy/delete
> >>>>>>> all
> >>>>>>>  > copies of the transmittal. Thank you.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> This email is intended only for the use of the individual or
> entity to
> >>>>>>> which it is addressed and may contain information that is
> >>>>>>> privileged and
> >>>>>>> confidential. If the reader of this email message is not the
> intended
> >>>>>>> recipient, you are hereby notified that any dissemination,
> >>>>>>> distribution,
> >>>>>>> or copying of this communication is prohibited. If you have
> received
> >>>>>>> this email in error, please notify the sender and
> destroy/delete all
> >>>>>>> copies of the transmittal. Thank you.
> >>>>>> 
> >>>>>>
> >>>>>>
> >>>>>> This email is intended only for the use of the individual or
> entity
> >>>>>> to which it is addressed and may contain information that is
> >>>>>> privileged and confidential. If the reader of this email
> message is
> >>>>>> not the intended recipient, you are hereby notified that any
> >>>>>> dissemination, distribution, or copying of this communication
> is
> >>>>>> prohibited. If you have received this email in error, please
> notify
> >>>>>> the sender and destroy/delete all copies of the transmittal.
> Thank you.
> >>>>>>
> ------------------------------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Swift-devel mailing list
> >>>>>> Swift-devel at ci.uchicago.edu
> >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>  
> >>>>> _______________________________________________
> >>>>> Swift-devel mailing list
> >>>>> Swift-devel at ci.uchicago.edu
> >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> 
> 
>  
> 
> 
> This email is intended only for the use of the individual or entity to
> which it is addressed and may contain information that is privileged
> and confidential. If the reader of this email message is not the
> intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication is prohibited. If you
> have received this email in error, please notify the sender and
> destroy/delete all copies of the transmittal. Thank you.




More information about the Swift-devel mailing list