[Swift-devel] RE: [Swift-user] Execution error

Yue, Chen - BMD yuechen at bsd.uchicago.edu
Thu Apr 30 17:32:56 CDT 2009


Hi Michael,
 
I already have +osg-client-1.0.0-r1 in my .soft file. But I change it to +osg-client and tried again. "ranger" gave me the same error message. In the meantime, I tested one job on both Abe and Lonestar and they both gave me qsub error. I attached as following:
 
////////////////////////////////////
[yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file sites.xml -tc.file tc.data
Swift 0.9rc2 swift-r2860 cog-r2388
RunID: 20090430-1722-oncfdolb
Progress:
Progress:  Stage in:1
Progress:  Stage in:1
Progress:  Stage in:1
Progress:  Submitting:1
Progress:  Submitted:1
Failed to transfer wrapper log from PTMap2-unmod-20090430-1722-oncfdolb/info/3 on TACC_LoneStar
Progress:  Active:1
Failed to transfer wrapper log from PTMap2-unmod-20090430-1722-oncfdolb/info/5 on TACC_LoneStar
Progress:  Stage in:1
Progress:  Active:1
Failed to transfer wrapper log from PTMap2-unmod-20090430-1722-oncfdolb/info/7 on TACC_LoneStar
Progress:  Failed:1
Execution failed:
        Exception in PTMap2:
Arguments: [e04.mzXML, ./seqs-ecolik12/fasta02, inputs-unmod.txt, parameters.txt]
Host: TACC_LoneStar
Directory: PTMap2-unmod-20090430-1722-oncfdolb/jobs/7/PTMap2-7uagp5aj
stderr.txt:
stdout.txt:
----
Caused by:
        Cannot submit job: Could not submit job (qsub reported an exit code of -1). no error output
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job: Could not submit job (qsub reported an exit code of -1). no error output
        at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
        at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
        at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:43)
        at org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.startWorker(WorkerManager.java:221)
        at org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.run(WorkerManager.java:145)
Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of -1). no error output
        at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:94)
        at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
        ... 4 more
Cleaning up...
Shutting down service at https://129.114.50.32:34704 <https://129.114.50.32:34704> 
Got channel MetaChannel: 2013263 -> GSSSChannel-null(1)
- Done
/////////////////////////////////////////
 
My sites.xml is at : /home/yuechen/PTMap2/sites.xml. I'm wondering if this still relates to my setup. Thanks!
 
Chen, Yue
 
 

________________________________

From: Michael Wilde [mailto:wilde at mcs.anl.gov]
Sent: Thu 4/30/2009 5:23 PM
To: Mihael Hategan
Cc: swift-devel; Yue, Chen - BMD
Subject: Re: [Swift-devel] RE: [Swift-user] Execution error





On 4/30/09 5:13 PM, Mihael Hategan wrote:
 >> GRAM Job submission failed because the job manager failed to open
stderr
 >> (error code 74)
 >
 > That seems like an IP address problem. Make sure you set GLOBUS_HOSTNAME
 > properly.

OK, I will try that. But in the test below, I caused the error by
unsetting X509_CERT_DIR and fixed the error by resetting it - no other
changes.

I *think* that as recently as a few weeks ago globus-job-run to ranger
worked with just @globus in my .soft file.

Adding +osg-client seemed to make it work by setting X509_CERT_DIR.

So as far as I can tell, at least at the level of globus-job-run, these
seems to be related to certs.

Given what Im seeing, do you still think GLOBUS_HOSTNAME is a factor?

- Mike


> On Thu, 2009-04-30 at 17:01 -0500, Michael Wilde wrote:
>> A bit more info on this: it *seems* like a cert issue.
>>
>> I last accessed Ranger via globus-job-run perhaps 2 weeks ago, no problem.
>>
>> Yesterday, while debugging with Glen, globus-job-run was giving me GRAM
>> err 74. (and GRM err 12 to all other sites)
>>
>> So I added +osg-client to my .soft file, and then globus-job-run worked.
>>
>> But I noticed that my globus-job-run was still coming from the GT4 dir,
>> not from an OSG dir.
>>
>> Just now I traced this back to X509_CERT_DIR:
>>
>> <works here>  then I did:
>>
>> com$ unset X509_CERT_DIR
>> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id
>> GRAM Job submission failed because the job manager failed to open stderr
>> (error code 74)
>
> That seems like an IP address problem. Make sure you set GLOBUS_HOSTNAME
> properly.
>
>> com$
>> com$
>> com$ X509_CERT_DIR=/soft/osg-client-1.0.0-r1/globus/TRUSTED_CA
>> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id
>> GRAM Job submission failed because the job manager failed to open stderr
>> (error code 74)
>> com$ export X509_CERT_DIR=/soft/osg-client-1.0.0-r1/globus/TRUSTED_CA
>> com$ globus-job-run gatekeeper.ranger.tacc.teragrid.org /usr/bin/id
>> uid=455797(tg455797) gid=80243(G-80243)
>> groups=80243(G-80243),81031(G-81031),81411(G-81411),81611(G-81611),81613(G-81613),81621(G-81621),81747(G-81747),81792(G-81792),800744(G-800744),800745(G-800745),800889(G-800889),800981(G-800981),800983(G-800983),801271(G-801271),801364(G-801364)
>> com$
>>
>> Mihael, does swift honor X509_CERT_DIR? If so, Glen, Yue, that is
>> something to try.
>>
>> You may need to put +osg-client this in your .soft file and re-login:
>>
>> @python-2.5
>> +java-sun
>>
>> +apache-ant
>> +gx-map
>> +condor
>> +gx-map
>> @globus-4
>> @default
>> +R
>> +torque
>> +maui
>> +matlab-7.7
>> +osg-client
>>
>> - Mike
>>
>>
>>
>>
>>
>> On 4/30/09 4:39 PM, Michael Wilde wrote:
>>> And we should also drill back down to why (at least yesterday) the GT4
>>> softev package failed, but the OSG client worked, for globus-job-run.
>>>
>>> I guess its possible there is a host or CA cert issue here.
>>>
>>> - Mike
>>>
>>>
>>> On 4/30/09 4:31 PM, Mihael Hategan wrote:
>>>> Can you guys try to run first.swift on ranger with the settings you have
>>>> (you'll need to add "echo" to tc.data)?
>>>>
>>>>
>>>> On Thu, 2009-04-30 at 16:13 -0500, Glen Hocky wrote:
>>>>> I have the identical response on ranger. It started yesterday
>>>>> evening. Possibly a problem that the TACC folks need to fix?
>>>>>
>>>>> Glen
>>>>>
>>>>> Yue, Chen - BMD wrote:
>>>>>> Hi Michael,
>>>>>> 
>>>>>> Thank you for the advices. I tested ranger with 1 job and new
>>>>>> specifications of maxwalltime. It shows the following error message.
>>>>>> I don't know if there is other problem with my setup. Thank you!
>>>>>> 
>>>>>> /////////////////////////////////////////////////
>>>>>> [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift -sites.file
>>>>>> sites.xml -tc.file tc.data
>>>>>> Swift 0.9rc2 swift-r2860 cog-r2388
>>>>>> RunID: 20090430-1559-2vi6x811
>>>>>> Progress:
>>>>>> Progress:  Stage in:1
>>>>>> Progress:  Submitting:1
>>>>>> Progress:  Submitting:1
>>>>>> Progress:  Submitted:1
>>>>>> Progress:  Active:1
>>>>>> Failed to transfer wrapper log from
>>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
>>>>>> Progress:  Active:1
>>>>>> Failed to transfer wrapper log from
>>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
>>>>>> Progress:  Stage in:1
>>>>>> Progress:  Active:1
>>>>>> Failed to transfer wrapper log from
>>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
>>>>>> Progress:  Failed:1
>>>>>> Execution failed:
>>>>>>         Exception in PTMap2:
>>>>>> Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, inputs-unmod.txt,
>>>>>> parameters.txt]
>>>>>> Host: ranger
>>>>>> Directory: PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
>>>>>> stderr.txt:
>>>>>> stdout.txt:
>>>>>> ----
>>>>>> Caused by:
>>>>>>         Failed to start worker:
>>>>>> null
>>>>>> null
>>>>>> org.globus.gram.GramException: The job manager detected an invalid
>>>>>> script response
>>>>>>         at
>>>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)
>>>>>>
>>>>>>         at org.globus.gram.GramJob.setStatus(GramJob.java:184)
>>>>>>         at
>>>>>> org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
>>>>>>         at java.lang.Thread.run(Thread.java:619)
>>>>>> Cleaning up...
>>>>>> Shutting down service at https://129.114.50.163:45562 <https://129.114.50.163:45562/> 
>>>>>> <https://129.114.50.163:45562 <https://129.114.50.163:45562/> >
>>>>>> Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
>>>>>> - Done
>>>>>> [yuechen at communicado PTMap2]$
>>>>>> ///////////////////////////////////////////////////////////
>>>>>> 
>>>>>> Chen, Yue
>>>>>> 
>>>>>>
>>>>>> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>>>> *Sent:* Thu 4/30/2009 3:02 PM
>>>>>> *To:* Yue, Chen - BMD; swift-devel
>>>>>> *Subject:* Re: [Swift-user] Execution error
>>>>>>
>>>>>> Back on list here (I only went off-list to discuss accounts, etc)
>>>>>>
>>>>>> The problem in the run below is this:
>>>>>>
>>>>>> 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION
>>>>>> jobid=PTMap2-abeii5aj - Application exception: Job cannot be run with
>>>>>> the given max walltime worker constraint (task: 3000, \
>>>>>> maxwalltime: 2400s)
>>>>>>
>>>>>> You have this on the ptmap app in your tc.data:
>>>>>>
>>>>>> globus::maxwalltime=50
>>>>>>
>>>>>> But you only gave coasters 40 mins per coaster worker. So its
>>>>>> complaining that it cant run a 50 minute job in a 40 minute (max)
>>>>>> coaster worker. ;)
>>>>>>
>>>>>> I mentioned in a prior mail that you need to set the two time vals in
>>>>>> your sites.xml entry; thats what you need to do next, now.
>>>>>>
>>>>>> change the coaster time in your sites.xml to:
>>>>>>      key="coasterWorkerMaxwalltime">00:51:00</profile>
>>>>>>
>>>>>> If you have more info on the variability of your ptmap run times, send
>>>>>> that to the list, and we can discuss how to handle.
>>>>>>
>>>>>>
>>>>>> (NOTE: doing grp -i of the log for "except" or scanning for "except"
>>>>>> with an editor will often locate the first "exception" that your job
>>>>>> encountered. Thats how I found the error above).
>>>>>>
>>>>>> Also, Yue, for testing new sites, or for validating that old sites
>>>>>> still
>>>>>> work, you should create the smallest possible ptmap workflow - 1 job if
>>>>>> that is possible - and verify that this works.  Then say 10 jobs to
>>>>>> make
>>>>>> sure scheduling etc is sane.  Then, send in your huge jobs.
>>>>>>
>>>>>> With only 1 job, its easier to spot the errors in the log file.
>>>>>>
>>>>>> - Mike
>>>>>>
>>>>>>
>>>>>> On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
>>>>>>> Hi Michael,
>>>>>>>
>>>>>>> I run into the same messages again when I use Ranger:
>>>>>>>
>>>>>>> Progress:  Selecting site:146  Stage in:25  Submitting:15 
>>>>>>> Submitted:821
>>>>>>> Failed but can retry:16
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
>>>>>>> Progress:  Selecting site:146  Stage in:3  Submitting:1  Submitted:857
>>>>>>> Failed but can retry:16
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
>>>>>>> Failed to transfer wrapper log from
>>>>>>> PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>>>>>>> The log for the search is at :
>>>>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
>>>>>>>
>>>>>>> The sites.xml I have is:
>>>>>>>
>>>>>>>  <pool handle="ranger">
>>>>>>>      <execution provider="coaster"
>>>>>>>                 url="gatekeeper.ranger.tacc.teragrid.org"
>>>>>>>                 jobManager="gt2:gt2:SGE"/>
>>>>>>>      <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>>>>>>>      <profile namespace="env"
>>>>>>>               key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>>>>>>>      <profile namespace="globus" key="project">TG-CCR080022N</profile>
>>>>>>>      <profile namespace="globus" key="coastersPerNode">16</profile>
>>>>>>>      <profile namespace="globus" key="queue">development</profile>
>>>>>>>      <profile namespace="globus"
>>>>>>>               key="coasterWorkerMaxwalltime">00:40:00</profile>
>>>>>>>      <profile namespace="globus" key="maxwalltime">31</profile>
>>>>>>>      <profile namespace="karajan" key="initialScore">50</profile>
>>>>>>>      <profile namespace="karajan" key="jobThrottle">10</profile>
>>>>>>>      <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
>>>>>>>  </pool>
>>>>>>> The tc.data I have is:
>>>>>>>
>>>>>>> ranger          PTMap2        
>>>>>>> /share/home/01164/yuechen/PTMap2/PTMap2         INSTALLED     
>>>>>>> INTEL32::LINUX  globus::maxwalltime=50
>>>>>>>
>>>>>>> I'm using swift 0.9 rc2
>>>>>>>
>>>>>>> Thank you very much for help!
>>>>>>>
>>>>>>> Chen, Yue
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>>>>> *Sent:* Thu 4/30/2009 2:05 PM
>>>>>>> *To:* Yue, Chen - BMD
>>>>>>> *Subject:* Re: [Swift-user] Execution error
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
>>>>>>>  > Hi Michael,
>>>>>>>  >
>>>>>>>  > When I tried to activate my account, I encountered the following
>>>>>> error:
>>>>>>>  >
>>>>>>>  > "Sorry, this account is in an invalid state. You may not activate
>>>>>> your
>>>>>>>  > at this time."
>>>>>>>  >
>>>>>>>  > I used the username and password from TG-CDA070002T. Should I use a
>>>>>>>  > different password?
>>>>>>>
>>>>>>> If you can already login to Ranger, then you are all set - you must
>>>>>>> have
>>>>>>> done this previously.
>>>>>>>
>>>>>>> I thought you had *not*, because when I looked up your login on ranger
>>>>>>> ("finger yuechen") it said "never logged in". But seems like that info
>>>>>>> is incorrect.
>>>>>>>
>>>>>>> If you have ptmap compiled, seems like you are almost all set.
>>>>>>>
>>>>>>> Let me know if it works.
>>>>>>>
>>>>>>> - Mike
>>>>>>>
>>>>>>>  > Thanks!
>>>>>>>  >
>>>>>>>  > Chen, Yue
>>>>>>>  >
>>>>>>>  >
>>>>>>>  >
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>>  > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>>>>>  > *Sent:* Thu 4/30/2009 1:07 PM
>>>>>>>  > *To:* Yue, Chen - BMD
>>>>>>>  > *Cc:* swift user
>>>>>>>  > *Subject:* Re: [Swift-user] Execution error
>>>>>>>  >
>>>>>>>  > Yue, use this XML pool element to access ranger:
>>>>>>>  >
>>>>>>>  >   <pool handle="ranger">
>>>>>>>  >      <execution provider="coaster"
>>>>>>>  >                 url="gatekeeper.ranger.tacc.teragrid.org"
>>>>>>>  >                 jobManager="gt2:gt2:SGE"/>
>>>>>>>  >      <gridftp
>>>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>>>>>>>  >      <profile namespace="env"
>>>>>>>  >               key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>>>>>>>  >      <profile namespace="globus"
>>>>>> key="project">TG-CCR080022N</profile>
>>>>>>>  >      <profile namespace="globus" key="coastersPerNode">16</profile>
>>>>>>>  >      <profile namespace="globus" key="queue">development</profile>
>>>>>>>  >      <profile namespace="globus"
>>>>>>>  >               key="coasterWorkerMaxwalltime">00:40:00</profile>
>>>>>>>  >      <profile namespace="globus" key="maxwalltime">31</profile>
>>>>>>>  >      <profile namespace="karajan" key="initialScore">50</profile>
>>>>>>>  >      <profile namespace="karajan" key="jobThrottle">10</profile>
>>>>>>>  >      <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
>>>>>>>  >    </pool>
>>>>>>>  >
>>>>>>>  >
>>>>>>>  > You will need to also do these steps:
>>>>>>>  >
>>>>>>>  > Go to this web page to enable your Ranger account:
>>>>>>>  >
>>>>>>>  > https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
>>>>>>>  >
>>>>>>>  > Then login to Ranger via the TeraGrid portal and put your ssh
>>>>>>> keys in
>>>>>>>  > place (assuming you use ssh keys, which you should)
>>>>>>>  >
>>>>>>>  > While on Ranger, do this:
>>>>>>>  >
>>>>>>>  > echo $WORK
>>>>>>>  > mkdir $work/swiftwork
>>>>>>>  >
>>>>>>>  > and put the full path of your $WORK/swiftwork directory in the
>>>>>>>  > <workdirectory> element above. (My login is tg455etc, yours is
>>>>>> yuechen)
>>>>>>>  >
>>>>>>>  > Then scp your code to Ranger and compile it.
>>>>>>>  >
>>>>>>>  > Then create a tc.data entry for your ptmap app
>>>>>>>  >
>>>>>>>  > Next, set your time values in the sites.xml entry above to suitable
>>>>>>>  > values for Ranger. You'll need to measure times, but I think you
>>>>>>> will
>>>>>>>  > find Ranger about twice as fast as Mercury for CPU-bound jobs.
>>>>>>>  >
>>>>>>>  > The values above were set for one app job per coaster. I think
>>>>>> you can
>>>>>>>  > probably do more.
>>>>>>>  >
>>>>>>>  > If you estimate a run time of 5 minutes, use:
>>>>>>>  >
>>>>>>>  >      <profile namespace="globus"
>>>>>>>  >               key="coasterWorkerMaxwalltime">00:30:00</profile>
>>>>>>>  >      <profile namespace="globus" key="maxwalltime">5</profile>
>>>>>>>  >
>>>>>>>  > Other people on the list - please sanity check what I suggest here.
>>>>>>>  >
>>>>>>>  > - Mike
>>>>>>>  >
>>>>>>>  >
>>>>>>>  > On 4/30/09 12:40 PM, Michael Wilde wrote:
>>>>>>>  >  > I just checked - TG-CDA070002T has indeed expired.
>>>>>>>  >  >
>>>>>>>  >  > The best for now is to move to use (only) Ranger, under this
>>>>>> account:
>>>>>>>  >  > TG-CCR080022N
>>>>>>>  >  >
>>>>>>>  >  > I will locate and send you a sites.xml entry in a moment.
>>>>>>>  >  >
>>>>>>>  >  > You need to go to a web page to activate your Ranger login.
>>>>>>>  >  >
>>>>>>>  >  > Best to contact me in IM and we can work this out.
>>>>>>>  >  >
>>>>>>>  >  > - Mike
>>>>>>>  >  >
>>>>>>>  >  >
>>>>>>>  >  >
>>>>>>>  >  > On 4/30/09 12:23 PM, Michael Wilde wrote:
>>>>>>>  >  >> Also, what account are you running under? We may need to change
>>>>>>> you to
>>>>>>>  >  >> a new account - as the OSG Training account expires today.
>>>>>>>  >  >> If that happend at Noon, it *might* be the problem.
>>>>>>>  >  >>
>>>>>>>  >  >> - Mike
>>>>>>>  >  >>
>>>>>>>  >  >>
>>>>>>>  >  >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
>>>>>>>  >  >>> Hi,
>>>>>>>  >  >>>
>>>>>>>  >  >>> I came back to re-run my application on NCSA Mercury which was
>>>>>>> tested
>>>>>>>  >  >>> successfully last week after I just set up coasters with
>>>>>> swift 0.9,
>>>>>>>  >  >>> but I got many messages like the following:
>>>>>>>  >  >>>
>>>>>>>  >  >>> Progress:  Stage in:219  Submitting:803  Submitted:1
>>>>>>>  >  >>> Progress:  Stage in:129  Submitting:703  Submitted:190 Failed
>>>>>>> but can
>>>>>>>  >  >>> retry:1
>>>>>>>  >  >>> Progress:  Stage in:38  Submitting:425  Submitted:556 Failed
>>>>>> but can
>>>>>>>  >  >>> retry:4
>>>>>>>  >  >>> Failed to transfer wrapper log from
>>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY
>>>>>>>  >  >>> Failed to transfer wrapper log from
>>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY
>>>>>>>  >  >>> Failed to transfer wrapper log from
>>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY
>>>>>>>  >  >>> Failed to transfer wrapper log from
>>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY
>>>>>>>  >  >>> Failed to transfer wrapper log from
>>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY
>>>>>>>  >  >>> Failed to transfer wrapper log from
>>>>>>>  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY
>>>>>>>  >  >>> Progress:  Stage in:1  Submitted:1013  Active:1 Failed but can
>>>>>>> retry:8
>>>>>>>  >  >>> Progress:  Submitted:1011  Active:1 Failed but can retry:11
>>>>>>>  >  >>> The log file for the successful run last week is ;
>>>>>>>  >  >>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
>>>>>>>  >  >>>
>>>>>>>  >  >>> The log file for the failed run is :
>>>>>>>  >  >>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
>>>>>>>  >  >>>
>>>>>>>  >  >>> I don't think I did anything different, so I don't know why
>>>>>>> this
>>>>>>> time
>>>>>>>  >  >>> they failed. The sites.xml for Mercury is:
>>>>>>>  >  >>>
>>>>>>>  >  >>>  <pool handle="NCSA_MERCURY">
>>>>>>>  >  >>>     <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
>>>>>>>  >  >>>     <execution provider="coaster"
>>>>>> url="grid-hg.ncsa.teragrid.org"
>>>>>>>  >  >>> jobManager="gt2:PBS"/>
>>>>>>>  >  >>>    
>>>>>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
>>>>>>>  >  >>>     <profile namespace="globus" key="queue">debug</profile>
>>>>>>>  >  >>>  </pool>
>>>>>>>  >  >>>
>>>>>>>  >  >>> Thank you for help!
>>>>>>>  >  >>>
>>>>>>>  >  >>> Chen, Yue
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>> This email is intended only for the use of the individual or
>>>>>> entity
>>>>>>>  >  >>> to which it is addressed and may contain information that is
>>>>>>>  >  >>> privileged and confidential. If the reader of this email
>>>>>> message is
>>>>>>>  >  >>> not the intended recipient, you are hereby notified that any
>>>>>>>  >  >>> dissemination, distribution, or copying of this
>>>>>>> communication is
>>>>>>>  >  >>> prohibited. If you have received this email in error, please
>>>>>> notify
>>>>>>>  >  >>> the sender and destroy/delete all copies of the transmittal.
>>>>>>> Thank you.
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >  >>>
>>>>>>>  >
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>>  >  >>>
>>>>>>>  >  >>> _______________________________________________
>>>>>>>  >  >>> Swift-user mailing list
>>>>>>>  >  >>> Swift-user at ci.uchicago.edu
>>>>>>>  >  >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>>>>  >  >> _______________________________________________
>>>>>>>  >  >> Swift-user mailing list
>>>>>>>  >  >> Swift-user at ci.uchicago.edu
>>>>>>>  >  >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>>>>  >  > _______________________________________________
>>>>>>>  >  > Swift-user mailing list
>>>>>>>  >  > Swift-user at ci.uchicago.edu
>>>>>>>  >  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>>>>  >
>>>>>>>  >
>>>>>>>  >
>>>>>>>  >
>>>>>>>  > This email is intended only for the use of the individual or
>>>>>> entity to
>>>>>>>  > which it is addressed and may contain information that is
>>>>>> privileged and
>>>>>>>  > confidential. If the reader of this email message is not the
>>>>>>> intended
>>>>>>>  > recipient, you are hereby notified that any dissemination,
>>>>>> distribution,
>>>>>>>  > or copying of this communication is prohibited. If you have
>>>>>>> received
>>>>>>>  > this email in error, please notify the sender and destroy/delete
>>>>>>> all
>>>>>>>  > copies of the transmittal. Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This email is intended only for the use of the individual or entity to
>>>>>>> which it is addressed and may contain information that is
>>>>>>> privileged and
>>>>>>> confidential. If the reader of this email message is not the intended
>>>>>>> recipient, you are hereby notified that any dissemination,
>>>>>>> distribution,
>>>>>>> or copying of this communication is prohibited. If you have received
>>>>>>> this email in error, please notify the sender and destroy/delete all
>>>>>>> copies of the transmittal. Thank you.
>>>>>> 
>>>>>>
>>>>>>
>>>>>> This email is intended only for the use of the individual or entity
>>>>>> to which it is addressed and may contain information that is
>>>>>> privileged and confidential. If the reader of this email message is
>>>>>> not the intended recipient, you are hereby notified that any
>>>>>> dissemination, distribution, or copying of this communication is
>>>>>> prohibited. If you have received this email in error, please notify
>>>>>> the sender and destroy/delete all copies of the transmittal. Thank you.
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>  
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>




This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090430/050c0b61/attachment.html>


More information about the Swift-devel mailing list