[Swift-devel] Re: [Swift-user] Execution error]]

Zhao Zhang zhaozhang at uchicago.edu
Thu May 21 13:18:12 CDT 2009



Michael Wilde wrote:
>
>
> On 5/21/09 12:39 PM, Zhao Zhang wrote:
>> Hi, Mike
>>
>> I did repeat this bug on ranger with 100 jobs. The log is at
>> /home/zzhang/scip/scip_100_fail_log
>>
>> Then I set
>> <profile namespace="globus" key="coasterMaxJobs">5</profile>
>> The same work flow ran perfectly.
>
> OK, thats good to hear.
> I thought the limit on Ranger was 50, though.
> Can you repeat the test at coasterMaxJobs 50 (or whatever the 
> documented limit is)?
yes, the limit is 50, I am trying it now.
>
>> I ran scip workflow with 16, 64 128, 256, 512, 1024 jobs.
>> One thing I concerned is that there might be noise in the tests, the 
>> end-to-end time could not be the real running time
>> for the workflow.
>
> I dont understand what you mean?  That the numbers below are too fast 
> to be believable?
No. What I mean is that if we expect a linear increase for running time 
from 512 jobs to 1024 jobs, since there is noise in the system,
we can't see the running time for 1024 jobs are 200 seconds given that 
512 jobs took 100 seconds.

zhao
>
> - Mike
>
>
>> Here is the summary for running time:
>> 16  173 seconds
>> 64  132 seconds
>> 128  166 seconds
>> 256  202 seconds
>> 512  221 seconds
>> 1024  400 seconds
>>
>> The logs are at
>> /home/zzhang/scip/scip_16_log
>> /home/zzhang/scip/scip_64_log
>> /home/zzhang/scip/scip_128_log
>> /home/zzhang/scip/scip_256_log
>> /home/zzhang/scip/scip_512_log
>> /home/zzhang/scip/scip_1024_log
>>
>> zhao
>>
>> Michael Wilde wrote:
>>> Zhao, here is the latest Coaster bug fix I am aware of, to test.
>>>
>>> This was a showstopper bug for Glen, as once he exceeded the job 
>>> limit, it *seemed* as I understand that Ranger killed all his jobs.
>>>
>>> - Mike
>>>
>>>
>>> -------- Original Message --------
>>> Subject: [Fwd: Re: [Swift-devel] RE: [Swift-user] Execution error]
>>> Date: Tue, 12 May 2009 15:38:52 -0500
>>> From: Michael Wilde <wilde at mcs.anl.gov>
>>> To: Glen Hocky <hockyg at uchicago.edu>,  Mihael Hategan 
>>> <hategan at mcs.anl.gov>
>>>
>>> Glen, the bug you were asking about yesterday was fixed on May 1 -
>>> here's the email from Mihael.
>>>
>>> Did you try this fix, or were you unaware of this particular message in
>>> the thread on that problem?
>>>
>>> - Mike
>>>
>>>
>>> -------- Original Message --------
>>> Subject: Re: [Swift-devel] RE: [Swift-user] Execution error
>>> Date: Fri, 01 May 2009 15:22:02 -0500
>>> From: Mihael Hategan <hategan at mcs.anl.gov>
>>> To: Glen Hocky <hockyg at uchicago.edu>
>>> CC: swift-devel <swift-devel at ci.uchicago.edu>,    "Yue,  Chen - BMD"
>>> <yuechen at bsd.uchicago.edu>
>>> References:
>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB1 at ADM-EXCHVS04.bsdad.uchicago.edu>    
>>> <49F9DE8F.1070404 at mcs.anl.gov>
>>> <49F9E298.8030801 at mcs.anl.gov>    <49F9E8FB.9020500 at mcs.anl.gov>
>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB4 at ADM-EXCHVS04.bsdad.uchicago.edu>    
>>> <49F9F680.6040503 at mcs.anl.gov>
>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB5 at ADM-EXCHVS04.bsdad.uchicago.edu>    
>>> <49FA03EA.7080807 at mcs.anl.gov>
>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB6 at ADM-EXCHVS04.bsdad.uchicago.edu>    
>>> <49FA147E.6070205 at uchicago.edu>
>>>  <1241135666.3603.1.camel at localhost>
>>>
>>> Fix in cog 2394.
>>>
>>> Use globus:coasterMaxJobs profile.
>>>
>>> On Thu, 2009-04-30 at 18:54 -0500, Mihael Hategan wrote:
>>>> Mystery solved:
>>>>
>>>> Thu Apr 30 18:19:13 2009 JM_SCRIPT:   ERROR: job submission failed:
>>>> Thu Apr 30 18:19:13 2009 JM_SCRIPT:
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>       Welcome to TACC's Ranger System, an NSF TeraGrid Resource
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>
>>>>   --> Submitting 16 tasks...
>>>>   --> Submitting 16 tasks/host...
>>>>   --> Submitting exclusive job to 1 hosts...
>>>>   --> Verifying HOME file-system availability...
>>>>   --> Verifying WORK file-system availability...
>>>>   --> Verifying SCRATCH file-system availability...
>>>>   --> Ensuring absence of dubious h_vmem,h_data,s_vmem,s_data 
>>>> limits...
>>>>   --> Requesting valid memory configuration (mt=31.3G)...
>>>>   --> Checking ssh keys...
>>>>   --> Checking file existence and permissions for passwordless ssh...
>>>>   --> Verifying accounting...
>>>> ----------------------------------------------------------------
>>>>    ERROR: You have exceeded the max submitted job count.
>>>>    Maximum allowed is 50 jobs.
>>>>
>>>>    Please contact TACC Consulting if you believe you have
>>>>    received this message in error.
>>>> ----------------------------------------------------------------
>>>> Job aborted by esub.
>>>>
>>>> I'll add a limit for the number of jobs allowed to the current coaster
>>>> code.
>>>>
>>>>
>>>> On Thu, 2009-04-30 at 16:13 -0500, Glen Hocky wrote:
>>>> > I have the identical response on ranger. It started yesterday 
>>>> evening. > Possibly a problem that the TACC folks need to fix?
>>>> > > Glen
>>>> > > Yue, Chen - BMD wrote:
>>>> > > Hi Michael,
>>>> > >  > > Thank you for the advices. I tested ranger with 1 job and 
>>>> new > > specifications of maxwalltime. It shows the following error 
>>>> message. I > > don't know if there is other problem with my setup. 
>>>> Thank you!
>>>> > >  > > /////////////////////////////////////////////////
>>>> > > [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift 
>>>> -sites.file > > sites.xml -tc.file tc.data
>>>> > > Swift 0.9rc2 swift-r2860 cog-r2388
>>>> > > RunID: 20090430-1559-2vi6x811
>>>> > > Progress:
>>>> > > Progress:  Stage in:1
>>>> > > Progress:  Submitting:1
>>>> > > Progress:  Submitting:1
>>>> > > Progress:  Submitted:1
>>>> > > Progress:  Active:1
>>>> > > Failed to transfer wrapper log from > > 
>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
>>>> > > Progress:  Active:1
>>>> > > Failed to transfer wrapper log from > > 
>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
>>>> > > Progress:  Stage in:1
>>>> > > Progress:  Active:1
>>>> > > Failed to transfer wrapper log from > > 
>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
>>>> > > Progress:  Failed:1
>>>> > > Execution failed:
>>>> > >         Exception in PTMap2:
>>>> > > Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01, 
>>>> inputs-unmod.txt, > > parameters.txt]
>>>> > > Host: ranger
>>>> > > Directory: 
>>>> PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
>>>> > > stderr.txt:
>>>> > > stdout.txt:
>>>> > > ----
>>>> > > Caused by:
>>>> > >         Failed to start worker:
>>>> > > null
>>>> > > null
>>>> > > org.globus.gram.GramException: The job manager detected an 
>>>> invalid > > script response
>>>> > >         at > > 
>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530) 
>>>>
>>>> > >         at org.globus.gram.GramJob.setStatus(GramJob.java:184)
>>>> > >         at > > 
>>>> org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
>>>> > >         at java.lang.Thread.run(Thread.java:619)
>>>> > > Cleaning up...
>>>> > > Shutting down service at https://129.114.50.163:45562 > > 
>>>> <https://129.114.50.163:45562>
>>>> > > Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
>>>> > > - Done
>>>> > > [yuechen at communicado PTMap2]$
>>>> > > ///////////////////////////////////////////////////////////
>>>> > >  > > Chen, Yue
>>>> > >  > >
>>>> > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>> > > *Sent:* Thu 4/30/2009 3:02 PM
>>>> > > *To:* Yue, Chen - BMD; swift-devel
>>>> > > *Subject:* Re: [Swift-user] Execution error
>>>> > >
>>>> > > Back on list here (I only went off-list to discuss accounts, etc)
>>>> > >
>>>> > > The problem in the run below is this:
>>>> > >
>>>> > > 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2 
>>>> APPLICATION_EXCEPTION
>>>> > > jobid=PTMap2-abeii5aj - Application exception: Job cannot be 
>>>> run with
>>>> > > the given max walltime worker constraint (task: 3000, \
>>>> > > maxwalltime: 2400s)
>>>> > >
>>>> > > You have this on the ptmap app in your tc.data:
>>>> > >
>>>> > > globus::maxwalltime=50
>>>> > >
>>>> > > But you only gave coasters 40 mins per coaster worker. So its
>>>> > > complaining that it cant run a 50 minute job in a 40 minute (max)
>>>> > > coaster worker. ;)
>>>> > >
>>>> > > I mentioned in a prior mail that you need to set the two time 
>>>> vals in
>>>> > > your sites.xml entry; thats what you need to do next, now.
>>>> > >
>>>> > > change the coaster time in your sites.xml to:
>>>> > >      key="coasterWorkerMaxwalltime">00:51:00</profile>
>>>> > >
>>>> > > If you have more info on the variability of your ptmap run 
>>>> times, send
>>>> > > that to the list, and we can discuss how to handle.
>>>> > >
>>>> > >
>>>> > > (NOTE: doing grp -i of the log for "except" or scanning for 
>>>> "except"
>>>> > > with an editor will often locate the first "exception" that 
>>>> your job
>>>> > > encountered. Thats how I found the error above).
>>>> > >
>>>> > > Also, Yue, for testing new sites, or for validating that old 
>>>> sites still
>>>> > > work, you should create the smallest possible ptmap workflow - 
>>>> 1 job if
>>>> > > that is possible - and verify that this works.  Then say 10 
>>>> jobs to make
>>>> > > sure scheduling etc is sane.  Then, send in your huge jobs.
>>>> > >
>>>> > > With only 1 job, its easier to spot the errors in the log file.
>>>> > >
>>>> > > - Mike
>>>> > >
>>>> > >
>>>> > > On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
>>>> > > > Hi Michael,
>>>> > > > > > > I run into the same messages again when I use Ranger:
>>>> > > > > > > Progress:  Selecting site:146  Stage in:25  
>>>> Submitting:15  Submitted:821
>>>> > > > Failed but can retry:16
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
>>>> > > > Progress:  Selecting site:146  Stage in:3  Submitting:1  
>>>> Submitted:857
>>>> > > > Failed but can retry:16
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
>>>> > > > Failed to transfer wrapper log from
>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>>>> > > > The log for the search is at : > > > 
>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
>>>> > > > > > > The sites.xml I have is:
>>>> > > > > > >  <pool handle="ranger">
>>>> > > >      <execution provider="coaster"
>>>> > > >                 url="gatekeeper.ranger.tacc.teragrid.org"
>>>> > > >                 jobManager="gt2:gt2:SGE"/>
>>>> > > >      <gridftp 
>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>>>> > > >      <profile namespace="env"
>>>> > > >               
>>>> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>>>> > > >      <profile namespace="globus" 
>>>> key="project">TG-CCR080022N</profile>
>>>> > > >      <profile namespace="globus" 
>>>> key="coastersPerNode">16</profile>
>>>> > > >      <profile namespace="globus" 
>>>> key="queue">development</profile>
>>>> > > >      <profile namespace="globus"
>>>> > > >               key="coasterWorkerMaxwalltime">00:40:00</profile>
>>>> > > >      <profile namespace="globus" key="maxwalltime">31</profile>
>>>> > > >      <profile namespace="karajan" 
>>>> key="initialScore">50</profile>
>>>> > > >      <profile namespace="karajan" key="jobThrottle">10</profile>
>>>> > > >      
>>>> <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
>>>> > > >  </pool>
>>>> > > > The tc.data I have is:
>>>> > > > > > > ranger          PTMap2         > > > 
>>>> /share/home/01164/yuechen/PTMap2/PTMap2         INSTALLED      > > 
>>>> > INTEL32::LINUX  globus::maxwalltime=50
>>>> > > >
>>>> > > > I'm using swift 0.9 rc2
>>>> > > >
>>>> > > > Thank you very much for help!
>>>> > > >
>>>> > > > Chen, Yue
>>>> > > >
>>>> > > > > > >
>>>> > > > 
>>>> ------------------------------------------------------------------------ 
>>>>
>>>> > > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>> > > > *Sent:* Thu 4/30/2009 2:05 PM
>>>> > > > *To:* Yue, Chen - BMD
>>>> > > > *Subject:* Re: [Swift-user] Execution error
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
>>>> > > >  > Hi Michael,
>>>> > > >  >
>>>> > > >  > When I tried to activate my account, I encountered the 
>>>> following > > error:
>>>> > > >  >
>>>> > > >  > "Sorry, this account is in an invalid state. You may not 
>>>> activate > > your
>>>> > > >  > at this time."
>>>> > > >  >
>>>> > > >  > I used the username and password from TG-CDA070002T. 
>>>> Should I use a
>>>> > > >  > different password?
>>>> > > >
>>>> > > > If you can already login to Ranger, then you are all set - 
>>>> you must have
>>>> > > > done this previously.
>>>> > > >
>>>> > > > I thought you had *not*, because when I looked up your login 
>>>> on ranger
>>>> > > > ("finger yuechen") it said "never logged in". But seems like 
>>>> that info
>>>> > > > is incorrect.
>>>> > > >
>>>> > > > If you have ptmap compiled, seems like you are almost all set.
>>>> > > >
>>>> > > > Let me know if it works.
>>>> > > >
>>>> > > > - Mike
>>>> > > >
>>>> > > >  > Thanks!
>>>> > > >  >
>>>> > > >  > Chen, Yue
>>>> > > >  >
>>>> > > >  >
>>>> > > >  > > > 
>>>> ------------------------------------------------------------------------ 
>>>>
>>>> > > >  > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>> > > >  > *Sent:* Thu 4/30/2009 1:07 PM
>>>> > > >  > *To:* Yue, Chen - BMD
>>>> > > >  > *Cc:* swift user
>>>> > > >  > *Subject:* Re: [Swift-user] Execution error
>>>> > > >  >
>>>> > > >  > Yue, use this XML pool element to access ranger:
>>>> > > >  >
>>>> > > >  >   <pool handle="ranger">
>>>> > > >  >      <execution provider="coaster"
>>>> > > >  >                 url="gatekeeper.ranger.tacc.teragrid.org"
>>>> > > >  >                 jobManager="gt2:gt2:SGE"/>
>>>> > > >  >      <gridftp > > 
>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>>>> > > >  >      <profile namespace="env"
>>>> > > >  >               
>>>> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>>>> > > >  >      <profile namespace="globus" > > 
>>>> key="project">TG-CCR080022N</profile>
>>>> > > >  >      <profile namespace="globus" 
>>>> key="coastersPerNode">16</profile>
>>>> > > >  >      <profile namespace="globus" 
>>>> key="queue">development</profile>
>>>> > > >  >      <profile namespace="globus"
>>>> > > >  >               
>>>> key="coasterWorkerMaxwalltime">00:40:00</profile>
>>>> > > >  >      <profile namespace="globus" 
>>>> key="maxwalltime">31</profile>
>>>> > > >  >      <profile namespace="karajan" 
>>>> key="initialScore">50</profile>
>>>> > > >  >      <profile namespace="karajan" 
>>>> key="jobThrottle">10</profile>
>>>> > > >  >      
>>>> <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
>>>> > > >  >    </pool>
>>>> > > >  >
>>>> > > >  >
>>>> > > >  > You will need to also do these steps:
>>>> > > >  >
>>>> > > >  > Go to this web page to enable your Ranger account:
>>>> > > >  >
>>>> > > >  > 
>>>> https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
>>>> > > >  >
>>>> > > >  > Then login to Ranger via the TeraGrid portal and put your 
>>>> ssh keys in
>>>> > > >  > place (assuming you use ssh keys, which you should)
>>>> > > >  >
>>>> > > >  > While on Ranger, do this:
>>>> > > >  >
>>>> > > >  > echo $WORK
>>>> > > >  > mkdir $work/swiftwork
>>>> > > >  >
>>>> > > >  > and put the full path of your $WORK/swiftwork directory in 
>>>> the
>>>> > > >  > <workdirectory> element above. (My login is tg455etc, 
>>>> yours is > > yuechen)
>>>> > > >  >
>>>> > > >  > Then scp your code to Ranger and compile it.
>>>> > > >  >
>>>> > > >  > Then create a tc.data entry for your ptmap app
>>>> > > >  >
>>>> > > >  > Next, set your time values in the sites.xml entry above to 
>>>> suitable
>>>> > > >  > values for Ranger. You'll need to measure times, but I 
>>>> think you will
>>>> > > >  > find Ranger about twice as fast as Mercury for CPU-bound 
>>>> jobs.
>>>> > > >  >
>>>> > > >  > The values above were set for one app job per coaster. I 
>>>> think > > you can
>>>> > > >  > probably do more.
>>>> > > >  >
>>>> > > >  > If you estimate a run time of 5 minutes, use:
>>>> > > >  >
>>>> > > >  >      <profile namespace="globus"
>>>> > > >  >               
>>>> key="coasterWorkerMaxwalltime">00:30:00</profile>
>>>> > > >  >      <profile namespace="globus" 
>>>> key="maxwalltime">5</profile>
>>>> > > >  >
>>>> > > >  > Other people on the list - please sanity check what I 
>>>> suggest here.
>>>> > > >  >
>>>> > > >  > - Mike
>>>> > > >  >
>>>> > > >  >
>>>> > > >  > On 4/30/09 12:40 PM, Michael Wilde wrote:
>>>> > > >  >  > I just checked - TG-CDA070002T has indeed expired.
>>>> > > >  >  >
>>>> > > >  >  > The best for now is to move to use (only) Ranger, under 
>>>> this > > account:
>>>> > > >  >  > TG-CCR080022N
>>>> > > >  >  >
>>>> > > >  >  > I will locate and send you a sites.xml entry in a moment.
>>>> > > >  >  >
>>>> > > >  >  > You need to go to a web page to activate your Ranger 
>>>> login.
>>>> > > >  >  >
>>>> > > >  >  > Best to contact me in IM and we can work this out.
>>>> > > >  >  >
>>>> > > >  >  > - Mike
>>>> > > >  >  >
>>>> > > >  >  >
>>>> > > >  >  >
>>>> > > >  >  > On 4/30/09 12:23 PM, Michael Wilde wrote:
>>>> > > >  >  >> Also, what account are you running under? We may need 
>>>> to change
>>>> > > > you to
>>>> > > >  >  >> a new account - as the OSG Training account expires 
>>>> today.
>>>> > > >  >  >> If that happend at Noon, it *might* be the problem.
>>>> > > >  >  >>
>>>> > > >  >  >> - Mike
>>>> > > >  >  >>
>>>> > > >  >  >>
>>>> > > >  >  >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
>>>> > > >  >  >>> Hi,
>>>> > > >  >  >>>
>>>> > > >  >  >>> I came back to re-run my application on NCSA Mercury 
>>>> which was
>>>> > > > tested
>>>> > > >  >  >>> successfully last week after I just set up coasters 
>>>> with > > swift 0.9,
>>>> > > >  >  >>> but I got many messages like the following:
>>>> > > >  >  >>>
>>>> > > >  >  >>> Progress:  Stage in:219  Submitting:803  Submitted:1
>>>> > > >  >  >>> Progress:  Stage in:129  Submitting:703  
>>>> Submitted:190 Failed
>>>> > > > but can
>>>> > > >  >  >>> retry:1
>>>> > > >  >  >>> Progress:  Stage in:38  Submitting:425  Submitted:556 
>>>> Failed > > but can
>>>> > > >  >  >>> retry:4
>>>> > > >  >  >>> Failed to transfer wrapper log from
>>>> > > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on 
>>>> NCSA_MERCURY
>>>> > > >  >  >>> Failed to transfer wrapper log from
>>>> > > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on 
>>>> NCSA_MERCURY
>>>> > > >  >  >>> Failed to transfer wrapper log from
>>>> > > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on 
>>>> NCSA_MERCURY
>>>> > > >  >  >>> Failed to transfer wrapper log from
>>>> > > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on 
>>>> NCSA_MERCURY
>>>> > > >  >  >>> Failed to transfer wrapper log from
>>>> > > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on 
>>>> NCSA_MERCURY
>>>> > > >  >  >>> Failed to transfer wrapper log from
>>>> > > >  >  >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on 
>>>> NCSA_MERCURY
>>>> > > >  >  >>> Progress:  Stage in:1  Submitted:1013  Active:1 
>>>> Failed but can
>>>> > > > retry:8
>>>> > > >  >  >>> Progress:  Submitted:1011  Active:1 Failed but can 
>>>> retry:11
>>>> > > >  >  >>> The log file for the successful run last week is ;
>>>> > > >  >  >>> 
>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
>>>> > > >  >  >>>
>>>> > > >  >  >>> The log file for the failed run is :
>>>> > > >  >  >>> 
>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
>>>> > > >  >  >>>
>>>> > > >  >  >>> I don't think I did anything different, so I don't 
>>>> know why this
>>>> > > > time
>>>> > > >  >  >>> they failed. The sites.xml for Mercury is:
>>>> > > >  >  >>>
>>>> > > >  >  >>>  <pool handle="NCSA_MERCURY">
>>>> > > >  >  >>>     <gridftp 
>>>> url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
>>>> > > >  >  >>>     <execution provider="coaster" > > 
>>>> url="grid-hg.ncsa.teragrid.org"
>>>> > > >  >  >>> jobManager="gt2:PBS"/>
>>>> > > >  >  >>>     > > 
>>>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
>>>> > > >  >  >>>     <profile namespace="globus" 
>>>> key="queue">debug</profile>
>>>> > > >  >  >>>  </pool>
>>>> > > >  >  >>>
>>>> > > >  >  >>> Thank you for help!
>>>> > > >  >  >>>
>>>> > > >  >  >>> Chen, Yue
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>> This email is intended only for the use of the 
>>>> individual or > > entity
>>>> > > >  >  >>> to which it is addressed and may contain information 
>>>> that is
>>>> > > >  >  >>> privileged and confidential. If the reader of this 
>>>> email > > message is
>>>> > > >  >  >>> not the intended recipient, you are hereby notified 
>>>> that any
>>>> > > >  >  >>> dissemination, distribution, or copying of this 
>>>> communication is
>>>> > > >  >  >>> prohibited. If you have received this email in error, 
>>>> please > > notify
>>>> > > >  >  >>> the sender and destroy/delete all copies of the 
>>>> transmittal.
>>>> > > > Thank you.
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  >  >>>
>>>> > > >  > > > 
>>>> ------------------------------------------------------------------------ 
>>>>
>>>> > > >  >  >>>
>>>> > > >  >  >>> _______________________________________________
>>>> > > >  >  >>> Swift-user mailing list
>>>> > > >  >  >>> Swift-user at ci.uchicago.edu
>>>> > > >  >  >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>> > > >  >  >> _______________________________________________
>>>> > > >  >  >> Swift-user mailing list
>>>> > > >  >  >> Swift-user at ci.uchicago.edu
>>>> > > >  >  >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>> > > >  >  > _______________________________________________
>>>> > > >  >  > Swift-user mailing list
>>>> > > >  >  > Swift-user at ci.uchicago.edu
>>>> > > >  >  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>> > > >  >
>>>> > > >  >
>>>> > > >  >
>>>> > > >  >
>>>> > > >  > This email is intended only for the use of the individual 
>>>> or > > entity to
>>>> > > >  > which it is addressed and may contain information that is 
>>>> > > privileged and
>>>> > > >  > confidential. If the reader of this email message is not 
>>>> the intended
>>>> > > >  > recipient, you are hereby notified that any dissemination, 
>>>> > > distribution,
>>>> > > >  > or copying of this communication is prohibited. If you 
>>>> have received
>>>> > > >  > this email in error, please notify the sender and 
>>>> destroy/delete all
>>>> > > >  > copies of the transmittal. Thank you.
>>>> > > >
>>>> > > > > > >
>>>> > > >
>>>> > > > This email is intended only for the use of the individual or 
>>>> entity to
>>>> > > > which it is addressed and may contain information that is 
>>>> privileged and
>>>> > > > confidential. If the reader of this email message is not the 
>>>> intended
>>>> > > > recipient, you are hereby notified that any dissemination, 
>>>> distribution,
>>>> > > > or copying of this communication is prohibited. If you have 
>>>> received
>>>> > > > this email in error, please notify the sender and 
>>>> destroy/delete all
>>>> > > > copies of the transmittal. Thank you.
>>>> > >
>>>> > >  > >
>>>> > >
>>>> > > This email is intended only for the use of the individual or 
>>>> entity to > > which it is addressed and may contain information 
>>>> that is privileged > > and confidential. If the reader of this 
>>>> email message is not the > > intended recipient, you are hereby 
>>>> notified that any dissemination, > > distribution, or copying of 
>>>> this communication is prohibited. If you > > have received this 
>>>> email in error, please notify the sender and > > destroy/delete all 
>>>> copies of the transmittal. Thank you.
>>>> > > 
>>>> ------------------------------------------------------------------------ 
>>>>
>>>> > >
>>>> > > _______________________________________________
>>>> > > Swift-devel mailing list
>>>> > > Swift-devel at ci.uchicago.edu
>>>> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>> > >   > > _______________________________________________
>>>> > Swift-devel mailing list
>>>> > Swift-devel at ci.uchicago.edu
>>>> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>



More information about the Swift-devel mailing list