[Swift-devel] Re: [Swift-user] Execution error]]
Zhao Zhang
zhaozhang at uchicago.edu
Thu May 21 13:28:34 CDT 2009
Hi, Mike
Zhao Zhang wrote:
>
>
> Michael Wilde wrote:
>>
>>
>> On 5/21/09 12:39 PM, Zhao Zhang wrote:
>>> Hi, Mike
>>>
>>> I did repeat this bug on ranger with 100 jobs. The log is at
>>> /home/zzhang/scip/scip_100_fail_log
>>>
>>> Then I set
>>> <profile namespace="globus" key="coasterMaxJobs">5</profile>
>>> The same work flow ran perfectly.
>>
>> OK, thats good to hear.
>> I thought the limit on Ranger was 50, though.
>> Can you repeat the test at coasterMaxJobs 50 (or whatever the
>> documented limit is)?
> yes, the limit is 50, I am trying it now.
I ran the 1024 job workflow again with limit 50, again it took 393
seconds to finish. No big difference with the "5" case.
zhao
>>
>>> I ran scip workflow with 16, 64 128, 256, 512, 1024 jobs.
>>> One thing I concerned is that there might be noise in the tests, the
>>> end-to-end time could not be the real running time
>>> for the workflow.
>>
>> I dont understand what you mean? That the numbers below are too fast
>> to be believable?
> No. What I mean is that if we expect a linear increase for running
> time from 512 jobs to 1024 jobs, since there is noise in the system,
> we can't see the running time for 1024 jobs are 200 seconds given that
> 512 jobs took 100 seconds.
>
> zhao
>>
>> - Mike
>>
>>
>>> Here is the summary for running time:
>>> 16 173 seconds
>>> 64 132 seconds
>>> 128 166 seconds
>>> 256 202 seconds
>>> 512 221 seconds
>>> 1024 400 seconds
>>>
>>> The logs are at
>>> /home/zzhang/scip/scip_16_log
>>> /home/zzhang/scip/scip_64_log
>>> /home/zzhang/scip/scip_128_log
>>> /home/zzhang/scip/scip_256_log
>>> /home/zzhang/scip/scip_512_log
>>> /home/zzhang/scip/scip_1024_log
>>>
>>> zhao
>>>
>>> Michael Wilde wrote:
>>>> Zhao, here is the latest Coaster bug fix I am aware of, to test.
>>>>
>>>> This was a showstopper bug for Glen, as once he exceeded the job
>>>> limit, it *seemed* as I understand that Ranger killed all his jobs.
>>>>
>>>> - Mike
>>>>
>>>>
>>>> -------- Original Message --------
>>>> Subject: [Fwd: Re: [Swift-devel] RE: [Swift-user] Execution error]
>>>> Date: Tue, 12 May 2009 15:38:52 -0500
>>>> From: Michael Wilde <wilde at mcs.anl.gov>
>>>> To: Glen Hocky <hockyg at uchicago.edu>, Mihael Hategan
>>>> <hategan at mcs.anl.gov>
>>>>
>>>> Glen, the bug you were asking about yesterday was fixed on May 1 -
>>>> here's the email from Mihael.
>>>>
>>>> Did you try this fix, or were you unaware of this particular
>>>> message in
>>>> the thread on that problem?
>>>>
>>>> - Mike
>>>>
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [Swift-devel] RE: [Swift-user] Execution error
>>>> Date: Fri, 01 May 2009 15:22:02 -0500
>>>> From: Mihael Hategan <hategan at mcs.anl.gov>
>>>> To: Glen Hocky <hockyg at uchicago.edu>
>>>> CC: swift-devel <swift-devel at ci.uchicago.edu>, "Yue, Chen - BMD"
>>>> <yuechen at bsd.uchicago.edu>
>>>> References:
>>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB1 at ADM-EXCHVS04.bsdad.uchicago.edu>
>>>> <49F9DE8F.1070404 at mcs.anl.gov>
>>>> <49F9E298.8030801 at mcs.anl.gov> <49F9E8FB.9020500 at mcs.anl.gov>
>>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB4 at ADM-EXCHVS04.bsdad.uchicago.edu>
>>>> <49F9F680.6040503 at mcs.anl.gov>
>>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB5 at ADM-EXCHVS04.bsdad.uchicago.edu>
>>>> <49FA03EA.7080807 at mcs.anl.gov>
>>>> <AD1FA15416EEBC49A0FE4F8B0C8AD7C5158CB6 at ADM-EXCHVS04.bsdad.uchicago.edu>
>>>> <49FA147E.6070205 at uchicago.edu>
>>>> <1241135666.3603.1.camel at localhost>
>>>>
>>>> Fix in cog 2394.
>>>>
>>>> Use globus:coasterMaxJobs profile.
>>>>
>>>> On Thu, 2009-04-30 at 18:54 -0500, Mihael Hategan wrote:
>>>>> Mystery solved:
>>>>>
>>>>> Thu Apr 30 18:19:13 2009 JM_SCRIPT: ERROR: job submission failed:
>>>>> Thu Apr 30 18:19:13 2009 JM_SCRIPT:
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> Welcome to TACC's Ranger System, an NSF TeraGrid Resource
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> --> Submitting 16 tasks...
>>>>> --> Submitting 16 tasks/host...
>>>>> --> Submitting exclusive job to 1 hosts...
>>>>> --> Verifying HOME file-system availability...
>>>>> --> Verifying WORK file-system availability...
>>>>> --> Verifying SCRATCH file-system availability...
>>>>> --> Ensuring absence of dubious h_vmem,h_data,s_vmem,s_data
>>>>> limits...
>>>>> --> Requesting valid memory configuration (mt=31.3G)...
>>>>> --> Checking ssh keys...
>>>>> --> Checking file existence and permissions for passwordless ssh...
>>>>> --> Verifying accounting...
>>>>> ----------------------------------------------------------------
>>>>> ERROR: You have exceeded the max submitted job count.
>>>>> Maximum allowed is 50 jobs.
>>>>>
>>>>> Please contact TACC Consulting if you believe you have
>>>>> received this message in error.
>>>>> ----------------------------------------------------------------
>>>>> Job aborted by esub.
>>>>>
>>>>> I'll add a limit for the number of jobs allowed to the current
>>>>> coaster
>>>>> code.
>>>>>
>>>>>
>>>>> On Thu, 2009-04-30 at 16:13 -0500, Glen Hocky wrote:
>>>>> > I have the identical response on ranger. It started yesterday
>>>>> evening. > Possibly a problem that the TACC folks need to fix?
>>>>> > > Glen
>>>>> > > Yue, Chen - BMD wrote:
>>>>> > > Hi Michael,
>>>>> > > > > Thank you for the advices. I tested ranger with 1 job and
>>>>> new > > specifications of maxwalltime. It shows the following
>>>>> error message. I > > don't know if there is other problem with my
>>>>> setup. Thank you!
>>>>> > > > > /////////////////////////////////////////////////
>>>>> > > [yuechen at communicado PTMap2]$ swift PTMap2-unmod.swift
>>>>> -sites.file > > sites.xml -tc.file tc.data
>>>>> > > Swift 0.9rc2 swift-r2860 cog-r2388
>>>>> > > RunID: 20090430-1559-2vi6x811
>>>>> > > Progress:
>>>>> > > Progress: Stage in:1
>>>>> > > Progress: Submitting:1
>>>>> > > Progress: Submitting:1
>>>>> > > Progress: Submitted:1
>>>>> > > Progress: Active:1
>>>>> > > Failed to transfer wrapper log from > >
>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/i on ranger
>>>>> > > Progress: Active:1
>>>>> > > Failed to transfer wrapper log from > >
>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/k on ranger
>>>>> > > Progress: Stage in:1
>>>>> > > Progress: Active:1
>>>>> > > Failed to transfer wrapper log from > >
>>>>> PTMap2-unmod-20090430-1559-2vi6x811/info/m on ranger
>>>>> > > Progress: Failed:1
>>>>> > > Execution failed:
>>>>> > > Exception in PTMap2:
>>>>> > > Arguments: [e04.mzXML, ./seqs-ecolik12/fasta01,
>>>>> inputs-unmod.txt, > > parameters.txt]
>>>>> > > Host: ranger
>>>>> > > Directory:
>>>>> PTMap2-unmod-20090430-1559-2vi6x811/jobs/m/PTMap2-mbe6m5aj
>>>>> > > stderr.txt:
>>>>> > > stdout.txt:
>>>>> > > ----
>>>>> > > Caused by:
>>>>> > > Failed to start worker:
>>>>> > > null
>>>>> > > null
>>>>> > > org.globus.gram.GramException: The job manager detected an
>>>>> invalid > > script response
>>>>> > > at > >
>>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.statusChanged(JobSubmissionTaskHandler.java:530)
>>>>>
>>>>> > > at org.globus.gram.GramJob.setStatus(GramJob.java:184)
>>>>> > > at > >
>>>>> org.globus.gram.GramCallbackHandler.run(CallbackHandler.java:176)
>>>>> > > at java.lang.Thread.run(Thread.java:619)
>>>>> > > Cleaning up...
>>>>> > > Shutting down service at https://129.114.50.163:45562 > >
>>>>> <https://129.114.50.163:45562>
>>>>> > > Got channel MetaChannel: 20903429 -> GSSSChannel-null(1)
>>>>> > > - Done
>>>>> > > [yuechen at communicado PTMap2]$
>>>>> > > ///////////////////////////////////////////////////////////
>>>>> > > > > Chen, Yue
>>>>> > > > >
>>>>> > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>>> > > *Sent:* Thu 4/30/2009 3:02 PM
>>>>> > > *To:* Yue, Chen - BMD; swift-devel
>>>>> > > *Subject:* Re: [Swift-user] Execution error
>>>>> > >
>>>>> > > Back on list here (I only went off-list to discuss accounts, etc)
>>>>> > >
>>>>> > > The problem in the run below is this:
>>>>> > >
>>>>> > > 2009-04-30 14:29:41,265-0500 DEBUG vdl:execute2
>>>>> APPLICATION_EXCEPTION
>>>>> > > jobid=PTMap2-abeii5aj - Application exception: Job cannot be
>>>>> run with
>>>>> > > the given max walltime worker constraint (task: 3000, \
>>>>> > > maxwalltime: 2400s)
>>>>> > >
>>>>> > > You have this on the ptmap app in your tc.data:
>>>>> > >
>>>>> > > globus::maxwalltime=50
>>>>> > >
>>>>> > > But you only gave coasters 40 mins per coaster worker. So its
>>>>> > > complaining that it cant run a 50 minute job in a 40 minute (max)
>>>>> > > coaster worker. ;)
>>>>> > >
>>>>> > > I mentioned in a prior mail that you need to set the two time
>>>>> vals in
>>>>> > > your sites.xml entry; thats what you need to do next, now.
>>>>> > >
>>>>> > > change the coaster time in your sites.xml to:
>>>>> > > key="coasterWorkerMaxwalltime">00:51:00</profile>
>>>>> > >
>>>>> > > If you have more info on the variability of your ptmap run
>>>>> times, send
>>>>> > > that to the list, and we can discuss how to handle.
>>>>> > >
>>>>> > >
>>>>> > > (NOTE: doing grp -i of the log for "except" or scanning for
>>>>> "except"
>>>>> > > with an editor will often locate the first "exception" that
>>>>> your job
>>>>> > > encountered. Thats how I found the error above).
>>>>> > >
>>>>> > > Also, Yue, for testing new sites, or for validating that old
>>>>> sites still
>>>>> > > work, you should create the smallest possible ptmap workflow -
>>>>> 1 job if
>>>>> > > that is possible - and verify that this works. Then say 10
>>>>> jobs to make
>>>>> > > sure scheduling etc is sane. Then, send in your huge jobs.
>>>>> > >
>>>>> > > With only 1 job, its easier to spot the errors in the log file.
>>>>> > >
>>>>> > > - Mike
>>>>> > >
>>>>> > >
>>>>> > > On 4/30/09 2:34 PM, Yue, Chen - BMD wrote:
>>>>> > > > Hi Michael,
>>>>> > > > > > > I run into the same messages again when I use Ranger:
>>>>> > > > > > > Progress: Selecting site:146 Stage in:25
>>>>> Submitting:15 Submitted:821
>>>>> > > > Failed but can retry:16
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/l on ranger
>>>>> > > > Progress: Selecting site:146 Stage in:3 Submitting:1
>>>>> Submitted:857
>>>>> > > > Failed but can retry:16
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/v on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/b on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/0 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/a on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/4 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/8 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/7 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/x on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/3 on ranger
>>>>> > > > Failed to transfer wrapper log from
>>>>> > > > PTMap2-unmod-20090430-1428-v0c5di5c/info/q on ranger
>>>>> > > > The log for the search is at : > > >
>>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1428-v0c5di5c.log
>>>>> > > > > > > The sites.xml I have is:
>>>>> > > > > > > <pool handle="ranger">
>>>>> > > > <execution provider="coaster"
>>>>> > > > url="gatekeeper.ranger.tacc.teragrid.org"
>>>>> > > > jobManager="gt2:gt2:SGE"/>
>>>>> > > > <gridftp
>>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>>>>> > > > <profile namespace="env"
>>>>> > > >
>>>>> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>>>>> > > > <profile namespace="globus"
>>>>> key="project">TG-CCR080022N</profile>
>>>>> > > > <profile namespace="globus"
>>>>> key="coastersPerNode">16</profile>
>>>>> > > > <profile namespace="globus"
>>>>> key="queue">development</profile>
>>>>> > > > <profile namespace="globus"
>>>>> > > > key="coasterWorkerMaxwalltime">00:40:00</profile>
>>>>> > > > <profile namespace="globus" key="maxwalltime">31</profile>
>>>>> > > > <profile namespace="karajan"
>>>>> key="initialScore">50</profile>
>>>>> > > > <profile namespace="karajan"
>>>>> key="jobThrottle">10</profile>
>>>>> > > >
>>>>> <workdirectory>/work/01164/yuechen/swiftwork</workdirectory>
>>>>> > > > </pool>
>>>>> > > > The tc.data I have is:
>>>>> > > > > > > ranger PTMap2 > > >
>>>>> /share/home/01164/yuechen/PTMap2/PTMap2 INSTALLED > >
>>>>> > INTEL32::LINUX globus::maxwalltime=50
>>>>> > > >
>>>>> > > > I'm using swift 0.9 rc2
>>>>> > > >
>>>>> > > > Thank you very much for help!
>>>>> > > >
>>>>> > > > Chen, Yue
>>>>> > > >
>>>>> > > > > > >
>>>>> > > >
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> > > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>>> > > > *Sent:* Thu 4/30/2009 2:05 PM
>>>>> > > > *To:* Yue, Chen - BMD
>>>>> > > > *Subject:* Re: [Swift-user] Execution error
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > On 4/30/09 1:51 PM, Yue, Chen - BMD wrote:
>>>>> > > > > Hi Michael,
>>>>> > > > >
>>>>> > > > > When I tried to activate my account, I encountered the
>>>>> following > > error:
>>>>> > > > >
>>>>> > > > > "Sorry, this account is in an invalid state. You may not
>>>>> activate > > your
>>>>> > > > > at this time."
>>>>> > > > >
>>>>> > > > > I used the username and password from TG-CDA070002T.
>>>>> Should I use a
>>>>> > > > > different password?
>>>>> > > >
>>>>> > > > If you can already login to Ranger, then you are all set -
>>>>> you must have
>>>>> > > > done this previously.
>>>>> > > >
>>>>> > > > I thought you had *not*, because when I looked up your login
>>>>> on ranger
>>>>> > > > ("finger yuechen") it said "never logged in". But seems like
>>>>> that info
>>>>> > > > is incorrect.
>>>>> > > >
>>>>> > > > If you have ptmap compiled, seems like you are almost all set.
>>>>> > > >
>>>>> > > > Let me know if it works.
>>>>> > > >
>>>>> > > > - Mike
>>>>> > > >
>>>>> > > > > Thanks!
>>>>> > > > >
>>>>> > > > > Chen, Yue
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > > >
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> > > > > *From:* Michael Wilde [mailto:wilde at mcs.anl.gov]
>>>>> > > > > *Sent:* Thu 4/30/2009 1:07 PM
>>>>> > > > > *To:* Yue, Chen - BMD
>>>>> > > > > *Cc:* swift user
>>>>> > > > > *Subject:* Re: [Swift-user] Execution error
>>>>> > > > >
>>>>> > > > > Yue, use this XML pool element to access ranger:
>>>>> > > > >
>>>>> > > > > <pool handle="ranger">
>>>>> > > > > <execution provider="coaster"
>>>>> > > > > url="gatekeeper.ranger.tacc.teragrid.org"
>>>>> > > > > jobManager="gt2:gt2:SGE"/>
>>>>> > > > > <gridftp > >
>>>>> url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
>>>>> > > > > <profile namespace="env"
>>>>> > > > >
>>>>> key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
>>>>> > > > > <profile namespace="globus" > >
>>>>> key="project">TG-CCR080022N</profile>
>>>>> > > > > <profile namespace="globus"
>>>>> key="coastersPerNode">16</profile>
>>>>> > > > > <profile namespace="globus"
>>>>> key="queue">development</profile>
>>>>> > > > > <profile namespace="globus"
>>>>> > > > >
>>>>> key="coasterWorkerMaxwalltime">00:40:00</profile>
>>>>> > > > > <profile namespace="globus"
>>>>> key="maxwalltime">31</profile>
>>>>> > > > > <profile namespace="karajan"
>>>>> key="initialScore">50</profile>
>>>>> > > > > <profile namespace="karajan"
>>>>> key="jobThrottle">10</profile>
>>>>> > > > >
>>>>> <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
>>>>> > > > > </pool>
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > You will need to also do these steps:
>>>>> > > > >
>>>>> > > > > Go to this web page to enable your Ranger account:
>>>>> > > > >
>>>>> > > > >
>>>>> https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx
>>>>> > > > >
>>>>> > > > > Then login to Ranger via the TeraGrid portal and put your
>>>>> ssh keys in
>>>>> > > > > place (assuming you use ssh keys, which you should)
>>>>> > > > >
>>>>> > > > > While on Ranger, do this:
>>>>> > > > >
>>>>> > > > > echo $WORK
>>>>> > > > > mkdir $work/swiftwork
>>>>> > > > >
>>>>> > > > > and put the full path of your $WORK/swiftwork directory
>>>>> in the
>>>>> > > > > <workdirectory> element above. (My login is tg455etc,
>>>>> yours is > > yuechen)
>>>>> > > > >
>>>>> > > > > Then scp your code to Ranger and compile it.
>>>>> > > > >
>>>>> > > > > Then create a tc.data entry for your ptmap app
>>>>> > > > >
>>>>> > > > > Next, set your time values in the sites.xml entry above
>>>>> to suitable
>>>>> > > > > values for Ranger. You'll need to measure times, but I
>>>>> think you will
>>>>> > > > > find Ranger about twice as fast as Mercury for CPU-bound
>>>>> jobs.
>>>>> > > > >
>>>>> > > > > The values above were set for one app job per coaster. I
>>>>> think > > you can
>>>>> > > > > probably do more.
>>>>> > > > >
>>>>> > > > > If you estimate a run time of 5 minutes, use:
>>>>> > > > >
>>>>> > > > > <profile namespace="globus"
>>>>> > > > >
>>>>> key="coasterWorkerMaxwalltime">00:30:00</profile>
>>>>> > > > > <profile namespace="globus"
>>>>> key="maxwalltime">5</profile>
>>>>> > > > >
>>>>> > > > > Other people on the list - please sanity check what I
>>>>> suggest here.
>>>>> > > > >
>>>>> > > > > - Mike
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > On 4/30/09 12:40 PM, Michael Wilde wrote:
>>>>> > > > > > I just checked - TG-CDA070002T has indeed expired.
>>>>> > > > > >
>>>>> > > > > > The best for now is to move to use (only) Ranger,
>>>>> under this > > account:
>>>>> > > > > > TG-CCR080022N
>>>>> > > > > >
>>>>> > > > > > I will locate and send you a sites.xml entry in a moment.
>>>>> > > > > >
>>>>> > > > > > You need to go to a web page to activate your Ranger
>>>>> login.
>>>>> > > > > >
>>>>> > > > > > Best to contact me in IM and we can work this out.
>>>>> > > > > >
>>>>> > > > > > - Mike
>>>>> > > > > >
>>>>> > > > > >
>>>>> > > > > >
>>>>> > > > > > On 4/30/09 12:23 PM, Michael Wilde wrote:
>>>>> > > > > >> Also, what account are you running under? We may need
>>>>> to change
>>>>> > > > you to
>>>>> > > > > >> a new account - as the OSG Training account expires
>>>>> today.
>>>>> > > > > >> If that happend at Noon, it *might* be the problem.
>>>>> > > > > >>
>>>>> > > > > >> - Mike
>>>>> > > > > >>
>>>>> > > > > >>
>>>>> > > > > >> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
>>>>> > > > > >>> Hi,
>>>>> > > > > >>>
>>>>> > > > > >>> I came back to re-run my application on NCSA Mercury
>>>>> which was
>>>>> > > > tested
>>>>> > > > > >>> successfully last week after I just set up coasters
>>>>> with > > swift 0.9,
>>>>> > > > > >>> but I got many messages like the following:
>>>>> > > > > >>>
>>>>> > > > > >>> Progress: Stage in:219 Submitting:803 Submitted:1
>>>>> > > > > >>> Progress: Stage in:129 Submitting:703
>>>>> Submitted:190 Failed
>>>>> > > > but can
>>>>> > > > > >>> retry:1
>>>>> > > > > >>> Progress: Stage in:38 Submitting:425
>>>>> Submitted:556 Failed > > but can
>>>>> > > > > >>> retry:4
>>>>> > > > > >>> Failed to transfer wrapper log from
>>>>> > > > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on
>>>>> NCSA_MERCURY
>>>>> > > > > >>> Failed to transfer wrapper log from
>>>>> > > > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on
>>>>> NCSA_MERCURY
>>>>> > > > > >>> Failed to transfer wrapper log from
>>>>> > > > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on
>>>>> NCSA_MERCURY
>>>>> > > > > >>> Failed to transfer wrapper log from
>>>>> > > > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on
>>>>> NCSA_MERCURY
>>>>> > > > > >>> Failed to transfer wrapper log from
>>>>> > > > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on
>>>>> NCSA_MERCURY
>>>>> > > > > >>> Failed to transfer wrapper log from
>>>>> > > > > >>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on
>>>>> NCSA_MERCURY
>>>>> > > > > >>> Progress: Stage in:1 Submitted:1013 Active:1
>>>>> Failed but can
>>>>> > > > retry:8
>>>>> > > > > >>> Progress: Submitted:1011 Active:1 Failed but can
>>>>> retry:11
>>>>> > > > > >>> The log file for the successful run last week is ;
>>>>> > > > > >>>
>>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
>>>>> > > > > >>>
>>>>> > > > > >>> The log file for the failed run is :
>>>>> > > > > >>>
>>>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
>>>>> > > > > >>>
>>>>> > > > > >>> I don't think I did anything different, so I don't
>>>>> know why this
>>>>> > > > time
>>>>> > > > > >>> they failed. The sites.xml for Mercury is:
>>>>> > > > > >>>
>>>>> > > > > >>> <pool handle="NCSA_MERCURY">
>>>>> > > > > >>> <gridftp
>>>>> url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
>>>>> > > > > >>> <execution provider="coaster" > >
>>>>> url="grid-hg.ncsa.teragrid.org"
>>>>> > > > > >>> jobManager="gt2:PBS"/>
>>>>> > > > > >>> > >
>>>>> <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
>>>>> > > > > >>> <profile namespace="globus"
>>>>> key="queue">debug</profile>
>>>>> > > > > >>> </pool>
>>>>> > > > > >>>
>>>>> > > > > >>> Thank you for help!
>>>>> > > > > >>>
>>>>> > > > > >>> Chen, Yue
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>> This email is intended only for the use of the
>>>>> individual or > > entity
>>>>> > > > > >>> to which it is addressed and may contain information
>>>>> that is
>>>>> > > > > >>> privileged and confidential. If the reader of this
>>>>> email > > message is
>>>>> > > > > >>> not the intended recipient, you are hereby notified
>>>>> that any
>>>>> > > > > >>> dissemination, distribution, or copying of this
>>>>> communication is
>>>>> > > > > >>> prohibited. If you have received this email in
>>>>> error, please > > notify
>>>>> > > > > >>> the sender and destroy/delete all copies of the
>>>>> transmittal.
>>>>> > > > Thank you.
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > >>>
>>>>> > > > > > >
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> > > > > >>>
>>>>> > > > > >>> _______________________________________________
>>>>> > > > > >>> Swift-user mailing list
>>>>> > > > > >>> Swift-user at ci.uchicago.edu
>>>>> > > > > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>> > > > > >> _______________________________________________
>>>>> > > > > >> Swift-user mailing list
>>>>> > > > > >> Swift-user at ci.uchicago.edu
>>>>> > > > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>> > > > > > _______________________________________________
>>>>> > > > > > Swift-user mailing list
>>>>> > > > > > Swift-user at ci.uchicago.edu
>>>>> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>> > > > >
>>>>> > > > >
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > This email is intended only for the use of the individual
>>>>> or > > entity to
>>>>> > > > > which it is addressed and may contain information that is
>>>>> > > privileged and
>>>>> > > > > confidential. If the reader of this email message is not
>>>>> the intended
>>>>> > > > > recipient, you are hereby notified that any
>>>>> dissemination, > > distribution,
>>>>> > > > > or copying of this communication is prohibited. If you
>>>>> have received
>>>>> > > > > this email in error, please notify the sender and
>>>>> destroy/delete all
>>>>> > > > > copies of the transmittal. Thank you.
>>>>> > > >
>>>>> > > > > > >
>>>>> > > >
>>>>> > > > This email is intended only for the use of the individual or
>>>>> entity to
>>>>> > > > which it is addressed and may contain information that is
>>>>> privileged and
>>>>> > > > confidential. If the reader of this email message is not the
>>>>> intended
>>>>> > > > recipient, you are hereby notified that any dissemination,
>>>>> distribution,
>>>>> > > > or copying of this communication is prohibited. If you have
>>>>> received
>>>>> > > > this email in error, please notify the sender and
>>>>> destroy/delete all
>>>>> > > > copies of the transmittal. Thank you.
>>>>> > >
>>>>> > > > >
>>>>> > >
>>>>> > > This email is intended only for the use of the individual or
>>>>> entity to > > which it is addressed and may contain information
>>>>> that is privileged > > and confidential. If the reader of this
>>>>> email message is not the > > intended recipient, you are hereby
>>>>> notified that any dissemination, > > distribution, or copying of
>>>>> this communication is prohibited. If you > > have received this
>>>>> email in error, please notify the sender and > > destroy/delete
>>>>> all copies of the transmittal. Thank you.
>>>>> > >
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> > >
>>>>> > > _______________________________________________
>>>>> > > Swift-devel mailing list
>>>>> > > Swift-devel at ci.uchicago.edu
>>>>> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>> > > > > _______________________________________________
>>>>> > Swift-devel mailing list
>>>>> > Swift-devel at ci.uchicago.edu
>>>>> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list