[Swift-user] Execution error

Michael Wilde wilde at mcs.anl.gov
Thu Apr 30 13:07:55 CDT 2009


Yue, use this XML pool element to access ranger:

  <pool handle="ranger">
     <execution provider="coaster"
                url="gatekeeper.ranger.tacc.teragrid.org"
                jobManager="gt2:gt2:SGE"/>
     <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org:2811/" />
     <profile namespace="env"
              key="SWIFT_JOBDIR_PATH">/tmp/yuechen/jobdir</profile>
     <profile namespace="globus" key="project">TG-CCR080022N</profile>
     <profile namespace="globus" key="coastersPerNode">16</profile>
     <profile namespace="globus" key="queue">development</profile>
     <profile namespace="globus"
              key="coasterWorkerMaxwalltime">00:40:00</profile>
     <profile namespace="globus" key="maxwalltime">31</profile>
     <profile namespace="karajan" key="initialScore">50</profile>
     <profile namespace="karajan" key="jobThrottle">10</profile>
     <workdirectory>/work/00306/tg455797/swiftwork</workdirectory>
   </pool>


You will need to also do these steps:

Go to this web page to enable your Ranger account:

https://tas.tacc.utexas.edu/TASMigration/AccountActivation.aspx

Then login to Ranger via the TeraGrid portal and put your ssh keys in 
place (assuming you use ssh keys, which you should)

While on Ranger, do this:

echo $WORK
mkdir $work/swiftwork

and put the full path of your $WORK/swiftwork directory in the 
<workdirectory> element above. (My login is tg455etc, yours is yuechen)

Then scp your code to Ranger and compile it.

Then create a tc.data entry for your ptmap app

Next, set your time values in the sites.xml entry above to suitable 
values for Ranger. You'll need to measure times, but I think you will 
find Ranger about twice as fast as Mercury for CPU-bound jobs.

The values above were set for one app job per coaster. I think you can 
probably do more.

If you estimate a run time of 5 minutes, use:

     <profile namespace="globus"
              key="coasterWorkerMaxwalltime">00:30:00</profile>
     <profile namespace="globus" key="maxwalltime">5</profile>

Other people on the list - please sanity check what I suggest here.

- Mike


On 4/30/09 12:40 PM, Michael Wilde wrote:
> I just checked - TG-CDA070002T has indeed expired.
> 
> The best for now is to move to use (only) Ranger, under this account:
> TG-CCR080022N
> 
> I will locate and send you a sites.xml entry in a moment.
> 
> You need to go to a web page to activate your Ranger login.
> 
> Best to contact me in IM and we can work this out.
> 
> - Mike
> 
> 
> 
> On 4/30/09 12:23 PM, Michael Wilde wrote:
>> Also, what account are you running under? We may need to change you to 
>> a new account - as the OSG Training account expires today.
>> If that happend at Noon, it *might* be the problem.
>>
>> - Mike
>>
>>
>> On 4/30/09 12:08 PM, Yue, Chen - BMD wrote:
>>> Hi,
>>>  
>>> I came back to re-run my application on NCSA Mercury which was tested 
>>> successfully last week after I just set up coasters with swift 0.9, 
>>> but I got many messages like the following:
>>>  
>>> Progress:  Stage in:219  Submitting:803  Submitted:1
>>> Progress:  Stage in:129  Submitting:703  Submitted:190 Failed but can 
>>> retry:1
>>> Progress:  Stage in:38  Submitting:425  Submitted:556 Failed but can 
>>> retry:4
>>> Failed to transfer wrapper log from 
>>> PTMap2-unmod-20090430-1203-r19dxq10/info/h on NCSA_MERCURY
>>> Failed to transfer wrapper log from 
>>> PTMap2-unmod-20090430-1203-r19dxq10/info/j on NCSA_MERCURY
>>> Failed to transfer wrapper log from 
>>> PTMap2-unmod-20090430-1203-r19dxq10/info/p on NCSA_MERCURY
>>> Failed to transfer wrapper log from 
>>> PTMap2-unmod-20090430-1203-r19dxq10/info/1 on NCSA_MERCURY
>>> Failed to transfer wrapper log from 
>>> PTMap2-unmod-20090430-1203-r19dxq10/info/b on NCSA_MERCURY
>>> Failed to transfer wrapper log from 
>>> PTMap2-unmod-20090430-1203-r19dxq10/info/c on NCSA_MERCURY
>>> Progress:  Stage in:1  Submitted:1013  Active:1 Failed but can retry:8
>>> Progress:  Submitted:1011  Active:1 Failed but can retry:11
>>> The log file for the successful run last week is ;
>>> /home/yuechen/PTMap2/PTMap2-unmod-20090422-1216-4s3037gf.log
>>>  
>>> The log file for the failed run is :
>>> /home/yuechen/PTMap2/PTMap2-unmod-20090430-1151-rf2uuhb7.log
>>>  
>>> I don't think I did anything different, so I don't know why this time 
>>> they failed. The sites.xml for Mercury is:
>>>  
>>>  <pool handle="NCSA_MERCURY">
>>>     <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org"/>
>>>     <execution provider="coaster" url="grid-hg.ncsa.teragrid.org" 
>>> jobManager="gt2:PBS"/>
>>>     <workdirectory>/gpfs_scratch1/yuechen/swiftwork</workdirectory>
>>>     <profile namespace="globus" key="queue">debug</profile>
>>>  </pool>
>>>  
>>> Thank you for help!
>>>  
>>> Chen, Yue
>>>  
>>>  
>>>
>>>  
>>>
>>>  
>>>
>>>
>>>  
>>>
>>>
>>> This email is intended only for the use of the individual or entity 
>>> to which it is addressed and may contain information that is 
>>> privileged and confidential. If the reader of this email message is 
>>> not the intended recipient, you are hereby notified that any 
>>> dissemination, distribution, or copying of this communication is 
>>> prohibited. If you have received this email in error, please notify 
>>> the sender and destroy/delete all copies of the transmittal. Thank you.
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user



More information about the Swift-user mailing list