[Swift-devel] Persistent coasters on OSG Swift not getting started cores

Ketan Maheshwari ketancmaheshwari at gmail.com
Fri Sep 9 11:52:17 CDT 2011


Hi Mihael, All,

I am trying to run the DSSAT workflow, a simple one process catsn-like loop.

The setup on OSG is persisten coasters based with the following elements:

1. A coaster service is started on the head node
2. Workers are started on OSG sites. I am using 11 OSG sites.
3. The workers are submitted in the form of condor jobs which connect back
to the service running at the headnode.
4. In the current instance that I am running, 500 workers are submitted to
start, out of which 280 workers are in running state as of now.

My throttles: jobthrottle, foreach throttle are set to run 500 tasks at a
time.

However, I am seeing a see-saw pattern of active tasks whose peak is very
low. What I am seeing is: the number of active tasks start rising gradually
from 0 to about 30 followed by a decrease from 30 to 0 and back to 30.

The logs and sources are at : http://ci.uchicago.edu/~ketan/DSSAT-logs.tgz

This tarball contains the following:

DSSAT-logs/sites.grid-ps.xml
DSSAT-logs/tc-provider-staging
DSSAT-logs/cf.ps
DSSAT-logs/RunDSSAT.swift

Condor, swift logs

DSSAT-logs/condor.log
DSSAT-logs/swift.log

Service and worker's stdouts

DSSAT-logs/service-0.out
DSSAT-logs/swift-workers.out

Three runlogs since the run was resumed twice:

DSSAT-logs/RunDSSAT-20110909-1025-hjcelum9.log
DSSAT-logs/RunDSSAT-20110909-1030-jjefp0sb.log
DSSAT-logs/RunDSSAT-20110909-0918-0hk7ign5.log

Any insights would be helpful.

Regards,
-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110909/8df34b0f/attachment.html>


More information about the Swift-devel mailing list