[Swift-devel] Swift did not make progress with high throtteling rate

Emalayan Vairavanathan svemalayan at yahoo.com
Tue Apr 10 22:22:05 CDT 2012


Hi Justin,

Thank you for looking at the issue. 


The benchmark was working when JOB_THROTTLE = 0.05 at different scales (nodes = 64, 128, 256). But it didn't make any progress at high rates for a long time.

I think this is due to the storage slowdown  (I was using GPFS to have both the worker directory and also to stage-out the files). 

Now I changed my setup to stage-out to PVFS and now the benchmark successfully works with different scale (nodes = 64, 128, 256) at high job throttle rate (JOB_THROTTLE = 1000) .

Thank you again.

Regards
Emalayan



________________________________
 From: Justin M Wozniak <wozniak at mcs.anl.gov>
To: Emalayan Vairavanathan <svemalayan at yahoo.com> 
Cc: "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu> 
Sent: Tuesday, 10 April 2012 12:22 PM
Subject: Re: [Swift-devel] Swift did not make progress with high throtteling rate
 
Hi Emalayan
    Are you saying that this case does run with the default throttles and fails with jobThrottle=1000?
    I just took a look at the log file.  It looks like the jobs do get scheduled.  Are there any -info files to look at?
    Justin

On Tue, 10 Apr 2012, Emalayan Vairavanathan wrote:

> Hi All,
> 
> I tired to run my pipeline-swift benchmark on GPFS+PVFS with 128 compute nodes (Surveyor), JOB_THROTTLE = 1000 and JOBS_PER_NODE = 4.
> 
> I used GPFS as the central storage and PVFS as the intermediate storage. The benchmark did not make any progress and I found the following messages in the log file. (This happened even with MosaStore)
> 
> 
> 2012-04-10 18:18:36,710+0000 WARN  HangChecke No events in 10s.
> 2012-04-10 18:18:36,717+0000 WARN  HangChecker
> Registered futures:
> file stage_2_output - F/stage_2_output[95]:file - Open
> file stage_1_output - F/stage_1_output[85]:file - Open
> file stage_3_output - F/stage_3_output[62]:file - Open
> file stage_3_output - F/stage_3_output[44]:file - Open
> file stage_1_output - F/stage_1_output[4]:file - Open
> file stage_2_output - F/stage_2_output[3]:file - Open
> file input_data - F/input_data[121]:file - Open
> file stage_1_output - F/stage_1_output[113]:file - Open
> file stage_1_output - F/stage_1_output[98]:file - Open
> 
> I am using the swift version that I took from Justin's home directory 3 weeks before. 
> 
> Do you have any idea ? Does swift has problem with high throttling rate / jobs-per-node ? I have attached swift log file and the benchmark with this mail. I highly appreciate your suggestions.
> 
> 
> Thank you
> Emalayan

-- Justin M Wozniak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120410/74404334/attachment.html>


More information about the Swift-devel mailing list