[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Ben Clifford benc at hawaga.org.uk
Thu Apr 3 05:06:26 CDT 2008


I just asked zhao for the log files (both swift and -info) for the patched 
run; but I think I'd like to see the unpatched run logs too.

On Wed, 2 Apr 2008, Ioan Raicu wrote:

> Hi Ben,
> Thanks again for the patches, they made a huge difference, increased
> efficiency from 21% to 81%!
> 
> Here are the numbers:
> 
> 	1 Node Perf 	Falkon 	Swift+Falkon 	Swift+Falkon (patched)
> Min 	63.618 	53.782 	169.139 	58.538
> Average 	64.76 	65.47253 	309.1945 	80.21246
> Median 	64.74072 	64.774 	313.5535 	76.5245
> Max 	65.863 	94.447 	605.654 	115.237
> Standard Deviation 	0.488984 	3.863944 	52.13821
> 10.95652
> Efficiency 	100% 	99% 	21% 	81%
> 
> 
> The first column shows the per task statistic when running on 1 node (4 CPUs)
> through Falkon.  The second column are the statistics for running the
> application at large scale, on 2048 CPUs.  The 3rd column is running
> Swift+Falkon (both from SVN) on 256 CPUs.  The 4th column is Swift+Falkon, but
> Swift has the 3 patches applied.  Essentially, the per task execution time was
> reduced from 309 seconds to 80 seconds, where the ideal would have been 64
> seconds.  It brought the efficiency from 21% to 81% for this particular
> workload.  This looks fantastic! 
> We'll have to verify that we can maintain this 81% efficiency to higher number
> of CPUs.  In the meantime, if you can think of anything else that we could do
> to keep pushing the 81% efficiency number higher, let us know.4
> 
> Thanks again,
> Ioan
> 
> Ben Clifford wrote:
> > On Mon, 31 Mar 2008, Ben Clifford wrote:
> > 
> >   
> > > This temporary directory handling is pretty ugly - it should be a couple
> > > lines change to wrapper.sh to get similar functionality using the existing
> > > swift temporary direcotry handling - change the path to /tmp and use cp
> > > instead of ln -s. That way you can take advantage of Swift's existing
> > > unique job IDs and error handling too.
> > >     
> > 
> > Attached are three patches that will apply against svn r1775:
> > 
> > The first puts temporary directories in /tmp rather than on shared fs.
> > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
> > 
> > The second copies the application file to the worker in each job execution
> > (though doesn't do any worker-node caching of such between jobs)
> > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
> > 
> > The third creates the worker node log on /tmp and copies it at the end.
> > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
> > 
> > The three modify all wrapper.sh and should be applied in the above order.
> > 
> > With the first two patches, the timestamps in the usual info logs will
> > provide information about how long the copies take, in the same way that
> > they usually indicate times for other execution stages.
> > 
> >   
> 
> 



More information about the Swift-user mailing list