[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Ben Clifford benc at hawaga.org.uk
Thu Apr 3 14:45:22 CDT 2008


its fine for now.

There's a convention for storing log files - put the .log file and the 
whole .d director somewhere in ~benc/swift-logs/ in CI NFS space.

Most simply, put files directly in there; for a more structured layout see 
how mike has organised his stuff under ~benc/swift-logs/wilde/

On Thu, 3 Apr 2008, Zhao Zhang wrote:

> Sorry, Ben.
> 
> I didn't save the swift log file. If you really need the old -info file, I
> could redo the test, and try to send them to you.
> But for now, I have several urgent issues.
> 
> zhao
> 
> Ben Clifford wrote:
> > I just asked zhao for the log files (both swift and -info) for the patched
> > run; but I think I'd like to see the unpatched run logs too.
> > 
> > On Wed, 2 Apr 2008, Ioan Raicu wrote:
> > 
> >   
> > > Hi Ben,
> > > Thanks again for the patches, they made a huge difference, increased
> > > efficiency from 21% to 81%!
> > > 
> > > Here are the numbers:
> > > 
> > > 	1 Node Perf 	Falkon 	Swift+Falkon 	Swift+Falkon (patched)
> > > Min 	63.618 	53.782 	169.139 	58.538
> > > Average 	64.76 	65.47253 	309.1945 	80.21246
> > > Median 	64.74072 	64.774 	313.5535 	76.5245
> > > Max 	65.863 	94.447 	605.654 	115.237
> > > Standard Deviation 	0.488984 	3.863944 	52.13821
> > > 10.95652
> > > Efficiency 	100% 	99% 	21% 	81%
> > > 
> > > 
> > > The first column shows the per task statistic when running on 1 node (4
> > > CPUs)
> > > through Falkon.  The second column are the statistics for running the
> > > application at large scale, on 2048 CPUs.  The 3rd column is running
> > > Swift+Falkon (both from SVN) on 256 CPUs.  The 4th column is Swift+Falkon,
> > > but
> > > Swift has the 3 patches applied.  Essentially, the per task execution time
> > > was
> > > reduced from 309 seconds to 80 seconds, where the ideal would have been 64
> > > seconds.  It brought the efficiency from 21% to 81% for this particular
> > > workload.  This looks fantastic! We'll have to verify that we can maintain
> > > this 81% efficiency to higher number
> > > of CPUs.  In the meantime, if you can think of anything else that we could
> > > do
> > > to keep pushing the 81% efficiency number higher, let us know.4
> > > 
> > > Thanks again,
> > > Ioan
> > > 
> > > Ben Clifford wrote:
> > >     
> > > > On Mon, 31 Mar 2008, Ben Clifford wrote:
> > > > 
> > > >         
> > > > > This temporary directory handling is pretty ugly - it should be a
> > > > > couple
> > > > > lines change to wrapper.sh to get similar functionality using the
> > > > > existing
> > > > > swift temporary direcotry handling - change the path to /tmp and use
> > > > > cp
> > > > > instead of ln -s. That way you can take advantage of Swift's existing
> > > > > unique job IDs and error handling too.
> > > > >             
> > > > Attached are three patches that will apply against svn r1775:
> > > > 
> > > > The first puts temporary directories in /tmp rather than on shared fs.
> > > > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
> > > > 
> > > > The second copies the application file to the worker in each job
> > > > execution
> > > > (though doesn't do any worker-node caching of such between jobs)
> > > > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
> > > > 
> > > > The third creates the worker node log on /tmp and copies it at the end.
> > > > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
> > > > 
> > > > The three modify all wrapper.sh and should be applied in the above
> > > > order.
> > > > 
> > > > With the first two patches, the timestamps in the usual info logs will
> > > > provide information about how long the copies take, in the same way that
> > > > they usually indicate times for other execution stages.
> > > > 
> > > >         
> > >     
> > 
> >   



More information about the Swift-user mailing list