[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?
Ben Clifford
benc at hawaga.org.uk
Wed Apr 2 15:36:18 CDT 2008
any chance you can test the patches separately to see how they each
contribute to this change?
On Wed, 2 Apr 2008, Ioan Raicu wrote:
> Hi Ben,
> Thanks again for the patches, they made a huge difference, increased
> efficiency from 21% to 81%!
>
> Here are the numbers:
>
> 1 Node Perf Falkon Swift+Falkon Swift+Falkon (patched)
> Min 63.618 53.782 169.139 58.538
> Average 64.76 65.47253 309.1945 80.21246
> Median 64.74072 64.774 313.5535 76.5245
> Max 65.863 94.447 605.654 115.237
> Standard Deviation 0.488984 3.863944 52.13821
> 10.95652
> Efficiency 100% 99% 21% 81%
>
>
> The first column shows the per task statistic when running on 1 node (4 CPUs)
> through Falkon. The second column are the statistics for running the
> application at large scale, on 2048 CPUs. The 3rd column is running
> Swift+Falkon (both from SVN) on 256 CPUs. The 4th column is Swift+Falkon, but
> Swift has the 3 patches applied. Essentially, the per task execution time was
> reduced from 309 seconds to 80 seconds, where the ideal would have been 64
> seconds. It brought the efficiency from 21% to 81% for this particular
> workload. This looks fantastic!
> We'll have to verify that we can maintain this 81% efficiency to higher number
> of CPUs. In the meantime, if you can think of anything else that we could do
> to keep pushing the 81% efficiency number higher, let us know.4
>
> Thanks again,
> Ioan
>
> Ben Clifford wrote:
> > On Mon, 31 Mar 2008, Ben Clifford wrote:
> >
> >
> > > This temporary directory handling is pretty ugly - it should be a couple
> > > lines change to wrapper.sh to get similar functionality using the existing
> > > swift temporary direcotry handling - change the path to /tmp and use cp
> > > instead of ln -s. That way you can take advantage of Swift's existing
> > > unique job IDs and error handling too.
> > >
> >
> > Attached are three patches that will apply against svn r1775:
> >
> > The first puts temporary directories in /tmp rather than on shared fs.
> > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
> >
> > The second copies the application file to the worker in each job execution
> > (though doesn't do any worker-node caching of such between jobs)
> > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
> >
> > The third creates the worker node log on /tmp and copies it at the end.
> > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
> >
> > The three modify all wrapper.sh and should be applied in the above order.
> >
> > With the first two patches, the timestamps in the usual info logs will
> > provide information about how long the copies take, in the same way that
> > they usually indicate times for other execution stages.
> >
> >
>
>
More information about the Swift-user
mailing list