[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?
Ben Clifford
benc at hawaga.org.uk
Thu Apr 3 14:45:22 CDT 2008
its fine for now.
There's a convention for storing log files - put the .log file and the
whole .d director somewhere in ~benc/swift-logs/ in CI NFS space.
Most simply, put files directly in there; for a more structured layout see
how mike has organised his stuff under ~benc/swift-logs/wilde/
On Thu, 3 Apr 2008, Zhao Zhang wrote:
> Sorry, Ben.
>
> I didn't save the swift log file. If you really need the old -info file, I
> could redo the test, and try to send them to you.
> But for now, I have several urgent issues.
>
> zhao
>
> Ben Clifford wrote:
> > I just asked zhao for the log files (both swift and -info) for the patched
> > run; but I think I'd like to see the unpatched run logs too.
> >
> > On Wed, 2 Apr 2008, Ioan Raicu wrote:
> >
> >
> > > Hi Ben,
> > > Thanks again for the patches, they made a huge difference, increased
> > > efficiency from 21% to 81%!
> > >
> > > Here are the numbers:
> > >
> > > 1 Node Perf Falkon Swift+Falkon Swift+Falkon (patched)
> > > Min 63.618 53.782 169.139 58.538
> > > Average 64.76 65.47253 309.1945 80.21246
> > > Median 64.74072 64.774 313.5535 76.5245
> > > Max 65.863 94.447 605.654 115.237
> > > Standard Deviation 0.488984 3.863944 52.13821
> > > 10.95652
> > > Efficiency 100% 99% 21% 81%
> > >
> > >
> > > The first column shows the per task statistic when running on 1 node (4
> > > CPUs)
> > > through Falkon. The second column are the statistics for running the
> > > application at large scale, on 2048 CPUs. The 3rd column is running
> > > Swift+Falkon (both from SVN) on 256 CPUs. The 4th column is Swift+Falkon,
> > > but
> > > Swift has the 3 patches applied. Essentially, the per task execution time
> > > was
> > > reduced from 309 seconds to 80 seconds, where the ideal would have been 64
> > > seconds. It brought the efficiency from 21% to 81% for this particular
> > > workload. This looks fantastic! We'll have to verify that we can maintain
> > > this 81% efficiency to higher number
> > > of CPUs. In the meantime, if you can think of anything else that we could
> > > do
> > > to keep pushing the 81% efficiency number higher, let us know.4
> > >
> > > Thanks again,
> > > Ioan
> > >
> > > Ben Clifford wrote:
> > >
> > > > On Mon, 31 Mar 2008, Ben Clifford wrote:
> > > >
> > > >
> > > > > This temporary directory handling is pretty ugly - it should be a
> > > > > couple
> > > > > lines change to wrapper.sh to get similar functionality using the
> > > > > existing
> > > > > swift temporary direcotry handling - change the path to /tmp and use
> > > > > cp
> > > > > instead of ln -s. That way you can take advantage of Swift's existing
> > > > > unique job IDs and error handling too.
> > > > >
> > > > Attached are three patches that will apply against svn r1775:
> > > >
> > > > The first puts temporary directories in /tmp rather than on shared fs.
> > > > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
> > > >
> > > > The second copies the application file to the worker in each job
> > > > execution
> > > > (though doesn't do any worker-node caching of such between jobs)
> > > > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
> > > >
> > > > The third creates the worker node log on /tmp and copies it at the end.
> > > > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
> > > >
> > > > The three modify all wrapper.sh and should be applied in the above
> > > > order.
> > > >
> > > > With the first two patches, the timestamps in the usual info logs will
> > > > provide information about how long the copies take, in the same way that
> > > > they usually indicate times for other execution stages.
> > > >
> > > >
> > >
> >
> >
More information about the Swift-user
mailing list