[Swift-devel] IO overheads of swift wrapper scripts on BlueGene/P
Allan Espinosa
aespinosa at cs.uchicago.edu
Fri Oct 16 21:07:16 CDT 2009
The corresponding swift-plot-log is in
http://www.ci.uchicago.edu/~aespinosa/swift/report-blastp-test3.4.7_3cpn.32ifs.192cpu/.
I used the same runid for two runs so the statistics were merged.
Just focus on the second lump of events that happened in the log.
Here's the snippet of an info file.
Progress 2009-10-16 17:30:44.576141000-0500 LOG_START
_____________________________________________________________________________
Wrapper
_____________________________________________________________________________
Job directory mode is: link on shared filesystem
DIR=jobs/5/blastall-59j5q3ij
EXEC=/home/espinosa/workflows/jgi_blastp/blastall_wrapper
STDIN=
STDOUT=home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/D00STDERR=home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/D00
DIRS=home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/D0000INF=
OUTF=home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/D0000KICKSTART=
ARGS=-p blastp -m 8 -e 1.0e-5 -FF -d /dataifs/nr -i /intrepid-fs0/users/espinosa
ARGC=13
Progress 2009-10-16 17:30:46.378238000-0500 CREATE_JOBDIR
Created job directory: jobs/5/blastall-59j5q3ij
Progress 2009-10-16 17:30:59.498509000-0500 CREATE_INPUTDIR
Created output directory: jobs/5/blastall-59j5q3ij/home/espinosa/workflows/jgi_b
Progress 2009-10-16 17:33:25.777819000-0500 LINK_INPUTS
Progress 2009-10-16 17:41:28.979051000-0500 EXECUTE
Moving back to workflow directory
/intrepid-fs0/users/espinosa/scratch/jgi-blastProgress 2009-10-16
17:58:11.732629000-0500 EXECUTE_DONE
Job ran successfully
Progress 2009-10-16 18:00:33.756364000-0500 COPYING_OUTPUTS
Progress 2009-10-16 18:08:19.970449000-0500 RM_JOBDIR
Progress 2009-10-16 18:08:57.155065000-0500 END
from LOG_START to END, the time is around 2292.578924 seconds. the
time between EXECUTE and EXECUTE_DONE is 1002.753578 which is fairly
close to 891 seconds. It verifies that pre-execution states in
_swiftwrap does add some overhead. specially in 256 cpus per PSET as
were we are most likely starting to saturate the Tree network and the
forwarding daemon.
-Allan
2009/10/16 Mihael Hategan <hategan at mcs.anl.gov>:
> On Fri, 2009-10-16 at 20:02 -0500, Allan Espinosa wrote:
>> The attached graph looks interesting. The green on the left shows the
>> 192 job workload using the straight falkon client. The right one is
>> through swift. Using pure falkon the total time for the workload is
>> at around 814 seconds while the swift workflow took 1520 seconds. In
>> some repeats of the same run, Swift took 3078 seconds.
>
> It would be useful to plot the swift stuff with swift-plot-log.
>
> It would also be useful to make use of the info files which record
> timing data for what the wrapper does.
>
>>
>> Just to confirm, in the deef-provider, the actual jobs that are
>> dispatched to the Falkon workers include the ones in the _swiftwrap
>> script correct?
>
> _swiftwrap is the job which forks blast (in your case).
>
>> This must be the one inducing the overheads. From
>> the top of my head create jobdir and link input dir can be the
>> culprits.
>
> Sounds plausible.
>
> To that add move-output-files-to-shared-dir. 3 per job as I see it.
> Again, info files should provide some details.
>
>>
>> My workflow below does not do any input staging to the work directory.
>> Only output staging is done which I believe is separate from the
>> _swiftwrap script and will therefore not register time in the falkon
>> graphs (or in EXECUTE state in swift).
>
> Right. Once _swiftwrap is done, the executor should be released, unless
> there's some other part in there that takes time (like, say,
> notification). Ioan would know this.
>
>
>
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list