[Swift-devel] IO overheads of swift wrapper scripts on BlueGene/P
Ioan Raicu
iraicu at cs.uchicago.edu
Mon Oct 19 10:25:15 CDT 2009
OK, I see now. The move in theory should be light weight, right? As its
just metadata that is changing (e.g. moving on the same filesystem, not
copying), right? Or is the job dir really in the compute node RAM, and
the move is actually doing a copy from CN RAM to GPFS?
Ioan
Allan Espinosa wrote:
> swift does extra moves from job directory to work directory. which
> takes long in this case.
>
> -Allan
>
> 2009/10/18 Ioan Raicu <iraicu at cs.uchicago.edu>:
>
>> Hi Allan,
>> I don't remember, but your Falkon only run seemed to run OK, right? Didn't
>> that also produce the output files Swift is producing? Or is Swift doing an
>> extra step, to copy/move files from one place to another after the
>> computation terminates, which is the thing that takes so long? Just trying
>> to understand the difference between the Falkon only run and Swift run.
>>
>> Ioan
>>
>> --
>> =================================================================
>> Ioan Raicu, Ph.D.
>> NSF/CRA Computing Innovation Fellow
>> =================================================================
>> Center for Ultra-scale Computing and Information Security (CUCIS)
>> Department of Electrical Engineering and Computer Science
>> Northwestern University
>> 2145 Sheridan Rd, Tech M384
>> Evanston, IL 60208-3118
>> =================================================================
>> Cel: 1-847-722-0876
>> Tel: 1-847-491-8163
>> Email: iraicu at eecs.northwestern.edu
>> Web: http://www.eecs.northwestern.edu/~iraicu/
>> https://wiki.cucis.eecs.northwestern.edu/
>> =================================================================
>> =================================================================
>>
>>
>>
>> Allan Espinosa wrote:
>>
>> Here I tried one directory per job (Q0000130). 3 output files are
>> expected per directory which are produced by a single job:
>>
>> Progress 2009-10-17 20:53:56.943503000-0500 LOG_START
>>
>> _____________________________________________________________________________
>>
>> Wrapper
>> _____________________________________________________________________________
>>
>> Job directory mode is: link on shared filesystem
>> DIR=jobs/7/blastall-715ul5ij
>> EXEC=/home/espinosa/workflows/jgi_blastp/blastall_wrapper
>> STDIN=
>> STDOUT=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.sout
>> STDERR=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.serr
>> DIRS=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130
>> INF=
>> OUTF=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.out^home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.serr^home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.sout
>> KICKSTART=
>> ARGS=-p blastp -m 8 -e 1.0e-5 -FF -d /dataifs/nr -i
>> /intrepid-fs0/users/espinosa/persistent/datasets/nr_bob/queries/mock_2seq/D0000000/SEQ0000130.fasta
>> -o
>> home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.out
>> ARGC=13
>> Progress 2009-10-17 20:53:58.656335000-0500 CREATE_JOBDIR
>> Created job directory: jobs/7/blastall-715ul5ij
>> Progress 2009-10-17 20:54:05.204962000-0500 CREATE_INPUTDIR
>> Created output directory:
>> jobs/7/blastall-715ul5ij/home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130
>> Progress 2009-10-17 20:54:15.498666000-0500 LINK_INPUTS
>> Progress 2009-10-17 20:54:19.900786000-0500 EXECUTE
>> Moving back to workflow directory
>> /fuse/intrepid-fs0/users/espinosa/scratch/jgi-blastp_runs/blastp-test3.2.7_3cpn.64ifs.192cpu
>> Progress 2009-10-17 21:20:23.390800000-0500 EXECUTE_DONE
>> Job ran successfully
>> Progress 2009-10-17 21:31:11.179664000-0500 COPYING_OUTPUTS
>> Progress 2009-10-17 21:37:14.539569000-0500 RM_JOBDIR
>> Progress 2009-10-17 21:38:24.220130000-0500 END
>>
>>
>> COPYING_OUTPUTS still take time.
>>
>> 2009/10/17 Michael Wilde <wilde at mcs.anl.gov>:
>>
>>
>> Remember that any situation in which multiple IONs modify the same file or
>> directory (ie by creating files or directories in the same parent directory)
>> will cause severe contention and performance degradation on any GPFS
>> filesystem.
>>
>> In addition to creating many directories, you need to ensure that no single
>> file or directories is likely to ever be written to from multiple client
>> nodes (eg IONs on the BG/P) concurrently.
>>
>>
>> This workload is just over 1 PSET so there are no other IONs
>> contending over the directories.
>>
>>
>>
>> Have you done that in this workload, Allan?
>>
>> - Mike
>>
>>
>> On 10/17/09 2:59 AM, Allan Espinosa wrote:
>>
>>
>> I was using 1000 files (or was it 3000?) per directory. it looks like
>> i need to lower my ratio...
>>
>> -Allan
>>
>> 2009/10/16 Mihael Hategan <hategan at mcs.anl.gov>:
>>
>>
>> On Fri, 2009-10-16 at 21:07 -0500, Allan Espinosa wrote:
>>
>>
>> Progress 2009-10-16 18:00:33.756364000-0500 COPYING_OUTPUTS
>> Progress 2009-10-16 18:08:19.970449000-0500 RM_JOBDIR
>>
>>
>> Grr. 8 minutes spent COPYING_OUTPUTS.
>>
>> What would be useful is to aggregate all the access that happened on
>> that FS from all the relevant jobs, to see the exact thing that causes
>> contention. I strongly suspect it's
>> home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/
>>
>> Pretty much all the outputs seem to go to that directory.
>>
>> I'm afraid however that the information in the logs is insufficient.
>> Strace with relevant options (for fs calls only) may be useful if you
>> want to try.
>>
>> Alternatively, you could try to spread your output over multiple
>> directories and see what the difference is.
>>
>> Also, it may be interesting to see the dependence between the delay and
>> the number of contending processes. That is so that we know the limit of
>> how many processes we can allow to compete for a shared resource without
>> causing too much trouble.
>>
>> Mihael
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> ===============================
>
>
--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384
Evanston, IL 60208-3118
=================================================================
Cel: 1-847-722-0876
Tel: 1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web: http://www.eecs.northwestern.edu/~iraicu/
https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20091019/95a905c4/attachment.html>
More information about the Swift-devel
mailing list