[Swift-devel] IO overheads of swift wrapper scripts on BlueGene/P

Mon Oct 19 10:25:15 CDT 2009

OK, I see now. The move in theory should be light weight, right? As its 
just metadata that is changing (e.g. moving on the same filesystem, not 
copying), right? Or is the job dir really in the compute node RAM, and 
the move is actually doing a copy from CN RAM to GPFS?

Ioan

Allan Espinosa wrote:
> swift does extra moves from job directory to work directory.  which
> takes long in this case.
>
> -Allan
>
> 2009/10/18 Ioan Raicu <iraicu at cs.uchicago.edu>:
>   
>> Hi Allan,
>> I don't remember, but your Falkon only run seemed to run OK, right? Didn't
>> that also produce the output files Swift is producing? Or is Swift doing an
>> extra step, to copy/move files from one place to another after the
>> computation terminates, which is the thing that takes so long? Just trying
>> to understand the difference between the Falkon only run and Swift run.
>>
>> Ioan
>>
>> --
>> =================================================================
>> Ioan Raicu, Ph.D.
>> NSF/CRA Computing Innovation Fellow
>> =================================================================
>> Center for Ultra-scale Computing and Information Security (CUCIS)
>> Department of Electrical Engineering and Computer Science
>> Northwestern University
>> 2145 Sheridan Rd, Tech M384
>> Evanston, IL 60208-3118
>> =================================================================
>> Cel:   1-847-722-0876
>> Tel:   1-847-491-8163
>> Email: iraicu at eecs.northwestern.edu
>> Web:   http://www.eecs.northwestern.edu/~iraicu/
>>        https://wiki.cucis.eecs.northwestern.edu/
>> =================================================================
>> =================================================================
>>
>>
>>
>> Allan Espinosa wrote:
>>
>> Here I tried one directory per job (Q0000130).  3 output files are
>> expected per directory which are produced by a single job:
>>
>> Progress  2009-10-17 20:53:56.943503000-0500  LOG_START
>>
>> _____________________________________________________________________________
>>
>>         Wrapper
>> _____________________________________________________________________________
>>
>> Job directory mode is: link on shared filesystem
>> DIR=jobs/7/blastall-715ul5ij
>> EXEC=/home/espinosa/workflows/jgi_blastp/blastall_wrapper
>> STDIN=
>> STDOUT=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.sout
>> STDERR=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.serr
>> DIRS=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130
>> INF=
>> OUTF=home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.out^home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.serr^home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.sout
>> KICKSTART=
>> ARGS=-p blastp -m 8 -e 1.0e-5 -FF -d /dataifs/nr -i
>> /intrepid-fs0/users/espinosa/persistent/datasets/nr_bob/queries/mock_2seq/D0000000/SEQ0000130.fasta
>> -o
>> home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130/out_Q0000130.out
>> ARGC=13
>> Progress  2009-10-17 20:53:58.656335000-0500  CREATE_JOBDIR
>> Created job directory: jobs/7/blastall-715ul5ij
>> Progress  2009-10-17 20:54:05.204962000-0500  CREATE_INPUTDIR
>> Created output directory:
>> jobs/7/blastall-715ul5ij/home/espinosa/workflows/jgi_blastp/oldtests/test3.2.7_3cpn.64ifs.192cpu/output/D0000000/Q0000130
>> Progress  2009-10-17 20:54:15.498666000-0500  LINK_INPUTS
>> Progress  2009-10-17 20:54:19.900786000-0500  EXECUTE
>> Moving back to workflow directory
>> /fuse/intrepid-fs0/users/espinosa/scratch/jgi-blastp_runs/blastp-test3.2.7_3cpn.64ifs.192cpu
>> Progress  2009-10-17 21:20:23.390800000-0500  EXECUTE_DONE
>> Job ran successfully
>> Progress  2009-10-17 21:31:11.179664000-0500  COPYING_OUTPUTS
>> Progress  2009-10-17 21:37:14.539569000-0500  RM_JOBDIR
>> Progress  2009-10-17 21:38:24.220130000-0500  END
>>
>>
>> COPYING_OUTPUTS still take time.
>>
>> 2009/10/17 Michael Wilde <wilde at mcs.anl.gov>:
>>
>>
>> Remember that any situation in which multiple IONs modify the same file or
>> directory (ie by creating files or directories in the same parent directory)
>> will cause severe contention and performance degradation on any GPFS
>> filesystem.
>>
>> In addition to creating many directories, you need to ensure that no single
>> file or directories is likely to ever be written to from multiple client
>> nodes (eg IONs on the BG/P) concurrently.
>>
>>
>> This workload is just over 1 PSET so there are no other IONs
>> contending over the directories.
>>
>>
>>
>> Have you done that in this workload, Allan?
>>
>> - Mike
>>
>>
>> On 10/17/09 2:59 AM, Allan Espinosa wrote:
>>
>>
>> I was using 1000 files  (or was it 3000?) per directory. it looks like
>> i need to lower my ratio...
>>
>> -Allan
>>
>> 2009/10/16 Mihael Hategan <hategan at mcs.anl.gov>:
>>
>>
>> On Fri, 2009-10-16 at 21:07 -0500, Allan Espinosa wrote:
>>
>>
>> Progress  2009-10-16 18:00:33.756364000-0500  COPYING_OUTPUTS
>> Progress  2009-10-16 18:08:19.970449000-0500  RM_JOBDIR
>>
>>
>> Grr. 8 minutes spent COPYING_OUTPUTS.
>>
>> What would be useful is to aggregate all the access that happened on
>> that FS from all the relevant jobs, to see the exact thing that causes
>> contention. I strongly suspect it's
>> home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/
>>
>> Pretty much all the outputs seem to go to that directory.
>>
>> I'm afraid however that the information in the logs is insufficient.
>> Strace with relevant options (for fs calls only) may be useful if you
>> want to try.
>>
>> Alternatively, you could try to spread your output over multiple
>> directories and see what the difference is.
>>
>> Also, it may be interesting to see the dependence between the delay and
>> the number of contending processes. That is so that we know the limit of
>> how many processes we can allow to compete for a shared resource without
>> causing too much trouble.
>>
>> Mihael
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     
>
> ===============================
>
>   

-- 
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384 
Evanston, IL 60208-3118
=================================================================
Cel:   1-847-722-0876
Tel:   1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web:   http://www.eecs.northwestern.edu/~iraicu/
       https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20091019/95a905c4/attachment.html>