[Swift-devel] Re: Question of wrapper.sh

Ioan Raicu iraicu at cs.uchicago.edu
Fri Mar 7 00:03:04 CST 2008



Ben Clifford wrote:
> you should send questions like this to swift-devel or swift-user list 
> rather than attempting to compose your own list of likely candidates and 
> witholding the information from the public archives.
>   
Made the reply to the Swift devel mailing lists...
>   
>> I am trying to dig into the wrapper.sh, disable the log to enhance the
>> performance.
>>     
>
> Do you have numbers that suggest logging is causing a performance 
> degradation? 
By default, Swift is able to do about 5 jobs/sec running over Falkon on 
256 CPUs on the BG/P, where each job is a sleep 0.  The Falkon command 
line client can do about 1700 jobs/sec on the same hardware.  9 months 
ago, I saw Swift go from a few jobs/sec to about 50 jobs/sec by 
stripping out all logging (i.e. echo "..." >> LOG) from the wrapper 
script, and by removing the mkdir and symbolic linking.  Since the mkdir 
is much improved now, I assume that is not the bottleneck, but doing 
10~20 echo to a log file on the shared file system from many nodes at 
the same time is expensive, which I think is the main bottleneck in the 
current wrapper script.  Once Zhao is done disabling all logging, except 
for necessary ones, we'll have a better idea of how fast we can go, and 
if it is necessary to eliminate the mkdir step as well.  I think getting 
about 50 jobs/sec is within reach by streamlining the wrapper.sh script, 
but I think we'll have to think of ways to push those numbers even higher!
> I notice you're using quite an old version of swift 
iraicu at login1.surveyor:/home/zzhang/cog/modules/vdsk> svn info
Path: .
URL: https://svn.ci.uchicago.edu/svn/vdl2/trunk
Repository Root: https://svn.ci.uchicago.edu/svn/vdl2
Repository UUID: e2bb083e-7f23-0410-b3a8-8253ac9ef6d8
Revision: 1673
Node Kind: directory
Schedule: normal
Last Changed Author: benc at CI.UCHICAGO.EDU
Last Changed Rev: 1670
Last Changed Date: 2008-02-09 12:42:56 -0600 (Sat, 09 Feb 2008)

It doesn't seem that old, but we'll update to the latest one before we 
do more experiments.
> (the last 
> release) - we made substantial log speed improvements subsequent to that. 
> If you're hitting log file problems here, there is a fair chance that 
> you'll encounter other scalability problems on the site filesystem that we 
> also fixed in SVN some months ago.
>   
Right, I know, and I thought we were using a late enough version that 
had those fixes.  Just to be sure, we'll upgrade!
>   
>> One thing I notice is that for each job, correct me if I am wrong, SWIFT 
>> will make a unique directory with the date and a random string, then 
>> copy wrapper.sh and other necessary files to that directory.
>>     
>
> It should do that one per workflow per site, not per job.
>   
Every job still has a scratch space sandbox, which results in a mkdir, 
symbolic linking, and finally a cleanup remove dir.  I think this is the 
dir he is referring to.  BTW, if there would be an easy way to eliminate 
this entire mkdir part of the wrapper script without breaking anything 
in Swift, it would be nice.  The apps we are dealing with don't need the 
sandboxing, as we know all input files, and all output files, and we'll 
never have input as *.fits that might be ambiguous if we don't sandbox.

Ioan
>   
>>        echo -abc  "Hello, world!" stdout=@filename(t);
>>     
>
> Put the -abc in quotes:
>
> echo "-abc" "hello"
>
> to solve the immediate problem.
>
> However, note that the command:
>     echo -abc hello
> executes successfully on my linux and os x boxes.
>
> If you want a job that will fail, try the 'false' command.
>
>   
>> RunID: 20080306-1647-4nd1cymf
>> Execution failed:
>>        Variable not found: abc
>>     
>
> This is because you did not quote "-abc", so swift is trying to give you 
> the unary negative value of -abc (just like if you said -abc in Java or 
> C).
>
>   
>> But I still can not find the default working directory of this task. 
>> Also, I know there is a log file for this wrapper, so it is in the 
>> working directory, right?
>>     
>
> Swift will never have attempted to run the above, because of the above 
> error.
>
>   
>> Another question is, could you give me a simple task description of 
>> wrapper.sh? So I could invoke wrapper.sh directly without falkon. I got 
>> a task description before,
>>
>> 140.221.82.10 : urn:0-195-1203621652641 : EXECUTABLE /bin/bash ARGUEMENTS
>> shared/wrapper.sh sleep-1j38kqoi -jobdir 1 -e /bin/sleep -out stdout.txt -err
>> stderr.txt -i -d  -if  -of  -k  -a 0
>>
>> but it is within the working directory, and I don't understand what
>> "sleep-1j38kqoi" means.
>>     
>
> sleep-1j38kqoi is a job identifier (in Swift internal language, an 
> execute2 identifier, perhaps) which identifies one attempt to run an 
> application. This is used to label log files and working directories for 
> this.
>
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20080307/18b55d75/attachment.html>


More information about the Swift-devel mailing list