[Swift-devel] Swift and BGP

Ian Foster foster at anl.gov
Mon Oct 26 15:07:34 CDT 2009


Hi Mihael:

This is very encouraging.

It will be helpful to understand how these numbers compare to Falkon,  
as Falkon is the one other data point we have on what can be achieved  
on BGP.

Ian.


On Oct 26, 2009, at 11:56 AM, Mihael Hategan wrote:

> I've been playing with Swift on the BGP the past few days.
>
> My observation is that with the current swift and reasonable mapping
> (such as the one done by the concurrent mapper) filesystem slowness  
> does
> not seem to be caused by contentious access to the same file/ 
> directory.
> Instead, it's the sheer amount of network filesystem requests which  
> come
> from a few sources:
>
> - bash: every time bash forks a process it closes the current script,
> forks the process; after the process is done, bash re-opens the script
> file, seeks to the position it left it at and reads the next command.
> And typical scripts involve forking a few processes. Our wrapper  
> script
> is invoked while on a networked FS.
> - info: about 30 requests per run
> - swift-specific fs access (what the wrapper is meant to do)
> - application fs requests
>
> At 100 jobs/s, only the wrapper causes about 10000 requests/s to the  
> fs
> server.
>
> I suspect that what Allan observed with the moving of the output files
> being slow is a coincidence. I did a run which showed that for jobs
> towards the start, the operations towards the end of the wrapper
> execution are slow, while jobs towards the end have the first part of
> the wrapper process running slower. This is likely due to ramp-up and
> ramp-down. I wanted to plot that, but BGP is down today, so it will  
> have
> to wait.
>
> The solution is the having things on the node local FS. Ben already
> added some code to do that. I changed that a bit and also moved the  
> info
> file to the scratch fs (unless the user requests that the info be on  
> NFS
> in order to get progressive results for debugging purposes). A scratch
> directory different than the work directory is used whenever the user
> specifies <scratch>dir</scratch> in sites.xml.
>
> Another thing is using provider job status instead of files when using
> coasters or falkon.
>
> With coasters, scratch FS, and provider status, I empirically  
> determined
> that an average throughput of 100jobs/s is something that the system
> (swift + coasters) can sustain well, provided that swift tries to keep
> the number of jobs submitted to the coaster service to about twice the
> number of workers. I tested this with 6000 workers and 60 second  
> jobs. I
> will post the plots shortly.
>
> So here's how one would go with this on intrepid:
> - determine the maximum number of workers (avg-exec-time * 100)
> - set the nodeGranularity to 512 nodes, 4 workers per node. Also set
> maxWorkers to 512 so that only 512 node blocks are requested. For some
> reason 512 node partitions start almost instantly (even if you have  
> 6 of
> them) while 1024 node partitions you have to wait for.
> - set the total number of blocks ("slots" parameter) to
> no-of-workers/2048.
> - set the jobThrottle to 2*no-of-workers/100
> - make sure you also have foreach,max.threads set to 2*no-of-workers
> (though that depends on the structure of the program).
> - run on login6. There is no point in using the normal login machines
> since they have a limit of 1024 file descriptors per process.
>
> I will actually code an xml element for sites.xml to capture this
> without that much pain.
>
> There is eventually a hard limit of (a bit less than) 65536 workers. I
> think. This is because each TCP connection from the workers requires a
> local port on the coaster service side, and there's a limit of 2^16 to
> that. This could eventually be addressed by having proxies on the IO
> nodes or something.
>
> On intrepid (as opposed to surveyor) the default queue won't accept
> 1-node jobs. So the cleanup job at the end of a run will fail with a
> nice display of a stack trace and you will have to manually clean up  
> the
> work directory.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list