[Swift-devel] Performance problem with CDM direct processing

Justin M Wozniak wozniak at mcs.anl.gov
Mon Aug 22 21:22:16 CDT 2011


Ok, great.  Another idea that I think Mihael suggested a while ago would 
be to rewrite _swiftwrap in perl.  A lot of things might come out of that. 
For example, it would be pretty neat if the Coasters worker could be 
configured to only read the functions from that file and thus not require 
an external call to perl to start a Swift task.
 	Justin

On Mon, 22 Aug 2011, Jonathan Monette wrote:

> Using bash to do the wildcard matching was one of the ideas we came up with. 
>
> ----- Reply message -----
> From: "Justin M Wozniak" <wozniak at mcs.anl.gov>
> Date: Mon, Aug 22, 2011 12:46 pm
> Subject: [Swift-devel] Performance problem with CDM direct processing
> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> Cc: "Michael Wilde" <wilde at mcs.anl.gov>, "Jonathan Monette" <jon.monette at gmail.com>, "swift-devel Devel" <swift-devel at ci.uchicago.edu>
>
>
>
> This has to do with the way the _swiftwrap shell script looks up those 
> files.  To avoid the external use of perl, I will take a look at using 
> bash to do the wildcard matching and lookup.  Either that or I will batch 
> multiple lookups into one perl call.
> 	Justin
>
> On Mon, 22 Aug 2011, Jonathan Monette wrote:
>
>> Correct. I suspect if we can improve the performance of this section we 
>> can go from a run 12 hour run to a 6-8 hour run.
>>
>> The number of files that are being procesed by cdm look up is 320K. 
>> What was observed was several processes were spawned for each file and 
>> took maybe a second to run(i think that was the time).
>>
>> Mike and me had a discussion on how we can replicate it with a simple 
>> test case to show the delay as well as some simple fixes to try out.
>>
>> ----- Reply message -----
>> From: "Michael Wilde" <wilde at mcs.anl.gov>
>> Date: Mon, Aug 22, 2011 10:41 am
>> Subject: [Swift-devel] Performance problem with CDM direct processing
>> To: "Jonathan Monette" <jon.monette at gmail.com>, "Justin M Wozniak" <wozniak at mcs.anl.gov>
>> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
>>
>>
>> Justin,
>>
>> In testing Montage, Jon observed what looks like a performance bottleneck in the processing of CDM direct output passing.
>>
>> I *think* what was happening was that a large number of jobs (say 25,000 or more, but I dont recall the exact number, it may have been larger) produced an output file, and all those files were being passed as input to a merge job.
>>
>> What we observed was that the scripts being called from _swiftwrap (and perhaps some processing at the vdl-int.k level??? as well) were running very slowly, and that a fairly large number of scripts were being invoked per file. I think (but am not sure) that the high overhead was being observed at the start of the merge job in CDM scripts called by _swiftwrap.
>>
>> Jon, can you explain what you know about this problem, and then lets see if we can enhance the performance?  This is now the main bottleneck in this application, which is otherwise now performing quite well.
>>
>> Thanks,
>>
>> - Mike
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
> -- 
> Justin M Wozniak

-- 
Justin M Wozniak



More information about the Swift-devel mailing list