[Swift-user] Deleting no longer necessary anonymous files in _concurrent

John Dennis dennis at ucar.edu
Wed Sep 1 09:59:35 CDT 2010


Justin,

	I am a little confused by your response that cleaning up temporary  
files is not the responsibility of the Swift language.  We did not
create  the file  
'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift  
did.  I certainly have not use for it.  It was created
as part of the parallelization process.   Consider the following bit  
of pseudo swift code

  foreach years {
  	file wgt_files[];
  	foreach month {
  		wgt_files[] = DoSomething();
  	}	
  }

	The 'wgt_files' is only in  scope within the 'foreach years' loop.   
Once all iterations of 'foreach years' loop has completed,
I would expect the 'wgt_files' to be deleted once a variable/file goes  
out of scope.   Isn't this really an issue of garbage collection
for the Swift language?

	While I do see how you could use the external variable to manage this  
all ourselves that would significantly complicate the
source code and remove much of the simple and elegant solution that  
Swift provides.

	Matthew and I are concerned about this because of the impact this has  
on disk usage.  For example our Swift script
requires temporary space of size 4x the input data.  Our generated  
data is tiny, while the size of the _concurrent directory
is 2x the size of the input data.  Now we want to execute the Swift  
script on ~30 TB of data.  So just to enable parallel execution
with Swift would require an extra 120TB of disk space.  I realize that  
parallel execution will consume more disk space but this seems
excessive.

Thanks,
John Dennis
	


On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote:

> Hi Matthew
> 	Deleting files is out of the scope of the Swift language.  You can  
> of course remove them yourself in your scripts, and as long as Swift  
> does not try to stage them out you should be fine.
> 	You may want to look at external variables as another way to  
> approach this (manual 2.5).  Using external variables you can manage  
> the files in your scripts while maintaining the Swift progress model.
> 	Justin
>
> On Fri, 27 Aug 2010, Matthew Woitaszek wrote:
>> Good afternoon,
>>
>> I'm working with a script that creates arrays of intermediate files
>> using the anonymous concurrent mapper, such as:
>>
>> file wgt_file[];
>>
>> As I expect, all of these files get generated in the remote swift
>> temporary directory and are then returned to the _concurrent  
>> directory
>> on the host executing Swift. However, in this particular application,
>> they're then immediately consumed by a subsequent procedure and never
>> needed again.
>>
>> Is there a way to configure Swift or the file mapper declaration to
>> delete these files after the remaining script "consumes" them? (That
>> is, after all procedures relying on them as inputs have been
>> executed?) Or can (should?) that be done manually?
>>
>> More speculatively, is there a way to keep files like these on the
>> execution host and not even bring them back to _concurrent? (With  
>> loss
>> of generality, I'm executing on a single site, and don't really ever
>> need the file locally, for restarts or staging to another site.)
>>
>> Any advice about managing copies of large intermediate data files in
>> the Swift execution context would be appreciated!
>>
>> Matthew
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
> -- 
> Justin M Wozniak
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100901/447309bc/attachment.html>


More information about the Swift-user mailing list