[Swift-user] Deleting no longer necessary anonymous files in _concurrent
John Dennis
dennis at ucar.edu
Wed Sep 1 09:59:35 CDT 2010
Justin,
I am a little confused by your response that cleaning up temporary
files is not the responsibility of the Swift language. We did not
create the file
'wgt_files-935f5705-27ed-4a99-9420-441269bba3a0-36-4-0-array' Swift
did. I certainly have not use for it. It was created
as part of the parallelization process. Consider the following bit
of pseudo swift code
foreach years {
file wgt_files[];
foreach month {
wgt_files[] = DoSomething();
}
}
The 'wgt_files' is only in scope within the 'foreach years' loop.
Once all iterations of 'foreach years' loop has completed,
I would expect the 'wgt_files' to be deleted once a variable/file goes
out of scope. Isn't this really an issue of garbage collection
for the Swift language?
While I do see how you could use the external variable to manage this
all ourselves that would significantly complicate the
source code and remove much of the simple and elegant solution that
Swift provides.
Matthew and I are concerned about this because of the impact this has
on disk usage. For example our Swift script
requires temporary space of size 4x the input data. Our generated
data is tiny, while the size of the _concurrent directory
is 2x the size of the input data. Now we want to execute the Swift
script on ~30 TB of data. So just to enable parallel execution
with Swift would require an extra 120TB of disk space. I realize that
parallel execution will consume more disk space but this seems
excessive.
Thanks,
John Dennis
On Aug 30, 2010, at 3:54 PM, Justin M Wozniak wrote:
> Hi Matthew
> Deleting files is out of the scope of the Swift language. You can
> of course remove them yourself in your scripts, and as long as Swift
> does not try to stage them out you should be fine.
> You may want to look at external variables as another way to
> approach this (manual 2.5). Using external variables you can manage
> the files in your scripts while maintaining the Swift progress model.
> Justin
>
> On Fri, 27 Aug 2010, Matthew Woitaszek wrote:
>> Good afternoon,
>>
>> I'm working with a script that creates arrays of intermediate files
>> using the anonymous concurrent mapper, such as:
>>
>> file wgt_file[];
>>
>> As I expect, all of these files get generated in the remote swift
>> temporary directory and are then returned to the _concurrent
>> directory
>> on the host executing Swift. However, in this particular application,
>> they're then immediately consumed by a subsequent procedure and never
>> needed again.
>>
>> Is there a way to configure Swift or the file mapper declaration to
>> delete these files after the remaining script "consumes" them? (That
>> is, after all procedures relying on them as inputs have been
>> executed?) Or can (should?) that be done manually?
>>
>> More speculatively, is there a way to keep files like these on the
>> execution host and not even bring them back to _concurrent? (With
>> loss
>> of generality, I'm executing on a single site, and don't really ever
>> need the file locally, for restarts or staging to another site.)
>>
>> Any advice about managing copies of large intermediate data files in
>> the Swift execution context would be appreciated!
>>
>> Matthew
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
> --
> Justin M Wozniak
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100901/447309bc/attachment.html>
More information about the Swift-user
mailing list