[Swift-devel] user guide work

Lorenzo Pesce lpesce at uchicago.edu
Tue May 19 14:16:33 CDT 2015


> On May 19, 2015, at 1:34 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> 
> On Tue, 2015-05-19 at 09:25 -0500, Lorenzo Pesce wrote:
>> I will throw in my useless 2 cents. I have been part of five or six swift based projects
>> all involving what I would call large computations (at least a million core hours, usually tens of terabytes of data).
>> 
>> In my experience:
>> 1) The basic variables are all we ever needed.
>> 2) The basic control loops is all we ever needed even if some more complex ones could have been handy (they just weren’t there when I started).
>> 3) All the file system trickeries known to the devil and his friends are absolutely necessary. I never found a project that didn’t end up chocking on its data.
>> 4) All the tricks for dealing intelligently with staging of data and code and “offloading” control are essential to avoid the hatred of the system administrators and other users.
>> 5) A very clear understanding of where Swift is robust and where it
>> isn’t are very important because nobody likes to have a 300 node run
>> come crashing down on you creating an infestation of zombies that
>> eventually might have forced Beagle’s admin to reboot. This is not
>> criticism, I participated heavily in writing that script, so I am on
>> the accused team. 
> 
> Thanks! That is very useful. Do you have specific feedback on the
> documentation? Such as #1 and #2 most annoying things about it?

Do you want an honest answer? ;-)

My first though were, bear with the humor I mean it in a positive way (all what sounds obnoxious is just my devious sense of humor, promised):

-Awesome, now I know another language in which to write “Hello World”, gee, I can even have a file called hello world or use tr to make it caps!!!! How do I run 300 genomes which are going to clog up half of my Cray, break my disks and drive my users insane? 

-Hmm… my scheduler choked and returned me errors that are incomprehensible and I am left with a 10 GB log file that requires tea leaves to be sorted out. If I wanted to write something that I can run on 10 nodes I would have used bash. OK, how do I sort out the mess? Is there any procedure to try and figure out what did I do wrong?

-Oh shit, I did not realize that I was making swift stage 10 TB of data during each run. Luckily my admin missed me when he tried to shoot me. :-)

- Can anyone tell me how does swift control the scheduler and keep track of what is running and what should be run? How do I help it sort out what to stage and when to start doing it? Can one help me sort that out without having to read the entire guide? It really looks like my scheduler is comatose and I am being blamed for anything that happens there. ;-)


>> I can try to be more helpful, if you want me to. 
>> 
>> I have worked on a number of parallel projects, where essentially the
>> same workflow was implemented with and without swift and both teams
>> played to win. That is one paper I plan to write some day…
> 
> Wait, who won? :)


Actually the jury is still out and my inclination would be to say that there is a lot to learn from it on how to make using Swift easy on the users who don’t want to learn how to use swift.
I like the swift implementation better (honestly, I am not trying to flatter you), but the other paper was published a year ago.

> 
> Mihael
> 




More information about the Swift-devel mailing list