[Swift-devel] multiple worker.sh in the same job

Tue Jul 1 16:27:58 CDT 2008

This whole thing, I think, applies not only to MPI jobs, but also to any
job requesting more than one node. So I think the solution is not to
swap mpirun and wrapper.sh, but, along the lines of what Andriy did,
perform all the relevant wrapper functions in only one instance and have
a barrier right before running the executable as well as right after.

How exactly this would be done is a little hazy in my head, but I guess
that's what makes it interesting.

On Tue, 2008-07-01 at 21:11 +0000, Ben Clifford wrote:
> Here is one problem with Swift + MPI, with workaround, that Andriy Fedorov 
> <fedorov at cs.wm.edu> and I have uncovered. I'm interested in anyone's 
> commentary.
> 
> If you use GRAM with jobtype=mpi, then your job is run through mpirun, and 
> thus executes on each node in the job rather than once.
> 
> In the case of Swift submitting this way, 'your job' actually means the 
> Swift server side code, wrapper.sh, not 'your (the user's) job'.
> 
> This means there are multiple wrapper.sh jobs running, all trying to use 
> the same working directory, input files and output files.
> 
> Andriy tried making only one of the nodes create output files (eg the rank 
> 0 node), and that appears to work in his case, though I think the 
> following is happening:
> 
>   * each worker will link the same input files into the same working 
>     directory. if this was a copy, this would be a potentially damaging 
>     race condition. as its a link, I think there is a still a race 
>     condition there that would cause some of the workers to fail (so 
>     perhaps in the presence of any input files at all this won't work - I 
>     think Andriy's test case does not have any input files).
> 
>   * I think that all except the rank-0 wrapper script indicates failure 
>     (because of missing output files); and the rank-0 wrapper script 
>     indicates success. Swift submit-side checks for success flag before 
>     failure flag, so regards the job as successful. I think this only 
>     works if at least one job succeeds, which pretty much means one job 
>     must generate all the output files, rather than different jobs 
>     generating different output files.
> 
> I haven't really tested the above out in great depth, but I think that is 
> what is happening
> 
> >From a technical perspective, I think the way to address this is to swap 
> the mpirun and wrapper.sh, so that one wrapper.sh runs, and inside that it 
> runs mpirun which then spawns only the application executables.
> 
> There you lose the abstraction from GRAM of being able to specify 
> jobtype=mpi; instead you have to know how to do this yourself, and run the 
> job as a normal, not mpi, job from GRAM's perspective.
> 
> However, in the case of non-GRAM execution mechanisms, then that 
> abstraction is not in place anyway.
>