[Swift-devel] Re: multiple worker.sh in the same job

Andriy Fedorov fedorov at cs.wm.edu
Thu Jul 3 09:55:52 CDT 2008


Ben Clifford wrote:
> * each worker will link the same input files into the same working
>   directory. if this was a copy, this would be a potentially damaging
>   race condition. as its a link, I think there is a still a race
>   condition there that would cause some of the workers to fail (so
>   perhaps in the presence of any input files at all this won't work - I
>   think Andriy's test case does not have any input files).

This is correct, I haven't tried that. So the first thing I tried was
to confirm your conjecture. I updated the MPI example to take input
file:

hello_mpi3.c ==>

#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv){
        int myrank, size;

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        fprintf(stderr, "Hello, world from cpu %i (total %i)\n",
            myrank, size);

        if(myrank==atoi(argv[1])){
                FILE *fIn, *fOut;
                fIn = fopen(argv[2], "r");
                fOut = fopen(argv[3], "w");
                char inStr[255];
                fscanf(fIn, "%s", &inStr[0]);
                fprintf(fOut, "File IO: Hello, world from cpu %i
(total %i). The message is:\"%s\"\n",
                    myrank, size, inStr);
                fclose(fOut);
        }

        MPI_Finalize();

        return 0;
}

hello_mpi3.c <==

Here's the Swift script:

hello_mpi_swift3.swift ==>

type messagefile {}

(messagefile fOut) greeting(messagefile fIn) {
    app {
        hello_mpi "0" @fIn @fOut;
    }
}

messagefile outfile <"hello_mpi3.txt">;
messagefile infile <"test_input.txt">;

outfile = greeting(infile);

hello_mpi_swift3.swift <==

And didn't change anything in the tc.data (kept jobType=mpi). My tc.data is:

UC-GT4  hello_mpi       /home/fedorov/local/bin/hello_mpi3_v INSTALLED
INTEL32::LINUX GLOBUS::hostCount="4",jobType=mpi,maxWallTime="10",count="4"

What I see happening is that the PBS reports job starts, but never
finishes (Status is "R"). I don't know what is going on there.

I guess it confirms what Ben suggested. I am not the one to explain
what exactly is going on though.


> Here's how I just ran a simple mpi-hello-world with one wrapper.sh that
> launches MPI inside the wrapper. I would be interested if Andriy could try
> his app in the style shown below.
>

I tried this with the test when I have file input, file output at 0
rank, and stderr output at all ranks.

Everything works great!!!!

Of course, the wrapper has to be updated each time to handle the
command line, not perfectly convenient, but the main thing is that
it's working.

Thanks, Ben!!!

--
Andrey



> I think the behaviour is now correct. From a user configuration
> perspective it is somewhat unpleasant, though.
>
> On TG-UC:
>
>  /home/benc/mpi/a.out  is my mpi hello world program
>  /home/benc/mpi/mpi.sh contains:
>
> #!/bin/bash
>
> echo running launcher on $(hostname)
> mpirun -np 3 -machinefile $PBS_NODEFILE /home/benc/mpi/a.out
>
>
> On swift submit side (my laptop):
>
> tc.data maps mpi to /home/benc/mpi/mpi.sh
>
> sites.xml defines:
>
>  <pool handle="tguc" >
>    <gridftp  url="gsiftp://tg-gridftp.uc.teragrid.org" />
>    <jobmanager universe="vanilla"
> url="tg-grid.uc.teragrid.org/jobmanager-pbs"
> major="2" />
>    <profile namespace="globus" key="project">TG-CCR080002N</profile>
>    <profile namespace="globus" key="host_types">ia64-compute</profile>
>    <profile namespace="globus" key="host_xcount">4</profile>
>    <profile namespace="globus" key="jobtype">single</profile>
>    <workdirectory >/home/benc/mpi</workdirectory>
>  </pool>
>
> Note specifically, jobtype=single (which is what causes only a single
> wrapper.sh to be run, even though 4 nodes will be allocated).
>
> mpi.swift contains:
>
> $ cat mpi.swift
>
> type file;
>
> (file o, file e) p() {
>    app {
>        mpi stdout=@filename(o) stderr=@filename(e);
>    }
> }
>
> file mpiout <"mpi.out">;
> file mpierr <"mpi.err">;
>
> (mpiout, mpierr) = p();
>
>
>
> so now run the above, and the output of the hello world MPI app (different
> pieces output by all workers) appears mpi.out, correctly staged back
> through mpirun and wrapper.sh.
>
> --
>



More information about the Swift-devel mailing list