[Swift-user] Passing hostType for MPI jobs
Andriy Fedorov
fedorov at cs.wm.edu
Tue Jul 1 08:39:28 CDT 2008
Hi,
I am having problems passing host type for MPI jobs. This appears to
happen both when I am using globusrun-ws (XML job description),
although the errors are different.
I am trying to request nodes of type "compute" on UC TeraGrid site.
This host type is recognized by PBS when I pass it to "qsub".
Basically, when I am using XML job description, I am specifying
hostType using Job description extension support
(http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html#r-wsgram-extensions-constructs-nodes).
What happens is that I get the correct type of nodes, but the count is
not what I request.
When I specify hostType parameter in tc.data I either get an error
(when I have hostCount="4:compute"):
===>
RunID: 20080701-0829-xstp5l98
Progress:
hello_mpi started
Progress: Stage in:1
Failed to transfer wrapper log from
hello_mpi_swift-20080701-0829-xstp5l98/info/a/UC-GT4
Failed to transfer wrapper log from
hello_mpi_swift-20080701-0829-xstp5l98/info/b/UC-GT4
Failed to transfer wrapper log from
hello_mpi_swift-20080701-0829-xstp5l98/info/c/UC-GT4
hello_mpi failed
Execution failed:
Exception in hello_mpi:
Arguments: []
Host: UC-GT4
Directory: hello_mpi_swift-20080701-0829-xstp5l98/jobs/c/hello_mpi-cltmnvui
stderr.txt:
stdout.txt:
----
Caused by:
For input string: "4:compute"
<===
or I get the nodes of the wrong type (when I use hostType="compute" --
looks like it is just ignored).
Does anyone know how to specify host type correctly? Is this a GT4
bug? I suspect there is a GT4 bug involved, because when I skip
<extensions>, I can correctly run MPI job on 4 hosts. I don't know
what is the Swift support for host type functionality.
For the reference, I attach my XML job description, tc.data,
sites.xml, Swift script, and the simple MPI "hello world" code.
hello_mpi.c (compile with `mpicc -o hello_mpi hello_mpi.c') ==>
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv){
int myrank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
fprintf(stderr, "Hello, world from cpu %i (total %i)\n",
myrank, size);
MPI_Finalize();
return 0;
}
<===
hello_mpi_xml.xml ===>
<job>
<factoryEndpoint
xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
<wsa:Address>
https://tg-grid.uc.teragrid.org:8443/wsrf/services/ManagedJobFactoryService
</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>PBS</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<executable>/home/fedorov/local/bin/hello_mpi</executable>
<stdout>/home/fedorov/scratch/hello_mpi_xml.stdout</stdout>
<stderr>/home/fedorov/scratch/hello_mpi_xml.stderr</stderr>
<count>4</count>
<hostCount>4</hostCount>
<maxWallTime>10</maxWallTime>
<jobType>mpi</jobType>
<extensions>
<resourceAllocationGroup>
<hostType>compute</hostType>
</resourceAllocationGroup>
</extensions>
</job>
<===
hello_mpi_swift.swift ===>
type messagefile {}
(messagefile t) greeting() {
app {
hello_mpi stderr=@filename(t);
}
}
messagefile outfile <"hello_mpi.txt">;
outfile = greeting();
<===
tc.data ===>
UC-GT4 hello_mpi /home/fedorov/local/bin/hello_mpi_v INSTALLED
INTEL32::LINUX GLOBUS::hostCount="4",jobType=mpi,maxWallTime="10",count="4",hostType="compute"
<===
sites.xml ===>
<pool handle="UC-GT4">
<gridftp url="gsiftp://tg-gridftp.uc.teragrid.org" />
<execution provider="gt4" jobmanager="PBS"
url="https://tg-grid.uc.teragrid.org:8443/wsrf/services/ManagedJobFactoryService"
/>
<workdirectory>/home/fedorov/scratch</workdirectory>
</pool>
<===
More information about the Swift-user
mailing list