[Swift-user] Running first.swift remotely on NCSA

Mihael Hategan hategan at mcs.anl.gov
Fri Jun 20 16:50:48 CDT 2008


On Fri, 2008-06-20 at 17:08 -0400, Andriy Fedorov wrote:
> >> Note, that I can see my job started and completed with "qstat" on UC
> >> site, but the result never gets back.
> >
> > By never, do you mean more than 1 minute or less?
> >
> 
> More than 5 minutes.

That "never" enough.

> 
> >>  This is on
> >> tg-login.uc.teragrid.org, so there should be no problem with firewall.
> >
> > Though there might still be a problem with GLOBUS_HOSTNAME. Is that set
> > properly? Can you do the telnet thingy?
> >
> 
> GLOBUS_HOSTNAME was not set. I set it, but nothing changed.
> GLOBUS_TCP_PORT_RANGE is set to "50000,51000", port 50000 is open,
> yes, I can telnet to that port.

Ok, can you send me a link to the swift log for the run that behaves
badly?

> 
> >>
> >> The only reason I would like to get this working for pre-WS is because
> >> my true goal is to be able to run MPI job, and be able to pass node
> >> type to PBS. The only way to specify host type for GT4 GRAM is through
> >> Job description extensions (see
> >> http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html),
> >> and I don't know how this can be translated into tc.data. With GT2, I
> >> can simply use "host_count=2:compute".
> >
> > I think you should be able to use host_types=compute with both pre-WS
> > GRAM and WS-GRAM.
> >
> 
> Ok, so I added this line to tc.data:
> 
> UC-GT4          echo_gt4_mpi    /bin/echo       INSTALLED
> INTEL32::LINUX  GLOBUS::host_count="2:compute",jobType=mpi

I'm a bit unsure whether that would work. I think that
host_types="ia32-compute" has worked in the past on that site.

Mihael

> 
> and ran first.swift with echo_gt4_mpi. The result is very strange.
> 
> In the qstat I see one job running on 1 node (the node type is NOT
> "compute"), then it finishes, then Swift script reports "Final status:
>  Finished successfully:1", and then I see SECOND job with 1 node
> running in qstat.
> 
> Then I tried and added ",count=2" in the end of GLOBUS attributes.
> Then I saw first job running on 2 nodes (again, node types were not
> what I requested), then again Swift finished, and again 1-node job
> started and finished....
> 
> So, I see two problems: 1) problems using jobmanager-pbs with pre-WS
> GRAM, and 2) problems passing arguments to Globus for running MPI-type
> jobs.
> 
> Note, I can request resources correctly and run MPI jobs when using
> RSL job description with globusrun, and using XML (with job
> description extensions) and globusrun-ws. I can post those, if you are
> interested. Therefore, I do not think the second problem I mentioned
> is a GT problem.
> 
> Andrey




More information about the Swift-user mailing list