[Swift-user] Running first.swift remotely on NCSA
Mihael Hategan
hategan at mcs.anl.gov
Fri Jun 20 16:50:48 CDT 2008
On Fri, 2008-06-20 at 17:08 -0400, Andriy Fedorov wrote:
> >> Note, that I can see my job started and completed with "qstat" on UC
> >> site, but the result never gets back.
> >
> > By never, do you mean more than 1 minute or less?
> >
>
> More than 5 minutes.
That "never" enough.
>
> >> This is on
> >> tg-login.uc.teragrid.org, so there should be no problem with firewall.
> >
> > Though there might still be a problem with GLOBUS_HOSTNAME. Is that set
> > properly? Can you do the telnet thingy?
> >
>
> GLOBUS_HOSTNAME was not set. I set it, but nothing changed.
> GLOBUS_TCP_PORT_RANGE is set to "50000,51000", port 50000 is open,
> yes, I can telnet to that port.
Ok, can you send me a link to the swift log for the run that behaves
badly?
>
> >>
> >> The only reason I would like to get this working for pre-WS is because
> >> my true goal is to be able to run MPI job, and be able to pass node
> >> type to PBS. The only way to specify host type for GT4 GRAM is through
> >> Job description extensions (see
> >> http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html),
> >> and I don't know how this can be translated into tc.data. With GT2, I
> >> can simply use "host_count=2:compute".
> >
> > I think you should be able to use host_types=compute with both pre-WS
> > GRAM and WS-GRAM.
> >
>
> Ok, so I added this line to tc.data:
>
> UC-GT4 echo_gt4_mpi /bin/echo INSTALLED
> INTEL32::LINUX GLOBUS::host_count="2:compute",jobType=mpi
I'm a bit unsure whether that would work. I think that
host_types="ia32-compute" has worked in the past on that site.
Mihael
>
> and ran first.swift with echo_gt4_mpi. The result is very strange.
>
> In the qstat I see one job running on 1 node (the node type is NOT
> "compute"), then it finishes, then Swift script reports "Final status:
> Finished successfully:1", and then I see SECOND job with 1 node
> running in qstat.
>
> Then I tried and added ",count=2" in the end of GLOBUS attributes.
> Then I saw first job running on 2 nodes (again, node types were not
> what I requested), then again Swift finished, and again 1-node job
> started and finished....
>
> So, I see two problems: 1) problems using jobmanager-pbs with pre-WS
> GRAM, and 2) problems passing arguments to Globus for running MPI-type
> jobs.
>
> Note, I can request resources correctly and run MPI jobs when using
> RSL job description with globusrun, and using XML (with job
> description extensions) and globusrun-ws. I can post those, if you are
> interested. Therefore, I do not think the second problem I mentioned
> is a GT problem.
>
> Andrey
More information about the Swift-user
mailing list