[Swift-user] Running first.swift remotely on NCSA

Andriy Fedorov fedorov at cs.wm.edu
Fri Jun 20 16:08:57 CDT 2008


>> Note, that I can see my job started and completed with "qstat" on UC
>> site, but the result never gets back.
>
> By never, do you mean more than 1 minute or less?
>

More than 5 minutes.

>>  This is on
>> tg-login.uc.teragrid.org, so there should be no problem with firewall.
>
> Though there might still be a problem with GLOBUS_HOSTNAME. Is that set
> properly? Can you do the telnet thingy?
>

GLOBUS_HOSTNAME was not set. I set it, but nothing changed.
GLOBUS_TCP_PORT_RANGE is set to "50000,51000", port 50000 is open,
yes, I can telnet to that port.

>>
>> The only reason I would like to get this working for pre-WS is because
>> my true goal is to be able to run MPI job, and be able to pass node
>> type to PBS. The only way to specify host type for GT4 GRAM is through
>> Job description extensions (see
>> http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html),
>> and I don't know how this can be translated into tc.data. With GT2, I
>> can simply use "host_count=2:compute".
>
> I think you should be able to use host_types=compute with both pre-WS
> GRAM and WS-GRAM.
>

Ok, so I added this line to tc.data:

UC-GT4          echo_gt4_mpi    /bin/echo       INSTALLED
INTEL32::LINUX  GLOBUS::host_count="2:compute",jobType=mpi

and ran first.swift with echo_gt4_mpi. The result is very strange.

In the qstat I see one job running on 1 node (the node type is NOT
"compute"), then it finishes, then Swift script reports "Final status:
 Finished successfully:1", and then I see SECOND job with 1 node
running in qstat.

Then I tried and added ",count=2" in the end of GLOBUS attributes.
Then I saw first job running on 2 nodes (again, node types were not
what I requested), then again Swift finished, and again 1-node job
started and finished....

So, I see two problems: 1) problems using jobmanager-pbs with pre-WS
GRAM, and 2) problems passing arguments to Globus for running MPI-type
jobs.

Note, I can request resources correctly and run MPI jobs when using
RSL job description with globusrun, and using XML (with job
description extensions) and globusrun-ws. I can post those, if you are
interested. Therefore, I do not think the second problem I mentioned
is a GT problem.

Andrey



More information about the Swift-user mailing list