[Swift-user] Need help debugging strange problem...

Andriy Fedorov fedorov at cs.wm.edu
Thu Aug 7 09:47:52 CDT 2008


Hi,

I have a Swift script that is running fine on UC TG site, and now I am
trying to add NCSA to the set of execution sites, but I have some
strange problems, and I am not sure how to debug this.

First, I submit a simple script (below) to NCSA Mercury with GT4 Fork
jobmanager, and it works. When I change the provider from "fork" to
"PBS", the Swift execution does not finish after the PBS job
completion. I see the job submitted, queued in PBS, running,
completing, I see the output file is produced in the scratch
directory, but on the submission site I have "Progress: Executing:1".
The submission site is the same as for the example with "fork"
jobmanager, so I don't see how firewall can be an issue, and I can
telnet to the submission site from NCSA.

Note, that I was able to run the same simple test with both fork and
PBS providers on the SDSC TG site.

How can I figure out what is wrong about NCSA Mercury?


sites.xml: (as in http://www.teragrid.org/userinfo/jobs/gram.php)

<pool handle="NCSA-GT4">
  <gridftp url="gsiftp://gridftp-hg.ncsa.teragrid.org:2811/" />
  <execution provider="gt4" jobmanager="PBS"         <=========== HERE
I change PBS/fork
  url="https://grid-hg.ncsa.teragrid.org:8443/wsrf/services/ManagedJobFactoryService"/>
  <workdirectory>/home/ac/fedorov/scratch</workdirectory>
</pool>


tc.data:

NCSA-GT4   NCSA_hostname /sbin/ifconfig INSTALLED INTEL32::LINUX null

hello.swift:

type messagefile{}

(messagefile uc_hostname) hostname2(){
  app{
    NCSA_hostname stdout=@filename(uc_hostname);
  }
}

messagefile uc_hostname<"uc_hostname.txt">;
messagefile ncsa_hostname<"ncsa_hostname.txt">;

ncsa_hostname = hostname2();

--
Andrey Fedorov

Center for Real-Time Computing
College of William and Mary
http://www.cs.wm.edu/~fedorov



More information about the Swift-user mailing list