[Swift-user] Swift job environment on NCSA Abe

Andriy Fedorov fedorov at cs.wm.edu
Tue Sep 23 10:12:14 CDT 2008


Hi,

I had difficulties running mpi jobs on Abe using the wrapper, so I
tried to debug the problem.

It appears that Swift jobs submitted to Abe do not get environment
initialized properly.

Here's the test shell script I am trying to run on Abe:

[fedorov at TG/Abe:honest1 etc] cat /u/ac/fedorov/local/env_wrapper.sh
#!/usr/local/bin/bash
which mpirun

When I run this using PBS script directly on Abe, I get this output:

----------------------------------------
Begin Torque Prologue (Tue Sep 23 09:49:36 2008)
Job ID:           545989
Username:         fedorov
Group:            bri
Job Name:         env.pbs
Limits:           ncpus=1,neednodes=abe0726,nodes=1,walltime=00:01:00
Job Queue:        normal
Account:          bri
Nodes:            abe0726
End Torque Prologue
----------------------------------------
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/opt/mpich-vmi-2.2.0-3-gcc-ofed-1.2/bin/mpirun
----------------------------------------
Begin Torque Epilogue (Tue Sep 23 09:49:39 2008)
Job ID:           545989
Username:         fedorov
Group:            bri
Job Name:         env.pbs
Session:          721
Limits:           ncpus=1,nodes=1,walltime=00:01:00
Resources:        cput=00:00:00,mem=2960kb,vmem=13044kb,walltime=00:00:03
Job Queue:        normal
Account:          bri
Nodes:            abe0726
Killing leftovers...

End Torque Epilogue
----------------------------------------

Now, when I am trying to run the same script from Swift, I get

Execution failed:
        Exception in ABE_env_wrapped:
Arguments: []
Host: Abe-GT4
Directory: site_test-20080923-1107-or2r0ut7/jobs/9/ABE_env_wrapped-9doxytzi
stderr.txt: which: no mpirun in ((null))

stdout.txt:
----

Caused by:
        Exit code 1

Here's the relevant line from tc.data:

[fedorov at mistral runs.d] grep ABE_env ~/local/vdsk-0.6/etc/tc.data
Abe-GT4 ABE_env_wrapped /u/ac/fedorov/local/env_wrapper.sh INSTALLED
INTEL32::LINUX null


I have this problem only on Abe.

--
Andrey Fedorov

Center for Real-Time Computing
College of William and Mary
http://www.cs.wm.edu/~fedorov



More information about the Swift-user mailing list