[Swift-devel] Re: Coasters failing on Teraport - cant find Java?
Allan Espinosa
aespinosa at cs.uchicago.edu
Tue Jan 27 15:05:57 CST 2009
Hi Mike,
I actually emailed directly Teraport support to add my DOEgrids DN to
the gridmap file. so my jobs are actually being executed under my
username (aespinosa).
As of now, I can only submit to OSG sites supporting the OSGEDU VO.
If i remember correctly, we placed OSG as my VO when applying forthe
DOEgrids certificate. Then I just emailed Alina to include my DN in
the OSGEDU VO member list. I need to email and follow-up OSG
operations in the status of my VO application.
For the sites.xml, I think you need to specify the filesystem provider
which sets up the environment for the coaster (based on what I
understood from the documentation). Below is my sites.xml:
<config>
<pool handle="Teraport" sysinfo="INTEL32::LINUX">
<profile namespace="globus" key="queue">fast</profile>
<profile namespace="globus" key="maxwalltime">00:01:00</profile>
<gridftp url="gsiftp://tp-grid1.ci.uchicago.edu/disks/tp-gpfs/scratch/aespinosa"
storage="/opt/osg/data/aespinosa" major="2" minor="2" patch="4">
</gridftp>
<execution provider="coaster" url="tp-grid1.uchicago.edu"
jobmanager="gt2:gt2:pbs" />
<filesystem provider="coaster" url="gt2://tp-grid1.uchicago.edu" /> *****
<workdirectory >/disks/tp-gpfs/scratch/aespinosa</workdirectory>
</pool>
</config>
-Allan
On Tue, Jan 27, 2009 at 1:41 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
> When I run /bin/sh *without* the "-l" option, under globus, I do get a java
> in my path.
>
> Allan: what VO did you run on when you got a sucsessful gt2:gt2:pbs coaster
> run on teraport, after you fixed the walltime issue?
>
>
> My sites.xml is:
>
> <config>
> <pool handle="teraport" >
> <profile namespace="globus" key="queue">fast</profile>
> <profile namespace="globus" key="maxwalltime">00:05:00</profile>
> <gridftp url="gsiftp://tp-grid1.ci.uchicago.edu" />
> <execution provider="coaster"
> url="tp-grid1.ci.uchicago.edu"
> jobmanager="gt2:gt2:pbs" />
> <workdirectory>/gpfs1/osg/data/oops/swiftwork</workdirectory>
> </pool>
> </config>
>
> I get this on stdout/err:
>
> ---------------------------------------------
> Swift 0.8rc1 swift-r2448 cog-r2261
>
> RunID: 20090127-1305-hcxdpor3
> Progress:
> Progress: Selecting site:2 Stage in:1 Initializing site shared directory:1
> Progress: Selecting site:2 Stage in:1 Submitting:1
> Progress: Selecting site:2 Submitting:1 Submitted:1
> Failed to transfer wrapper log from oops5-20090127-1305-hcxdpor3/info/a on
> teraport
> Execution failed:
> Exception in runoops:
> Arguments: [input/fasta/T1af7.fasta, input/secseq/T1af7.secseq,
> input/native/T1af7.pdb, output/T1af7.1.pdt, output/T1af7.1.rmsd, 1, [TEMP
> UPDATE INTERVAL = 10, SMOOTH DEVIATION COEFFICIENT = 0.80001]]
> Host: teraport
> Directory: oops5-20090127-1305-hcxdpor3/jobs/a/runoops-asq0ir5j
> stderr.txt:
>
> stdout.txt:
>
> ----
>
> Caused by:
> Could not submit job
> Caused by:
> Could not start coaster service
> Caused by:
> Task ended before registration was received.
> STDOUT: which: no java in
> (/usr/kerberos/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/osgvo/osg/bin/linux-rhel4-x86_64:/home/osgvo/osg/bin:/soft/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_64/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bin:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin)
> dirname: too few arguments
> Try `dirname --help' for more information.
> http://tp-login2.ci.uchicago.edu:50001: line 55: -Djava.home=/..: No such
> file or directory
>
> STDERR: null
> Cleaning up...
> Done
>
> ------------------------------------
>
> Checking out the environment with this cert I see:
>
> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -l -c 'java -version'
> /bin/sh: java: command not found
>
>
> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -c 'java -version'
> java version "1.5.0_14"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-b03, mixed mode)
>
>
> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -l -c 'which java; echo
> JAVA_HOME IS: $JAVA_HOME;echo PATH IS: $PATH'
> JAVA_HOME IS:
> PATH IS:
> /usr/kerberos/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/osgvo/osg/bin/linux-rhel4-x86_64:/home/osgvo/osg/bin:/soft/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_64/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bin:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin
> /usr/bin/which: no java in
> (/usr/kerberos/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/osgvo/osg/bin/linux-rhel4-x86_64:/home/osgvo/osg/bin:/soft/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_64/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bin:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin)
> tp$
>
>
> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -c 'which java; echo
> JAVA_HOME IS: $JAVA_HOME;echo PATH IS: $PATH'
>
> /opt/osg-ce-0.8.0-r1/jdk1.5/bin/java
> JAVA_HOME IS:
> PATH IS:
> /opt/osg-ce-0.8.0-r1/condor/sbin:/opt/osg-ce-0.8.0-r1/condor/bin:/opt/osg-ce-0.8.0-r1/apache/bin:/opt/osg-ce-0.8.0-r1/srm-v2-client/bin:/opt/osg-ce-0.8.0-r1/srm-v1-client/sbin:/opt/osg-ce-0.8.0-r1/srm-v1-client/bin:/opt/osg-ce-0.8.0-r1/wget/bin:/opt/osg-ce-0.8.0-r1/gums/scripts:/opt/osg-ce-0.8.0-r1/cert-scripts/bin:/opt/osg-ce-0.8.0-r1/glite/sbin:/opt/osg-ce-0.8.0-r1/glite/bin:/opt/osg-ce-0.8.0-r1/edg/sbin:/opt/osg-ce-0.8.0-r1/prima/bin:/opt/osg-ce-0.8.0-r1/mysql/bin:/opt/osg-ce-0.8.0-r1/logrotate/sbin:/opt/osg-ce-0.8.0-r1/ant/bin:/opt/osg-ce-0.8.0-r1/jdk1.5/bin:/opt/osg-ce-0.8.0-r1/gpt/sbin:/opt/osg-ce-0.8.0-r1/globus/bin:/opt/osg-ce-0.8.0-r1/globus/sbin:/software/linux-rhel4-x86_64/pacman-3.21-r1/bin:/opt/osg-ce-0.8.0-r1/vdt/sbin:/opt/osg-ce-0.8.0-r1/vdt/bin:/opt/osg-ce-0.8.0-r1/condor/sbin:/opt/osg-ce-0.8.0-r1/condor/bin:/opt/osg-ce-0.8.0-r1/apache/bin:/opt/osg-ce-0.8.0-r1/srm-v2-client/bin:/opt/osg-ce-0.8.0-r1/srm-v1-client/sbin:/opt/osg-ce-0.8.0-r1/srm-v1-client/bin:/opt
> /osg-ce-0.8.0-r1/wget/bin:/opt/osg-ce-0.8.0-r1/gums/scripts:/opt/osg-ce-0.8.0-r1/cert-scripts/bin:/opt/osg-ce-0.8.0-r1/glite/sbin:/opt/osg-ce-0.8.0-r1/glite/bin:/opt/osg-ce-0.8.0-r1/edg/sbin:/opt/osg-ce-0.8.0-r1/prima/bin:/opt/osg-ce-0.8.0-r1/mysql/bin:/opt/osg-ce-0.8.0-r1/logrotate/sbin:/opt/osg-ce-0.8.0-r1/ant/bin:/opt/osg-ce-0.8.0-r1/jdk1.5/bin:/opt/osg-ce-0.8.0-r1/gpt/sbin:/software/linux-rhel4-x86_64/pacman-3.21-r1/bin:/opt/osg-ce-0.8.0-r1/vdt/sbin:/opt/osg-ce-0.8.0-r1/vdt/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -c 'java -version'java
> version "1.5.0_14"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-b03, mixed mode)
>
>
> - Mike
>
>
>
>
>
> On 1/24/09 5:03 PM, Allan Espinosa wrote:
>>
>> Hi,
>>
>> I am using swift0.8rc1. the same also happens to v0.7
>>
>> I tried submitting a job from communicado to tp-grid1 (teraport) using
>> coasters. The swift runtime does not give any error but it does not
>> finish as well. Looking through the files received by the teraport
>> head node, i observed that swift keeps submitting gram jobs. It looks
>> like that the submitted pbs scripts kept finishing / failing.
>>
>> diging through ~/.globus/jobs/tp-grid1.uchicago.edu/*/scheduler* we
>> see that maxwalltime become 101:00 from 00:10:00 (in sites.xml)
>>
>> /usr/bin/perl "/home/aespinosa/.globus/coasters/cscript63266.pl"
>> "http://128.135.125.118:50001" "1728236079"
>> #! /bin/sh
>> # PBS batch job script built by Globus job manager
>> #
>> #PBS -S /bin/sh
>> #PBS -m n
>> #PBS -q fast
>> #PBS -l walltime=101:00
>> #PBS -o /dev/null
>> #PBS -e /dev/null
>> #PBS -l nodes=1
>> HOME="/home/aespinosa";
>> export HOME;
>> OSG_DATA="/gpfs1/osg/data";
>> ...
>> ...
>> counter=0
>> exit_code=0
>> while test $counter -lt 1; do
>> /bin/touch
>> /home/aespinosa/.globus/job/tp-grid1.ci.uchicago.edu/7432.1232837576/exit.$counter;
>>
More information about the Swift-devel
mailing list