[Swift-devel] Re: Coasters failing on Teraport - cant find Java?

Michael Wilde wilde at mcs.anl.gov
Tue Jan 27 15:16:48 CST 2009


Thanks, Allan.

So it would be interesting to probe an OSGEDU site with /bin/sh both 
with and without "-l" to see how the PATH is set there.

Also, question for Ben/Mihael: for coasters, are the filesystem and 
gridftp tags meant to be mutually exclusive?

I'll send you mail off-list about the certs.

- Mike


On 1/27/09 3:05 PM, Allan Espinosa wrote:
> Hi Mike,
> 
> I actually emailed directly Teraport support to add my DOEgrids DN to
> the gridmap file.  so my jobs are actually being executed under my
> username (aespinosa).
> 
> As of now, I can only submit to OSG sites supporting the OSGEDU VO.
> If i remember correctly, we placed OSG as my VO when applying forthe
> DOEgrids certificate.  Then I just emailed Alina to include my DN in
> the OSGEDU VO member list.  I need to email and follow-up OSG
> operations in the status of my VO application.
> 
> For the sites.xml, I think you need to specify the filesystem provider
> which sets up the environment for the coaster (based on what I
> understood from the documentation).  Below is my sites.xml:
> 
> <config>
> 
>   <pool handle="Teraport" sysinfo="INTEL32::LINUX">
>     <profile namespace="globus" key="queue">fast</profile>
>     <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>     <gridftp  url="gsiftp://tp-grid1.ci.uchicago.edu/disks/tp-gpfs/scratch/aespinosa"
> storage="/opt/osg/data/aespinosa" major="2" minor="2" patch="4">
>     </gridftp>
>     <execution provider="coaster" url="tp-grid1.uchicago.edu"
> jobmanager="gt2:gt2:pbs" />
>     <filesystem provider="coaster" url="gt2://tp-grid1.uchicago.edu" /> *****
>     <workdirectory >/disks/tp-gpfs/scratch/aespinosa</workdirectory>
>   </pool>
> 
> </config>
> 
> 
> 
> -Allan
> 
> 
> 
> On Tue, Jan 27, 2009 at 1:41 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
>> When I run /bin/sh *without* the "-l" option, under globus, I do get a java
>> in my path.
>>
>> Allan: what VO did you run on when you got a sucsessful gt2:gt2:pbs coaster
>> run on teraport, after you fixed the walltime issue?
>>
>>
>> My sites.xml is:
>>
>> <config>
>> <pool handle="teraport" >
>>  <profile namespace="globus" key="queue">fast</profile>
>>  <profile namespace="globus" key="maxwalltime">00:05:00</profile>
>>  <gridftp url="gsiftp://tp-grid1.ci.uchicago.edu" />
>>  <execution provider="coaster"
>>     url="tp-grid1.ci.uchicago.edu"
>>     jobmanager="gt2:gt2:pbs" />
>>  <workdirectory>/gpfs1/osg/data/oops/swiftwork</workdirectory>
>> </pool>
>> </config>
>>
>> I get this on stdout/err:
>>
>> ---------------------------------------------
>> Swift 0.8rc1 swift-r2448 cog-r2261
>>
>> RunID: 20090127-1305-hcxdpor3
>> Progress:
>> Progress:  Selecting site:2 Stage in:1 Initializing site shared directory:1
>> Progress:  Selecting site:2 Stage in:1 Submitting:1
>> Progress:  Selecting site:2 Submitting:1 Submitted:1
>> Failed to transfer wrapper log from oops5-20090127-1305-hcxdpor3/info/a on
>> teraport
>> Execution failed:
>>        Exception in runoops:
>> Arguments: [input/fasta/T1af7.fasta, input/secseq/T1af7.secseq,
>> input/native/T1af7.pdb, output/T1af7.1.pdt, output/T1af7.1.rmsd, 1, [TEMP
>> UPDATE INTERVAL = 10, SMOOTH DEVIATION COEFFICIENT = 0.80001]]
>> Host: teraport
>> Directory: oops5-20090127-1305-hcxdpor3/jobs/a/runoops-asq0ir5j
>> stderr.txt:
>>
>> stdout.txt:
>>
>> ----
>>
>> Caused by:
>>        Could not submit job
>> Caused by:
>>        Could not start coaster service
>> Caused by:
>>        Task ended before registration was received.
>> STDOUT: which: no java in
>> (/usr/kerberos/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/osgvo/osg/bin/linux-rhel4-x86_64:/home/osgvo/osg/bin:/soft/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_64/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bin:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin)
>> dirname: too few arguments
>> Try `dirname --help' for more information.
>> http://tp-login2.ci.uchicago.edu:50001: line 55: -Djava.home=/..: No such
>> file or directory
>>
>> STDERR: null
>> Cleaning up...
>>  Done
>>
>> ------------------------------------
>>
>> Checking out the environment with this cert I see:
>>
>> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -l -c 'java -version'
>> /bin/sh: java: command not found
>>
>>
>> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -c 'java -version'
>> java version "1.5.0_14"
>> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
>> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-b03, mixed mode)
>>
>>
>> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -l -c 'which java; echo
>> JAVA_HOME IS: $JAVA_HOME;echo PATH IS: $PATH'
>> JAVA_HOME IS:
>> PATH IS:
>> /usr/kerberos/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/osgvo/osg/bin/linux-rhel4-x86_64:/home/osgvo/osg/bin:/soft/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_64/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bin:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin
>> /usr/bin/which: no java in
>> (/usr/kerberos/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/software/common/softenv-1.6.0-r1/bin:/home/osgvo/osg/bin/linux-rhel4-x86_64:/home/osgvo/osg/bin:/soft/xcat-1.2.0-r1/bin:/soft/xcat-1.2.0-r1/sbin:/soft/xcat-1.2.0-r1/x86_64/bin:/soft/xcat-1.2.0-r1/x86_64/sbin:/soft/xcat-1.2.0-r1/contrib/bin:/soft/xcat-1.2.0-r1/contrib/sbin:/soft/xcat-1.2.0-r1/contrib/x86_64/bin:/soft/xcat-1.2.0-r1/contrib/x86_64/sbin)
>> tp$
>>
>>
>> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -c 'which java; echo
>> JAVA_HOME IS: $JAVA_HOME;echo PATH IS: $PATH'
>>
>> /opt/osg-ce-0.8.0-r1/jdk1.5/bin/java
>> JAVA_HOME IS:
>> PATH IS:
>> /opt/osg-ce-0.8.0-r1/condor/sbin:/opt/osg-ce-0.8.0-r1/condor/bin:/opt/osg-ce-0.8.0-r1/apache/bin:/opt/osg-ce-0.8.0-r1/srm-v2-client/bin:/opt/osg-ce-0.8.0-r1/srm-v1-client/sbin:/opt/osg-ce-0.8.0-r1/srm-v1-client/bin:/opt/osg-ce-0.8.0-r1/wget/bin:/opt/osg-ce-0.8.0-r1/gums/scripts:/opt/osg-ce-0.8.0-r1/cert-scripts/bin:/opt/osg-ce-0.8.0-r1/glite/sbin:/opt/osg-ce-0.8.0-r1/glite/bin:/opt/osg-ce-0.8.0-r1/edg/sbin:/opt/osg-ce-0.8.0-r1/prima/bin:/opt/osg-ce-0.8.0-r1/mysql/bin:/opt/osg-ce-0.8.0-r1/logrotate/sbin:/opt/osg-ce-0.8.0-r1/ant/bin:/opt/osg-ce-0.8.0-r1/jdk1.5/bin:/opt/osg-ce-0.8.0-r1/gpt/sbin:/opt/osg-ce-0.8.0-r1/globus/bin:/opt/osg-ce-0.8.0-r1/globus/sbin:/software/linux-rhel4-x86_64/pacman-3.21-r1/bin:/opt/osg-ce-0.8.0-r1/vdt/sbin:/opt/osg-ce-0.8.0-r1/vdt/bin:/opt/osg-ce-0.8.0-r1/condor/sbin:/opt/osg-ce-0.8.0-r1/condor/bin:/opt/osg-ce-0.8.0-r1/apache/bin:/opt/osg-ce-0.8.0-r1/srm-v2-client/bin:/opt/osg-ce-0.8.0-r1/srm-v1-client/sbin:/opt/osg-ce-0.8.0-r1/srm-v1-client/bin:/
opt
>> /osg-ce-0.8.0-r1/wget/bin:/opt/osg-ce-0.8.0-r1/gums/scripts:/opt/osg-ce-0.8.0-r1/cert-scripts/bin:/opt/osg-ce-0.8.0-r1/glite/sbin:/opt/osg-ce-0.8.0-r1/glite/bin:/opt/osg-ce-0.8.0-r1/edg/sbin:/opt/osg-ce-0.8.0-r1/prima/bin:/opt/osg-ce-0.8.0-r1/mysql/bin:/opt/osg-ce-0.8.0-r1/logrotate/sbin:/opt/osg-ce-0.8.0-r1/ant/bin:/opt/osg-ce-0.8.0-r1/jdk1.5/bin:/opt/osg-ce-0.8.0-r1/gpt/sbin:/software/linux-rhel4-x86_64/pacman-3.21-r1/bin:/opt/osg-ce-0.8.0-r1/vdt/sbin:/opt/osg-ce-0.8.0-r1/vdt/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
>> tp$ globus-job-run tp-grid1.ci.uchicago.edu /bin/sh -c 'java -version'java
>> version "1.5.0_14"
>> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
>> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-b03, mixed mode)
>>
>>
>> - Mike
>>
>>
>>
>>
>>
>> On 1/24/09 5:03 PM, Allan Espinosa wrote:
>>> Hi,
>>>
>>> I am using swift0.8rc1.  the same also happens to v0.7
>>>
>>> I tried submitting a job from communicado to tp-grid1 (teraport) using
>>> coasters.  The swift runtime does not give any error but it does not
>>> finish as well. Looking through the files received by the teraport
>>> head node, i observed that swift keeps submitting gram jobs.  It looks
>>> like that the submitted pbs scripts kept finishing / failing.
>>>
>>> diging through ~/.globus/jobs/tp-grid1.uchicago.edu/*/scheduler* we
>>> see that maxwalltime become 101:00 from 00:10:00 (in sites.xml)
>>>
>>> /usr/bin/perl "/home/aespinosa/.globus/coasters/cscript63266.pl"
>>> "http://128.135.125.118:50001" "1728236079"
>>> #! /bin/sh
>>> # PBS batch job script built by Globus job manager
>>> #
>>> #PBS -S /bin/sh
>>> #PBS -m n
>>> #PBS -q fast
>>> #PBS -l walltime=101:00
>>> #PBS -o /dev/null
>>> #PBS -e /dev/null
>>> #PBS -l nodes=1
>>> HOME="/home/aespinosa";
>>> export HOME;
>>> OSG_DATA="/gpfs1/osg/data";
>>> ...
>>> ...
>>> counter=0
>>> exit_code=0
>>> while test $counter -lt 1; do
>>>    /bin/touch
>>> /home/aespinosa/.globus/job/tp-grid1.ci.uchicago.edu/7432.1232837576/exit.$counter;
>>>



More information about the Swift-devel mailing list