[Swift-devel] Re: catsn on beagle
ketan
ketancmaheshwari at gmail.com
Sun Jun 5 20:13:44 CDT 2011
I do not know the reason why you are getting this. The PBS submit throws
the following stderr:
aprun: Apid 327667: Caught signal Terminated, sending to application
Also on my another swift submission I am getting this as PBS submit stderr:
[NID 00065] 2011-06-05 19:34:26 distributeControlMsg: Apid 327688 write
failure to node 66, 10.128.0.67, port 607, Connection reset by peer
I suspect the last maintenance/upgrade of Beagle might have caused this.
Ketan
On 6/5/11 1:35 PM, Jonathan S Monette wrote:
> I still have problems running this script. I does not seem to ever
> execute. I get this to stdout
>
> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified
> locally)
>
> RunID: 20110604-2338-5eik2hbb
> Progress:
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Canceling job 173155.sdb
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Canceling job 173182.sdb
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Submitted:1
>
> I had to C-c the run. I would check qstat and it would say the job
> was executing in the development queue but it seems like it never ran.
> Also it seems that the coasters job was cancelled twice during the
> Swift execution.
>
> On Sat, Jun 4, 2011 at 1:12 PM, Jonathan S Monette <jonmon at utexas.edu
> <mailto:jonmon at utexas.edu>> wrote:
>
> Ok. Thanks. and the (none) also appeared when I was on
> login1.beagle.ci.uchicago.edu
> <http://login1.beagle.ci.uchicago.edu>. Thanks Ketan.
>
>
> On Sat, Jun 4, 2011 at 1:09 PM, ketan <ketancmaheshwari at gmail.com
> <mailto:ketancmaheshwari at gmail.com>> wrote:
>
> Yes, you should bring this to the attention of beagle sysadmins.
>
>
> On 6/4/11 1:01 PM, Jonathan S Monette wrote:
>> Ok. I have my job submitted and it is currently waiting in
>> the queue. besides manually resetting this variable
>> everytime I am on beagle what else can be done? Is this
>> something that I should bring to the attention of the sys
>> admins for beagle?
>>
>> On Sat, Jun 4, 2011 at 12:51 PM, Ketan Maheshwari
>> <ketancmaheshwari at gmail.com
>> <mailto:ketancmaheshwari at gmail.com>> wrote:
>>
>> Alright, so this is the issue. GLOBUS_HOSTNAME is picked
>> up from HOSTNAME and I do not know where does this none
>> comes in your case.
>>
>> In my case I get hostname: login2.beagle.ci.uchicago.edu
>> <http://login2.beagle.ci.uchicago.edu>
>>
>> see if you can manually change this env variable and try
>> to run again.
>>
>>
>> On Sat, Jun 4, 2011 at 12:49 PM, Jonathan S Monette
>> <jonmon at utexas.edu <mailto:jonmon at utexas.edu>> wrote:
>>
>> login2.beagle.ci.uchicago.edu.(none)
>>
>> when you ran your test was in on login1 or login2?
>>
>>
>> On Sat, Jun 4, 2011 at 12:47 PM, Ketan Maheshwari
>> <ketancmaheshwari at gmail.com
>> <mailto:ketancmaheshwari at gmail.com>> wrote:
>>
>> What does
>>
>> echo $HOSTNAME give?
>>
>>
>> On Sat, Jun 4, 2011 at 12:44 PM, Jonathan S
>> Monette <jonmon at utexas.edu
>> <mailto:jonmon at utexas.edu>> wrote:
>>
>> I am not sure why that (none) appears. Is
>> this a variable that the system sets or do I
>> need to set it?
>>
>>
>> On Sat, Jun 4, 2011 at 12:39 PM, Ketan
>> Maheshwari <ketancmaheshwari at gmail.com
>> <mailto:ketancmaheshwari at gmail.com>> wrote:
>>
>> This suprises me because, I do not see
>> that (none) in my commandline.
>>
>>
>> On Sat, Jun 4, 2011 at 12:38 PM, Jonathan
>> S Monette <jonmon at utexas.edu
>> <mailto:jonmon at utexas.edu>> wrote:
>>
>> I added
>> echo ${OPTIONS}
>> echo ${COG_OPTS}
>> echo ${LOCALCLASSPATH}
>> echo ${EXEC}
>> echo ${CMDLINE}
>>
>> to the script. It seems in the
>> $OPTIONS variable this appears.
>>
>> -Xmx8192M
>> -Djava.endorsed.dirs=/home/jonmon/Library/Swift/swift-0.92/bin/../lib/endorsed
>> -DUID=1881
>> -DGLOBUS_HOSTNAME=login2.beagle.ci.uchicago.edu.(none)
>> -DCOG_INSTALL_PATH=/home/jonmon/Library/Swift/swift-0.92/bin/..
>> -Dswift.home=/home/jonmon/Library/Swift/swift-0.92/bin/..
>> -Djava.security.egd=file:///dev/urandom
>>
>> There is the line
>> DGLOBUS_HOSTNAME=login2.beagle.ci.uchicago.edu.(none)
>> I believe that is the parenthesis
>> that is causing this to fail. Now
>> How do I fix it.
>>
>> On Sat, Jun 4, 2011 at 12:36 PM,
>> Jonathan S Monette <jonmon at utexas.edu
>> <mailto:jonmon at utexas.edu>> wrote:
>>
>> I may have found the error. I
>> download 0.92 binaries and
>> changed the swift command script.
>> I added
>>
>>
>> On Sat, Jun 4, 2011 at 12:26 PM,
>> Jonathan S Monette
>> <jonmon at utexas.edu
>> <mailto:jonmon at utexas.edu>> wrote:
>>
>> Same result. Same error
>>
>>
>> On Sat, Jun 4, 2011 at 12:24
>> PM, ketan
>> <ketancmaheshwari at gmail.com
>> <mailto:ketancmaheshwari at gmail.com>>
>> wrote:
>>
>> can you try running the
>> commandline by copying
>> from run.sh and pasting
>> it at the prompt.
>>
>>
>> On 6/4/11 12:18 PM,
>> Jonathan S Monette wrote:
>>> I do have that project.
>>> I did projects --avail
>>> and got back
>>>
>>> Project PI
>>> Title
>>> ------------------------------------------------------------------------------
>>> CI-CCR000013 Michael
>>> Wilde The
>>> Swift Parallel Scripting
>>> System
>>>
>>> Here is my run.sh file.
>>> I execute with ./run.sh
>>> after I did chmod +x run.sh
>>>
>>> #!/bin/bash
>>> swift -config cf
>>> -tc.file tc -sites.file
>>> beagle-coaster.xml
>>> catsn.swift -n=1
>>>
>>> I do have sh and cat in
>>> my path. I can execute
>>> them. Here is what
>>> which sh and which cat
>>> produced.
>>>
>>> sh is /usr/bin/sh
>>> cat is /bin/cat
>>>
>>> On Sat, Jun 4, 2011 at
>>> 12:10 PM, ketan
>>> <ketancmaheshwari at gmail.com
>>> <mailto:ketancmaheshwari at gmail.com>>
>>> wrote:
>>>
>>> Can you try
>>> "projects --avail"
>>> command on beagle to
>>> see if you are
>>> member of a project.
>>>
>>> Else you will need
>>> to request
>>> membership. You can
>>> do this from this page:
>>>
>>> http://pads.ci.uchicago.edu/access/
>>>
>>> In any case, your
>>> error does not seem
>>> to be because of the
>>> above. Looks like
>>> '(' has sneaked in
>>> because of some
>>> unexpected typo in
>>> commandline .. can
>>> you doublecheck it.
>>>
>>> Also can you make
>>> sure that you have
>>> all the files: tc,
>>> beagle-coasters.xml
>>> cf in PATH, ie. are
>>> you able to access
>>> them w/o ./
>>>
>>>
>>> On 6/4/11 12:03 PM,
>>> Jonathan S Monette
>>> wrote:
>>>> I did change the
>>>> work directory. I
>>>> however did not
>>>> change the project
>>>> name. I do not
>>>> know my project
>>>> name so I kept the
>>>> same project that
>>>> was in there. Here
>>>> is my modified
>>>> beagle-coasters.xml
>>>>
>>>> <config>
>>>> <pool handle="pbs">
>>>> <execution
>>>> provider="coaster"
>>>> jobmanager="local:pbs"/>
>>>> <profile
>>>> namespace="globus"
>>>> key="project">CI-CCR000013</profile>
>>>>
>>>> <profile
>>>> namespace="globus"
>>>> key="ppn">24:cray:pack</profile>
>>>>
>>>> <profile
>>>> namespace="globus"
>>>> key="workersPerNode">24</profile>
>>>> <profile
>>>> namespace="globus"
>>>> key="maxTime">1000</profile>
>>>> <profile
>>>> namespace="globus"
>>>> key="slots">1</profile>
>>>> <profile
>>>> namespace="globus"
>>>> key="nodeGranularity">1</profile>
>>>> <profile
>>>> namespace="globus"
>>>> key="maxNodes">1</profile>
>>>>
>>>> <profile
>>>> namespace="karajan"
>>>> key="jobThrottle">.63</profile>
>>>> <profile
>>>> namespace="karajan"
>>>> key="initialScore">10000</profile>
>>>>
>>>> <filesystem
>>>> provider="local"/>
>>>> <workdirectory
>>>> >/lustre/beagle/jonmon/Swift/work</workdirectory>
>>>> </pool>
>>>> </config>
>>>>
>>>>
>>>> On Sat, Jun 4, 2011
>>>> at 12:00 PM, Ketan
>>>> Maheshwari
>>>> <ketancmaheshwari at gmail.com
>>>> <mailto:ketancmaheshwari at gmail.com>>
>>>> wrote:
>>>>
>>>> Jon,
>>>>
>>>> Thanks for
>>>> trying out
>>>> catsn on Beagle.
>>>>
>>>> I just tried it
>>>> myself but
>>>> could not
>>>> reproduce the
>>>> error you are
>>>> getting. Have
>>>> you made the
>>>> changes that
>>>> are mentioned
>>>> in the README:
>>>> ==
>>>> Change workdir
>>>> location in
>>>> beagle-coaster.xml
>>>> Change the
>>>> project entry
>>>> in the
>>>> beagle-coaster.xml
>>>> to your project
>>>> name
>>>> ==
>>>>
>>>> Ketan
>>>>
>>>>
>>>> On Sat, Jun 4,
>>>> 2011 at 11:50
>>>> AM, Jonathan S
>>>> Monette
>>>> <jonmon at utexas.edu
>>>> <mailto:jonmon at utexas.edu>>
>>>> wrote:
>>>>
>>>> Hello,
>>>> I am
>>>> trying to
>>>> run the
>>>> catsn test
>>>> on beagle
>>>> using the
>>>> files in
>>>> ~ketan/catsn.
>>>> I have
>>>> copied over
>>>> this
>>>> directory
>>>> over to my
>>>> home
>>>> directory
>>>> and I
>>>> believe I
>>>> set it up
>>>> correctly.
>>>> I did
>>>> module load
>>>> swift and
>>>> the ran
>>>> run.sh that
>>>> was in this
>>>> directory.
>>>> I get this
>>>> error.
>>>>
>>>> /soft/swift/0.92/bin/swift:
>>>> eval: line
>>>> 152: syntax
>>>> error near
>>>> unexpected
>>>> token `('
>>>> /soft/swift/0.92/bin/swift:
>>>> eval: line
>>>> 152: `java
>>>> -Xmx8192M
>>>> -Djava.endorsed.dirs=/soft/swift/0.92/bin/../lib/endorsed
>>>> -DUID=1881
>>>> -DGLOBUS_HOSTNAME=login2.beagle.ci.uchicago.edu.(none)
>>>> -DCOG_INSTALL_PATH=/soft/swift/0.92/bin/..
>>>> -Dswift.home=/soft/swift/0.92/bin/..
>>>> -Djava.security.egd=file:///dev/urandom
>>>> -classpath
>>>> /soft/swift/0.92/bin/../etc:/soft/swift/0.92/bin/../libexec:/soft/swift/0.92/bin/../lib/addressing-1.0.jar:/soft/swift/0.92/bin/../lib/ant.jar:/soft/swift/0.92/bin/../lib/antlr-2.7.5.jar:/soft/swift/0.92/bin/../lib/axis.jar:/soft/swift/0.92/bin/../lib/axis-url.jar:/soft/swift/0.92/bin/../lib/backport-util-concurrent.jar:/soft/swift/0.92/bin/../lib/castor-0.9.6.jar:/soft/swift/0.92/bin/../lib/coaster-bootstrap.jar:/soft/swift/0.92/bin/../lib/cog-abstraction-common-2.4.jar:/soft/swift/0.92/bin/../lib/cog-axis.jar:/soft/swift/0.92/bin/../lib/cog-grapheditor-0.47.jar:/soft/swift/0.92/bin/../lib/cog-jglobus-1.7.0.jar:/soft/swift/0.92/bin/../lib/cog-karajan-0.36-dev.jar:/soft/swift/0.92/bin/../lib/cog-provider-clref-gt4_0_0.jar:/soft/swift/0.92/bin/../lib/cog-provider-coaster-0.3.jar:/soft/swift/0.92/bin/../lib/cog-provider-dcache-0.1.jar:/soft/swift/0.92/bin/../lib/cog-provider-gt2-2.4.jar:/soft/swift/0.92/bin/../lib/cog-provider-gt4_0_0-2.5.jar:/soft/swift/0.92/bin/../lib/cog-pro
>>>> vider-local-2.2.jar:/soft/swift/0.92/bin/../lib/cog-provider-localscheduler-0.4.jar:/soft/swift/0.92/bin/../lib/cog-provider-ssh-2.4.jar:/soft/swift/0.92/bin/../lib/cog-provider-webdav-2.1.jar:/soft/swift/0.92/bin/../lib/cog-resources-1.0.jar:/soft/swift/0.92/bin/../lib/cog-swift-svn.jar:/soft/swift/0.92/bin/../lib/cog-trap-1.0.jar:/soft/swift/0.92/bin/../lib/cog-url.jar:/soft/swift/0.92/bin/../lib/cog-util-0.92.jar:/soft/swift/0.92/bin/../lib/commonj.jar:/soft/swift/0.92/bin/../lib/commons-beanutils.jar:/soft/swift/0.92/bin/../lib/commons-collections-3.0.jar:/soft/swift/0.92/bin/../lib/commons-digester.jar:/soft/swift/0.92/bin/../lib/commons-discovery.jar:/soft/swift/0.92/bin/../lib/commons-httpclient.jar:/soft/swift/0.92/bin/../lib/commons-logging-1.1.jar:/soft/swift/0.92/bin/../lib/concurrent.jar:/soft/swift/0.92/bin/../lib/cryptix32.jar:/soft/swift/0.92/bin/../lib/cryptix-asn1.jar:/soft/swift/0.92/bin/../lib/cryptix.jar:/soft/swift/0.92/bin/../lib/globus_delegation_servic
>>>> e.jar:/soft/swift/0.92/bin/../lib/globus_delegation_stubs.jar:/soft/swift/0.92/bin/../lib/globus_wsrf_mds_aggregator_stubs.jar:/soft/swift/0.92/bin/../lib/globus_wsrf_rendezvous_service.jar:/soft/swift/0.92/bin/../lib/globus_wsrf_rendezvous_stubs.jar:/soft/swift/0.92/bin/../lib/globus_wsrf_rft_stubs.jar:/soft/swift/0.92/bin/../lib/gram-client.jar:/soft/swift/0.92/bin/../lib/gram-stubs.jar:/soft/swift/0.92/bin/../lib/gram-utils.jar:/soft/swift/0.92/bin/../lib/j2ssh-common-0.2.2.jar:/soft/swift/0.92/bin/../lib/j2ssh-core-0.2.2-patch-b.jar:/soft/swift/0.92/bin/../lib/jakarta-regexp-1.2.jar:/soft/swift/0.92/bin/../lib/jakarta-slide-webdavlib-2.0.jar:/soft/swift/0.92/bin/../lib/jaxrpc.jar:/soft/swift/0.92/bin/../lib/jce-jdk13-131.jar:/soft/swift/0.92/bin/../lib/jgss.jar:/soft/swift/0.92/bin/../lib/jline-0.9.94.jar:/soft/swift/0.92/bin/../lib/jsr173_1.0_api.jar:/soft/swift/0.92/bin/../lib/jug-lgpl-2.0.0.jar:/soft/swift/0.92/bin/../lib/junit.jar:/soft/swift/0.92/bin/../lib/log4j-1.2
>>>> .8.jar:/soft/swift/0.92/bin/../lib/naming-common.jar:/soft/swift/0.92/bin/../lib/naming-factory.jar:/soft/swift/0.92/bin/../lib/naming-java.jar:/soft/swift/0.92/bin/../lib/naming-resources.jar:/soft/swift/0.92/bin/../lib/opensaml.jar:/soft/swift/0.92/bin/../lib/puretls.jar:/soft/swift/0.92/bin/../lib/resolver.jar:/soft/swift/0.92/bin/../lib/saaj.jar:/soft/swift/0.92/bin/../lib/stringtemplate.jar:/soft/swift/0.92/bin/../lib/vdldefinitions.jar:/soft/swift/0.92/bin/../lib/wsdl4j.jar:/soft/swift/0.92/bin/../lib/wsrf_core.jar:/soft/swift/0.92/bin/../lib/wsrf_core_stubs.jar:/soft/swift/0.92/bin/../lib/wsrf_mds_index_stubs.jar:/soft/swift/0.92/bin/../lib/wsrf_mds_usefulrp_schema_stubs.jar:/soft/swift/0.92/bin/../lib/wsrf_provider_jce.jar:/soft/swift/0.92/bin/../lib/wsrf_tools.jar:/soft/swift/0.92/bin/../lib/wss4j.jar:/soft/swift/0.92/bin/../lib/xalan.jar:/soft/swift/0.92/bin/../lib/xbean.jar:/soft/swift/0.92/bin/../lib/xbean_xpath.jar:/soft/swift/0.92/bin/../lib/xercesImpl.jar:/soft
>>>> /swift/0.92/bin/../lib/xml-apis.jar:/soft/swift/0.92/bin/../lib/xmlsec.jar:/soft/swift/0.92/bin/../lib/xpp3-1.1.3.4d_b4_min.jar:/soft/swift/0.92/bin/../lib/xstream-1.1.1-patched.jar:
>>>> org.griphyn.vdl.karajan.Loader
>>>> '-config'
>>>> 'cf'
>>>> '-tc.file'
>>>> 'tc'
>>>> '-sites.file'
>>>> 'beagle-coaster.xml'
>>>> 'catsn.swift'
>>>> '-n=1''
>>>>
>>>> I there
>>>> something
>>>> else I need
>>>> to load on
>>>> beagle to
>>>> make swift
>>>> run
>>>> accordingly?
>>>>
>>>> --
>>>>
>>>> Any
>>>> intelligent
>>>> fool can
>>>> make things
>>>> bigger and
>>>> more
>>>> complex...
>>>> It takes a
>>>> touch of
>>>> genius -
>>>> and a lot
>>>> of courage
>>>> to move in
>>>> the
>>>> opposite
>>>> direction.
>>>>
>>>> - Albert
>>>> Einstein
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Any intelligent
>>>> fool can make
>>>> things bigger and
>>>> more complex... It
>>>> takes a touch of
>>>> genius - and a lot
>>>> of courage to move
>>>> in the opposite
>>>> direction.
>>>>
>>>> - Albert Einstein
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Any intelligent fool can
>>> make things bigger and
>>> more complex... It takes
>>> a touch of genius - and
>>> a lot of courage to move
>>> in the opposite direction.
>>>
>>> - Albert Einstein
>>>
>>>
>>
>>
>>
>> --
>>
>> Any intelligent fool can make
>> things bigger and more
>> complex... It takes a touch
>> of genius - and a lot of
>> courage to move in the
>> opposite direction.
>>
>> - Albert Einstein
>>
>>
>>
>>
>>
>> --
>>
>> Any intelligent fool can make
>> things bigger and more complex...
>> It takes a touch of genius - and
>> a lot of courage to move in the
>> opposite direction.
>>
>> - Albert Einstein
>>
>>
>>
>>
>>
>> --
>>
>> Any intelligent fool can make things
>> bigger and more complex... It takes a
>> touch of genius - and a lot of
>> courage to move in the opposite
>> direction.
>>
>> - Albert Einstein
>>
>>
>>
>>
>>
>>
>> --
>>
>> Any intelligent fool can make things bigger
>> and more complex... It takes a touch of
>> genius - and a lot of courage to move in the
>> opposite direction.
>>
>> - Albert Einstein
>>
>>
>>
>>
>>
>>
>> --
>>
>> Any intelligent fool can make things bigger and more
>> complex... It takes a touch of genius - and a lot of
>> courage to move in the opposite direction.
>>
>> - Albert Einstein
>>
>>
>>
>>
>>
>>
>> --
>>
>> Any intelligent fool can make things bigger and more
>> complex... It takes a touch of genius - and a lot of courage
>> to move in the opposite direction.
>>
>> - Albert Einstein
>>
>>
>
>
>
> --
>
> Any intelligent fool can make things bigger and more complex... It
> takes a touch of genius - and a lot of courage to move in the
> opposite direction.
>
> - Albert Einstein
>
>
>
>
>
> --
>
> Any intelligent fool can make things bigger and more complex... It
> takes a touch of genius - and a lot of courage to move in the opposite
> direction.
>
> - Albert Einstein
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110605/538d865e/attachment.html>
More information about the Swift-devel
mailing list