[Swift-devel] ssh-cl how to tell coaster bootstrap to run with limited java heap space

Mihael Hategan hategan at mcs.anl.gov
Thu Sep 19 15:56:32 CDT 2013


The coaster bootstrap starts the service with 512M on 64 bit machines.

It can be changed in the code:
org.globus.cog.abstraction.impl.execution.coaster.bootstrap.Bootstrap.java, line 191. The other solution is to run the service on a compute node, but I don't think we ever spent enough effort on nailing that issue down. Maybe it's time to do so.

On Thu, 2013-09-19 at 15:53 -0500, Ketan Maheshwari wrote:
> Yes, 64 bit running CentOS release 6.4
> 
> 
> On Thu, Sep 19, 2013 at 3:48 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> 
> > Are the login nodes 64 bit by any chance?
> >
> > On Thu, 2013-09-19 at 15:43 -0500, Ketan Maheshwari wrote:
> > > SDSC Gordon admins have limited java heap space to 256 on login nodes.
> > >
> > > This is enabled via the following environment variable:
> > >
> > > JAVA_TOOL_OPTIONS=-Xmx256m
> > >
> > > It seems coaster bootstrap does not like this:
> > >
> > > mdw$ swift -sites.file sites.gordon.xml -tc.file apps -config cf
> > > workflow.swift
> > > Swift trunk swift-r7089 cog-r3775
> > > RunID: 20130919-2038-jef0ns83
> > > Progress:  time: Thu, 19 Sep 2013 20:38:42 +0000
> > > Progress:  time: Thu, 19 Sep 2013 20:38:43 +0000  Submitting:2
> > >
> > > Execution failed:
> > > Exception in matrixgen:
> > >     Arguments: [2544, 3300, mA.dat]
> > >     Host: gordon
> > >     Directory: workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl
> > >     exception @ swift-int-staging.k, line: 162
> > > Caused by: null
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Could
> > > not submit job
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Could
> > > not start coaster service
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task
> > > ended before registration was received.
> > >
> > > Picked up JAVA_TOOL_OPTIONS: -Xmx256m
> > > /bin/bash: line 54: 33675 Aborted
> > /usr/java/latest/bin/java
> > > -Djava=/usr/java/latest/bin/java -DGLOBUS_TCP_PORT_RANGE=50000,51000
> > > -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604
> > > -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem
> > > -DGLOBUS_HOSTNAME=gordon.sdsc.xsede.org -Duser.home=/home/ketan -jar
> > > /tmp/bootstrap.RWIqFu http://swift.rcc.uchicago.edu:50001
> > > https://128.135.112.73:50000 11836079986
> > >
> > >
> > > Do I understand right that this is indeed the java heap space issue? or
> > is
> > > it something else that I could work around with? Thanks for any ideas.
> > >
> > > SDSC Gordon admins have limited java heap space to 256 on login nodes.
> > >
> > >
> > > This is enabled via the following environment variable:
> > >
> > > JAVA_TOOL_OPTIONS=-Xmx256m
> > >
> > >
> > > It seems coaster bootstrap does not like this:
> > >
> > > mdw$ swift -sites.file sites.gordon.xml -tc.file apps -config cf
> > > workflow.swift
> > > Swift trunk swift-r7089 cog-r3775
> > > RunID: 20130919-2038-jef0ns83
> > > Progress:  time: Thu, 19 Sep 2013 20:38:42 +0000
> > > Progress:  time: Thu, 19 Sep 2013 20:38:43 +0000  Submitting:2
> > >
> > > Execution failed:
> > > Exception in matrixgen:
> > >     Arguments: [2544, 3300, mA.dat]
> > >     Host: gordon
> > >     Directory:
> > > workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl
> > >     exception @ swift-int-staging.k, line: 162
> > > Caused by: null
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > > Could not submit job
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > > Could not start coaster service
> > > Caused by:
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > > Task ended before registration was received.
> > >
> > > Picked up JAVA_TOOL_OPTIONS: -Xmx256m
> > > /bin/bash: line 54: 33675
> > > Aborted                 /usr/java/latest/bin/java
> > > -Djava=/usr/java/latest/bin/java -DGLOBUS_TCP_PORT_RANGE=50000,51000
> > > -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604
> > > -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem
> > > -DGLOBUS_HOSTNAME=gordon.sdsc.xsede.org -Duser.home=/home/ketan
> > > -jar /tmp/bootstrap.RWIqFu http://swift.rcc.uchicago.edu:50001
> > > https://128.135.112.73:50000 11836079986
> > >
> > >
> > >
> > >
> > > Do I understand right that this is indeed the java heap space issue?
> > > or is it something else that I could work around with? Thanks for any
> > > ideas.
> > >
> > >
> > > --
> > > Ketan
> > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> >
> 
> 
> Yes, 64 bit running CentOS release 6.4 
> 
> 
> 
> On Thu, Sep 19, 2013 at 3:48 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
>         Are the login nodes 64 bit by any chance?
>         
>         On Thu, 2013-09-19 at 15:43 -0500, Ketan Maheshwari wrote:
>         > SDSC Gordon admins have limited java heap space to 256 on
>         login nodes.
>         >
>         > This is enabled via the following environment variable:
>         >
>         > JAVA_TOOL_OPTIONS=-Xmx256m
>         >
>         > It seems coaster bootstrap does not like this:
>         >
>         > mdw$ swift -sites.file sites.gordon.xml -tc.file apps
>         -config cf
>         > workflow.swift
>         > Swift trunk swift-r7089 cog-r3775
>         > RunID: 20130919-2038-jef0ns83
>         > Progress:  time: Thu, 19 Sep 2013 20:38:42 +0000
>         > Progress:  time: Thu, 19 Sep 2013 20:38:43 +0000
>          Submitting:2
>         >
>         > Execution failed:
>         > Exception in matrixgen:
>         >     Arguments: [2544, 3300, mA.dat]
>         >     Host: gordon
>         >     Directory:
>         workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl
>         >     exception @ swift-int-staging.k, line: 162
>         > Caused by: null
>         > Caused by:
>         >
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>         > not submit job
>         > Caused by:
>         >
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>         > not start coaster service
>         > Caused by:
>         >
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task
>         > ended before registration was received.
>         >
>         > Picked up JAVA_TOOL_OPTIONS: -Xmx256m
>         > /bin/bash: line 54: 33675 Aborted
>           /usr/java/latest/bin/java
>         > -Djava=/usr/java/latest/bin/java
>         -DGLOBUS_TCP_PORT_RANGE=50000,51000
>         >
>         -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604
>         >
>         -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem
>         > -DGLOBUS_HOSTNAME=gordon.sdsc.xsede.org
>         -Duser.home=/home/ketan -jar
>         > /tmp/bootstrap.RWIqFu http://swift.rcc.uchicago.edu:50001
>         > https://128.135.112.73:50000 11836079986
>         >
>         >
>         > Do I understand right that this is indeed the java heap
>         space issue? or is
>         > it something else that I could work around with? Thanks for
>         any ideas.
>         >
>         > SDSC Gordon admins have limited java heap space to 256 on
>         login nodes.
>         >
>         >
>         > This is enabled via the following environment variable:
>         >
>         > JAVA_TOOL_OPTIONS=-Xmx256m
>         >
>         >
>         > It seems coaster bootstrap does not like this:
>         >
>         > mdw$ swift -sites.file sites.gordon.xml -tc.file apps
>         -config cf
>         > workflow.swift
>         > Swift trunk swift-r7089 cog-r3775
>         > RunID: 20130919-2038-jef0ns83
>         > Progress:  time: Thu, 19 Sep 2013 20:38:42 +0000
>         > Progress:  time: Thu, 19 Sep 2013 20:38:43 +0000
>          Submitting:2
>         >
>         > Execution failed:
>         > Exception in matrixgen:
>         >     Arguments: [2544, 3300, mA.dat]
>         >     Host: gordon
>         >     Directory:
>         > workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl
>         >     exception @ swift-int-staging.k, line: 162
>         > Caused by: null
>         > Caused by:
>         >
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>         > Could not submit job
>         > Caused by:
>         >
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>         > Could not start coaster service
>         > Caused by:
>         >
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>         > Task ended before registration was received.
>         >
>         > Picked up JAVA_TOOL_OPTIONS: -Xmx256m
>         > /bin/bash: line 54: 33675
>         > Aborted                 /usr/java/latest/bin/java
>         > -Djava=/usr/java/latest/bin/java
>         -DGLOBUS_TCP_PORT_RANGE=50000,51000
>         >
>         -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604
>         >
>         -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem
>         > -DGLOBUS_HOSTNAME=gordon.sdsc.xsede.org
>         -Duser.home=/home/ketan
>         > -jar /tmp/bootstrap.RWIqFu
>         http://swift.rcc.uchicago.edu:50001
>         > https://128.135.112.73:50000 11836079986
>         >
>         >
>         >
>         >
>         > Do I understand right that this is indeed the java heap
>         space issue?
>         > or is it something else that I could work around with?
>         Thanks for any
>         > ideas.
>         >
>         >
>         > --
>         > Ketan
>         >
>         >
>         
>         > _______________________________________________
>         > Swift-devel mailing list
>         > Swift-devel at ci.uchicago.edu
>         >
>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>         
>         
> 
> 
> 
> -- 
> Ketan
> 
> 





More information about the Swift-devel mailing list