<div dir="ltr">Changing the line to 256M and rebuilding Swift somehow did not have any effect on the run. I agree, an ability to run coaster service on compute node will help. XSEDE Stampede also has the same and other constraints in place.<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Sep 19, 2013 at 3:56 PM, Mihael Hategan <span dir="ltr"><<a href="mailto:hategan@mcs.anl.gov" target="_blank">hategan@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The coaster bootstrap starts the service with 512M on 64 bit machines.<br>
<br>
It can be changed in the code:<br>
org.globus.cog.abstraction.impl.execution.coaster.bootstrap.Bootstrap.java, line 191. The other solution is to run the service on a compute node, but I don't think we ever spent enough effort on nailing that issue down. Maybe it's time to do so.<br>
<div class="HOEnZb"><div class="h5"><br>
On Thu, 2013-09-19 at 15:53 -0500, Ketan Maheshwari wrote:<br>
> Yes, 64 bit running CentOS release 6.4<br>
><br>
><br>
> On Thu, Sep 19, 2013 at 3:48 PM, Mihael Hategan <<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>> wrote:<br>
><br>
> > Are the login nodes 64 bit by any chance?<br>
> ><br>
> > On Thu, 2013-09-19 at 15:43 -0500, Ketan Maheshwari wrote:<br>
> > > SDSC Gordon admins have limited java heap space to 256 on login nodes.<br>
> > ><br>
> > > This is enabled via the following environment variable:<br>
> > ><br>
> > > JAVA_TOOL_OPTIONS=-Xmx256m<br>
> > ><br>
> > > It seems coaster bootstrap does not like this:<br>
> > ><br>
> > > mdw$ swift -sites.file sites.gordon.xml -tc.file apps -config cf<br>
> > > workflow.swift<br>
> > > Swift trunk swift-r7089 cog-r3775<br>
> > > RunID: 20130919-2038-jef0ns83<br>
> > > Progress: time: Thu, 19 Sep 2013 20:38:42 +0000<br>
> > > Progress: time: Thu, 19 Sep 2013 20:38:43 +0000 Submitting:2<br>
> > ><br>
> > > Execution failed:<br>
> > > Exception in matrixgen:<br>
> > > Arguments: [2544, 3300, mA.dat]<br>
> > > Host: gordon<br>
> > > Directory: workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl<br>
> > > exception @ swift-int-staging.k, line: 162<br>
> > > Caused by: null<br>
> > > Caused by:<br>
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > Could<br>
> > > not submit job<br>
> > > Caused by:<br>
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > Could<br>
> > > not start coaster service<br>
> > > Caused by:<br>
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task<br>
> > > ended before registration was received.<br>
> > ><br>
> > > Picked up JAVA_TOOL_OPTIONS: -Xmx256m<br>
> > > /bin/bash: line 54: 33675 Aborted<br>
> > /usr/java/latest/bin/java<br>
> > > -Djava=/usr/java/latest/bin/java -DGLOBUS_TCP_PORT_RANGE=50000,51000<br>
> > > -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604<br>
> > > -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem<br>
> > > -DGLOBUS_HOSTNAME=<a href="http://gordon.sdsc.xsede.org" target="_blank">gordon.sdsc.xsede.org</a> -Duser.home=/home/ketan -jar<br>
> > > /tmp/bootstrap.RWIqFu <a href="http://swift.rcc.uchicago.edu:50001" target="_blank">http://swift.rcc.uchicago.edu:50001</a><br>
> > > <a href="https://128.135.112.73:50000" target="_blank">https://128.135.112.73:50000</a> 11836079986<br>
> > ><br>
> > ><br>
> > > Do I understand right that this is indeed the java heap space issue? or<br>
> > is<br>
> > > it something else that I could work around with? Thanks for any ideas.<br>
> > ><br>
> > > SDSC Gordon admins have limited java heap space to 256 on login nodes.<br>
> > ><br>
> > ><br>
> > > This is enabled via the following environment variable:<br>
> > ><br>
> > > JAVA_TOOL_OPTIONS=-Xmx256m<br>
> > ><br>
> > ><br>
> > > It seems coaster bootstrap does not like this:<br>
> > ><br>
> > > mdw$ swift -sites.file sites.gordon.xml -tc.file apps -config cf<br>
> > > workflow.swift<br>
> > > Swift trunk swift-r7089 cog-r3775<br>
> > > RunID: 20130919-2038-jef0ns83<br>
> > > Progress: time: Thu, 19 Sep 2013 20:38:42 +0000<br>
> > > Progress: time: Thu, 19 Sep 2013 20:38:43 +0000 Submitting:2<br>
> > ><br>
> > > Execution failed:<br>
> > > Exception in matrixgen:<br>
> > > Arguments: [2544, 3300, mA.dat]<br>
> > > Host: gordon<br>
> > > Directory:<br>
> > > workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl<br>
> > > exception @ swift-int-staging.k, line: 162<br>
> > > Caused by: null<br>
> > > Caused by:<br>
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > > Could not submit job<br>
> > > Caused by:<br>
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > > Could not start coaster service<br>
> > > Caused by:<br>
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > > Task ended before registration was received.<br>
> > ><br>
> > > Picked up JAVA_TOOL_OPTIONS: -Xmx256m<br>
> > > /bin/bash: line 54: 33675<br>
> > > Aborted /usr/java/latest/bin/java<br>
> > > -Djava=/usr/java/latest/bin/java -DGLOBUS_TCP_PORT_RANGE=50000,51000<br>
> > > -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604<br>
> > > -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem<br>
> > > -DGLOBUS_HOSTNAME=<a href="http://gordon.sdsc.xsede.org" target="_blank">gordon.sdsc.xsede.org</a> -Duser.home=/home/ketan<br>
> > > -jar /tmp/bootstrap.RWIqFu <a href="http://swift.rcc.uchicago.edu:50001" target="_blank">http://swift.rcc.uchicago.edu:50001</a><br>
> > > <a href="https://128.135.112.73:50000" target="_blank">https://128.135.112.73:50000</a> 11836079986<br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > Do I understand right that this is indeed the java heap space issue?<br>
> > > or is it something else that I could work around with? Thanks for any<br>
> > > ideas.<br>
> > ><br>
> > ><br>
> > > --<br>
> > > Ketan<br>
> > ><br>
> > ><br>
> > > _______________________________________________<br>
> > > Swift-devel mailing list<br>
> > > <a href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a><br>
> > > <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel</a><br>
> ><br>
> ><br>
> ><br>
><br>
><br>
> Yes, 64 bit running CentOS release 6.4<br>
><br>
><br>
><br>
> On Thu, Sep 19, 2013 at 3:48 PM, Mihael Hategan <<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>><br>
> wrote:<br>
> Are the login nodes 64 bit by any chance?<br>
><br>
> On Thu, 2013-09-19 at 15:43 -0500, Ketan Maheshwari wrote:<br>
> > SDSC Gordon admins have limited java heap space to 256 on<br>
> login nodes.<br>
> ><br>
> > This is enabled via the following environment variable:<br>
> ><br>
> > JAVA_TOOL_OPTIONS=-Xmx256m<br>
> ><br>
> > It seems coaster bootstrap does not like this:<br>
> ><br>
> > mdw$ swift -sites.file sites.gordon.xml -tc.file apps<br>
> -config cf<br>
> > workflow.swift<br>
> > Swift trunk swift-r7089 cog-r3775<br>
> > RunID: 20130919-2038-jef0ns83<br>
> > Progress: time: Thu, 19 Sep 2013 20:38:42 +0000<br>
> > Progress: time: Thu, 19 Sep 2013 20:38:43 +0000<br>
> Submitting:2<br>
> ><br>
> > Execution failed:<br>
> > Exception in matrixgen:<br>
> > Arguments: [2544, 3300, mA.dat]<br>
> > Host: gordon<br>
> > Directory:<br>
> workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl<br>
> > exception @ swift-int-staging.k, line: 162<br>
> > Caused by: null<br>
> > Caused by:<br>
> ><br>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could<br>
> > not submit job<br>
> > Caused by:<br>
> ><br>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could<br>
> > not start coaster service<br>
> > Caused by:<br>
> ><br>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task<br>
> > ended before registration was received.<br>
> ><br>
> > Picked up JAVA_TOOL_OPTIONS: -Xmx256m<br>
> > /bin/bash: line 54: 33675 Aborted<br>
> /usr/java/latest/bin/java<br>
> > -Djava=/usr/java/latest/bin/java<br>
> -DGLOBUS_TCP_PORT_RANGE=50000,51000<br>
> ><br>
> -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604<br>
> ><br>
> -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem<br>
> > -DGLOBUS_HOSTNAME=<a href="http://gordon.sdsc.xsede.org" target="_blank">gordon.sdsc.xsede.org</a><br>
> -Duser.home=/home/ketan -jar<br>
> > /tmp/bootstrap.RWIqFu <a href="http://swift.rcc.uchicago.edu:50001" target="_blank">http://swift.rcc.uchicago.edu:50001</a><br>
> > <a href="https://128.135.112.73:50000" target="_blank">https://128.135.112.73:50000</a> 11836079986<br>
> ><br>
> ><br>
> > Do I understand right that this is indeed the java heap<br>
> space issue? or is<br>
> > it something else that I could work around with? Thanks for<br>
> any ideas.<br>
> ><br>
> > SDSC Gordon admins have limited java heap space to 256 on<br>
> login nodes.<br>
> ><br>
> ><br>
> > This is enabled via the following environment variable:<br>
> ><br>
> > JAVA_TOOL_OPTIONS=-Xmx256m<br>
> ><br>
> ><br>
> > It seems coaster bootstrap does not like this:<br>
> ><br>
> > mdw$ swift -sites.file sites.gordon.xml -tc.file apps<br>
> -config cf<br>
> > workflow.swift<br>
> > Swift trunk swift-r7089 cog-r3775<br>
> > RunID: 20130919-2038-jef0ns83<br>
> > Progress: time: Thu, 19 Sep 2013 20:38:42 +0000<br>
> > Progress: time: Thu, 19 Sep 2013 20:38:43 +0000<br>
> Submitting:2<br>
> ><br>
> > Execution failed:<br>
> > Exception in matrixgen:<br>
> > Arguments: [2544, 3300, mA.dat]<br>
> > Host: gordon<br>
> > Directory:<br>
> > workflow-20130919-2038-jef0ns83/jobs/a/matrixgen-a1rnkhfl<br>
> > exception @ swift-int-staging.k, line: 162<br>
> > Caused by: null<br>
> > Caused by:<br>
> ><br>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > Could not submit job<br>
> > Caused by:<br>
> ><br>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > Could not start coaster service<br>
> > Caused by:<br>
> ><br>
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:<br>
> > Task ended before registration was received.<br>
> ><br>
> > Picked up JAVA_TOOL_OPTIONS: -Xmx256m<br>
> > /bin/bash: line 54: 33675<br>
> > Aborted /usr/java/latest/bin/java<br>
> > -Djava=/usr/java/latest/bin/java<br>
> -DGLOBUS_TCP_PORT_RANGE=50000,51000<br>
> ><br>
> -DX509_USER_PROXY=/home/ketan/.globus/sshproxy-316831905-1379663604<br>
> ><br>
> -DX509_CERT_DIR=/home/ketan/.globus/sshCAcert-316831905-1379663604.pem<br>
> > -DGLOBUS_HOSTNAME=<a href="http://gordon.sdsc.xsede.org" target="_blank">gordon.sdsc.xsede.org</a><br>
> -Duser.home=/home/ketan<br>
> > -jar /tmp/bootstrap.RWIqFu<br>
> <a href="http://swift.rcc.uchicago.edu:50001" target="_blank">http://swift.rcc.uchicago.edu:50001</a><br>
> > <a href="https://128.135.112.73:50000" target="_blank">https://128.135.112.73:50000</a> 11836079986<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Do I understand right that this is indeed the java heap<br>
> space issue?<br>
> > or is it something else that I could work around with?<br>
> Thanks for any<br>
> > ideas.<br>
> ><br>
> ><br>
> > --<br>
> > Ketan<br>
> ><br>
> ><br>
><br>
> > _______________________________________________<br>
> > Swift-devel mailing list<br>
> > <a href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a><br>
> ><br>
> <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel</a><br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> Ketan<br>
><br>
><br>
<br>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><font face="'courier new', monospace">Ketan</font><br><br>
</div>