<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Matthew,<br>
<br>
You should consider using Swift 0.96.0, and to the extent possible
use local filesystems instead of the shared filesystem, which is
often under excessive load.<br>
<br>
We can discuss how to do this in subsequent followup as needed.
Basically, try provider-staging, and put both the input data on the
login node's local filesystem, and the site workdirectory under
/dev/shm or /tmp. (You may need to probe the compute node as to
which of these is writable and has sufficient space). <br>
<br>
- Mike<br>
<br>
<div class="moz-cite-prefix">On 5/29/15 10:39 AM, Matthew Shaxted
wrote:<br>
</div>
<blockquote
cite="mid:E606D0858F0AB941B6A9EB52730031F6CE0214E52F@CCRD007.mail.lan"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle20
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">It looks like
the timeout problem is not solved actually. For some reason
I am having much difficulty running on Beagle, and I have an
feeling it is due to slow read/write. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">For example, I
finished ~1,200 / 12,000 runs before failure (see below
paragraph) and moving these results (of not very large
result files) to the public_html is taking an hour or so.
I’m hoping to scale up to 100-300k runs or so, thus this
will become a significant bottleneck. I have emailed
beagle-support about this issue just now.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">In all test
environments my Swift workflow is working well, but when
submitting jobs to Beagle queue, it completes some number of
simulations before the timeout error occurs and all jobs
stop. I'm using Swift-0.95-RC7 (and am in process of
updating to 0.95 latest), but think these errors may also be
due to this slow read/write. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Any
suggestions?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Below is the
error I see and the job completely stops:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">Host:
cluster<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">Directory:
epsweep-run004/jobs/a/RunEP-ai2mic9m exception @
swift-int-staging.k, line: 181<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">Caused
by: exception @ swift-int-staging.k, line: 177<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">Caused
by: <span style="background:yellow">Block task failed:
Connection to worker lost</span><o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">org.globus.cog.coaster.TimeoutException: <span
style="background:yellow">Channel timed out</span>.
lastTime=150526-142313.128,<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">50526-142514.107,
channel=TCPChannel [type: server, contact:
0526-0802460-000014-000456<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">at
org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">
at
org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">
at java.util.TimerThread.mainLoop(Timer.java:566)<o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span
style="font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222">
at java.util.TimerThread.run(Timer.java:516)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"
style="margin-bottom:14.0pt;line-height:13.0pt"><span
style="font-size:9.0pt;font-family:"Arial","sans-serif";color:#EF2B2D">MATTHEW
SHAXTED<o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:14.0pt;line-height:13.0pt"><span
style="font-size:9.0pt;font-family:"Arial","sans-serif";color:gray">SKIDMORE,
OWINGS & MERRILL LLP<br>
224 SOUTH MICHIGAN AVENUE<br>
CHICAGO, IL 60604<br>
T (312) 360-4368<br>
<a moz-do-not-send="true"
href="mailto:MATTHEW.SHAXTED@SOM.COM"><span
style="color:blue">MATTHEW.SHAXTED@SOM.COM</span></a><o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:14.0pt;line-height:13.0pt"><span
style="font-family:"Arial","sans-serif";color:gray"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-bottom:14.0pt"><a
moz-do-not-send="true" href="http://www.som.com/"><span
style="font-family:"Arial","sans-serif";color:black;text-decoration:none"><img
id="_x0000_i1028"
src="cid:part2.00040106.01010102@anl.gov"
alt="cid:image001.png@01CF9071.6FB46030" height="45"
width="123" border="0"></span></a><span
style="font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="line-height:12.0pt"><span
style="font-size:8.0pt;font-family:"Arial","sans-serif";color:gray">The
information contained in this communication may be
confidential, is intended only for the use of the
recipient(s) named above, and may be legally privileged.
If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of
its contents, is strictly prohibited and may be unlawful.
If you have received this communication in error, please
return it to the sender immediately and delete the
original message and any copy of it from your computer
system. If you have any questions concerning this message,
please contact the sender.</span><span
style="font-family:"Arial","sans-serif";color:gray"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Verdana","sans-serif";color:black"><img
id="_x0000_i1027"
src="cid:part4.00070807.08000209@anl.gov"
alt="http://intranet.som.com/common/admin/file.cfm?f=%2Fresources%2Fcontent%2F5%2F0%2F4%2F4%2F6%2F4%2F0%2F3%2Fdocuments%2Fimagea560bf%2Egif%406e10073b%2E30854c37"
height="19" width="393" border="0"></span><span
style="color:#1F497D"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Matthew Shaxted <br>
<b>Sent:</b> Wednesday, May 27, 2015 2:04 PM<br>
<b>To:</b> 'Swift User'<br>
<b>Subject:</b> RE: Channel Timeout on Beagle?<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Hi All: I was
able to get the runs working successfully by changing the
maxtime flag in the sites file.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Thanks<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Matthew Shaxted <br>
<b>Sent:</b> Wednesday, May 27, 2015 9:50 AM<br>
<b>To:</b> Swift User<br>
<b>Subject:</b> Channel Timeout on Beagle?<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi Swift Users:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I am running some studies on Beagle using
Swift, and experiencing a strange error. The Swift scripts run
great on cloud and on the Beagle login node, but seems to be
timing out for some reason.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Does anyone have insight into the cause of
this? Thanks for any help.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Below is the error I am getting:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Host: cluster<o:p></o:p></p>
<p class="MsoNormal">Directory:
epsweep-run004/jobs/a/RunEP-ai2mic9m exception @
swift-int-staging.k, line: 181<o:p></o:p></p>
<p class="MsoNormal">Caused by: exception @ swift-int-staging.k,
line: 177<o:p></o:p></p>
<p class="MsoNormal">Caused by: <span
style="background:yellow;mso-highlight:yellow">Block task
failed: Connection to worker lost</span><o:p></o:p></p>
<p class="MsoNormal">org.globus.cog.coaster.TimeoutException: <span
style="background:yellow;mso-highlight:yellow">Channel timed
out</span>. lastTime=150526-142313.128,<o:p></o:p></p>
<p class="MsoNormal">50526-142514.107, channel=TCPChannel [type:
server, contact: 0526-0802460-000014-000456<o:p></o:p></p>
<p class="MsoNormal">at
org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)<o:p></o:p></p>
<p class="MsoNormal"> at
org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)<o:p></o:p></p>
<p class="MsoNormal"> at
java.util.TimerThread.mainLoop(Timer.java:566)<o:p></o:p></p>
<p class="MsoNormal"> at
java.util.TimerThread.run(Timer.java:516)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Below is my sites.xml file:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><pool handle="cluster"><o:p></o:p></p>
<p class="MsoNormal"> <execution provider="coaster"
jobmanager="local:pbs" /><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="project">CI-SES000178</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="jobsPerNode">24</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="lowOverAllocation">100</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="highOverAllocation">100</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="maxtime">10800</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="maxWalltime">01:25:00</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="userHomeOverride">/lustre/beagle2/mattshax/epsweep/swifthome</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="slots">20</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="maxnodes">600</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="globus"
key="nodeGranularity">1</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="karajan"
key="jobThrottle">180</profile><o:p></o:p></p>
<p class="MsoNormal"> <profile namespace="karajan"
key="initialScore">10000</profile><o:p></o:p></p>
<p class="MsoNormal"> <!-- <profile namespace="karajan"
key="workerLoggingLevel">trace</profile> --><o:p></o:p></p>
<p class="MsoNormal">
<workdirectory>/dev/shm/mattshax/swiftapp</workdirectory><o:p></o:p></p>
<p class="MsoNormal"> </pool><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"
style="margin-bottom:14.0pt;line-height:13.0pt"><span
style="font-size:9.0pt;font-family:"Arial","sans-serif";color:#EF2B2D">MATTHEW
SHAXTED<o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:14.0pt;line-height:13.0pt"><span
style="font-size:9.0pt;font-family:"Arial","sans-serif";color:gray">SKIDMORE,
OWINGS & MERRILL LLP<br>
224 SOUTH MICHIGAN AVENUE<br>
CHICAGO, IL 60604<br>
T (312) 360-4368<br>
<a moz-do-not-send="true"
href="mailto:MATTHEW.SHAXTED@SOM.COM"><span
style="color:blue">MATTHEW.SHAXTED@SOM.COM</span></a><o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:14.0pt;line-height:13.0pt"><span
style="font-family:"Arial","sans-serif";color:gray"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-bottom:14.0pt"><a
moz-do-not-send="true" href="http://www.som.com/"><span
style="font-family:"Arial","sans-serif";color:black;text-decoration:none"><img
id="Picture_x0020_1"
src="cid:part6.02090508.02060809@anl.gov"
alt="cid:image001.png@01CF9071.6FB46030" height="45"
width="123" border="0"></span></a><a
moz-do-not-send="true" name="_GoBack"></a><span
style="font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="line-height:12.0pt"><span
style="font-size:8.0pt;font-family:"Arial","sans-serif";color:gray">The
information contained in this communication may be
confidential, is intended only for the use of the
recipient(s) named above, and may be legally privileged. If
the reader of this message is not the intended recipient,
you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of
its contents, is strictly prohibited and may be unlawful. If
you have received this communication in error, please return
it to the sender immediately and delete the original
message and any copy of it from your computer system. If you
have any questions concerning this message, please contact
the sender.</span><span
style="font-family:"Arial","sans-serif";color:gray"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Verdana","sans-serif";color:black"><img
id="Picture_x0020_2"
src="cid:part4.00070807.08000209@anl.gov"
alt="http://intranet.som.com/common/admin/file.cfm?f=%2Fresources%2Fcontent%2F5%2F0%2F4%2F4%2F6%2F4%2F0%2F3%2Fdocuments%2Fimagea560bf%2Egif%406e10073b%2E30854c37"
height="19" width="393" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Swift-user mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Swift-user@ci.uchicago.edu">Swift-user@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a></pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Michael Wilde
Mathematics and Computer Science Computation Institute
Argonne National Laboratory The University of Chicago
</pre>
</body>
</html>