<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
color:black;}
span.EmailStyle24
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body bgcolor=white lang=EN-US link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Thanks Mike – running on local filesystem makes a lot of sense. Will give this a try as well.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><div><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#EF2B2D'>MATTHEW SHAXTED<o:p></o:p></span></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:gray'>SKIDMORE, OWINGS & MERRILL LLP<br>224 SOUTH MICHIGAN AVENUE<br>CHICAGO, IL 60604<br>T (312) 360-4368<br><a href="mailto:MATTHEW.SHAXTED@SOM.COM"><span style='color:blue'>MATTHEW.SHAXTED@SOM.COM</span></a><o:p></o:p></span></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-family:"Arial","sans-serif";color:gray'><o:p> </o:p></span></p><p class=MsoNormal style='margin-bottom:14.0pt'><a href="http://www.som.com/"><span style='font-family:"Arial","sans-serif";color:black;text-decoration:none'><img border=0 width=123 height=45 id="_x0000_i1030" src="cid:image004.png@01D09A12.089ED460" alt="cid:image001.png@01CF9071.6FB46030"></span></a><span style='font-family:"Arial","sans-serif"'><o:p></o:p></span></p><p class=MsoNormal style='line-height:12.0pt'><span style='font-size:8.0pt;font-family:"Arial","sans-serif";color:gray'>The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.</span><span style='font-family:"Arial","sans-serif";color:gray'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Verdana","sans-serif"'><img border=0 width=393 height=19 id="_x0000_i1029" src="cid:image002.gif@01D09A12.0897A870" alt="http://intranet.som.com/common/admin/file.cfm?f=%2Fresources%2Fcontent%2F5%2F0%2F4%2F4%2F6%2F4%2F0%2F3%2Fdocuments%2Fimagea560bf%2Egif%406e10073b%2E30854c37"></span><span style='color:#1F497D'><o:p></o:p></span></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='color:windowtext'>From:</span></b><span style='color:windowtext'> Michael Wilde [mailto:wilde@anl.gov] <br><b>Sent:</b> Friday, May 29, 2015 11:41 AM<br><b>To:</b> Matthew Shaxted; Swift User<br><b>Subject:</b> Re: [Swift-user] Channel Timeout on Beagle?<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal style='margin-bottom:12.0pt'>Matthew,<br><br>You should consider using Swift 0.96.0, and to the extent possible use local filesystems instead of the shared filesystem, which is often under excessive load.<br><br>We can discuss how to do this in subsequent followup as needed. Basically, try provider-staging, and put both the input data on the login node's local filesystem, and the site workdirectory under /dev/shm or /tmp. (You may need to probe the compute node as to which of these is writable and has sufficient space). <br><br>- Mike<span style='font-size:12.0pt'><o:p></o:p></span></p><div><p class=MsoNormal>On 5/29/15 10:39 AM, Matthew Shaxted wrote:<o:p></o:p></p></div><blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><p class=MsoNormal><span style='color:#1F497D'>It looks like the timeout problem is not solved actually. For some reason I am having much difficulty running on Beagle, and I have an feeling it is due to slow read/write. </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'>For example, I finished ~1,200 / 12,000 runs before failure (see below paragraph) and moving these results (of not very large result files) to the public_html is taking an hour or so. I’m hoping to scale up to 100-300k runs or so, thus this will become a significant bottleneck. I have emailed beagle-support about this issue just now.</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'>In all test environments my Swift workflow is working well, but when submitting jobs to Beagle queue, it completes some number of simulations before the timeout error occurs and all jobs stop. I'm using Swift-0.95-RC7 (and am in process of updating to 0.95 latest), but think these errors may also be due to this slow read/write. </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'>Any suggestions?</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'>Below is the error I see and the job completely stops:</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>Host: cluster</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>Directory: epsweep-run004/jobs/a/RunEP-ai2mic9m exception @ swift-int-staging.k, line: 181</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>Caused by: exception @ swift-int-staging.k, line: 177</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>Caused by: <span style='background:yellow'>Block task failed: Connection to worker lost</span></span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>org.globus.cog.coaster.TimeoutException: <span style='background:yellow'>Channel timed out</span>. lastTime=150526-142313.128,</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>50526-142514.107, channel=TCPChannel [type: server, contact: 0526-0802460-000014-000456</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'>at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'> at java.util.TimerThread.mainLoop(Timer.java:566)</span><o:p></o:p></p><p class=MsoNormal style='background:white'><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#222222'> at java.util.TimerThread.run(Timer.java:516)</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><div><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#EF2B2D'>MATTHEW SHAXTED</span><o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:gray'>SKIDMORE, OWINGS & MERRILL LLP<br>224 SOUTH MICHIGAN AVENUE<br>CHICAGO, IL 60604<br>T (312) 360-4368<br><a href="mailto:MATTHEW.SHAXTED@SOM.COM"><span style='color:blue'>MATTHEW.SHAXTED@SOM.COM</span></a></span><o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-family:"Arial","sans-serif";color:gray'> </span><o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt'><a href="http://www.som.com/"><span style='font-family:"Arial","sans-serif";color:black;text-decoration:none'><img border=0 width=123 height=45 id="_x0000_i1025" src="cid:image003.png@01D09A12.0897A870" alt="cid:image001.png@01CF9071.6FB46030"></span></a><o:p></o:p></p><p class=MsoNormal style='line-height:12.0pt'><span style='font-size:8.0pt;font-family:"Arial","sans-serif";color:gray'>The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Verdana","sans-serif"'><img border=0 width=393 height=19 id="_x0000_i1026" src="cid:image002.gif@01D09A12.0897A870" alt="http://intranet.som.com/common/admin/file.cfm?f=%2Fresources%2Fcontent%2F5%2F0%2F4%2F4%2F6%2F4%2F0%2F3%2Fdocuments%2Fimagea560bf%2Egif%406e10073b%2E30854c37"></span><o:p></o:p></p></div><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b>From:</b> Matthew Shaxted <br><b>Sent:</b> Wednesday, May 27, 2015 2:04 PM<br><b>To:</b> 'Swift User'<br><b>Subject:</b> RE: Channel Timeout on Beagle?<o:p></o:p></p></div></div><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'>Hi All: I was able to get the runs working successfully by changing the maxtime flag in the sites file.</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'>Thanks</span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'> </span><o:p></o:p></p><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b>From:</b> Matthew Shaxted <br><b>Sent:</b> Wednesday, May 27, 2015 9:50 AM<br><b>To:</b> Swift User<br><b>Subject:</b> Channel Timeout on Beagle?<o:p></o:p></p></div></div><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal>Hi Swift Users:<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal>I am running some studies on Beagle using Swift, and experiencing a strange error. The Swift scripts run great on cloud and on the Beagle login node, but seems to be timing out for some reason.<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal>Does anyone have insight into the cause of this? Thanks for any help.<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal>Below is the error I am getting:<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal>Host: cluster<o:p></o:p></p><p class=MsoNormal>Directory: epsweep-run004/jobs/a/RunEP-ai2mic9m exception @ swift-int-staging.k, line: 181<o:p></o:p></p><p class=MsoNormal>Caused by: exception @ swift-int-staging.k, line: 177<o:p></o:p></p><p class=MsoNormal>Caused by: <span style='background:yellow;mso-highlight:yellow'>Block task failed: Connection to worker lost</span><o:p></o:p></p><p class=MsoNormal>org.globus.cog.coaster.TimeoutException: <span style='background:yellow;mso-highlight:yellow'>Channel timed out</span>. lastTime=150526-142313.128,<o:p></o:p></p><p class=MsoNormal>50526-142514.107, channel=TCPChannel [type: server, contact: 0526-0802460-000014-000456<o:p></o:p></p><p class=MsoNormal>at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)<o:p></o:p></p><p class=MsoNormal> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)<o:p></o:p></p><p class=MsoNormal> at java.util.TimerThread.mainLoop(Timer.java:566)<o:p></o:p></p><p class=MsoNormal> at java.util.TimerThread.run(Timer.java:516)<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal>Below is my sites.xml file:<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal><pool handle="cluster"><o:p></o:p></p><p class=MsoNormal> <execution provider="coaster" jobmanager="local:pbs" /><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="project">CI-SES000178</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="jobsPerNode">24</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="lowOverAllocation">100</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="highOverAllocation">100</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="maxtime">10800</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="maxWalltime">01:25:00</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="userHomeOverride">/lustre/beagle2/mattshax/epsweep/swifthome</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="slots">20</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="maxnodes">600</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="globus" key="nodeGranularity">1</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="karajan" key="jobThrottle">180</profile><o:p></o:p></p><p class=MsoNormal> <profile namespace="karajan" key="initialScore">10000</profile><o:p></o:p></p><p class=MsoNormal> <!-- <profile namespace="karajan" key="workerLoggingLevel">trace</profile> --><o:p></o:p></p><p class=MsoNormal> <workdirectory>/dev/shm/mattshax/swiftapp</workdirectory><o:p></o:p></p><p class=MsoNormal> </pool><o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#EF2B2D'>MATTHEW SHAXTED</span><o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:gray'>SKIDMORE, OWINGS & MERRILL LLP<br>224 SOUTH MICHIGAN AVENUE<br>CHICAGO, IL 60604<br>T (312) 360-4368<br><a href="mailto:MATTHEW.SHAXTED@SOM.COM"><span style='color:blue'>MATTHEW.SHAXTED@SOM.COM</span></a></span><o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt;line-height:13.0pt'><span style='font-family:"Arial","sans-serif";color:gray'> </span><o:p></o:p></p><p class=MsoNormal style='margin-bottom:14.0pt'><a href="http://www.som.com/"><span style='font-family:"Arial","sans-serif";color:black;text-decoration:none'><img border=0 width=123 height=45 id="Picture_x0020_1" src="cid:image003.png@01D09A12.0897A870" alt="cid:image001.png@01CF9071.6FB46030"></span></a><a name="_GoBack"></a><o:p></o:p></p><p class=MsoNormal style='line-height:12.0pt'><span style='font-size:8.0pt;font-family:"Arial","sans-serif";color:gray'>The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.</span><o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Verdana","sans-serif"'><img border=0 width=393 height=19 id="Picture_x0020_2" src="cid:image002.gif@01D09A12.0897A870" alt="http://intranet.som.com/common/admin/file.cfm?f=%2Fresources%2Fcontent%2F5%2F0%2F4%2F4%2F6%2F4%2F0%2F3%2Fdocuments%2Fimagea560bf%2Egif%406e10073b%2E30854c37"></span><o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><br><br><br><o:p></o:p></span></p><pre>_______________________________________________<o:p></o:p></pre><pre>Swift-user mailing list<o:p></o:p></pre><pre><a href="mailto:Swift-user@ci.uchicago.edu">Swift-user@ci.uchicago.edu</a><o:p></o:p></pre><pre><a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><o:p></o:p></pre></blockquote><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><br><br><o:p></o:p></span></p><pre>-- <o:p></o:p></pre><pre>Michael Wilde<o:p></o:p></pre><pre>Mathematics and Computer Science Computation Institute<o:p></o:p></pre><pre>Argonne National Laboratory The University of Chicago<o:p></o:p></pre></div></body></html>