<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"\@Adobe Song Std L";
panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri","sans-serif";}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoPlainText" style="margin-left:.5in">There are a number of places. I think this got improved a bit post 0.94, but that's another story.<o:p></o:p></p>
<p class="MsoPlainText" style="margin-left:.5in"><o:p> </o:p></p>
<p class="MsoPlainText" style="margin-left:.5in">Anyway, first place is the swift log (<scriptName>-<runId>.log in the directory where you ran swift).<o:p></o:p></p>
<p class="MsoPlainText"><span style="color:#4F81BD">I’m not seeing much here. There are the periodic warnings like:<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">2014-05-22 08:53:50,494-0700 INFO RuntimeStats$ProgressTicker Selecting site:39311 Stage in:1 Submitting:1 Stage out:284 Finished successfully:3118 Failed but can retry:646<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:#4F81BD">And occasional errors like:<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">Block Block task status changed: Failed Exitcode file (/g/g15/bronevet/.globus/scripts/PBS7360428055973706159.submit.exitcode) not found 5 queue polls after the job was reported done<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:#4F81BD">However, I can’t see the file in question at the reported path.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:#4F81BD"><o:p> </o:p></span></p>
<p class="MsoPlainText"><span style="color:#4F81BD">Also, when the run finally fails (I ran with -lazy.errors true to keep it going as far as possible) I get the following:<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">Exception in runModel:<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD"> Arguments: [--solver, bicg, --precond, diag, --matrix, nasa1824, --num_runs, 100, --modelType, contModel, --faultModel, n, --locModel, local, --ap, 1e-2, --am, 1e-4, --sp, 1e-2,
--sm, 1e-4, --dp, 1e-2, --dm, 1e-4, --mp, 1e-2, --mm, 1e-4, --psp, 1e-2, --psm, 1e-2, --ptsp, 1e-2, --ptsm, 1e-2, --cprob, 1e-5, --exec_time, 5.591000e-03, --stats, modelBlocks/stats.solver_bicg.precond_diag.mtx_nasa1824.mt_contModel.fm_n.lm_local.ap_1e-2.am_1e-4.sp_1e-2.sm_1e-4.dp_1e-2.dm_1e-4.mp_1e-2.mm_1e-4.psp_1e-2.psm_1e-2.ptsp_1e-2.ptsm_1e-2.cprob_1e-5.block_26]<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD"> Host: pbatch<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD"> Directory: experiments.new-20140522-0841-bhu0vze3/jobs/v/runModel-vvuy90rl<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">Caused by: Block task failed: 0522-4108580-000002 Block task ended prematurely<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:#4F81BD"><o:p> </o:p></span></p>
<p class="MsoPlainText"><span style="color:#4F81BD">I’ve attached the log.<o:p></o:p></span></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText" style="text-indent:.5in">The second place, if the previous one fails, is ~/.globus/coasters/*.log.<o:p></o:p></p>
<p class="MsoPlainText"><span style="color:#4F81BD">~/.globus/coasters contains the following files. No logs in my install.<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">cscript1601720472000314596.pl cscript7039282452425599503.pl cscript8162919165195912014.pl<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">cscript3466700121560325070.pl cscript747960757439884021.pl cscript8876053012113700888.pl<o:p></o:p></span></p>
<p class="MsoPlainText" style="margin-left:.5in"><span style="color:#4F81BD">cscript6877638344534390867.pl cscript7841839259853776419.pl cscript95537409038396166.pl<o:p></o:p></span></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText" style="margin-left:.5in">There is yet another place that isn't enabled by default. That's the coaster worker logs. It can be enabled by saying <profile namespace="globus" key="workerLoggingLevel">DEBUG</profile> in sites.xml. It will
produce some additional logs in ~/.globus/coasters/.<o:p></o:p></p>
<p class="MsoPlainText"><span style="color:#4F81BD">Done and attached. Please let me know if you see anything.<o:p></o:p></span></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Greg Bronevetsky<o:p></o:p></p>
<p class="MsoPlainText">Lawrence Livermore National Lab<o:p></o:p></p>
<p class="MsoPlainText">(925) 424-5756<o:p></o:p></p>
<p class="MsoPlainText">bronevetsky@llnl.gov<o:p></o:p></p>
<p class="MsoPlainText">http://greg.bronevetsky.com<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">-----Original Message-----<br>
From: Mihael Hategan [mailto:hategan@mcs.anl.gov] <br>
Sent: Thursday, May 22, 2014 12:23 AM<br>
To: Bronevetsky, Greg<br>
Cc: swift-user@ci.uchicago.edu<br>
Subject: Re: [Swift-user] Data transfer error</p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">There are a number of places. I think this got improved a bit post 0.94, but that's another story.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Anyway, first place is the swift log (<scriptName>-<runId>.log in the directory where you ran swift).<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">The second place, if the previous one fails, is ~/.globus/coasters/*.log.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">There is yet another place that isn't enabled by default. That's the coaster worker logs. It can be enabled by saying <profile namespace="globus" key="workerLoggingLevel">DEBUG</profile> in sites.xml. It will produce some additional
logs in ~/.globus/coasters/.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Please feel free to send any/all these our way. We might be able to quickly spot some obvious problems.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Mihael<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">On Wed, 2014-05-21 at 23:59 +0000, Bronevetsky, Greg wrote:<o:p></o:p></p>
<p class="MsoPlainText">> Where should I look to debug the following error?<o:p></o:p></p>
<p class="MsoPlainText">> Caused by: Block task failed: 0521-5404270-000009 Block task ended
<o:p></o:p></p>
<p class="MsoPlainText">> prematurely<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> Greg Bronevetsky<o:p></o:p></p>
<p class="MsoPlainText">> Lawrence Livermore National Lab<o:p></o:p></p>
<p class="MsoPlainText">> (925) 424-5756<o:p></o:p></p>
<p class="MsoPlainText">> <a href="mailto:bronevetsky@llnl.gov"><span style="color:windowtext;text-decoration:none">bronevetsky@llnl.gov</span></a><o:p></o:p></p>
<p class="MsoPlainText">> <a href="http://greg.bronevetsky.com"><span style="color:windowtext;text-decoration:none">http://greg.bronevetsky.com</span></a><o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> -----Original Message-----<o:p></o:p></p>
<p class="MsoPlainText">> From: Mihael Hategan [<a href="mailto:hategan@mcs.anl.gov"><span style="color:windowtext;text-decoration:none">mailto:hategan@mcs.anl.gov</span></a>]<o:p></o:p></p>
<p class="MsoPlainText">> Sent: Wednesday, May 21, 2014 2:10 PM<o:p></o:p></p>
<p class="MsoPlainText">> To: Bronevetsky, Greg<o:p></o:p></p>
<p class="MsoPlainText">> Cc: <a href="mailto:swift-user@ci.uchicago.edu"><span style="color:windowtext;text-decoration:none">swift-user@ci.uchicago.edu</span></a><o:p></o:p></p>
<p class="MsoPlainText">> Subject: Re: [Swift-user] Data transfer error<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> Hi,<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> Sorry for the late reply (to your previous mail mentioning this).<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> I don't know what the answer to your question is. It shouldn't be happening.<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> However, a directory called <scriptName>-<timestamp>-<runid>.d should be created by swift. That directory should contain one or more *.info file which may contain a few more details.<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> Mihael<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> On Wed, 2014-05-21 at 20:50 +0000, Bronevetsky, Greg wrote:<o:p></o:p></p>
<p class="MsoPlainText">> > Related question: what causes the following error?<o:p></o:p></p>
<p class="MsoPlainText">> > Caused by: Failed to move output file<o:p></o:p></p>
<p class="MsoPlainText">> > solver_bicg.precond_diag.mtx_nasa1824/FI/blocks/block_10 to shared directory I see the file in the swift work directory and the path solver_bicg.precond_diag.mtx_nasa1824/FI/blocks/ exists in the directory where I run the script.<o:p></o:p></p>
<p class="MsoPlainText">> > <o:p></o:p></p>
<p class="MsoPlainText">> > Greg Bronevetsky<o:p></o:p></p>
<p class="MsoPlainText">> > Lawrence Livermore National Lab<o:p></o:p></p>
<p class="MsoPlainText">> > (925) 424-5756<o:p></o:p></p>
<p class="MsoPlainText">> > <a href="mailto:bronevetsky@llnl.gov%3cmailto:bronevetsky@llnl.gov">
<span style="color:windowtext;text-decoration:none">bronevetsky@llnl.gov<mailto:bronevetsky@llnl.gov</span></a>><o:p></o:p></p>
<p class="MsoPlainText">> > <a href="http://greg.bronevetsky.com"><span style="color:windowtext;text-decoration:none">http://greg.bronevetsky.com</span></a><o:p></o:p></p>
<p class="MsoPlainText">> > <o:p></o:p></p>
<p class="MsoPlainText">> > From: Bronevetsky, Greg<o:p></o:p></p>
<p class="MsoPlainText">> > Sent: Tuesday, May 20, 2014 2:11 PM<o:p></o:p></p>
<p class="MsoPlainText">> > To: <a href="mailto:swift-user@ci.uchicago.edu"><span style="color:windowtext;text-decoration:none">swift-user@ci.uchicago.edu</span></a><o:p></o:p></p>
<p class="MsoPlainText">> > Subject: Data transfer error<o:p></o:p></p>
<p class="MsoPlainText">> > <o:p></o:p></p>
<p class="MsoPlainText">> > I sometimes get the following error in my Swift runs:<o:p></o:p></p>
<p class="MsoPlainText">> > Caused by: Failed to move output file <o:p></o:p></p>
<p class="MsoPlainText">> > solver_bicg.precond_diag.mtx_nasa1824/mt_fmodel/fm_n/lm_local/allStats to shared directory What causes it and how can I avoid it?<o:p></o:p></p>
<p class="MsoPlainText">> > <o:p></o:p></p>
<p class="MsoPlainText">> > Greg Bronevetsky<o:p></o:p></p>
<p class="MsoPlainText">> > Lawrence Livermore National Lab<o:p></o:p></p>
<p class="MsoPlainText">> > (925) 424-5756<o:p></o:p></p>
<p class="MsoPlainText">> > <a href="mailto:bronevetsky@llnl.gov%3cmailto:bronevetsky@llnl.gov">
<span style="color:windowtext;text-decoration:none">bronevetsky@llnl.gov<mailto:bronevetsky@llnl.gov</span></a>><o:p></o:p></p>
<p class="MsoPlainText">> > <a href="http://greg.bronevetsky.com"><span style="color:windowtext;text-decoration:none">http://greg.bronevetsky.com</span></a><o:p></o:p></p>
<p class="MsoPlainText">> > <o:p></o:p></p>
<p class="MsoPlainText">> > _______________________________________________<o:p></o:p></p>
<p class="MsoPlainText">> > Swift-user mailing list<o:p></o:p></p>
<p class="MsoPlainText">> > <a href="mailto:Swift-user@ci.uchicago.edu"><span style="color:windowtext;text-decoration:none">Swift-user@ci.uchicago.edu</span></a><o:p></o:p></p>
<p class="MsoPlainText">> > <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">
<span style="color:windowtext;text-decoration:none">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</span></a><o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
</div>
</body>
</html>