How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both?<br><br><div class="gmail_quote">On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan <span dir="ltr"><<a href="mailto:svemalayan@yahoo.com">svemalayan@yahoo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="color:#000;background-color:#fff;font-family:times new roman,new york,times,serif;font-size:12pt"><div>
<span>Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file.</span></div><div><br><span></span></div><div><span>Thank you</span></div><div><span>Emalayan<br></span></div><div><br></div> <div style="font-family:times new roman,new york,times,serif;font-size:12pt">
<div style="font-family:times new roman,new york,times,serif;font-size:12pt"> <div dir="ltr"> <font face="Arial"><div class="im"> <hr size="1"> <b><span style="font-weight:bold">From:</span></b> Ketan Maheshwari <<a href="mailto:ketancmaheshwari@gmail.com" target="_blank">ketancmaheshwari@gmail.com</a>><br>
<b><span style="font-weight:bold">To:</span></b> Emalayan Vairavanathan <<a href="mailto:svemalayan@yahoo.com" target="_blank">svemalayan@yahoo.com</a>> <br></div><b><span style="font-weight:bold">Cc:</span></b> Jonathan Monette <<a href="mailto:jonmon@mcs.anl.gov" target="_blank">jonmon@mcs.anl.gov</a>>; swift user <<a href="mailto:swift-user@ci.uchicago.edu" target="_blank">swift-user@ci.uchicago.edu</a>> <br>
<b><span style="font-weight:bold">Sent:</span></b> Monday, 23 January 2012 11:36 AM<br> <b><span style="font-weight:bold">Subject:</span></b> Re: [Swift-user] Montage+Swift+Coasters<br> </font> </div><div><div></div><div class="h5">
<br><div>Emalayan,<div><br></div><div>Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home<br><br><div>On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan <span dir="ltr"><<a rel="nofollow" href="mailto:svemalayan@yahoo.com" target="_blank">svemalayan@yahoo.com</a>></span> wrote:<br>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="color:#000;background-color:#fff;font-family:times new roman,new york,times,serif;font-size:12pt"><div>
<span>Jon,</span></div><div><br><span></span></div><div><span>Please find the detail below and let me know if you have any questions about my setup.<br></span></div><div><br><span></span></div><div><span>Thank you</span></div>
<div><span>Emalayan<br></span></div><div><br></div><div>==========================================================<br><span></span></div><div style="font-weight:bold"><span>site.xml</span></div><div style="font-weight:bold">
<br><span></span></div><div><span><config><br><pool handle="localhost"><br> <execution provider="coaster-persistent" url="<a rel="nofollow" href="http://localhost:1984" target="_blank">http://localhost:1984</a>" jobmanager="local:local"/><br>
<profile namespace="globus" key="workerManager">passive</profile><br><br> <profile namespace="globus"
key="workersPerNode">4</profile><br> <profile namespace="globus" key="maxTime">100000</profile><br> <profile namespace="globus" key="lowOverAllocation">100</profile><br>
<profile namespace="globus" key="highOverAllocation">100</profile><br> <profile namespace="globus" key="slots">100</profile><br> <profile namespace="globus" key="nodeGranularity">1</profile><br>
<profile namespace="globus" key="maxNodes">10</profile><br> <profile namespace="karajan" key="jobThrottle">25.00</profile><br> <profile namespace="karajan" key="initialScore">10000</profile><br>
<profile namespace="swift" key="stagingMethod">proxy</profile><br> <filesystem
provider="local"/><br> <workdirectory>/tmp/swift.workdir</workdirectory><br> </pool><br></config><br></span></div><div><br></div><div>=======================================================</div>
<div><br></div><div><span style="font-weight:bold">tc</span></div><div><br></div><div>localhost sh /bin/sh null null null<br>localhost cat /bin/cat null null null<br>localhost echo /bin/echo null null null<br>localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null<br>
localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null<br>localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null<br>localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null<br>
localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null<br>localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null<br>localhost mDiffExec_wrap
/home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null<br>localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null<br>localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null<br>
localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null<br>localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul<br><br>localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null<br>
localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null<br>localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null<br>
localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null<br>localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null
null<br>localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null</div><div><br></div><div>=================================================================</div><div><br>
</div><div><span style="font-weight:bold">cf</span><br></div><div><br>wrapperlog.always.transfer=true<br>sitedir.keep=true<br>execution.retries=1<br>lazy.errors=true<br>status.mode=provider<br>use.provider.staging=true<br>
provider.staging.pin.swiftfiles=false<br>foreach.max.threads=100<br>provenance.log=false</div><div><br></div><div>===================================================================<br></div><div><div><br></div>
</div><div style="font-family:times new roman,new york,times,serif;font-size:12pt"><div> </div><div style="font-family:times new roman,new york,times,serif;font-size:12pt"><div> <div dir="ltr"> <font face="Arial"> <hr size="1">
<b><span style="font-weight:bold">From:</span></b> Jonathan Monette
<<a rel="nofollow" href="mailto:jonmon@mcs.anl.gov" target="_blank">jonmon@mcs.anl.gov</a>><br> <b><span style="font-weight:bold">To:</span></b> Ketan Maheshwari <<a rel="nofollow" href="mailto:ketancmaheshwari@gmail.com" target="_blank">ketancmaheshwari@gmail.com</a>> <br>
<b><span style="font-weight:bold">Cc:</span></b> Emalayan Vairavanathan <<a rel="nofollow" href="mailto:svemalayan@yahoo.com" target="_blank">svemalayan@yahoo.com</a>>; swift user <<a rel="nofollow" href="mailto:swift-user@ci.uchicago.edu" target="_blank">swift-user@ci.uchicago.edu</a>> <br>
<b><span style="font-weight:bold">Sent:</span></b> Monday, 23 January 2012 11:08 AM<br> <b><span style="font-weight:bold">Subject:</span></b> Re: [Swift-user] Montage+Swift+Coasters<br> </font> </div></div><div><div></div>
<div> <br><div><div>Emalayan,<div> So I have ran the scripts with some of my own test cases and do not see it failing. Could you provide your config files? Please provide the tc, sites, and config file(if you use a config file).</div>
<div><br></div><div><div><div>On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote:</div><br><blockquote type="cite">Emalayan,<div><br></div><div>I
would check all the mappers and the resulting paths in the Swift source. </div><div><br></div><div>Also try running the failed job something like this: </div><div><br></div><div>cd <swift.workdir>/<span style="font-family:times,serif;font-size:16px;font-style:italic;background-color:rgb(255,255,255)">SwiftMontage-20120119-1749-</span><span style="font-family:times,serif;font-size:16px;font-style:italic;background-color:rgb(255,255,255)">rjshh1r9/jobs/b/mConcatFit-</span><span style="font-family:times,serif;font-size:16px;font-style:italic;background-color:rgb(255,255,255)">b1sa4vlk</span></div>
<div><font size="3" face="'times new roman', 'new york', times, serif"><i><br></i></font></div><div><span style="font-family:times,serif;font-size:16px;font-style:italic;background-color:rgb(255,255,255)">mConcatFit </span><span style="background-color:rgb(255,255,255);font-family:times,serif;font-size:16px;font-style:italic">_concurrent/status_tbl-</span><span style="background-color:rgb(255,255,255);font-family:times,serif;font-size:16px;font-style:italic">7a8340c2-045d-4039-a77c-</span><span style="background-color:rgb(255,255,255);font-family:times,serif;font-size:16px;font-style:italic">00429b78d9c9-5 fits.tbl stat_dir</span></div>
<div><br></div><div>error 520 indicates workers are not able to reach the data.</div><div><br></div><div>Also check if swift.workdir is writable on the site by the worker nodes.</div><div><br><div>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan <span dir="ltr"><<a rel="nofollow" href="mailto:svemalayan@yahoo.com" target="_blank">svemalayan@yahoo.com</a>></span> wrote:<br>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="color:#000;background-color:#fff;font-family:times new roman,new york,times,serif;font-size:12pt"><div>
<span>Hi Ketan,</span></div><div><br><span></span></div><div><span>This was with </span><span style="font-weight:bold">swift-0.92.1.</span><span><span> Now I have downloaded the latest swift 0.93 and </span>getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift)<br>
</span></div><div><br><span></span></div><div><span>Please let me know if you have any idea. <br></span></div><div><br><span></span></div><div><span>Regards</span></div><div><span>Emalayan<br></span></div><div><br><span></span></div>
<div><span>===============================================================================================<br></span></div><div><span><span style="font-style:italic">Swift 0.93 swift-r5501 cog-r3350</span><br style="font-style:italic">
<br style="font-style:italic"><span style="font-style:italic">RunID: 20120119-1749-rjshh1r9</span><br style="font-style:italic"><span style="font-style:italic"> (input): found 10 files</span><br style="font-style:italic">
<span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:20 -0800</span><br style="font-style:italic"><span style="font-style:italic">Find: <a rel="nofollow" href="http://localhost:1984/" target="_blank">http://localhost:1984</a></span><br style="font-style:italic">
<span style="font-style:italic">Find: keepalive(120), reconnect - <a rel="nofollow" href="http://localhost:1984/" target="_blank">http://localhost:1984</a></span><br style="font-style:italic"><span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9</span><br style="font-style:italic">
<span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1</span><br style="font-style:italic"><span style="font-style:italic">Progress: time:
Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished successfully:7</span><br style="font-style:italic"><span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished successfully:10</span><br style="font-style:italic">
<span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 Submitting:11 Submitted:6 Finished successfully:12</span><br style="font-style:italic"><span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 Active:6 Stage out:2 Finished successfully:17</span><br style="font-style:italic">
<span style="font-style:italic">Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished successfully:30</span><br style="font-style:italic"><span style="font-style:italic">Exception in mConcatFit:</span><br style="font-style:italic">
<span style="font-style:italic">Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir]</span><br style="font-style:italic"><span style="font-style:italic">Host: localhost</span><br style="font-style:italic">
<span style="font-style:italic">Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk</span><br style="font-style:italic"><span style="font-style:italic">- - -</span><br style="font-style:italic"><br style="font-style:italic">
<span style="font-style:italic">Caused by: null</span><br style="font-style:italic"><span style="font-style:italic">Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520</span><br style="font-style:italic">
<span style="font-style:italic">Execution failed:</span><br style="font-style:italic"><span style="font-style:italic"> back_list:Table
= org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies</span><br></span></div><div><div><br></div> </div><div style="font-family:times new roman,new york,times,serif;font-size:12pt">
<div> </div><div style="font-family:times new roman,new york,times,serif;font-size:12pt"><div> <div dir="ltr"> <font face="Arial"> <hr size="1"> <b><span style="font-weight:bold">From:</span></b> Ketan Maheshwari <<a rel="nofollow" href="mailto:ketancmaheshwari@gmail.com" target="_blank">ketancmaheshwari@gmail.com</a>><br>
<b><span style="font-weight:bold">To:</span></b> Emalayan Vairavanathan <<a rel="nofollow" href="mailto:svemalayan@yahoo.com" target="_blank">svemalayan@yahoo.com</a>> <br><b><span style="font-weight:bold">Cc:</span></b> swift user <<a rel="nofollow" href="mailto:swift-user@ci.uchicago.edu" target="_blank">swift-user@ci.uchicago.edu</a>> <br>
<b><span style="font-weight:bold">Sent:</span></b> Thursday, 19 January 2012 4:49 PM<br> <b><span style="font-weight:bold">Subject:</span></b> Re: [Swift-user] Montage+Swift+Coasters<br> </font> </div></div><div><div></div>
<div> <br><div>Emalayan,<div><br></div><div>From your
symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages?<br>
<br>Also, what version of Swift is this?</div><div><br></div><div>Regards,</div><div>Ketan</div><div><br><div>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan <span dir="ltr"><<a rel="nofollow" href="mailto:svemalayan@yahoo.com" target="_blank">svemalayan@yahoo.com</a>></span> wrote:<br>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="color:#000;background-color:#fff;font-family:times new roman,new york,times,serif;font-size:12pt"><div>
<span>Dear All,</span></div>
<div><br>
<span></span></div>
<div><span>I have a problem in running Montage with Coasters (<span style="font-style:italic">in our local cluster - no batch schedulers</span>). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ?<br>
</span></div><div><span><br></span></div><div><span>Everything works fine when I try to run the same montage-scripts with swift on a single machine.<br></span></div><div><br><span></span></div><div><span>Thank you</span></div>
<div><span>Emalayan<br></span></div><div><span><br>
</span></div>
<div><br>
<span></span></div>
<div style="font-style:italic"><span>2012-01-19
15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): handling reply
timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206,
now=120119-153809.207<br>
<a rel="nofollow">2012-01-19 15</a>:38:09,207-0800 INFO Command Command(119, HEARTBEAT): re-sending<br>
<a rel="nofollow">2012-01-19 15</a>:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault was: Reply timeout<br>
org.globus.cog.karajan.workflow.service.ReplyTimeoutException<br>
at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288)<br>
at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293)<br>
at java.util.TimerThread.mainLoop(Timer.java:534)<br>
at java.util.TimerThread.run(Timer.java:484)</span></div>
</div></div><br>_______________________________________________<br>
Swift-user mailing list<br>
<a rel="nofollow" href="mailto:Swift-user@ci.uchicago.edu" target="_blank">Swift-user@ci.uchicago.edu</a><br>
<a rel="nofollow" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br></blockquote></div><br><br clear="all"><div>
<br></div>-- <br>
Ketan<br><br><br>
</div>
</div><br><br> </div></div></div> </div> </div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Ketan<br><br><br>
</div>
_______________________________________________<br>Swift-user mailing list<br><a rel="nofollow" href="mailto:Swift-user@ci.uchicago.edu" target="_blank">Swift-user@ci.uchicago.edu</a><br><a rel="nofollow" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a></blockquote>
</div><br></div></div></div><br><br> </div></div></div> </div> </div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Ketan<br><br><br>
</div>
</div><br><br> </div></div></div> </div> </div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Ketan<br><br><br>