Mihael,<div><br></div><div>I've been updating the wiki page with the test results you listed. So far I tested for 7 OSG sites out of which 2 failed and 5 worked. I've uploaded logs for each test that you can check from the link alongside each test.<br>

<br>I'll carry on with further tests.</div><div><br></div><div>Regards,</div><div>Ketan</div><div><br><div class="gmail_quote">On Sun, Oct 16, 2011 at 3:06 PM, Mihael Hategan <span dir="ltr"><<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">There are craploads of errors in there of all kinds and sorts, but very<br>

few of them are actual transfer problems. It looks more like<br>

gridftp/filesystem configuration issues.<br>

<br>

I attached a sorted list of exception.<br>

<br>

However, this is irrelevant. It looks like so far we keep running this<br>

test that clearly doesn't work and hope that it will work. That's silly.<br>

We need to figure out each problem individually and fix things one by<br>

one.<br>

<br>

So here's my proposal. We list all the problems that can be seen in that<br>

log and try to fix them in order. And we do not re-run the whole thing<br>

unless we actually solved at least one problem. Also, we sync<br>

periodically on what was done (i.e. we keep a list that we update<br>

immediately after something was done about an item). Also, before doing<br>

an integration test after a problem is fixed, we do a test for that<br>

specific problem/on a specific site only. There is way too much noise in<br>

these big runs and that makes it very hard to see what is happening.<br>

<br>

So here's a first list:<br>

<a href="http://www.ci.uchicago.edu/wiki/bin/view/SWFT/OSGTesting" target="_blank">http://www.ci.uchicago.edu/wiki/bin/view/SWFT/OSGTesting</a><br>

<div><div></div><div class="h5"><br>

<br>

<br>

On Sun, 2011-10-16 at 09:54 -0500, Ketan Maheshwari wrote:<br>

> Hello,<br>

><br>

><br>

> While running an Extenci workflow on OSG with persistent coasters<br>

> (multiple coasters services, 1 per OSG site) and gsiftp staging, I am<br>

> facing some gridftp related issues. Following are some details of the<br>

> run:<br>

><br>

><br>

> A set of 15 OSG sites were selected after testing them for being<br>

> responsive ('greensites'). I performed a separate guc test on these<br>

> sites which seemed to have succeeded for each site (200MB roundtrip<br>

> transfer in 7 mins for all sites).<br>

><br>

><br>

> However, while running my workflow from Swift, many of these transfers<br>

> fail showing a variety of errors, most pertaining to the data<br>

> transfers.<br>

><br>

><br>

> I noticed, that these transfers fail irrespective of data sizes (250K<br>

> - 150M) and also seems to fail intermittently for different sites.<br>

><br>

><br>

> The log for this run is<br>

> here: <a href="http://www.mcs.anl.gov/~ketan/postproc-gridftp-20111013-2324-5qzebq16.log" target="_blank">http://www.mcs.anl.gov/~ketan/postproc-gridftp-20111013-2324-5qzebq16.log</a><br>

><br>

><br>

> I am providing a 7G of Heap space at Swift commandline and the host<br>

> has 50G of total memory.<br>

><br>

><br>

> Any ideas?<br>

><br>

><br>

><br>

><br>

> Regards,<br>

> --<br>

> Ketan<br>

><br>

><br>

><br>

</div></div>> _______________________________________________<br>

> Swift-devel mailing list<br>

> <a href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a><br>

> <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel</a><br>

<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>Ketan<br><br><br>

</div>