[Swift-devel] persistent coasters on OSG

Ketan Maheshwari ketancmaheshwari at gmail.com
Mon Aug 22 13:45:32 CDT 2011


Hi Mihael, All,

I am trying to test the persistent coasters setup with OSG sites from
communicado and see some intermittent exceptions/ jobs failed errors which
eventually succeed on retries.

The exceptions I see from the log are mostly low-level network exceptions:
(Channel Exceptions, Broken Pipe SocketExceptions, Timeout, etc.).

The runs that I tried were incremental catsn runs with n=1,10,50 and 100 and
data.txt=100MB and 200MB.

The only run that had the above mentioned errors were the ones with n=100
and data.txt=200MB.

The other runs completed without any errors.

I used just one OSG site for these runs.

Attaching the sites, log files and a file that contains exception messages
grepped from log files.

Any clues as to harden this, I had about 5 errors on today's run and about
11 on a similar run last week.


Regards,
-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110822/6391b509/attachment.html>
-------------- next part --------------
2011-08-22 11:31:26,251-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-6jxfpsek - Application exception: Task failed: Connection to worker lost
java.net.SocketException: Broken pipe
2011-08-22 11:31:50,808-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-ajxfpsek - Application exception: Task failed: Connection to worker lost
java.net.SocketException: Broken pipe
2011-08-22 11:35:38,899-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-ljxfpsek - Application exception: Task failed: Connection to worker lost
java.net.SocketException: Broken pipe
2011-08-22 11:35:57,531-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-jjxfpsek - Application exception: Task failed: Connection to worker lost
java.net.SocketException: Broken pipe
2011-08-22 11:40:12,334-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-yjxfpsek - Application exception: Task failed: Connection to worker lost
java.net.SocketException: Broken pipe
2011-08-22 11:42:16,427-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=cat-6kxfpsek - Application exception: Task failed: Connection to worker lost
java.net.SocketException: Broken pipe
2011-08-22 11:28:29,903-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:28:29,913-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:28:29,914-0500 INFO  ChannelManager Channel exception handled
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
2011-08-22 11:30:40,430-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:30:40,431-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:30:40,431-0500 INFO  ChannelManager Channel exception handled
2011-08-22 11:31:26,235-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:31:26,235-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:31:26,237-0500 INFO  ChannelManager Channel exception handled
2011-08-22 11:31:50,780-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:31:50,781-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:31:50,783-0500 INFO  ChannelManager Channel exception handled
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
2011-08-22 11:35:38,887-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:35:38,887-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:35:38,889-0500 INFO  ChannelManager Channel exception handled
2011-08-22 11:35:57,527-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:35:57,527-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:35:57,528-0500 INFO  ChannelManager Channel exception handled
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
2011-08-22 11:40:12,320-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:40:12,321-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:40:12,322-0500 INFO  ChannelManager Channel exception handled
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
2011-08-22 11:42:16,407-0500 INFO  AbstractStreamKarajanChannel Channel IOException
java.net.SocketException: Broken pipe
2011-08-22 11:42:16,407-0500 INFO  ChannelManager Handling channel exception
java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
2011-08-22 11:42:16,410-0500 INFO  ChannelManager Channel exception handled
-------------- next part --------------
A non-text attachment was scrubbed...
Name: service-0.out
Type: application/octet-stream
Size: 277424 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110822/6391b509/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: catsn-catsn-osg-n50-d200mb.log
Type: application/octet-stream
Size: 117353 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110822/6391b509/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: swift.log
Type: application/octet-stream
Size: 819984 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110822/6391b509/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sites.grid-ps.xml
Type: text/xml
Size: 616 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110822/6391b509/attachment.xml>


More information about the Swift-devel mailing list