[Swift-devel] Coaster persistent service issues - logs

Michael Wilde wilde at mcs.anl.gov
Tue Aug 31 12:14:21 CDT 2010


----- Forwarded Message -----
From: "David Kelly" <dk0966 at cs.ship.edu>
To: "Michael Wilde" <wilde at mcs.anl.gov>
Cc: "Jonathan Monette" <jon.monette at gmail.com>, "Justin Wozniak" <wozniak at mcs.anl.gov>, "Mihael Hategan" <hategan at mcs.anl.gov>
Sent: Sunday, August 29, 2010 10:32:07 AM GMT -06:00 US/Canada Central
Subject: Re: change skype call time today - and some to-do notes

Hello all, 

A few things I've noticed while trying out various coaster configurations this weekend: 

Had similar problems with the -nosec option. Here is the output I got: 

davidk at churn:~/cog/modules/swift/dist/swift-svn/bin$ coaster-service -nosec 
Error loading credential: [JGLOBUS-10] Expired credentials (DC=org,DC=doegrids,OU=People,CN=David Kelly 16830,CN=753950975). 
Error loading credential 
org.globus.gsi.GlobusCredentialException: [JGLOBUS-10] Expired credentials (DC=org,DC=doegrids,OU=People,CN=David Kelly 16830,CN=753950975). 
at org.globus.gsi.GlobusCredential.verify(GlobusCredential.java:321) 
at org.globus.gsi.GlobusCredential.reloadDefaultCredential(GlobusCredential.java:593) 
at org.globus.gsi.GlobusCredential.getDefaultCredential(GlobusCredential.java:575) 
at org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:73) 

I tested multiple connections when using a coasters-persistent+active mode. That seemed to have worked fine, with each new swift connection waiting for the previous to finish. I noticed there were some java exceptions in the log files: 

org.globus.cog.karajan.workflow.service.ReplyTimeoutException 
at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280) 
at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285) 
at java.util.TimerThread.mainLoop(Timer.java:512) 
at java.util.TimerThread.run(Timer.java:462) 
Channel IOException 
java.net.SocketException: Broken pipe 
at java.net.SocketOutputStream.socketWrite0(Native Method) 
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) 
at java.net.SocketOutputStream.write(SocketOutputStream.java:124) 
at org.globus.gsi.gssapi.net.impl.GSIGssOutputStream.writeToken(GSIGssOutputStream.java:61) 
at org.globus.gsi.gssapi.net.impl.GSIGssOutputStream.flush(GSIGssOutputStream.java:45) 
at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.send(AbstractStreamKarajanChannel.java:298) 
at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:247) 

Not sure if this important or not, but I will include the logs. 

I can't quite get coasters-persistent working in passive mode. I am not sure if this if a configuration issue, a swift issue, or operator error. Here is what I am trying to do: 

sites.xml: 
<config> 
<pool handle="churn"> 
<execution provider="coaster-persistent" url=" churn.mcs.anl.gov " jobmanager="local:local"/> 
<profile namespace="globus" key="workerManager">passive</profile> 
<profile namespace="globus" key="workersPerNode">1</profile> 
<profile namespace="globus" key="maxTime">3500</profile> 
<profile namespace="globus" key="slots">1</profile> 
<profile namespace="globus" key="nodeGranularity">1</profile> 
<profile namespace="globus" key="maxNodes">1</profile> 
<filesystem provider="local" url="none" /> 
<profile key="jobThrottle" namespace="karajan">.31</profile> 
<profile namespace="karajan" key="initialScore">10000</profile> 
<workdirectory>/home/davidk/swiftwork/churn</workdirectory> 
</pool> 
</config> 

I run grid-proxy init on the submit host (login*. mcs.anl.gov ) and on the remote host ( churn.mcs.anl.gov ). From churn I run coaster-service. When I run the catsn.swift script on login, I notice these kind of errors in the coaster-service output: 

GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 
GSSSChannel-null(1)[7990655: {}] REQ: Handler(HEARTBEAT) 

These messages seem to repeat several times per second and never stop. The script never finishes. The configurations and log files attached. 

Once I can get this configuration working manually, I will start working on a script to automate this process for multiple hosts to make things a little easier. 

David 

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: coaster-logs.tar.gz
Type: application/x-gzip
Size: 48618 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100831/b9ddc1d4/attachment.bin>


More information about the Swift-devel mailing list