[Swift-devel] Swift unresponsive while using local provider.

Michael Wilde wilde at mcs.anl.gov
Fri Jun 17 15:03:25 CDT 2011


This would be a great thing to build into the test suite for any test run that the suite is about to cancel for exceeding its max run time: capture the stack trace, so that if its a deadlock, we can do some diagnosis right form the test results.

Can we add some Swift commands to dump Swifts thread/future status as well?

- Mike


----- Original Message -----
> do "jstack -l <pid_of_swift_java_process>" whenever it happens and
> send
> the output.
> 
> 
> 
> On Fri, 2011-06-17 at 14:48 -0500, David Kelly wrote:
> > I saw similar things on my laptop (4 gb ram) this weekend when I was
> > testing the galaxy demo scripts using the local provider. I was
> > using
> > trunk. In the output I would see things like "no activity for 10s"
> > and
> > it just would sit there and do nothing until I manually killed it.
> > But
> > most of the time it would work fine. I wrote a little shell script
> > that would repeatedly run it until it hung. Then I was talking to
> > Jon
> > about this and he saw something similar with his montage work. He
> > thought it might be related to a configuration issue - that either
> > wrapper.parameter.mode=files or status.mode=provider should be set.
> >
> > I can send my scripts as well if you need some help in tracking this
> > down.
> >
> > David
> >
> > On Fri, Jun 17, 2011 at 2:38 PM, Michael Wilde <wilde at mcs.anl.gov>
> > wrote:
> >         Alberto, how long are you letting it run for, and under what
> >         environment? if you are running on your laptop, how much RAM
> >         do you have? Its possible that you are seeing paging delays
> >         if you are running the Swift Java app with too little
> >         memory.
> >
> >
> >         Also, are you running trunk or 0.92.1? You should compare
> >         the
> >         two.
> >
> >
> >         Its *possible* that this simple test is hanging under recent
> >         trunk mods, but its more likely that this is some kind of
> >         resource shortage.
> >
> >
> >         Can you run this on one of the Swift lab machines bridled or
> >         communcado, or better yet on the MCS compute servers, or a
> >         PADS worker node (which you can get with qsub -I on pads)?
> >
> >
> >         Look at Swift under the "top" command to see if Swift is
> >         running and slow, or is hung.
> >
> >
> >         Stop by and we can discuss in more detail.
> >
> >
> >         - Mike
> >
> >
> >
> >         ______________________________________________________________
> >
> >                 When I run the following SwiftScript using suite.sh,
> >                 the report shows an odd behavior, most of the time
> >                 it
> >                 times out, but once in a while it passes, however
> >                 this
> >                 outcome is completely random, since sometimes that
> >                 test has passed 3 times in a row, and all of the
> >                 sudden it fails.
> >                 This is my script:
> >
> >
> >                 type messagefile;
> >
> >
> >                 app (messagefile t) greeting (string s[]) {
> >                         echo s[0] s[1] s[2] stdout=@filename(t);
> >                 }
> >
> >
> >                 messagefile outfile <"q5out.txt">;
> >
> >
> >                 string words[] = ["how","are","you"];
> >
> >
> >                 outfile = greeting(words);
> >
> >
> >
> >
> >
> >
> >
> >                 Swift.properties contents:
> >
> >
> >                 $ cat swift.properties
> >                 wrapperlog.always.transfer=true
> >                 sitedir.keep=true
> >                 execution.retries=0
> >                 lazy.errors=false
> >                 status.mode=provider
> >                 use.provider.staging=false
> >                 provider.staging.pin.swiftfiles=false
> >
> >
> >                 Sites.template.xml contents:
> >
> >
> >                 $ cat sites.template.xml
> >                 <config>
> >                   <pool handle="localhost">
> >                     <filesystem provider="local" />
> >                     <execution provider="coaster"
> >                 jobmanager="local:local"/>
> >                     <profile namespace="globus"
> >                 key="internalHostname">127.0.0.1</profile>
> >                     <profile namespace="karajan"
> >                  key="jobthrottle">1000</profile>
> >                     <profile namespace="karajan"
> >                  key="initialScore">10000</profile>
> >                     <profile namespace="globus"
> >                 key="jobsPerNode">4</profile>
> >                     <profile namespace="globus"
> >                 key="slots">8</profile>
> >                     <profile namespace="globus"
> >                 key="maxTime">1000</profile>
> >                     <profile namespace="globus"
> >                 key="nodeGranularity">1</profile>
> >                     <profile namespace="globus"
> >                 key="maxNodes">4</profile>
> >                     <workdirectory>/tmp</workdirectory>
> >                   </pool>
> >                 </config>
> >
> >
> >                 -Alberto
> >
> >
> >                 _______________________________________________
> >                 Swift-devel mailing list
> >                 Swift-devel at ci.uchicago.edu
> >                 http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> >
> >         --
> >         Michael Wilde
> >         Computation Institute, University of Chicago
> >         Mathematics and Computer Science Division
> >         Argonne National Laboratory
> >
> >
> >
> >         _______________________________________________
> >         Swift-devel mailing list
> >         Swift-devel at ci.uchicago.edu
> >         http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list