[Swift-devel] Swift unresponsive while using local provider.
Michael Wilde
wilde at mcs.anl.gov
Fri Jun 17 15:03:25 CDT 2011
This would be a great thing to build into the test suite for any test run that the suite is about to cancel for exceeding its max run time: capture the stack trace, so that if its a deadlock, we can do some diagnosis right form the test results.
Can we add some Swift commands to dump Swifts thread/future status as well?
- Mike
----- Original Message -----
> do "jstack -l <pid_of_swift_java_process>" whenever it happens and
> send
> the output.
>
>
>
> On Fri, 2011-06-17 at 14:48 -0500, David Kelly wrote:
> > I saw similar things on my laptop (4 gb ram) this weekend when I was
> > testing the galaxy demo scripts using the local provider. I was
> > using
> > trunk. In the output I would see things like "no activity for 10s"
> > and
> > it just would sit there and do nothing until I manually killed it.
> > But
> > most of the time it would work fine. I wrote a little shell script
> > that would repeatedly run it until it hung. Then I was talking to
> > Jon
> > about this and he saw something similar with his montage work. He
> > thought it might be related to a configuration issue - that either
> > wrapper.parameter.mode=files or status.mode=provider should be set.
> >
> > I can send my scripts as well if you need some help in tracking this
> > down.
> >
> > David
> >
> > On Fri, Jun 17, 2011 at 2:38 PM, Michael Wilde <wilde at mcs.anl.gov>
> > wrote:
> > Alberto, how long are you letting it run for, and under what
> > environment? if you are running on your laptop, how much RAM
> > do you have? Its possible that you are seeing paging delays
> > if you are running the Swift Java app with too little
> > memory.
> >
> >
> > Also, are you running trunk or 0.92.1? You should compare
> > the
> > two.
> >
> >
> > Its *possible* that this simple test is hanging under recent
> > trunk mods, but its more likely that this is some kind of
> > resource shortage.
> >
> >
> > Can you run this on one of the Swift lab machines bridled or
> > communcado, or better yet on the MCS compute servers, or a
> > PADS worker node (which you can get with qsub -I on pads)?
> >
> >
> > Look at Swift under the "top" command to see if Swift is
> > running and slow, or is hung.
> >
> >
> > Stop by and we can discuss in more detail.
> >
> >
> > - Mike
> >
> >
> >
> > ______________________________________________________________
> >
> > When I run the following SwiftScript using suite.sh,
> > the report shows an odd behavior, most of the time
> > it
> > times out, but once in a while it passes, however
> > this
> > outcome is completely random, since sometimes that
> > test has passed 3 times in a row, and all of the
> > sudden it fails.
> > This is my script:
> >
> >
> > type messagefile;
> >
> >
> > app (messagefile t) greeting (string s[]) {
> > echo s[0] s[1] s[2] stdout=@filename(t);
> > }
> >
> >
> > messagefile outfile <"q5out.txt">;
> >
> >
> > string words[] = ["how","are","you"];
> >
> >
> > outfile = greeting(words);
> >
> >
> >
> >
> >
> >
> >
> > Swift.properties contents:
> >
> >
> > $ cat swift.properties
> > wrapperlog.always.transfer=true
> > sitedir.keep=true
> > execution.retries=0
> > lazy.errors=false
> > status.mode=provider
> > use.provider.staging=false
> > provider.staging.pin.swiftfiles=false
> >
> >
> > Sites.template.xml contents:
> >
> >
> > $ cat sites.template.xml
> > <config>
> > <pool handle="localhost">
> > <filesystem provider="local" />
> > <execution provider="coaster"
> > jobmanager="local:local"/>
> > <profile namespace="globus"
> > key="internalHostname">127.0.0.1</profile>
> > <profile namespace="karajan"
> > key="jobthrottle">1000</profile>
> > <profile namespace="karajan"
> > key="initialScore">10000</profile>
> > <profile namespace="globus"
> > key="jobsPerNode">4</profile>
> > <profile namespace="globus"
> > key="slots">8</profile>
> > <profile namespace="globus"
> > key="maxTime">1000</profile>
> > <profile namespace="globus"
> > key="nodeGranularity">1</profile>
> > <profile namespace="globus"
> > key="maxNodes">4</profile>
> > <workdirectory>/tmp</workdirectory>
> > </pool>
> > </config>
> >
> >
> > -Alberto
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list