[Swift-devel] hang checker updates

Mihael Hategan hategan at mcs.anl.gov
Sat Jul 14 13:38:17 CDT 2012


The waiting threads are as follows:

0-17-84-2-66 local_output.h5part = sphOutArr
0-17-84-2-73 foreach myfile in local_output.h5part
0-17-84-2-72 output = local_output
0-17-84-2-63 gpg(local_forward_dat, gpg_stdout, conca_dat, concb_dat,
concc_dat, writeDataOut, h5part_files, iter, plot)
0-17-84-2-54 trace(writeDataOut)
0-17-84-2-55 writeDataOut = writeData(sphOutNameArr)
0-17-84-2-47-4-3-7 tarfiles[i] = tarfile

54 waits on writeDataOut which waits on sphOutNameArr
55 waits on sphOutNameArr
63 waits on writeDataOut who waits in sphOutNameArr
66 waits on sphOutArr
72 waits on local_output.h5part who waits on sphOutArr
73 waits on local_output who waits on sphOutArr

sphOutNameArr and sphOutArr wait on two partial closes: 88043 and 88075
Those are the if (n > NUM_SPH_RUNS) {} (line 250) and the iterate on
line 313

The first one is the problem. In particular:
0-17-84-2-47-4-3-7 tarfiles[4] = tarfile

For some reason tarfile is open. Since it should be closed by copySph
(and all other returns of copySph are closed), I can only conclude that
it's a swift bug.

Do you have a different run (just the log file with hang checker
triggered will do) to confirm?

Mihael

On Sat, 2012-07-14 at 11:33 -0500, Michael Wilde wrote:
> Sorry, should be readable now.
> 
> - Mike
> 
> ----- Original Message -----
> > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Saturday, July 14, 2012 11:04:28 AM
> > Subject: Re: [Swift-devel] hang checker updates
> > On Sat, 2012-07-14 at 06:29 -0500, Michael Wilde wrote:
> > > In the meantime, can you help diagnose the specific deadlock in the
> > > PNNL "SPH" script?
> > 
> > I can try.
> > 
> > > The files for this problem are on the CI net at:
> > >   /home/wilde/swift/support/PNNL.SPH.deadlock.2012.0712
> > 
> > scp: /home/wilde/swift/support/PNNL.SPH.deadlock.2012.0712: Permission
> > denied
> 





More information about the Swift-devel mailing list