Inconsistent results on bluegene (reproduce the same problem on ANL's BG/L)

Yu-Heng Tseng YHTseng at lbl.gov
Sat Jun 3 09:52:09 CDT 2006


Hi Rob,

Thanks for checking this out. However, could you get more details? It 
is very strange that the inconsistency always occurs on nodes=2,8,16 
(when testing nodes=2,4,8,16,32,64,128). This is true for both ANL's 
BG/L and NCAR's BG/L. This is also true for different file systems. I 
believe NCAR's BG/L also uses different file system. Does that imply 
that parallel I/O is still not stable on BG/L so far? Any way to fix 
this? Thanks a lot for your investigation.

Cheers
Yu-heng
---------------------------------------------------
Yu-Heng Tseng

Computational Research Division
Lawrence Berkeley National Laboratory
One Cyclotron Rd, MS: 50F-1650
Berkeley, CA94720
YHTseng at lbl.gov
510.495.2904

----- Original Message -----
From: robl at mcs.anl.gov (Robert Latham)
Date: Friday, June 2, 2006 12:24 pm
Subject: Re: Inconsistent results on bluegene (reproduce the same 
problem on ANL's BG/L)

> On Thu, May 18, 2006 at 07:40:08AM -0700, Yu-Heng Tseng wrote:
> > The test case is run under home directory. The inconsistent 
> problem 
> > exists for both ANL's and NCAR's BG/L system. So I suspect this 
> may 
> > not be a single issue. Thank you so much if someone can solve 
> this 
> > problem.
> 
> Hi Yu-heng
> 
> Sorry for the delay in getting back to you, but I've had a chance to
> look at this a little bit now.  I too see these non-zero results for
> 16 processes on both NFS and PVFS2 (when I change the file name in 
the
> test program).
> 
> This is probably a filesystem issue.  NFS caching can cause problems
> in some cases.  Further, both PVFS2 and NFS ignores fcntl lock
> requests.  Additionally, PVFS2 on argonne's BGL is treated like a
> regular unix file system.  
> 
> ROMIO's noncontigous I/O optimizations on unix-like file systems
> perform a read-modify-write, and require file locking to eliminate 
the
> possiblity of false sharing among different processes.   ROMIO can 
try
> to work around this for nfs, but I don't think IBM's ROMIO is
> configured for that (nfs:/path/to/file results in "unsupported file
> system).  
> 
> ==rob
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
> Argonne National Labs, IL USA                B29D F333 664A 4280 315B
> 




More information about the parallel-netcdf mailing list