Inconsistent results on bluegene (reproduce the same problem on ANL's BG/L)
Yu-Heng Tseng
YHTseng at lbl.gov
Sat Jun 3 09:52:09 CDT 2006
Hi Rob,
Thanks for checking this out. However, could you get more details? It
is very strange that the inconsistency always occurs on nodes=2,8,16
(when testing nodes=2,4,8,16,32,64,128). This is true for both ANL's
BG/L and NCAR's BG/L. This is also true for different file systems. I
believe NCAR's BG/L also uses different file system. Does that imply
that parallel I/O is still not stable on BG/L so far? Any way to fix
this? Thanks a lot for your investigation.
Cheers
Yu-heng
---------------------------------------------------
Yu-Heng Tseng
Computational Research Division
Lawrence Berkeley National Laboratory
One Cyclotron Rd, MS: 50F-1650
Berkeley, CA94720
YHTseng at lbl.gov
510.495.2904
----- Original Message -----
From: robl at mcs.anl.gov (Robert Latham)
Date: Friday, June 2, 2006 12:24 pm
Subject: Re: Inconsistent results on bluegene (reproduce the same
problem on ANL's BG/L)
> On Thu, May 18, 2006 at 07:40:08AM -0700, Yu-Heng Tseng wrote:
> > The test case is run under home directory. The inconsistent
> problem
> > exists for both ANL's and NCAR's BG/L system. So I suspect this
> may
> > not be a single issue. Thank you so much if someone can solve
> this
> > problem.
>
> Hi Yu-heng
>
> Sorry for the delay in getting back to you, but I've had a chance to
> look at this a little bit now. I too see these non-zero results for
> 16 processes on both NFS and PVFS2 (when I change the file name in
the
> test program).
>
> This is probably a filesystem issue. NFS caching can cause problems
> in some cases. Further, both PVFS2 and NFS ignores fcntl lock
> requests. Additionally, PVFS2 on argonne's BGL is treated like a
> regular unix file system.
>
> ROMIO's noncontigous I/O optimizations on unix-like file systems
> perform a read-modify-write, and require file locking to eliminate
the
> possiblity of false sharing among different processes. ROMIO can
try
> to work around this for nfs, but I don't think IBM's ROMIO is
> configured for that (nfs:/path/to/file results in "unsupported file
> system).
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
> Argonne National Labs, IL USA B29D F333 664A 4280 315B
>
More information about the parallel-netcdf
mailing list