Inconsistent results on bluegene (reproduce the same problem on ANL's BG/L)

Tue Jun 6 10:09:01 CDT 2006

On Sat, Jun 03, 2006 at 07:52:09AM -0700, Yu-Heng Tseng wrote:
> Thanks for checking this out. However, could you get more details? It 
> is very strange that the inconsistency always occurs on nodes=2,8,16 
> (when testing nodes=2,4,8,16,32,64,128). This is true for both ANL's 
> BG/L and NCAR's BG/L. This is also true for different file systems. I 
> believe NCAR's BG/L also uses different file system. Does that imply 
> that parallel I/O is still not stable on BG/L so far? Any way to fix 
> this? Thanks a lot for your investigation.

Well, I don't know how Lustre or GPFS file systems are exported to BGL
compute nodes.  In Argonne's case, both the NFS-exported home
directories and PVFS2 are treated by the MPI-IO implementation as a
unix file system.   Because both file systems lack certain unix-like
characteristics (caching and locking behaviors), treating them like a
unix file system will work a lot of the time, but not always. 

The fastest way to fix this is for IBM to rebuild their MPI-IO with
support for NFS.  As of V1R2M1_020_2006-060110, there is no NFS
support in the MPI-IO implementation.   I've asked our BGL guys about
this.  native support for PVFS2 is a bit harder than a recompile, but
we're working on it. 

In the meantime, do try with real applications.  There are many
workloads (as you have seen) that do not exhibit this failure.  If you
can provide additional applications and workloads that do fail, that
would be good motivation for an updated MPI-IO implementation.

Thanks 
==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B