[Nek5000-users] IO problem
nek5000-users at lists.mcs.anl.gov
nek5000-users at lists.mcs.anl.gov
Mon Nov 24 21:44:13 CST 2014
Hello all,
I have sometimes experienced an IO issue, when running on a relatively
large number of machines on our bgq cluster with MPIIO enabled and
p65=0. The issue is that while the status of the job shows normal
running, no output files and even .log file are written on the file
system. What is strange is that the issue happens occasionally and for
jobs with more than 512 nodes, as I haven't seen this for smaller scale
problems and resources. Usually, cancelling and resubmitting the job
will solve the issue.
I contacted the technical staff of our cluster and they performed
several tests but they have not yet been able to reproduce the problem
on small scales. It is more likely that the this problem is caused by a
hardware issue rather than a code issue since it is not happened all the
time. However, I wanted to contact you and wondered if anyone has any
comments or similar experience.
Thanks
Mohsen
More information about the Nek5000-users
mailing list