[mpich-discuss] File I/O causing collective abort of all ranks
Brian Harker
brian.harker at gmail.com
Tue Sep 23 13:39:36 CDT 2008
Hi-
Thanks for the replies. When i first encountered this error, I did
try it with only a single process, and it still aborts. :( It's
definitely not an I/O problem on my machine, as I am running other
serial code(s) right now with absolutely no problem. Strange. Any
idea what "signal 9" actually is? I tried some googling, but nothing
helpful has come up. I am currently running it under gdb to see if I
can further isolate where the problem is occuring...
To "The Source": process shouldn't exit at the point where the file is
opened, and I was careful to make sure MPI_Finalize is in the correct
place...I have watched it seemingly *try* to open the file, with some
simple print-to-stdout debugging, and it seems to hang at the file
open statement. After what seems like about 1-2 minutes of hangtime,
it all crashes and I get the error message from my first post.
On Tue, Sep 23, 2008 at 12:22 PM, The Source <thesourcehim at gmail.com> wrote:
> This error pops up when the process exits without calling MPI_Finalize().
> Check if process crashes for example.
>
> Brian Harker пишет:
>>
>> Hello list-
>>
>> I have a problem with process 0 being able to open a file for writing
>> and subsequently write to it. The pertinent section of code looks as
>> follows:
>>
>> ========================================
>> if ( proc_id == 0 ) then
>>
>> open( unit = 1, file = "fubar.dat", status="new" )
>> do i = 1, ny
>> write(1,*) ( array(i,j), i = 1, nx )
>> end do
>> close(1)
>>
>> end if
>> ========================================
>>
>> When this part of the code is reached, the program seems to hang for a
>> long time while trying to open the file, then spits out the following
>> error message:
>>
>> rank 0 in job 11 $HOSTNAME_##### caused collective abort of all ranks
>> exit status of rank 0: killed by signal 9
>>
>> I am confused about this error, because it is seemingly isolated to
>> this particular write-to-file by process 0. During execution, my
>> slave processes write out other files using this exact same syntax.
>> Has anyone run across this? I can't seem to find any useful
>> information on the interweb. I have run into this problem with both
>> MPICH2-1.0.6p1 and MPICH2-1.0.7. I am using the Intel fortran
>> compiler, ifort 10.1.012.
>>
>> Thanks in advance for any input!
>>
>>
>>
>>
>
>
--
Cheers,
Brian
brian.harker at gmail.com
"In science, there is only physics; all the rest is stamp-collecting."
-Ernest Rutherford
More information about the mpich-discuss
mailing list