[mpich-discuss] File I/O causing collective abort of all ranks

Brian Harker brian.harker at gmail.com
Tue Sep 23 13:39:36 CDT 2008


Hi-
Thanks for the replies.  When i first encountered this error, I did
try it with only a single process, and it still aborts.  :(  It's
definitely not an I/O problem on my machine, as I am running other
serial code(s) right now with absolutely no problem.  Strange.  Any
idea what "signal 9" actually is?  I tried some googling, but nothing
helpful has come up.  I am currently running it under gdb to see if I
can further isolate where the problem is occuring...

To "The Source": process shouldn't exit at the point where the file is
opened, and I was careful to make sure MPI_Finalize is in the correct
place...I have watched it seemingly *try* to open the file, with some
simple print-to-stdout debugging, and it seems to hang at the file
open statement.  After what seems like about 1-2 minutes of hangtime,
it all crashes and I get the error message from my first post.

On Tue, Sep 23, 2008 at 12:22 PM, The Source <thesourcehim at gmail.com> wrote:
> This error pops up when the process exits without calling MPI_Finalize().
> Check if process crashes for example.
>
> Brian Harker пишет:
>>
>> Hello list-
>>
>> I have a problem with process 0 being able to open a file for writing
>> and subsequently write to it.  The pertinent section of code looks as
>> follows:
>>
>> ========================================
>> if ( proc_id == 0 ) then
>>
>>  open( unit = 1, file = "fubar.dat", status="new" )
>>  do i = 1, ny
>>    write(1,*) ( array(i,j), i = 1, nx )
>>  end do
>>  close(1)
>>
>> end if
>> ========================================
>>
>> When this part of the code is reached, the program seems to hang for a
>> long time while trying to open the file, then spits out the following
>> error message:
>>
>> rank 0 in job 11  $HOSTNAME_#####  caused collective abort of all ranks
>>   exit status of rank 0: killed by signal 9
>>
>> I am confused about this error, because it is seemingly isolated to
>> this particular write-to-file by process 0.  During execution, my
>> slave processes write out other files using this exact same syntax.
>> Has anyone run across this?  I can't seem to find any useful
>> information on the interweb.  I have run into this problem with both
>> MPICH2-1.0.6p1 and MPICH2-1.0.7.  I am using the Intel fortran
>> compiler, ifort 10.1.012.
>>
>> Thanks in advance for any input!
>>
>>
>>
>>
>
>



-- 
Cheers,
Brian
brian.harker at gmail.com


"In science, there is only physics; all the rest is stamp-collecting."
 -Ernest Rutherford




More information about the mpich-discuss mailing list