[petsc-users] MatAssemblyBegin and dapl MPI fabric

Barry Smith bsmith at mcs.anl.gov
Thu Jan 22 10:11:49 CST 2015


  Can you run with valgrind to determine if there is memory corruption? http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

  Also check with Intel for any MPI updates.

  You can also try to call MatAssemblyBegin/End(mat,MAT_FLUSH_ASSEMBLY) several times during the generation of the matrix entries (this will make the messages smaller).  Warning: all processes have to call MatAssemblyBegin/End(mat,MAT_FLUSH_ASSEMBLY) the same number of times. If this "solves" the problem then we know it is an issue with the MPI buffers.

 
  Barry


> On Jan 22, 2015, at 9:17 AM, Antoine De Blois <antoine.deblois at aero.bombardier.com> wrote:
> 
> Hi Everyone,
>  
> I get a strange error during a call to MatAssemblyBegin. The error message is triggered by Intel MPI, as shown below. The error does not always occurs, which is even more strange.
> [333:node1179] unexpected disconnect completion event from [163:node1254]
> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
>  
> All ranks output the same error message with their own node number. I did a bit of research and some say that MPICH2 solves this issue. Since our group is keen in using Intel MPI, I would like to solves this issue at the root.
>  
> A few important points:
> ·         At the moment, we are assembling the matrix with a single MatAssembleBegin/End and MAT_FINAL_ASSEMBLY after doing MatSetValuesBlocked. Can it be due to memory overflow in the buffers?
> ·         We are using -genv I_MPI_FABRICS shm:dapl in the submission script
> ·         I tried using –malloc_log and –log_summary, but the crash prevents writing the log ouput
>  
> Has anyone of you already faced this issue?
> Any suggestion is welcome,
> Best regards,
> Antoine DeBlois
>  
> Antoine DeBlois
> Specialiste ingenierie, MDO lead / Engineering Specialist, MDO lead
> Aéronautique / Aerospace
> 514-855-5001, x 50862
> antoine.deblois at aero.bombardier.com
> 
> 2351 Blvd Alfred-Nobel
> Montreal, Qc
> H4S 1A9
> 
> <image001.jpg>
> CONFIDENTIALITY NOTICE - This communication may contain privileged or confidential information.
> If you are not the intended recipient or received this communication by error, please notify the sender
> and delete the message without copying



More information about the petsc-users mailing list