[petsc-users] coupling with Matlab and parallel solution

Benjamin Sanderse B.Sanderse at cwi.nl
Tue Sep 7 10:53:15 CDT 2010


Hi Barry,

I am still not too happy with the execution in parallel. I am working under Linux (64 bits) and still using your approach with two command windows (since it gives the best debugging possibility). 
As I said, sometimes things work, but most of the time not. Here is the output of two successive runs

-bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
[1] PetscInitialize(): PETSc successfully started: number of processors = 2
[1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
[0] PetscInitialize(): PETSc successfully started: number of processors = 2
[0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
[0] PetscViewerSocketSetConnection(): Connecting to socket process on port 5005 machine borr.mas.cwi.nl
[0] PetscCommDuplicate():   returning tag 2147483646
[1] PetscCommDuplicate():   returning tag 2147483646
[1] PetscCommDuplicate():   returning tag 2147483641
[0] PetscCommDuplicate():   returning tag 2147483641
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[1] PetscFinalize(): PetscFinalize() called
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
[1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
[0] PetscFinalize(): PetscFinalize() called
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
[0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784


-bash-4.0$ netstat | grep 5005


-bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
[1] PetscInitialize(): PETSc successfully started: number of processors = 2
[1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
[0] PetscInitialize(): PETSc successfully started: number of processors = 2
[0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
[0] PetscViewerSocketSetConnection(): Connecting to socket process on port 5005 machine borr.mas.cwi.nl
[1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[1] PetscCommDuplicate():   returning tag 2147483646
[0] PetscCommDuplicate():   returning tag 2147483646
[0] PetscCommDuplicate():   returning tag 2147483641
[1] PetscCommDuplicate():   returning tag 2147483641
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
^C
-bash-4.0$ [0]0:Return code = 0, signaled with Interrupt
[0]1:Return code = 0, signaled with Interrupt


In both cases I first started the Matlab program. I am currently starting Matlab without a GUI, but with a GUI I have the same problems.
As you can see, in the first case everything works fine, and Petsc finalizes and closes. Matlab gives me the correct output. The second case, run just a couple of seconds later, does not reach PetscFinalize and Matlab does not give the correct output. In between the two cases I checked if port 5005 was in use, and it was not. 
Do you have any more suggestions on how to get this to work properly?

Benjamin

Op 3 sep 2010, om 21:11 heeft Barry Smith het volgende geschreven:

> 
> On Sep 3, 2010, at 4:32 PM, Benjamin Sanderse wrote:
> 
>> Hi Barry,
>> 
>> Thanks for your help! However, there are still some issues left. In other to test things, I simplified the program even more and now I am just sending a vector back and forth: matlab->petsc->matlab:
>> 
>> fd   = PETSC_VIEWER_SOCKET_WORLD;
>> 
>> // load rhs vector
>> ierr = VecLoad(fd,VECMPI,&b);CHKERRQ(ierr);
>> 
>> // send to matlab
>> ierr = VecView(b,fd);CHKERRQ(ierr);
>> ierr = VecDestroy(b);CHKERRQ(ierr);
>> 
>> 
>> - Your approach with two windows works *sometimes*. I removed the 'launch' statement and I executed my program 10 times, the first 2 times worked, and in all other cases I got this:
>> 
>> petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>> [1] PetscInitialize(): PETSc successfully started: number of processors = 2
>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>> [0] PetscInitialize(): PETSc successfully started: number of processors = 2
>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
>> [1] PetscCommDuplicate():   returning tag 2147483647
>> [0] PetscCommDuplicate():   returning tag 2147483647
>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on port 5005 machine borr.mas.cwi.nl
>> [0] PetscOpenSocket(): Connection refused in attaching socket, trying again[0] PetscOpenSocket(): Connection refused in attaching socket, trying again[0] PetscOpenSocket(): Connection refused in attaching socket, trying again
>> [0] PetscOpenSocket(): Connection refused in attaching socket, trying again^C
>> -bash-4.0$ [0]0:Return code = 0, signaled with Interrupt
>> [0]1:Return code = 0, signaled with Interrupt
>> 
>> Every time I start the program I use close(socket) and clear all in Matlab, so the socket from the previous run should not be present anymore. It seems that the port gets corrupted after a couple of times? Matlab does not respond and I have to kill it and restart it manually.
> 
>   Sometimes when you close a socket connection it doesn't close for a very long time so that if you try to open it again it doesn't work. When it appears the socket can not be used try using netstat | grep 5005 to see if the socket is still active. 
> 
>> 
>> - If I include the launch statement, or just type
>> system('mpiexec -n 2 ./petsc_poisson_par_barry2 &')
>> the program never works. 
> 
>   Are you sure mpiexec is in the path of system and it is the right one? The problem is that we are kind of cheating with system because we start a new job in the background and have no idea what the output is. Are you using unix and running Matlab on the command line or in a GUI?
> 
>  Barry
> 
> 
>> 
>> Hope you can figure out what is going wrong.
>> 
>> Ben
>> 
>> 
>> Op 3 sep 2010, om 13:25 heeft Barry Smith het volgende geschreven:
>> 
>>> 
>>> Ben
>>> 
>>> Ok, I figured out the problem. It is not fundamental and mostly comes from not having a create way to debug this.
>>> 
>>> The test vector you create is sequential then you try to view it back to Matlab with the parallel fd viewer. If you change to 
>>> ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,1,&test);CHKERRQ(ierr);
>>> then the code runs.
>>> 
>>> I've found (just now) that when I use launch all the output from the .c program gets lost which makes it impossible to figure out what has gone wrong. You can debug by running the two parts of the computation in two different windows. So comment out the launch from the matlab script and then in Matlab run the script (it will hang waiting for the socket to work) and in a separate terminal window run the .c program; for example petscmpiexec -n 2 ./ex1 -info Now you see exactly what is happening in the PETSc program. You can even use -start_in_debugger on the PETSc side to run the debugger on crashes.
>>> 
>>> I'll add this to the docs for launch
>>> 
>>> Barry
>>> 
>>> 
>>> On Sep 2, 2010, at 3:28 PM, Benjamin Sanderse wrote:
>>> 
>>>> Hi Barry,
>>>> 
>>>> I attached my matlab file, c file and makefile. First I generate the executable with 'make petsc_poisson_par_barry' and then I run test_petsc_par_barry.m. 
>>>> If you change MATMPIAIJ to MATAIJ and VECMPI to VECSEQ the code works fine.
>>>> 
>>>> Thanks a lot,
>>>> 
>>>> Benjamin
>>>> 
>>>> <makefile><test_petsc_par_barry.m><petsc_poisson_par_barry.c>
>>>> 
>>>> Op 2 sep 2010, om 13:45 heeft Barry Smith het volgende geschreven:
>>>> 
>>>>> 
>>>>> 
>>>>> Matlab is never aware the vector is parallel. Please send me the code and I'll figure out what is going on.
>>>>> 
>>>>> Barry
>>>>> 
>>>>> On Sep 2, 2010, at 2:07 PM, Benjamin Sanderse wrote:
>>>>> 
>>>>>> That sounds great, but there is one issue I am encountering. I switched vector types to VECMPI and matrix type to MATMPIAIJ, but when running Matlab I get the following error:
>>>>>> 
>>>>>> Found unrecogonized header 0 in file. If your file contains complex numbers
>>>>>> then call PetscBinaryRead() with "complex" as the second argument
>>>>>> Error in ==> PetscBinaryRead at 27
>>>>>> if nargin < 2
>>>>>> 
>>>>>> ??? Output argument "varargout" (and maybe others) not assigned during call to 
>>>>>> "/ufs/sanderse/Software/petsc-3.1-p4/bin/matlab/PetscBinaryRead.m>PetscBinaryRead".
>>>>>> 
>>>>>> Error in ==> test_petsc_par at 57
>>>>>> 	x4 = PetscBinaryReady(PS);
>>>>>> 
>>>>>> Could it be that Matlab does not understand the "parallel" vector which is returned by Petsc? Currently I have this done with VecView as follows:
>>>>>> 
>>>>>> fd = PETSC_VIEWER_SOCKET_WORLD;
>>>>>> ...
>>>>>> KSPSolve(ksp,b,x);
>>>>>> ...
>>>>>> VecView(fd,x);
>>>>>> 
>>>>>> Thanks for the help!
>>>>>> 
>>>>>> Ben
>>>>>> 
>>>>>> Op 2 sep 2010, om 10:09 heeft Barry Smith het volgende geschreven:
>>>>>> 
>>>>>>> 
>>>>>>> On Sep 2, 2010, at 10:51 AM, Benjamin Sanderse wrote:
>>>>>>> 
>>>>>>>> Hello all,
>>>>>>>> 
>>>>>>>> I figured out the coupling with Matlab and I can send back and forth matrices and vectors between Petsc and Matlab. Actually, I send only once a matrix from Matlab to Petsc and then repeatedly send new right hand sides from Matlab->Petsc and the solution vector from Petsc->Matlab. That works great.
>>>>>>>> I know want to see if the matrix that is send from (serial) Matlab to Petsc can be stored as a parallel matrix in Petsc so that subsequent computations with different right hand sides can be performed in parallel by Petsc. Does this simply work by using MatLoad and setting Mattype MPIAIJ? Or is something more fancy required?
>>>>>>> 
>>>>>>> In theory this can be done using the same code as sequential only with parallel vectors VECMPI  and matrices. MATMPIAIJ
>>>>>>> 
>>>>>>> Barry
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Ben
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 



More information about the petsc-users mailing list