[mpich-discuss] mpich2 MPI_TEST errors

Samir Khanal skhanal at bgsu.edu
Sun Mar 15 15:40:05 CDT 2009


Hi

I found the Culprit function 
it was indeed a problem with mpi_test call , i tracked it down, the programs works now.

But now i am having a hard time using the same program to run on mpich2 1.0.8/PBS on a x86_64 system.
it compiles and runs perfectly as a single process, 
ie, mpiexec -n 1 ./Ring 
executes and generates outputs.

but as soon as i do mpiexec -n 2 or more , it just waits and eventually the job is thrown out of the queue.

i am using the mpiexec that came with mpich2 

The previous system was a single core system. 

Does mpich2 has any special configurations with multiple core machines?
Any tips on job submission or compiling, 
if just used

./configure --with-device=ch3:nemesis



Right now i do job submission this way,

#PBS -l walltime=3:00:00
#PBS -N my_job
#PBS -j oe
#PBS -l nodes=6
echo `hostname`
echo Directory is `pwd`
echo This job is running on following Processors
export LD_LIBRARY_PATH=/home/skhanal/bgtw/lib:$LD_LIBRARY_PATH
time /home/skhanal/mpich2/bin/mpiexec -n 4 ./Ring

Your help is really appreciated.

Thanks
Samir



________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji [balaji at mcs.anl.gov]
Sent: Sunday, March 15, 2009 1:58 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpich2 MPI_TEST errors

Hi Samir,

Can you reproduce the problem with a smaller standalone program? It'll
be difficult to setup your library here and try it out.

  -- Pavan

Samir Khanal wrote:
> Hi Pavan
>
> Thanks for the quick response and am glad that it reached the right person for a response.
>
> i downloaded mpich2 1.0.8 version from the official website.
>
> I am compiling a time-warp library with mpich2 ch3:nemesis channel.
>
> I can run simple programs (like cpi etc), but the problem comes when i try to use the program for this library, (which works perfectly with mpich 1.2.7) it gives a mpi_test failure info.
>
> If you would like to compile the library and test, i can send you the whole package about 1MB in your mcs email.
>
> I am really stuck with this.
>
> Samir
>
>
> ________________________________________
> From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji [balaji at mcs.anl.gov]
> Sent: Sunday, March 15, 2009 1:47 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpich2 MPI_TEST errors
>
> Samir,
>
> Can you send the test program? If all channels are failing, it is very
> likely either a problem with the test program or your configuration.
> Btw, which version of MPICH2 are you using?
>
> Regarding the new cluster, all you need to use it ch3:nemesis. In a lot
> of ways, nemesis is a superset of sock, ssm and shm. The few features
> that it is missing (which you'll only notice in special circumstances)
> are being added in the 1.1 release of MPICH2.
>
>   -- Pavan
>
> Samir Khanal wrote:
>> Hi all
>>
>> I was trying to get a library compiled using mpich2,
>> it works ok - compiles and executes with mpich 1.2.7 ch3_p4 channel and 1.2.5 but i have problem running it with mpich2.
>> Isn't mpich2 backward compatible with mpich 1.2.7 ? I assume that there should not be specific rewrites to work with mpich2...
>> I compiled mpich2  with
>> --with-device=ch3:nemesis
>>
>> and all other 3 options
>> sock
>> ssm
>> shm
>>
>> what should be the channel to be compatible with ch3_p4 as in mpich-1.2.7
>>
>> The error i receive is
>>
>> mpiexec -n 1 ./test
>> Fatal error in MPI_Test: Invalid MPI_Request, error stack:
>> MPI_Test(152): MPI_Test(request=0x869e1dc, flag=0xbfa134f8, status=0xbfa134d0) failed
>> MPI_Test(75).: Invalid MPI_Requestrank 0 in job 19  protos.cs.bgsu.edu_46623   caused collective abort of all ranks
>>   exit status of rank 0: killed by signal 9
>>
>> mpiexec -n 2 ./test
>> Fatal error in MPI_Test: Invalid MPI_Request, error stack:
>> MPI_Test(152): MPI_Test(request=0x92451dc, flag=0xbf8c4938, status=0xbf8c4910) failed
>> MPI_Test(75).: Invalid MPI_Request
>>
>> Fatal error in MPI_Test: Invalid MPI_Request, error stack:
>> MPI_Test(152): MPI_Test(request=0x8bdc1dc, flag=0xbfb77be8, status=0xbfb77bc0) failed
>> MPI_Test(75).: Invalid MPI_Requestrank 0 in job 20  protos.cs.bgsu.edu_46623   caused collective abort of all ranks
>>   exit status of rank 0: killed by signal 9
>>
>>
>>
>> i am using correct mpicxx and mpicc and mpi.h versions in includes
>> infact i use
>> include "mpi.h" so that appropriate version header gets pulled in automatically
>>
>> I also have another question
>>
>> I want to compile mpich2 for another cluster with 6 pcs with intel core2quad processors, what appropriate channels should  be used?
>>
>> Thanks
>> Samir
>>
>>
>>
>>
>>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list