[mpich-discuss] Error in Checkpointing An MPI application

Manisha Chauhan manisha.chauhan at yahoo.co.in
Fri Sep 21 01:45:38 CDT 2012


Hi all

I want to checkpoint an MPI application using MPICH2 and BLCR tool. But while checkpointing i am getting an error.

I installed BLCR with MPICH2 successfuly. I configured BLCR using command-

./configure --prefix=/home/superusr/manisha/blcr-0.8.3/install
 make 
make install

vim .bash_profile

export LD_LIBRARY_PATH="/home/superusr/manisha/blcr-0.8.3/install/lib:$LD_LIBRARY_PATH"
export PATH="/home/superusr/manisha/blcr-0.8.3/install/bin:$PATH"

source .bash_profile

On command prompt Given-

1) /sbin/insmod /home/superusr/manisha/blcr-0.8.3/install/lib/blcr/2.6.32-279.2.1.el6.x86_64/blcr_imports.ko 
2) /sbin/insmod /home/superusr/manisha/blcr-0.8.3/install/lib/blcr/2.6.32-279.2.1.el6.x86_64/blcr.ko

Checked Whether BLCR is configured or loaded using command -

 /sbin/lsmod | grep blcr

O/P - 

blcr                  114837  0 
blcr_imports            9988  1 blcr

Then i configured the hydra module using command-


tar xzf hydra-1.4.1p1.tar.gz

cd hydra
mkdir hydra-install

In /home/superusr/manisha/hydra given configure command-

a. ./configure --prefix=/home/superusr/manisha/hydra/hydra-install --with-hydra-ckpointlib=blcr --with-blcr=/home/superusr/manisha/blcr-0.8.3/install
b. make 
c. make install

vim .bash_profile

export LD_LIBRARY_PATH="/home/superusr/manisha/hydra/hydra-install/lib:$LD_LIBRARY_PATH"
export PATH="/home/superusr/manisha/hydra/hydra-install/bin:$PATH"

source .bash_profile

And then finally i configured MPICH2 using command- 


tar xzf mpich2-1.4.1p1.tar.gz

cd mpich2-1.4.1p1
mkdir mpich2-install

In /home/superusr/manisha/mpich2-1.4.1p1 given configure command-

a. ./configure --prefix=/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install --enable-checkpointing --with-blcr=/home/superusr/manisha/blcr-0.8.3/install --with-pm=hydra
b. make 
c. make install

vim .bash_profile

export LD_LIBRARY_PATH="/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install/lib:$LD_LIBRARY_PATH"
export PATH="/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install/bin:$PATH"

source .bash_profile


And set the hydra checkpoint prefix-


vim .bash_profile

export HYDRA_CKPOINT_PREFIX=/home/superusr/Raghu/linpack_10.3.10/benchmarks/mp_linpack/bin_intel/intel64/tmp/app.ckpoint

source .bash_profile


But when i have given the check pointing command as shown below it gives the error given below -

mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -ckpoint-interval 3600 -n 4 ./xhpl_intel64

[cli_2]: Command cmd=put kvsname=kvs_23207_0 key=P0-hostname value=power1.cdacb.in
 failed, reason='duplicate_keyP0-hostname'
[cli_1]: Command cmd=put kvsname=kvs_23207_0 key=P0-hostname value=power1.cdacb.in
 failed, reason='duplicate_keyP0-hostname'
[cli_0]: Command cmd=put kvsname=kvs_23207_0 key=r2h0 value=thnum#1$h0#power1.cdacb.in$r0#0$nbc#1$
 failed, reason='duplicate_keyr2h0'
[cli_3]: Command cmd=put kvsname=kvs_23207_0 key=r2h0 value=thnum#1$h0#power1.cdacb.in$r0#0$nbc#1$
 failed, reason='duplicate_keyr2h0'
[cli_1]: Command cmd=put kvsname=kvs_23207_0 key=P0-businesscard value=fabrics_list#shm$
 failed, reason='duplicate_keyP0-businesscard'
[cli_2]: Command cmd=put kvsname=kvs_23207_0 key=P0-businesscard value=fabrics_list#shm$
 failed, reason='duplicate_keyP0-businesscard'
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)....: Initialization failed
MPID_Init(171)...........: channel initialization failed
MPIDI_CH3_Init(70).......: 
MPID_nem_init_ckpt(897)..: 
MPIDI_PG_SetConnInfo(668): PMI_KVS_Put returned -1
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)....: Initialization failed
MPID_Init(171)...........: channel initialization failed
MPIDI_CH3_Init(70).......: 
MPID_nem_init_ckpt(897)..: 
MPIDI_PG_SetConnInfo(668): PMI_KVS_Put returned -1

Please help me resolve this problem.

Thanks & Regards
Manisha Chauhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120921/d3ad7f19/attachment.html>


More information about the mpich-discuss mailing list