[mpich-discuss] Error in Checkpointing An MPI application

Darius Buntinas buntinas at mcs.anl.gov
Mon Sep 24 10:44:51 CDT 2012


Can you try runing cpi from the examples directory:
  mpiexec -n 4 /home/superusr/manisha/mpich2-1.4.1p1/examples/cpi

The error doesn't /seem/ to be checkpoint related.  Let us know if you get an error with that.

BTW, hydra is included in the mpich2-1.4.1p1 distribution, so you don't need to install it separately.

-d


On Sep 21, 2012, at 1:45 AM, Manisha Chauhan wrote:

> Hi all
> 
> I want to checkpoint an MPI application using MPICH2 and BLCR tool. But while checkpointing i am getting an error.
> 
> I installed BLCR with MPICH2 successfuly. I configured BLCR using command-
> 
> ./configure --prefix=/home/superusr/manisha/blcr-0.8.3/install
>  make 
> make install
> 
> vim .bash_profile
> 
> export LD_LIBRARY_PATH="/home/superusr/manisha/blcr-0.8.3/install/lib:$LD_LIBRARY_PATH"
> export PATH="/home/superusr/manisha/blcr-0.8.3/install/bin:$PATH"
> 
> source .bash_profile
> 
> On command prompt Given-
> 
> 1) /sbin/insmod /home/superusr/manisha/blcr-0.8.3/install/lib/blcr/2.6.32-279.2.1.el6.x86_64/blcr_imports.ko 
> 2) /sbin/insmod /home/superusr/manisha/blcr-0.8.3/install/lib/blcr/2.6.32-279.2.1.el6.x86_64/blcr.ko
> 
> Checked Whether BLCR is configured or loaded using command -
> 
>  /sbin/lsmod | grep blcr
> 
> O/P - 
> 
> blcr                  114837  0 
> blcr_imports            9988  1 blcr
> 
> Then i configured the hydra module using command-
> 
> tar xzf hydra-1.4.1p1.tar.gz
> 
> cd hydra
> mkdir hydra-install
> 
> In /home/superusr/manisha/hydra given configure command-
> 
> a. ./configure --prefix=/home/superusr/manisha/hydra/hydra-install --with-hydra-ckpointlib=blcr --with-blcr=/home/superusr/manisha/blcr-0.8.3/install
> b. make 
> c. make install
> 
> vim .bash_profile
> 
> export LD_LIBRARY_PATH="/home/superusr/manisha/hydra/hydra-install/lib:$LD_LIBRARY_PATH"
> export PATH="/home/superusr/manisha/hydra/hydra-install/bin:$PATH"
> 
> source .bash_profile
> 
> And then finally i configured MPICH2 using command- 
> 
> tar xzf mpich2-1.4.1p1.tar.gz
> 
> cd mpich2-1.4.1p1
> mkdir mpich2-install
> 
> In /home/superusr/manisha/mpich2-1.4.1p1 given configure command-
> 
> a. ./configure --prefix=/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install --enable-checkpointing --with-blcr=/home/superusr/manisha/blcr-0.8.3/install --with-pm=hydra
> b. make 
> c. make install
> 
> vim .bash_profile
> 
> export LD_LIBRARY_PATH="/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install/lib:$LD_LIBRARY_PATH"
> export PATH="/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install/bin:$PATH"
> 
> source .bash_profile
> 
> And set the hydra checkpoint prefix-
> 
> vim .bash_profile
> 
> export HYDRA_CKPOINT_PREFIX=/home/superusr/Raghu/linpack_10.3.10/benchmarks/mp_linpack/bin_intel/intel64/tmp/app.ckpoint
> 
> source .bash_profile
> 
> But when i have given the check pointing command as shown below it gives the error given below -
> 
> mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -ckpoint-interval 3600 -n 4 ./xhpl_intel64
> 
> [cli_2]: Command cmd=put kvsname=kvs_23207_0 key=P0-hostname value=power1.cdacb.in
>  failed, reason='duplicate_keyP0-hostname'
> [cli_1]: Command cmd=put kvsname=kvs_23207_0 key=P0-hostname value=power1.cdacb.in
>  failed, reason='duplicate_keyP0-hostname'
> [cli_0]: Command cmd=put kvsname=kvs_23207_0 key=r2h0 value=thnum#1$h0#power1.cdacb.in$r0#0$nbc#1$
>  failed, reason='duplicate_keyr2h0'
> [cli_3]: Command cmd=put kvsname=kvs_23207_0 key=r2h0 value=thnum#1$h0#power1.cdacb.in$r0#0$nbc#1$
>  failed, reason='duplicate_keyr2h0'
> [cli_1]: Command cmd=put kvsname=kvs_23207_0 key=P0-businesscard value=fabrics_list#shm$
>  failed, reason='duplicate_keyP0-businesscard'
> [cli_2]: Command cmd=put kvsname=kvs_23207_0 key=P0-businesscard value=fabrics_list#shm$
>  failed, reason='duplicate_keyP0-businesscard'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)....: Initialization failed
> MPID_Init(171)...........: channel initialization failed
> MPIDI_CH3_Init(70).......: 
> MPID_nem_init_ckpt(897)..: 
> MPIDI_PG_SetConnInfo(668): PMI_KVS_Put returned -1
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)....: Initialization failed
> MPID_Init(171)...........: channel initialization failed
> MPIDI_CH3_Init(70).......: 
> MPID_nem_init_ckpt(897)..: 
> MPIDI_PG_SetConnInfo(668): PMI_KVS_Put returned -1
> 
> Please help me resolve this problem.
> 
> Thanks & Regards
> Manisha Chauhan
> 
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list