[mpich-discuss] Error in Checkpointing An MPI application
Darius Buntinas
buntinas at mcs.anl.gov
Mon Sep 24 10:44:51 CDT 2012
Can you try runing cpi from the examples directory:
mpiexec -n 4 /home/superusr/manisha/mpich2-1.4.1p1/examples/cpi
The error doesn't /seem/ to be checkpoint related. Let us know if you get an error with that.
BTW, hydra is included in the mpich2-1.4.1p1 distribution, so you don't need to install it separately.
-d
On Sep 21, 2012, at 1:45 AM, Manisha Chauhan wrote:
> Hi all
>
> I want to checkpoint an MPI application using MPICH2 and BLCR tool. But while checkpointing i am getting an error.
>
> I installed BLCR with MPICH2 successfuly. I configured BLCR using command-
>
> ./configure --prefix=/home/superusr/manisha/blcr-0.8.3/install
> make
> make install
>
> vim .bash_profile
>
> export LD_LIBRARY_PATH="/home/superusr/manisha/blcr-0.8.3/install/lib:$LD_LIBRARY_PATH"
> export PATH="/home/superusr/manisha/blcr-0.8.3/install/bin:$PATH"
>
> source .bash_profile
>
> On command prompt Given-
>
> 1) /sbin/insmod /home/superusr/manisha/blcr-0.8.3/install/lib/blcr/2.6.32-279.2.1.el6.x86_64/blcr_imports.ko
> 2) /sbin/insmod /home/superusr/manisha/blcr-0.8.3/install/lib/blcr/2.6.32-279.2.1.el6.x86_64/blcr.ko
>
> Checked Whether BLCR is configured or loaded using command -
>
> /sbin/lsmod | grep blcr
>
> O/P -
>
> blcr 114837 0
> blcr_imports 9988 1 blcr
>
> Then i configured the hydra module using command-
>
> tar xzf hydra-1.4.1p1.tar.gz
>
> cd hydra
> mkdir hydra-install
>
> In /home/superusr/manisha/hydra given configure command-
>
> a. ./configure --prefix=/home/superusr/manisha/hydra/hydra-install --with-hydra-ckpointlib=blcr --with-blcr=/home/superusr/manisha/blcr-0.8.3/install
> b. make
> c. make install
>
> vim .bash_profile
>
> export LD_LIBRARY_PATH="/home/superusr/manisha/hydra/hydra-install/lib:$LD_LIBRARY_PATH"
> export PATH="/home/superusr/manisha/hydra/hydra-install/bin:$PATH"
>
> source .bash_profile
>
> And then finally i configured MPICH2 using command-
>
> tar xzf mpich2-1.4.1p1.tar.gz
>
> cd mpich2-1.4.1p1
> mkdir mpich2-install
>
> In /home/superusr/manisha/mpich2-1.4.1p1 given configure command-
>
> a. ./configure --prefix=/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install --enable-checkpointing --with-blcr=/home/superusr/manisha/blcr-0.8.3/install --with-pm=hydra
> b. make
> c. make install
>
> vim .bash_profile
>
> export LD_LIBRARY_PATH="/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install/lib:$LD_LIBRARY_PATH"
> export PATH="/home/superusr/manisha/mpich2-1.4.1p1/mpich2-install/bin:$PATH"
>
> source .bash_profile
>
> And set the hydra checkpoint prefix-
>
> vim .bash_profile
>
> export HYDRA_CKPOINT_PREFIX=/home/superusr/Raghu/linpack_10.3.10/benchmarks/mp_linpack/bin_intel/intel64/tmp/app.ckpoint
>
> source .bash_profile
>
> But when i have given the check pointing command as shown below it gives the error given below -
>
> mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -ckpoint-interval 3600 -n 4 ./xhpl_intel64
>
> [cli_2]: Command cmd=put kvsname=kvs_23207_0 key=P0-hostname value=power1.cdacb.in
> failed, reason='duplicate_keyP0-hostname'
> [cli_1]: Command cmd=put kvsname=kvs_23207_0 key=P0-hostname value=power1.cdacb.in
> failed, reason='duplicate_keyP0-hostname'
> [cli_0]: Command cmd=put kvsname=kvs_23207_0 key=r2h0 value=thnum#1$h0#power1.cdacb.in$r0#0$nbc#1$
> failed, reason='duplicate_keyr2h0'
> [cli_3]: Command cmd=put kvsname=kvs_23207_0 key=r2h0 value=thnum#1$h0#power1.cdacb.in$r0#0$nbc#1$
> failed, reason='duplicate_keyr2h0'
> [cli_1]: Command cmd=put kvsname=kvs_23207_0 key=P0-businesscard value=fabrics_list#shm$
> failed, reason='duplicate_keyP0-businesscard'
> [cli_2]: Command cmd=put kvsname=kvs_23207_0 key=P0-businesscard value=fabrics_list#shm$
> failed, reason='duplicate_keyP0-businesscard'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)....: Initialization failed
> MPID_Init(171)...........: channel initialization failed
> MPIDI_CH3_Init(70).......:
> MPID_nem_init_ckpt(897)..:
> MPIDI_PG_SetConnInfo(668): PMI_KVS_Put returned -1
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)....: Initialization failed
> MPID_Init(171)...........: channel initialization failed
> MPIDI_CH3_Init(70).......:
> MPID_nem_init_ckpt(897)..:
> MPIDI_PG_SetConnInfo(668): PMI_KVS_Put returned -1
>
> Please help me resolve this problem.
>
> Thanks & Regards
> Manisha Chauhan
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list