[mpich-discuss] Support for BLCR

Darius Buntinas buntinas at mcs.anl.gov
Wed Jul 21 11:14:09 CDT 2010


Hi Kishor,

It's hard to say what the problem might be.  This could be due to a process dying (then the other process gets an error when it tries to communicate), or due to an inability of the processes to connect to each other, possibly due to a firewall.

To rule out a problem with your test program, try running cpi which you can find in the examples directory.

-d

On Jul 15, 2010, at 1:42 PM, kishor kharbas wrote:

> Thank you Pavan and Darius for your help.
> 
> I am in the process of running MPI application with check-pointing, but I am facing a problem in running the application(without checkpoint) at the first place. I tried running the application on 2 processing nodes with default HYDRA process manager.
> 
> Command : $: ../bin/mpiexec -np 2 ./mpiexample
> (host file has domain names for 2 hosts) 
> 
> The error shown is - 
>     Fatal error in MPI_Send: Other MPI error, error stack:
>    MPI_Send(174).....................: MPI_Send(buf=0x7fff379fabb8, count=1, MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
>    MPIDI_CH3I_Progress(165)..........: 
>    MPID_nem_mpich2_blocking_recv(895): 
>    MPID_nem_tcp_connpoll(1714).......: Communication error
> 
> Can you please suggest how can I find the cause for this error.
> 
> Thanks,
> Kishor
> On Wed, Jul 14, 2010 at 2:03 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> Here's a wiki page that has some info on building it and running applications.  Let me know if you have trouble with this.
> 
> http://wiki.mcs.anl.gov/mpich2/index.php/Checkpointing
> 
> -d
> 
> On Jul 13, 2010, at 9:37 AM, kishor kharbas wrote:
> 
> > Hi,
> >
> > Does the beta version - mpich2-1.3a2 have support for BLCR ?
> > If so where can I find guidelines regarding usage of the functionality, if could not find it in the user guide document included with the above version.
> >
> >
> > Thanks,
> > Kishor
> > On Mon, Jul 12, 2010 at 11:14 AM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> >
> > The next release of MPICH2 (1.3) will include checkpointing support using BLCR.  You can try the beta release that's available under 'downloads' on the MPICH2 website:
> >
> >    http://www.mcs.anl.gov/research/projects/mpich2/
> >
> > You'll need to install BLCR version 0.8.2 (which is currently the latest version).
> >
> > -d
> >
> > On Jul 12, 2010, at 9:05 AM, kishor kharbas wrote:
> >
> > > Hello,
> > >
> > > I would like to know whether there are any plans for including Berkeley lab checkpoint restart(BLCR) in MPICH2 runtime environment.
> > >
> > > Thanks,
> > > Kishor Kharbas
> > > MS Student
> > > Department of Computer Science
> > > NC State University
> > > Raleigh, NC 27606
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> > --
> > MS Student
> > Department of Computer Science
> > NC State University
> > Raleigh, NC 27606
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> -- 
> MS Student
> Department of Computer Science
> NC State University
> Raleigh, NC 27606
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list