[mpich-discuss] Support for BLCR

kishor kharbas kishor.kharbas at gmail.com
Thu Jul 15 13:42:56 CDT 2010


Thank you Pavan and Darius for your help.

I am in the process of running MPI application with check-pointing, but I am
facing a problem in running the application(without checkpoint) at the first
place. I tried running the application on 2 processing nodes with default
HYDRA process manager.

Command : $: ../bin/mpiexec -np 2 ./mpiexample
(host file has domain names for 2 hosts)

The error shown is -
    *Fatal error in MPI_Send: Other MPI error, error stack:
   MPI_Send(174).................**....: MPI_Send(buf=0x7fff379fabb8,
count=1, MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
   MPIDI_CH3I_Progress(165)......**....:
   MPID_nem_mpich2_blocking_recv(**895):
   MPID_nem_tcp_connpoll(1714)...**....: Communication error*

Can you please suggest how can I find the cause for this error.

Thanks,
Kishor
On Wed, Jul 14, 2010 at 2:03 PM, Darius Buntinas <buntinas at mcs.anl.gov>wrote:

> Here's a wiki page that has some info on building it and running
> applications.  Let me know if you have trouble with this.
>
> http://wiki.mcs.anl.gov/mpich2/index.php/Checkpointing
>
> -d
>
> On Jul 13, 2010, at 9:37 AM, kishor kharbas wrote:
>
> > Hi,
> >
> > Does the beta version - mpich2-1.3a2 have support for BLCR ?
> > If so where can I find guidelines regarding usage of the functionality,
> if could not find it in the user guide document included with the above
> version.
> >
> >
> > Thanks,
> > Kishor
> > On Mon, Jul 12, 2010 at 11:14 AM, Darius Buntinas <buntinas at mcs.anl.gov>
> wrote:
> >
> > The next release of MPICH2 (1.3) will include checkpointing support using
> BLCR.  You can try the beta release that's available under 'downloads' on
> the MPICH2 website:
> >
> >    http://www.mcs.anl.gov/research/projects/mpich2/
> >
> > You'll need to install BLCR version 0.8.2 (which is currently the latest
> version).
> >
> > -d
> >
> > On Jul 12, 2010, at 9:05 AM, kishor kharbas wrote:
> >
> > > Hello,
> > >
> > > I would like to know whether there are any plans for including Berkeley
> lab checkpoint restart(BLCR) in MPICH2 runtime environment.
> > >
> > > Thanks,
> > > Kishor Kharbas
> > > MS Student
> > > Department of Computer Science
> > > NC State University
> > > Raleigh, NC 27606
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> > --
> > MS Student
> > Department of Computer Science
> > NC State University
> > Raleigh, NC 27606
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



-- 
MS Student
Department of Computer Science
NC State University
Raleigh, NC 27606
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100715/c0b83b54/attachment.htm>


More information about the mpich-discuss mailing list