[MPICH] Crash While Spawning Process

Jason Surratt jsurratt at spadac.com
Tue Oct 18 06:17:51 CDT 2005


David,

Thanks for responding.

I used your code and got the same results, but I also tried setting 
hosts to have one remote host and removed localhost -- this also works 
fine. (This appears to be what you did with 'hopper') If I set hosts to 
have more than one machine, then I get the same crash as before.

All of my other settings are set to the default.

Thanks again,

-Jason


David Ashton wrote:

>Jason,
>
>After modifying your example to correct syntax errors I ran the job like
>this:
>
>C:\Temp>"\Program Files\MPICH2\bin\mpiexec.exe" -exitcodes -n 1
>\\grass\c$\temp\myapp.exe
>Parent com: 4000000
>Error: This MPI Implementation doesn't support universe size.
>Spawning...
>\\grass\c$\temp\myapp.exe
>Parent com: 84000000
>Spawned.
>rank: node: exit code
>0: hopper: 0
>Parent com: 84000001
>Enter the number of intervals: (0 quits) 342
>pi is approximately 3.1415933660597268, Error is 0.0000007124699337
>wall clock time = 0.002582
>Enter the number of intervals: (0 quits) 0
>rank: node: exit code
>0: hopper: 0
>1: hopper: 0
>
>Everything went fine without any crashing.
>
>But you aren't going to get a valid value for universe size.  MPICH2 for
>Windows currently doesn't support that value.
>
>-David Ashton
>
>Here's the code I ran:
>
>/* -*- Mode: C; c-basic-offset:4 ; -*- */
>/*
> *  (C) 2001 by Argonne National Laboratory.
> *      See COPYRIGHT in top-level directory.
> */
>
>/* This is an interactive version of cpi */
>#include "mpi.h"
>#include <stdio.h>
>#include <math.h>
>
>double f(double);
>
>double f(double a)
>{
>    return (4.0 / (1.0 + a*a));
>}
>
>int main(int argc,char *argv[])
>{
>    int done = 0, n, myid, numprocs, i;
>    double PI25DT = 3.141592653589793238462643;
>    double mypi, pi, h, sum, x;
>    double startwtime = 0.0, endwtime;
>    int  namelen;
>    char processor_name[MPI_MAX_PROCESSOR_NAME];
>    MPI_Comm parentComm;
>
>    MPI_Init(&argc,&argv);
>    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
>    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
>    MPI_Get_processor_name(processor_name,&namelen);
>
>    MPI_Comm_get_parent(&parentComm);
>    printf("Parent com: %X\n", parentComm); fflush(stdout);
>
>    if (myid == 0 && parentComm == MPI_COMM_NULL)
>    {
>        int universeSize, *universeSizep;
>        int flag;
>        int result;
>        MPI_Comm everyone;
>
>        MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universeSizep,
>&flag);
>        if (!flag)
>        {
>            printf("Error: This MPI Implementation doesn't support universe
>size.\n"); fflush(stdout);
>            universeSize = 2;
>        }
>        else
>        {
>            universeSize = *universeSizep;
>        }
>
>        printf("Spawning...\n"); fflush(stdout);
>
>        printf("%s\n", argv[0]); fflush(stdout);
>        result = MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, universeSize,
>            MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
>MPI_ERRCODES_IGNORE); 
>
>        printf("Spawned.\n"); fflush(stdout);
>
>        MPI_Finalize();
>        return 0;
>    }
>
>    /*
>    fprintf(stdout,"Process %d of %d is on %s\n",
>	    myid, numprocs, processor_name);
>    fflush(stdout);
>    */
>
>    while (!done) {
>        if (myid == 0) {
>            fprintf(stdout, "Enter the number of intervals: (0 quits) ");
>	    fflush(stdout);
>            if (scanf("%d",&n) != 1) {
>		fprintf( stdout, "No number entered; quitting\n" );
>		n = 0;
>	    }
>	    startwtime = MPI_Wtime();
>        }
>        MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
>        if (n == 0)
>            done = 1;
>        else {
>            h   = 1.0 / (double) n;
>            sum = 0.0;
>            for (i = myid + 1; i <= n; i += numprocs) {
>                x = h * ((double)i - 0.5);
>                sum += f(x);
>            }
>            mypi = h * sum;
>            MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,
>MPI_COMM_WORLD);
>
>            if (myid == 0) {
>                printf("pi is approximately %.16f, Error is %.16f\n",
>                       pi, fabs(pi - PI25DT));
>		endwtime = MPI_Wtime();
>		printf("wall clock time = %f\n", endwtime-startwtime);
>
>		fflush( stdout );
>	    }
>        }
>    }
>    MPI_Finalize();
>    return 0;
>}
>
>
>-----Original Message-----
>From: owner-mpich-discuss at mcs.anl.gov
>[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jason Surratt
>Sent: Monday, October 17, 2005 7:28 AM
>To: MPICH Mailing List
>Subject: [MPICH] Crash While Spawning Process
>
>
>Hello,
>
>I am new to MPI, but I am very happy with the initial success I've had 
>parallelizing my application with MPICH2.
>
>The problem I'm having is when I spawn a process with the MPI_Comm_spawn 
>function. To keep things simple I have modified the icpi.c example to 
>include the following code snippet after line 35:
>
>...
>  MPI_Comm parentComm;
>  MPI_Comm_get_parent(&parentComm);
>  printf("Parent com: %X\n", parentComm);
>
>  if (myid == 0 && parentComm == MPI_COMM_NULL)
>  {
>    int universeSize, *universeSizep;
>    int flag;
>    MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universeSizep, 
>&flag);
>    if (!flag)
>    {
>      printf("Error: This MPI Implementation doesn't support universe 
>size.\n"); fflush(stdout);
>      universeSize = 2;
>    }
>    else
>    {
>      universeSize = *universeSizep;
>    }
>
>    int result;
>    MPI_Comm everyone;
>
>    printf("Spawning...\n"); fflush(stdout);
>
>    printf("%s\n", argv[0]); fflush(stdout);
>    result = MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, universeSize,
>      MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE); 
>
>    printf("Spawned.\n"); fflush(stdout);
>
>    MPI_Finalize();
>    return 0;
>  }
>...
>
>I then launch this application with:
>mpiexec -exitcodes -n 1 \\jsurratt-laptop\simple_mpi\simple_mpi
>
>My intent is to launch universeSize processes without requiring the user 
>to type the size at the command line.
>
>If I run this code with my local machine as the only host the universe 
>size is not returned (flag == 0), but the application spawns two 
>processes and runs fine.
>
>If I add another similarly configured machine to my host list the 
>universe size is not returned (flag == 0) and I get the standard Windows 
>crash prompt 'Process launcher for MPICH2 applications has encountered a 
>problem and needs to close.  We are sorry for the inconvenience.' when 
>the MPI_Comm_spawn function is called.
>
>I'm hoping that I'm just doing something silly, but if need be I can 
>bring it up in the debugger and post more information.
>
>I'm using the following configuration on all machines:
>Windows XP SP2
>MPICH2 v1.0.2-1 (ia32)
>
>Running the unmodified version of icpi.c with multiple hosts works fine.
>
>Thanks in advance,
>
>-Jason
>
>
>  
>




More information about the mpich-discuss mailing list