[MPICH] Crash While Spawning Process
Jason Surratt
jsurratt at spadac.com
Tue Oct 18 06:17:51 CDT 2005
David,
Thanks for responding.
I used your code and got the same results, but I also tried setting
hosts to have one remote host and removed localhost -- this also works
fine. (This appears to be what you did with 'hopper') If I set hosts to
have more than one machine, then I get the same crash as before.
All of my other settings are set to the default.
Thanks again,
-Jason
David Ashton wrote:
>Jason,
>
>After modifying your example to correct syntax errors I ran the job like
>this:
>
>C:\Temp>"\Program Files\MPICH2\bin\mpiexec.exe" -exitcodes -n 1
>\\grass\c$\temp\myapp.exe
>Parent com: 4000000
>Error: This MPI Implementation doesn't support universe size.
>Spawning...
>\\grass\c$\temp\myapp.exe
>Parent com: 84000000
>Spawned.
>rank: node: exit code
>0: hopper: 0
>Parent com: 84000001
>Enter the number of intervals: (0 quits) 342
>pi is approximately 3.1415933660597268, Error is 0.0000007124699337
>wall clock time = 0.002582
>Enter the number of intervals: (0 quits) 0
>rank: node: exit code
>0: hopper: 0
>1: hopper: 0
>
>Everything went fine without any crashing.
>
>But you aren't going to get a valid value for universe size. MPICH2 for
>Windows currently doesn't support that value.
>
>-David Ashton
>
>Here's the code I ran:
>
>/* -*- Mode: C; c-basic-offset:4 ; -*- */
>/*
> * (C) 2001 by Argonne National Laboratory.
> * See COPYRIGHT in top-level directory.
> */
>
>/* This is an interactive version of cpi */
>#include "mpi.h"
>#include <stdio.h>
>#include <math.h>
>
>double f(double);
>
>double f(double a)
>{
> return (4.0 / (1.0 + a*a));
>}
>
>int main(int argc,char *argv[])
>{
> int done = 0, n, myid, numprocs, i;
> double PI25DT = 3.141592653589793238462643;
> double mypi, pi, h, sum, x;
> double startwtime = 0.0, endwtime;
> int namelen;
> char processor_name[MPI_MAX_PROCESSOR_NAME];
> MPI_Comm parentComm;
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD,&myid);
> MPI_Get_processor_name(processor_name,&namelen);
>
> MPI_Comm_get_parent(&parentComm);
> printf("Parent com: %X\n", parentComm); fflush(stdout);
>
> if (myid == 0 && parentComm == MPI_COMM_NULL)
> {
> int universeSize, *universeSizep;
> int flag;
> int result;
> MPI_Comm everyone;
>
> MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universeSizep,
>&flag);
> if (!flag)
> {
> printf("Error: This MPI Implementation doesn't support universe
>size.\n"); fflush(stdout);
> universeSize = 2;
> }
> else
> {
> universeSize = *universeSizep;
> }
>
> printf("Spawning...\n"); fflush(stdout);
>
> printf("%s\n", argv[0]); fflush(stdout);
> result = MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, universeSize,
> MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
>MPI_ERRCODES_IGNORE);
>
> printf("Spawned.\n"); fflush(stdout);
>
> MPI_Finalize();
> return 0;
> }
>
> /*
> fprintf(stdout,"Process %d of %d is on %s\n",
> myid, numprocs, processor_name);
> fflush(stdout);
> */
>
> while (!done) {
> if (myid == 0) {
> fprintf(stdout, "Enter the number of intervals: (0 quits) ");
> fflush(stdout);
> if (scanf("%d",&n) != 1) {
> fprintf( stdout, "No number entered; quitting\n" );
> n = 0;
> }
> startwtime = MPI_Wtime();
> }
> MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
> if (n == 0)
> done = 1;
> else {
> h = 1.0 / (double) n;
> sum = 0.0;
> for (i = myid + 1; i <= n; i += numprocs) {
> x = h * ((double)i - 0.5);
> sum += f(x);
> }
> mypi = h * sum;
> MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,
>MPI_COMM_WORLD);
>
> if (myid == 0) {
> printf("pi is approximately %.16f, Error is %.16f\n",
> pi, fabs(pi - PI25DT));
> endwtime = MPI_Wtime();
> printf("wall clock time = %f\n", endwtime-startwtime);
>
> fflush( stdout );
> }
> }
> }
> MPI_Finalize();
> return 0;
>}
>
>
>-----Original Message-----
>From: owner-mpich-discuss at mcs.anl.gov
>[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jason Surratt
>Sent: Monday, October 17, 2005 7:28 AM
>To: MPICH Mailing List
>Subject: [MPICH] Crash While Spawning Process
>
>
>Hello,
>
>I am new to MPI, but I am very happy with the initial success I've had
>parallelizing my application with MPICH2.
>
>The problem I'm having is when I spawn a process with the MPI_Comm_spawn
>function. To keep things simple I have modified the icpi.c example to
>include the following code snippet after line 35:
>
>...
> MPI_Comm parentComm;
> MPI_Comm_get_parent(&parentComm);
> printf("Parent com: %X\n", parentComm);
>
> if (myid == 0 && parentComm == MPI_COMM_NULL)
> {
> int universeSize, *universeSizep;
> int flag;
> MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universeSizep,
>&flag);
> if (!flag)
> {
> printf("Error: This MPI Implementation doesn't support universe
>size.\n"); fflush(stdout);
> universeSize = 2;
> }
> else
> {
> universeSize = *universeSizep;
> }
>
> int result;
> MPI_Comm everyone;
>
> printf("Spawning...\n"); fflush(stdout);
>
> printf("%s\n", argv[0]); fflush(stdout);
> result = MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, universeSize,
> MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE);
>
> printf("Spawned.\n"); fflush(stdout);
>
> MPI_Finalize();
> return 0;
> }
>...
>
>I then launch this application with:
>mpiexec -exitcodes -n 1 \\jsurratt-laptop\simple_mpi\simple_mpi
>
>My intent is to launch universeSize processes without requiring the user
>to type the size at the command line.
>
>If I run this code with my local machine as the only host the universe
>size is not returned (flag == 0), but the application spawns two
>processes and runs fine.
>
>If I add another similarly configured machine to my host list the
>universe size is not returned (flag == 0) and I get the standard Windows
>crash prompt 'Process launcher for MPICH2 applications has encountered a
>problem and needs to close. We are sorry for the inconvenience.' when
>the MPI_Comm_spawn function is called.
>
>I'm hoping that I'm just doing something silly, but if need be I can
>bring it up in the debugger and post more information.
>
>I'm using the following configuration on all machines:
>Windows XP SP2
>MPICH2 v1.0.2-1 (ia32)
>
>Running the unmodified version of icpi.c with multiple hosts works fine.
>
>Thanks in advance,
>
>-Jason
>
>
>
>
More information about the mpich-discuss
mailing list