[mpich-discuss] Error when calling mpiexec from within a process
Reuti
reuti at staff.uni-marburg.de
Wed Oct 12 15:04:10 CDT 2011
Am 12.10.2011 um 21:31 schrieb Pramod:
> Unsetting those envs in the system() call before calling mpiexec did
> not really help (i still see the same error). From your response I
> understand that calling mpiexec from within an MPI process is not a
> common usage model
Correct.
BTW: I got confused as maybe I'm subscribed to too many lists it seems.
I listed stuff for Open MPI, but this is MPICH2. So, looking for the correct vars in MPICH2 and reset them might work.
Sorry for the confusion.
-- Reuti
> and perhaps not expected to work (?). May be I
> should think of a different approach to solve my problem.
>
> Thank you,
> Pramod
>
> On Fri, Oct 7, 2011 at 9:14 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Hi,
>>
>> Am 07.10.2011 um 02:46 schrieb Pramod:
>>
>>> I am trying to write a simple job scheduler using MPI. It schedules
>>> bunch of jobs to run in parallel on different hosts. Each job is an
>>> MPI application (uses mpiexec) that runs on multiple cores of each
>>> host. A child scheduling process runs on each host and executes the
>>> parallel job, given to it by the master process, using system( ). The
>>> executable is given, it runs on the multiple cores of host with some
>>> affinity settings, and I cannot modify it. I am sure there are other
>>> ways for a job scheduler, but whats wrong with this?
>>
>> will all jobs an all nodes then have the same runtime as a consequence?
>>
>> I would assume, that the second mpiexec inherits some of the already set
>> environment variables, and uses this information by accident. If you unset
>> them, it could work.
>>
>> When you check the /proc/12345/environ you will find this of an running mpi
>> process:
>>
>> OMPI_MCA_orte_precondition_transports=510befec2b70bcee-945da280a132e0fb
>> OMPI_MCA_plm=rsh
>> OMPI_MCA_orte_hnp_uri=1104674816.0;tcp://192.168.151.101:52964
>> OMPI_MCA_ess=env
>> OMPI_MCA_orte_ess_jobid=1104674817
>> OMPI_MCA_orte_ess_vpid=2
>> OMPI_MCA_orte_ess_num_procs=4
>> OMPI_MCA_orte_local_daemon_uri=1104674816.1;tcp://192.168.151.70:50363
>> OMPI_MCA_mpi_yield_when_idle=1
>> OMPI_MCA_orte_app_num=0
>> OMPI_UNIVERSE_SIZE=4
>> OMPI_COMM_WORLD_SIZE=4
>> OMPI_COMM_WORLD_LOCAL_SIZE=2
>> OMPI_COMM_WORLD_RANK=2
>> OMPI_COMM_WORLD_LOCAL_RANK=0
>> OPAL_OUTPUT_STDERR_FD=17
>>
>> The other point is the directory openmpi-sessions-reuti at foobar_0 where some
>> things are stored. Maybe you need another temporary directory to separate
>> the two and have two orted running (like it's done with queuing systems,
>> where this directory is stored in the job specific temporary directory
>> provided by the queuing system).
>>
>> -- Reuti
>>
>>
>>> -Pramod
>>>
>>> On Thu, Oct 6, 2011 at 4:28 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Am 06.10.2011 um 22:06 schrieb Pramod:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have an application where I need to call mpiexec from within a child
>>>>> process launched by mpiexec. I am using "system()" to call the mpiexec
>>>>> process from the child process. I am using mpich2-1.4.1 and the hydra
>>>>> process manger. The errors I see are below. I am attaching the source
>>>>> file main.c. Let me know what I am doing wrong here and if you need
>>>>> more information.
>>>>>
>>>>> To compile:
>>>>>
>>>>> /home/install/mpich/mpich2-1.4.1/linux_x86_64//bin/mpicc main.c
>>>>> -I/home/install/mpich/mpich2-1.4.1/linux_x86_64/include
>>>>>
>>>>> When I run the test on multiple nodes I get the following errors:
>>>>> mpiexec -n 3 -f hosts.list a.out
>>>>
>>>> what do you want to achieve in detail? Would you like to use another
>>>> hostlist for this call, so that each child decides on its own where to start
>>>> grandson processes?
>>>>
>>>> Spawning additional processes within MPI is not an option?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> proxy:0:0 at machine3] HYDU_create_process
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/utils/launch/launch.c:36):
>>>>> dup2 error (Bad file descriptor)
>>>>> [proxy:0:0 at machine3] launch_procs
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:751):
>>>>> create process returned error
>>>>> [proxy:0:0 at machine3] HYD_pmcd_pmip_control_cmd_cb
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:935):
>>>>> launch_procs returned error
>>>>> [proxy:0:0 at machine3] HYDT_dmxu_poll_wait_for_event
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/tools/demux/demux_poll.c:77):
>>>>> callback returned error status
>>>>> [proxy:0:0 at machine3] main
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/pm/pmiserv/pmip.c:226):
>>>>> demux engine error waiting for event
>>>>> [mpiexec at machine1.abc.com] control_cb
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:215):
>>>>> assert (!closed) failed
>>>>> [mpiexec at machine1.abc.com] HYDT_dmxu_poll_wait_for_event
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/tools/demux/demux_poll.c:77):
>>>>> callback returned error status
>>>>> [mpiexec at machine1.abc.com] HYD_pmci_wait_for_completion
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:181):
>>>>> error waiting for event
>>>>> [mpiexec at machine1.abc.com] main
>>>>>
>>>>> (/home/install/mpich/src/mpich2-1.4.1/src/pm/hydra/ui/mpich/mpiexec.c:405):
>>>>> process manager error waiting for completion
>>>>>
>>>>> ------
>>>>> On a single node I get the following.
>>>>> mpiexec -n 3 a.out
>>>>> [proxy:0:0 at machine1.abc.com] [proxy:0:0 at machine1.abc.com] Killed
>>>>> <main.c>_______________________________________________
>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>> _______________________________________________
>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list