[mpich-discuss] MPI application design

Mon Aug 17 17:05:09 CDT 2009

Hi Ronald,

    IMHO, you may be better with a job scheduler (torque or SGE) tying all 
that up than programming with MPI. The job scheduler knows what machines of 
your cluster are available, and can queue and run the jobs whenever they are 
free. Just to not reinvent the wheel.

Cheers,
 -- Diego.

On Monday 17 August 2009 10:22:51 ronald at audaces.com.br wrote:
> Hi all,
>
> This is my first message to this mailing list. Im just entering this world
> of MPI and MPICH seems to be a wonderful implementation from what Ive seen
> till now.
>
> My subject is not really into MPICH itself buf a more general MPI
> application design. So... I hope you experts at MPI/MPICH can help me.
>
> Well... lets go to the problem:
>
> In our company, we a have a CPU intesive algorithm that we already run on a
> multi-process "manually controlled" (not MPI) environment. The core
> algorithm is a thirdy party library (not parallel) so, we dont have access
> to the code in order to implement paralellism on the algorithm itself.
>
> But we can see it as a candidate for MPI if we look at the problem at an
> higher level: A set of problems that can be solved concurrently by this
> algorithm... The paralellismm consists of solving each problem in parallel.
> Each "run" of the serial algorithm in a process. And of course we get all
> the message passing mechanism (that is a big problem regarding parallel
> applications and MPI handles it very well).
>
> We plan to use it like this: Every machine on the network will provide its
> processors to the Cluster (one machine gives 1, others 2, others 4
> processors) in every processor we should run one problem at a time. A
> central processing unit will coordinate (and collect) the results and
> manage the processes (using MPI).
>
> An example: My problem consists of 100 problems to be solved and... on my
> network I have 10 processing unit (processors on slave machines). Lets say
> we want to run our algorith (an heuristic procedure) for 10 minutes for
> each problem. Our controller will for each problem:
> - See if there is a free Processor.
> - Send the problem to this processor so that it can compute its solution
> - Recieve back its solution and keep it.
> - Repeat till the 100 problems are solved.
> In this case it will take 100 min. to solve all the 100 problems. (10 mins
> for each 10 parallel workers, if it weren't in parallel it would take 1000
> min.)
>
> This is the kind of paralellism that we need. I though of a lot of ways to
> solve it but I would like to hear what do you guys think.
>
> At this time of my studies I'm tending to do it this way:
> - The controlling unit is a server that accepts connections.
> - For every new process it needs, it launches (mpiexec) a new process to
> proccess it, passing the connection as a parameter to the executor.
> - the executor communicates the progress through mpi to the server.
> - The it goes till the end;
>
>
> What do you guys think. Any help will be appreciated... Im really stuck by
> my lack of experience on choosing the way to go.
>
> Thanks in advance.
>
> Ronald