[mpich-discuss] MPICH and NWCHEM

Gus Correa gus at ldeo.columbia.edu
Thu May 26 16:48:49 CDT 2011


Hi Christopher

Is yours a Rocks cluster?
Node names such as compute-2-28 are typical from Rocks, although LSF is 
not a Rocks thing.

If it is Rocks, I'd suggest to install everything you need (MPICH2, 
NEWCHEM, etc) in subdirectories of /share/apps, if you have permission 
to write there, or in subdirectories of your home directory.
These directories exist physically on the head node (frontend node in 
Rocks parlance).
Both are exported from the head node and NFS-mounted on all compute
nodes in a Rocks cluster.
Therefore, libraries and other software installed on those locations
are reachable by the compute nodes.
If it is not a Rocks cluster, probably home directories are still 
NFS-mounted on the nodes, and may be the way to go.

The error message seems to say that blaunch is missing on that 
particular compute node.
Can you launch serial processes (say a script with 'hostname;pwd')
via blaunch on any node?

To test MPICH2, try the
very simple cpi.c program in the MPICH2 'examples' source directory.
Compile it with mpicc from MPICH2.
This will tell you if your MPICH2 is functional.
This may clear the way before you try NEWCHEM.

If the cluster is not being used by others, you can try to bypass
LSF and launch your MPICH2 jobs directly through mpiexec,
by providing a lists of hosts (-hosts), or a config file (-configfile),
and using ssh as a launcher (-launcher), maybe setting also the working 
directory (-wdir).
See mpiexec.hydra -help for details.

Admittedly this is a long shot, may or may not work,
depending on how many missing parts are there in your cluster.

I hope this helps,
Gus Correa

Christopher O'Brien wrote:
> James,
> I reinstalled mpich2 and used it to build NWCHEM. 
I don't know if this solves my problem as I can't get NWCHEM
to run in parallel with mpiexec. The error I receive is:
> [mpiexec at compute-2-28.local] HYDU_create_process (./utils/launch/launch.c:69): 
execvp error on file /opt/lava/6.1/linux2.6-glibc2.3-ia32e/bin/blaunch
(No such file or directory)
> 
> It turns out that the directory does not exist. 
The executable does not appear to exist anywhere else on my system either.
> 
> My cluster is ancient and it will never be updated. 
Since there is no way around that,
everything will have to be done at the user level.
Is there any way to launch mpich2 jobs without blaunch?
> 
> Thanks,
> Chris
> 
> 
> ===================================================================
> Christopher J. O'Brien
> cjobrien at ncsu.edu
> https://sites.google.com/a/ncsu.edu/cjobrien/
> 
> Ph.D. Candidate
> Computational Materials Group
> Department of Materials Science & Engineering
> North Carolina State University
> __________________________________________________________________
> Please send all documents in PDF. 
> For Word documents: Please use the 'Save as PDF' option before sending.
> ===================================================================
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list