[mpich-discuss] MPICH and NWCHEM

Christopher O'Brien cjobrien at ncsu.edu
Sat May 21 18:27:39 CDT 2011


I am very concerned about the strange distribution of processes spawned while running NWCHEM. NWCHEM developers think the problem is related to mpich.

I am using a cluster with 2 Xenons per node with 2GB total memory. I used GCC 4.6 and GNU's MPICH 1.4.6 to compile NWCHEM-6.0. NWCHEM passes it's tests and is compiled correctly. It even obtains reasonable speedup in cpu time, but quickly becomes bogged down in communication time with increasing number of processes. However, NWCHEM is unbelievably slow compared to benchmarks I have seen, even considering that my cluster uses standard ethernet interconnects. In one case, I find:
PROCS	CPU Time (s)		Wall Time (s)
1		405.8			410.4
2		196.1			230.2
4		153.9			304.1
8		85.6				403.8
16		62.2				595.2
I'm not too surprised that I don't need many processors, the job is really small. The large communication time made me suspicious and further investigation showed a strange distribution of processes. 

For example, I have posted two screen captures on my site (see below), one for  each node. In short, when I use my lsf job submission system (bsub command) I find that the processes are not evenly distributed across nodes. For example, if I request 4 processors and I have two Xenons per node, I should be using 2 nodes. In fact, NWCHEM uses 2 nodes and reports:
> ARMCI configured for 2 cluster nodes. Network protocol is 'TCP/IP Sockets'.
However, inspecting the actual number of running processes on each of the two nodes:
master node--------2 process
slave node--------7 process
I contacted the NWCHEM developers who replied "As to the extra processes, these should be some extra thread processes internal to NWChem that indeed do not do much (they can be woken up in certain communication operations)." But, why are there different numbers of auxiliary processes on each node?

To avoid filling up everyone's inbox, I posted screen shots of the running processes to: https://sites.google.com/a/ncsu.edu/cjobrien/file-exchange

Regards,
Chris O'Brien

===================================================================
Christopher J. O'Brien
cjobrien at ncsu.edu
https://sites.google.com/a/ncsu.edu/cjobrien/

Ph.D. Candidate
Computational Materials Group
Department of Materials Science & Engineering
North Carolina State University
__________________________________________________________________
Please send all documents in PDF. 
For Word documents: Please use the 'Save as PDF' option before sending.
===================================================================



More information about the mpich-discuss mailing list