[mpich-discuss] General Scalability Question
Hiatt, Dave M
dave.m.hiatt at citi.com
Mon Oct 26 10:39:31 CDT 2009
So far my experience has been that the in core message transfer rate is far better than a gigabyte switch and backbone. Infiniband would be a dramatic improvement but it's hard to believe that it could keep up with in memory. What has worked out best for our app is a single message thread, and then the app using shared memory directly to distribute. That dramatically lowers the number of open sockets and communication overhead. It may not work best in every case, but for us it worked better regardless of very high core/process count per node or lower count per node. So we ran only one MPI process per physical node. It also lowers the number of sockets you have to support on node 0 if you have point to point communication. Linux at least defaults to 1048 sockets and files, and it's nice for node 0 performance to keep under that. You can raise it with ulimit, but when you're got 15000 cores, it's pretty expensive to have one MPI process per core.
-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov]On Behalf Of Robertson, Andrew
Sent: Monday, October 26, 2009 10:30 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] General Scalability Question
Folks,
Our IT staff is not particularly knowledgeable about parallel computing. Their current upgrade plan centers around quad/quad or dual/hex boxes which would have 16 or 12 cores respectively. I have no doubt that such a machine would run a parallel job efficiently. My question is how well can I harness multiple boxes together?
The applications are all CFD (FLUENT, GASP, STAR, VULCAN). I am talking to the various software vendors about this but would like some info from the programming community.
Assuming the same memory per core am I better off with
High core count (12-16) boxes on a gigabit switch
Lower core count (2 -4) boxes on an infiniband switch.
I understand that if I configure mpich correctly it will use shared memory on the mutli-core multi-processor boxes. If I end up with the high core count boxes, should I spec the frontside bus (or whatever it is called now) as high as possible??
I also have concerns that a single power supply failure takes out more cores, though perhaps that is not such a problem
Any information is greatly appreciated
Thanks
Andy
--------------------
Andrew Robertson P.E.
CFD Analyst
GASL Operations
Tactical Propulsion and Controls
ATK
77 Raynor Avenue
Ronkokoma NY 11779
631-737-6100 Ext 120
Fax: 631-588-7023
www.atk.com<file://www.atk.com>
!! Knowledge and Thoroughness Baby !!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091026/92c8fce5/attachment.htm>
More information about the mpich-discuss
mailing list