This does not directly answer your question but it may help the effort to take advantage of the extra processor. My approach has been to use pthreads to split a problem up intra-node. Only the main thread communicates with other nodes.