[mpich-discuss] ask for help

Thu Oct 25 03:39:00 CDT 2012

Am 25.10.2012 um 04:37 schrieb Zhengqiang Ma:

> HI, I had a cluster comprising 12 Apple dual quad-core 2.26-GHz Mac Pros (each with 6GB of RAM) connected to a single quad-core 2.26-GHz Mac Pro as the head node (with 6GB of RAM). Recently when I add another 2GB memory to each of the member nodes and 10GB to the head node, I can no longer run mpi jobs. I keep getting the error like:
> 
> rank 0 in job 1  node00x.cluster.private_xxxxx   caused collective abort of all ranks

This sounds to me like something was setup on the machines which didn't survive the reboot. Were former reboots of the cluster successful without impact?

-- Reuti

> exit status of rank 0: return code 255
> 
> Job management is handled by the Sun Grid Engine (SGE) package from Sun MicroSystems, and the iNquiry Suite from the BioTeam.
> 
> 
> Please help.
> 
> 
> Thank you very much.
> 
> zqm
> 
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss