[mpich-discuss] ask for help

Rayson Ho raysonlogin at gmail.com
Thu Oct 25 23:55:49 CDT 2012


Zhengqiang,

Try run an MPI job outside of Grid Engine - this way you can remove
all the variables. Also, can you run just a small job (eg. a 2-node
job) that includes just the head node & 1 slave node?

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Thu, Oct 25, 2012 at 8:47 PM, Zhengqiang Ma <zqm2 at njau.edu.cn> wrote:
> Yes, The cluster has been running for about 3 years without seeing such kind a problem.
>
>
> On Oct 25, 2012, at 4:39 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>
>> Am 25.10.2012 um 04:37 schrieb Zhengqiang Ma:
>>
>>> HI, I had a cluster comprising 12 Apple dual quad-core 2.26-GHz Mac Pros (each with 6GB of RAM) connected to a single quad-core 2.26-GHz Mac Pro as the head node (with 6GB of RAM). Recently when I add another 2GB memory to each of the member nodes and 10GB to the head node, I can no longer run mpi jobs. I keep getting the error like:
>>>
>>> rank 0 in job 1  node00x.cluster.private_xxxxx   caused collective abort of all ranks
>>
>> This sounds to me like something was setup on the machines which didn't survive the reboot. Were former reboots of the cluster successful without impact?
>>
>> -- Reuti
>>
>>
>>> exit status of rank 0: return code 255
>>>
>>> Job management is handled by the Sun Grid Engine (SGE) package from Sun MicroSystems, and the iNquiry Suite from the BioTeam.
>>>
>>>
>>> Please help.
>>>
>>>
>>> Thank you very much.
>>>
>>> zqm
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list