itaps-parallel Orphan processes beware

Tim Tautges tautges at mcs.anl.gov
Wed Oct 29 10:09:11 CDT 2008


Hi all,
   The load problem on the mesh machine was due to orphan processes not 
being killed with a job.  So, if you're running a job and it dies or you 
have to kill it, make sure the processes all get killed (ps -ef |grep 
<username>) - note, those processes don't show up in the output of 'top'.

Also, please keep the jobs to 4 procs most of the time, or check with 
others if you need larger jobs.

Thanks.

- tim

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
          phone: (608) 263-8485      1500 Engineering Dr.
            fax: (608) 263-4499      Madison, WI 53706




More information about the itaps-parallel mailing list