[mpich-discuss] failed to submit a MPICH-related job on a cluster

Rui Mei meir02ster at gmail.com
Mon Jan 30 23:34:44 CST 2012


Dear all,

I compiled our model and created the ccsm.exe file on a linux cluster, but
after I submitted a job to run it with mpirun.lsf, it failed.
I googled this error message and found a lot of threads online. It seems to
be related to how to set up the use of process manager.

The MPICH2 is configured with all process managers available by
the administrator. The log file talks about "Hydra", but in the
mpich2_wrapper file it tries to use MPD. I am not sure if this
inconsistency is the cause. Any thoughts and suggestion will be appreciated.

Thanks,
Rui


Here is the log file:

"[mpiexec at cn60] match_arg (./utils/args/args.c:122): unrecognized argument a
[mpiexec at cn60] HYDU_parse_array (./utils/args/args.c:140): argument
matching returned error
[mpiexec at cn60] parse_args (./ui/mpich/utils.c:1387): error parsing input
array
[mpiexec at cn60] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:1475): error
parsing config args

Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...

Global options (passed to all executables):

  Global environment options:
    -genv {name} {value}             environment variable name and value
    -genvlist {env1,env2,...}        environment variable list to pass
    -genvnone                        do not pass any environment variables
    -genvall                         pass all environment variables not
managed
                                          by the launcher (default)

  Other global options:
    -f {name}                        file containing the host names
    -hosts {host list}               comma separated host list
    -wdir {dirname}                  working directory to use
    -configfile {name}               config file containing MPMD launch
options


Local options (passed to individual executables):

  Local environment options:
    -env {name} {value}              environment variable name and value
    -envlist {env1,env2,...}         environment variable list to pass
    -envnone                         do not pass any environment variables
    -envall                          pass all environment variables
(default)

  Other local options:
    -n/-np {value}                   number of processes
    {exec_name} {args}               executable name and arguments


Hydra specific options (treated as global):

  Launch options:
    -launcher                        launcher to use ( ssh rsh fork slurm
ll lsf sge manual persist)
    -launcher-exec                   executable to use to launch processes
    -enable-x/-disable-x             enable or disable X forwarding

  Resource management kernel options:
    -rmk                             resource management kernel to use (
user slurm ll lsf sge pbs)

  Hybrid programming options:
    -ranks-per-proc                  assign so many ranks to each process

  Processor topology options:
    -binding                         process-to-core binding mode
    -topolib                         processor topology library ( hwloc
plpa)

  Checkpoint/Restart options:
    -ckpoint-interval                checkpoint interval
    -ckpoint-prefix                  checkpoint file prefix
    -ckpoint-num                     checkpoint number to restart
    -ckpointlib                      checkpointing library (none)

  Demux engine options:
    -demux                           demux engine ( poll select)

  Other Hydra options:
    -verbose                         verbose mode
    -info                            build information
    -print-all-exitcodes             print exit codes of all processes
    -iface                           network interface to use
    -ppn                             processes per node
    -profile                         turn on internal profiling
    -prepend-rank                    prepend rank to output
    -prepend-pattern                 prepend pattern to output
    -outfile-pattern                 direct stdout to file
    -errfile-pattern                 direct stderr to file
    -nameserver                      name server information (host:port
format)
    -disable-auto-cleanup            don't cleanup processes on error
    -disable-hostname-propagation    let MPICH2 auto-detect the hostname
    -order-nodes                     order nodes as ascending/descending
cores

Please see the intructions provided at
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
for further details

Job  /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/mpich2_wrapper -a
mpich2 -n 12 -f /etc/hosts -launcher ssh ./ccsm.exe

TID   HOST_NAME   COMMAND_LINE            STATUS            TERMINATION_TIME
===== ========== ================  =======================
 ===================
00001 cn60                         Undefined
"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120131/372d0f77/attachment.htm>


More information about the mpich-discuss mailing list