[mpich-discuss] failed to submit a MPICH-related job on a cluster
Rui Mei
meir02ster at gmail.com
Mon Jan 30 23:34:44 CST 2012
Dear all,
I compiled our model and created the ccsm.exe file on a linux cluster, but
after I submitted a job to run it with mpirun.lsf, it failed.
I googled this error message and found a lot of threads online. It seems to
be related to how to set up the use of process manager.
The MPICH2 is configured with all process managers available by
the administrator. The log file talks about "Hydra", but in the
mpich2_wrapper file it tries to use MPD. I am not sure if this
inconsistency is the cause. Any thoughts and suggestion will be appreciated.
Thanks,
Rui
Here is the log file:
"[mpiexec at cn60] match_arg (./utils/args/args.c:122): unrecognized argument a
[mpiexec at cn60] HYDU_parse_array (./utils/args/args.c:140): argument
matching returned error
[mpiexec at cn60] parse_args (./ui/mpich/utils.c:1387): error parsing input
array
[mpiexec at cn60] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:1475): error
parsing config args
Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...
Global options (passed to all executables):
Global environment options:
-genv {name} {value} environment variable name and value
-genvlist {env1,env2,...} environment variable list to pass
-genvnone do not pass any environment variables
-genvall pass all environment variables not
managed
by the launcher (default)
Other global options:
-f {name} file containing the host names
-hosts {host list} comma separated host list
-wdir {dirname} working directory to use
-configfile {name} config file containing MPMD launch
options
Local options (passed to individual executables):
Local environment options:
-env {name} {value} environment variable name and value
-envlist {env1,env2,...} environment variable list to pass
-envnone do not pass any environment variables
-envall pass all environment variables
(default)
Other local options:
-n/-np {value} number of processes
{exec_name} {args} executable name and arguments
Hydra specific options (treated as global):
Launch options:
-launcher launcher to use ( ssh rsh fork slurm
ll lsf sge manual persist)
-launcher-exec executable to use to launch processes
-enable-x/-disable-x enable or disable X forwarding
Resource management kernel options:
-rmk resource management kernel to use (
user slurm ll lsf sge pbs)
Hybrid programming options:
-ranks-per-proc assign so many ranks to each process
Processor topology options:
-binding process-to-core binding mode
-topolib processor topology library ( hwloc
plpa)
Checkpoint/Restart options:
-ckpoint-interval checkpoint interval
-ckpoint-prefix checkpoint file prefix
-ckpoint-num checkpoint number to restart
-ckpointlib checkpointing library (none)
Demux engine options:
-demux demux engine ( poll select)
Other Hydra options:
-verbose verbose mode
-info build information
-print-all-exitcodes print exit codes of all processes
-iface network interface to use
-ppn processes per node
-profile turn on internal profiling
-prepend-rank prepend rank to output
-prepend-pattern prepend pattern to output
-outfile-pattern direct stdout to file
-errfile-pattern direct stderr to file
-nameserver name server information (host:port
format)
-disable-auto-cleanup don't cleanup processes on error
-disable-hostname-propagation let MPICH2 auto-detect the hostname
-order-nodes order nodes as ascending/descending
cores
Please see the intructions provided at
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
for further details
Job /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/mpich2_wrapper -a
mpich2 -n 12 -f /etc/hosts -launcher ssh ./ccsm.exe
TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
===== ========== ================ =======================
===================
00001 cn60 Undefined
"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120131/372d0f77/attachment.htm>
More information about the mpich-discuss
mailing list