[mpich-discuss] failed to submit a MPICH-related job on a cluster
Pavan Balaji
balaji at mcs.anl.gov
Fri Mar 16 16:35:08 CDT 2012
mpirun.lsf is not something we provide. It might have been created by
someone else on your system (e.g., your system administrator).
You might want to look at this thread for more information.
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-March/012006.html
-- Pavan
On 01/30/2012 11:34 PM, Rui Mei wrote:
> Dear all,
>
> I compiled our model and created the ccsm.exe file on a linux cluster,
> but after I submitted a job to run it with mpirun.lsf, it failed.
> I googled this error message and found a lot of threads online. It seems
> to be related to how to set up the use of process manager.
>
> The MPICH2 is configured with all process managers available by
> the administrator. The log file talks about "Hydra", but in the
> mpich2_wrapper file it tries to use MPD. I am not sure if this
> inconsistency is the cause. Any thoughts and suggestion will be appreciated.
>
> Thanks,
> Rui
>
> Here is the log file:
>
> "[mpiexec at cn60] match_arg (./utils/args/args.c:122): unrecognized argument a
> [mpiexec at cn60] HYDU_parse_array (./utils/args/args.c:140): argument
> matching returned error
> [mpiexec at cn60] parse_args (./ui/mpich/utils.c:1387): error parsing input
> array
> [mpiexec at cn60] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:1475):
> error parsing config args
>
> Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...
>
> Global options (passed to all executables):
>
> Global environment options:
> -genv {name} {value} environment variable name and value
> -genvlist {env1,env2,...} environment variable list to pass
> -genvnone do not pass any environment variables
> -genvall pass all environment variables not
> managed
> by the launcher (default)
>
> Other global options:
> -f {name} file containing the host names
> -hosts {host list} comma separated host list
> -wdir {dirname} working directory to use
> -configfile {name} config file containing MPMD launch
> options
>
>
> Local options (passed to individual executables):
>
> Local environment options:
> -env {name} {value} environment variable name and value
> -envlist {env1,env2,...} environment variable list to pass
> -envnone do not pass any environment variables
> -envall pass all environment variables
> (default)
>
> Other local options:
> -n/-np {value} number of processes
> {exec_name} {args} executable name and arguments
>
>
> Hydra specific options (treated as global):
>
> Launch options:
> -launcher launcher to use ( ssh rsh fork
> slurm ll lsf sge manual persist)
> -launcher-exec executable to use to launch processes
> -enable-x/-disable-x enable or disable X forwarding
>
> Resource management kernel options:
> -rmk resource management kernel to use
> ( user slurm ll lsf sge pbs)
>
> Hybrid programming options:
> -ranks-per-proc assign so many ranks to each process
>
> Processor topology options:
> -binding process-to-core binding mode
> -topolib processor topology library ( hwloc
> plpa)
>
> Checkpoint/Restart options:
> -ckpoint-interval checkpoint interval
> -ckpoint-prefix checkpoint file prefix
> -ckpoint-num checkpoint number to restart
> -ckpointlib checkpointing library (none)
>
> Demux engine options:
> -demux demux engine ( poll select)
>
> Other Hydra options:
> -verbose verbose mode
> -info build information
> -print-all-exitcodes print exit codes of all processes
> -iface network interface to use
> -ppn processes per node
> -profile turn on internal profiling
> -prepend-rank prepend rank to output
> -prepend-pattern prepend pattern to output
> -outfile-pattern direct stdout to file
> -errfile-pattern direct stderr to file
> -nameserver name server information (host:port
> format)
> -disable-auto-cleanup don't cleanup processes on error
> -disable-hostname-propagation let MPICH2 auto-detect the hostname
> -order-nodes order nodes as
> ascending/descending cores
>
> Please see the intructions provided at
> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
> for further details
>
> Job /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/mpich2_wrapper -a
> mpich2 -n 12 -f /etc/hosts -launcher ssh ./ccsm.exe
>
> TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
> ===== ========== ================ =======================
> ===================
> 00001 cn60 Undefined
> "
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list