[mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3
Yauheni Zelenko
zelenko at cadence.com
Fri Jul 23 15:47:34 CDT 2010
Hi!
Just clarification for my last e-mail. There is also LSB_HOSTS (in addition to LSB_MCPU_HOSTS) environment variable which may be used as resource list.
Eugene.
________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Yauheni Zelenko [zelenko at cadence.com]
Sent: Friday, July 23, 2010 1:41 PM
To: mpich-discuss at mcs.anl.gov
Cc: Carl Sun
Subject: Re: [mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3
Hi, Pavan!
It works! Thank you for help!
I have just question about query_node_list(). Is it default which may be overwritten with mpiexec command line parameters or not?
Our application may use multi-threading in each process launch from MPI, so sometimes we need to run one instance per host and use host's CPUs given by LSF, so it'll be problem if query_node_list() could not be overwritten.
Eugene.
________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji [balaji at mcs.anl.gov]
Sent: Thursday, July 22, 2010 11:56 PM
To: mpich-discuss at mcs.anl.gov
Cc: Carl Sun
Subject: Re: [mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3
Hi,
Thanks for the patch. I've committed a slightly different version of it
in r6883 (http://trac.mcs.anl.gov/projects/mpich2/changeset/6883). This
includes the issues you pointed out + moving functionality from the LSF
RMK device to the bootstrap server.
Can you try out the latest nightly snapshot to check if things are
working correctly?
http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra
-- Pavan
On 07/22/2010 07:03 PM, Yauheni Zelenko wrote:
> Hi!
>
> I'd like to suggest fix for Hydra when it is used on Platform LSF cluster. rsh/ssh are disabled there by default in such clusters and Platform LSF blaunch should be used instead.
>
> However default blaunch behavior is not compatible with Hydra standard streams handling (application just hangs, and output also missed). To fix this issue blaunch should be run with "-n" command line option. Platform is aware of this issue and probably will fix blaunch in next versions.
>
> I introduced blaunch bootstrap option which is subset of ssh bootstrap.
>
> Fix also use LSF_BINDIR (set by Platform LSF) to find path to blaunch. My code is based on MPICH2-1.3a2.
>
> Please review my changes and include them into main code base. I may be not aware of useful utilities functions to work with paths.
>
> In bsci_init.c:
>
> from:
>
> if (!strcmp(bootstrap, "rsh") || !strcmp(bootstrap, "fork"))
> bootstrap = "ssh";
>
> to:
>
> if (!strcmp(bootstrap, "rsh") || !strcmp(bootstrap, "fork") || !strcmp(bootstrap, "blaunch"))
> bootstrap = "ssh";
>
> ssh_launch.c:
>
> from:
>
> if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
> if (!path)
> path = HYDU_find_full_path("ssh");
> if (!path)
> path = HYDU_strdup("/usr/bin/ssh");
> }
> else {
> if (!path)
> path = HYDU_find_full_path("rsh");
> if (!path)
> path = HYDU_strdup("/usr/bin/rsh");
> }
>
> idx = 0;
> targs[idx++] = HYDU_strdup(path);
>
> /* Allow X forwarding only if explicitly requested */
> if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
> if (HYDT_bsci_info.enablex == 1)
> targs[idx++] = HYDU_strdup("-X");
> else if (HYDT_bsci_info.enablex == 0)
> targs[idx++] = HYDU_strdup("-x");
> else /* default mode is disable X */
> targs[idx++] = HYDU_strdup("-x");
> }
>
> to:
>
> if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
> if (!path)
> path = HYDU_find_full_path("ssh");
> if (!path)
> path = HYDU_strdup("/usr/bin/ssh");
> }
> else if (!strcmp(HYDT_bsci_info.bootstrap, "blaunch")) {
> char* BinDirPath;
>
> MPL_env2str("LSF_BINDIR", (const char **) &BinDirPath);
> if (BinDirPath) {
> int BinDirLength = strlen(BinDirPath);
>
> if (BinDirLength > 0) {
> int PathLength = BinDirLength + 1 + strlen("blaunch");
>
> if (BinDirPath[BinDirLength - 1] != '/')
> ++PathLength;
> HYDU_MALLOC(path, char*, PathLength * sizeof(char), status);
> strcpy(path, BinDirPath);
> if (BinDirPath[BinDirLength - 1] != '/') {
> path[BinDirLength] = '/';
> strcpy(path + BinDirLength + 1, "blaunch");
> }
> else
> strcpy(path + BinDirLength, "blaunch");
> }
> }
> if (!path)
> path = HYDU_find_full_path("blaunch");
> }
> else {
> if (!path)
> path = HYDU_find_full_path("rsh");
> if (!path)
> path = HYDU_strdup("/usr/bin/rsh");
> }
>
> idx = 0;
> targs[idx++] = HYDU_strdup(path);
>
> /* Allow X forwarding only if explicitly requested */
> if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
> if (HYDT_bsci_info.enablex == 1)
> targs[idx++] = HYDU_strdup("-X");
> else if (HYDT_bsci_info.enablex == 0)
> targs[idx++] = HYDU_strdup("-x");
> else /* default mode is disable X */
> targs[idx++] = HYDU_strdup("-x");
> }
> else if (!strcmp(HYDT_bsci_info.bootstrap, "blaunch")) {
> targs[idx++] = HYDU_strdup("-n");
> }
>
> Eugene.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list