[mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3

Yauheni Zelenko zelenko at cadence.com
Fri Jul 23 15:47:34 CDT 2010


Hi!

Just clarification for my last e-mail. There is also LSB_HOSTS (in addition to LSB_MCPU_HOSTS) environment variable which may be used as resource list.

Eugene.
________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Yauheni Zelenko [zelenko at cadence.com]
Sent: Friday, July 23, 2010 1:41 PM
To: mpich-discuss at mcs.anl.gov
Cc: Carl Sun
Subject: Re: [mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3

Hi, Pavan!

It works! Thank you for help!

I have just question about query_node_list(). Is it default which may be overwritten with mpiexec command line parameters or not?

Our application may use multi-threading in each process launch from MPI, so sometimes we need to run one instance per host and use host's CPUs given by LSF, so it'll be problem if query_node_list() could not be overwritten.

Eugene.
________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji [balaji at mcs.anl.gov]
Sent: Thursday, July 22, 2010 11:56 PM
To: mpich-discuss at mcs.anl.gov
Cc: Carl Sun
Subject: Re: [mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3

Hi,

Thanks for the patch. I've committed a slightly different version of it
in r6883 (http://trac.mcs.anl.gov/projects/mpich2/changeset/6883). This
includes the issues you pointed out + moving functionality from the LSF
RMK device to the bootstrap server.

Can you try out the latest nightly snapshot to check if things are
working correctly?

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra

  -- Pavan

On 07/22/2010 07:03 PM, Yauheni Zelenko wrote:
> Hi!
>
> I'd like to suggest fix for Hydra when it is used on Platform LSF cluster. rsh/ssh are disabled there by default in such clusters and Platform LSF blaunch should be used instead.
>
> However default blaunch behavior is not compatible with Hydra standard streams handling (application just hangs, and output also missed). To fix this issue blaunch should be run with "-n" command line option. Platform is aware of this issue and probably will fix blaunch in next versions.
>
> I introduced blaunch bootstrap option which is subset of ssh bootstrap.
>
> Fix also use LSF_BINDIR (set by Platform LSF) to find path to blaunch. My code is based on MPICH2-1.3a2.
>
> Please review my changes and include them into main code base. I may be not aware of useful utilities functions to work with paths.
>
> In bsci_init.c:
>
> from:
>
>     if (!strcmp(bootstrap, "rsh") || !strcmp(bootstrap, "fork"))
>         bootstrap = "ssh";
>
> to:
>
>     if (!strcmp(bootstrap, "rsh") || !strcmp(bootstrap, "fork") || !strcmp(bootstrap, "blaunch"))
>         bootstrap = "ssh";
>
> ssh_launch.c:
>
> from:
>
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (!path)
>             path = HYDU_find_full_path("ssh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/ssh");
>     }
>     else {
>         if (!path)
>             path = HYDU_find_full_path("rsh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/rsh");
>     }
>
>     idx = 0;
>     targs[idx++] = HYDU_strdup(path);
>
>     /* Allow X forwarding only if explicitly requested */
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (HYDT_bsci_info.enablex == 1)
>             targs[idx++] = HYDU_strdup("-X");
>         else if (HYDT_bsci_info.enablex == 0)
>             targs[idx++] = HYDU_strdup("-x");
>         else    /* default mode is disable X */
>             targs[idx++] = HYDU_strdup("-x");
>     }
>
> to:
>
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (!path)
>             path = HYDU_find_full_path("ssh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/ssh");
>     }
>     else if (!strcmp(HYDT_bsci_info.bootstrap, "blaunch")) {
>       char* BinDirPath;
>
>       MPL_env2str("LSF_BINDIR", (const char **) &BinDirPath);
>       if (BinDirPath) {
>           int BinDirLength = strlen(BinDirPath);
>
>           if (BinDirLength > 0) {
>               int PathLength = BinDirLength + 1 + strlen("blaunch");
>
>               if (BinDirPath[BinDirLength - 1] != '/')
>                   ++PathLength;
>               HYDU_MALLOC(path, char*, PathLength * sizeof(char), status);
>               strcpy(path, BinDirPath);
>               if (BinDirPath[BinDirLength - 1] != '/') {
>                   path[BinDirLength] = '/';
>                   strcpy(path + BinDirLength + 1, "blaunch");
>               }
>               else
>                   strcpy(path + BinDirLength, "blaunch");
>           }
>       }
>         if (!path)
>            path = HYDU_find_full_path("blaunch");
>     }
>     else {
>         if (!path)
>             path = HYDU_find_full_path("rsh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/rsh");
>     }
>
>     idx = 0;
>     targs[idx++] = HYDU_strdup(path);
>
>     /* Allow X forwarding only if explicitly requested */
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (HYDT_bsci_info.enablex == 1)
>             targs[idx++] = HYDU_strdup("-X");
>         else if (HYDT_bsci_info.enablex == 0)
>             targs[idx++] = HYDU_strdup("-x");
>         else    /* default mode is disable X */
>             targs[idx++] = HYDU_strdup("-x");
>     }
>     else if (!strcmp(HYDT_bsci_info.bootstrap, "blaunch")) {
>         targs[idx++] = HYDU_strdup("-n");
>     }
>
> Eugene.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list