[mpich-discuss] Fix for Hydra/blaunch in MPICH2-1.3

Pavan Balaji balaji at mcs.anl.gov
Fri Jul 23 01:56:50 CDT 2010


Hi,

Thanks for the patch. I've committed a slightly different version of it 
in r6883 (http://trac.mcs.anl.gov/projects/mpich2/changeset/6883). This 
includes the issues you pointed out + moving functionality from the LSF 
RMK device to the bootstrap server.

Can you try out the latest nightly snapshot to check if things are 
working correctly?

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra

  -- Pavan

On 07/22/2010 07:03 PM, Yauheni Zelenko wrote:
> Hi!
> 
> I'd like to suggest fix for Hydra when it is used on Platform LSF cluster. rsh/ssh are disabled there by default in such clusters and Platform LSF blaunch should be used instead.
> 
> However default blaunch behavior is not compatible with Hydra standard streams handling (application just hangs, and output also missed). To fix this issue blaunch should be run with "-n" command line option. Platform is aware of this issue and probably will fix blaunch in next versions. 
> 
> I introduced blaunch bootstrap option which is subset of ssh bootstrap.
> 
> Fix also use LSF_BINDIR (set by Platform LSF) to find path to blaunch. My code is based on MPICH2-1.3a2.
> 
> Please review my changes and include them into main code base. I may be not aware of useful utilities functions to work with paths.
> 
> In bsci_init.c:
> 
> from:
> 
>     if (!strcmp(bootstrap, "rsh") || !strcmp(bootstrap, "fork"))
>         bootstrap = "ssh";
> 
> to:
> 
>     if (!strcmp(bootstrap, "rsh") || !strcmp(bootstrap, "fork") || !strcmp(bootstrap, "blaunch"))
>         bootstrap = "ssh";
> 
> ssh_launch.c:
> 
> from:
> 
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (!path)
>             path = HYDU_find_full_path("ssh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/ssh");
>     }
>     else {
>         if (!path)
>             path = HYDU_find_full_path("rsh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/rsh");
>     }
> 
>     idx = 0;
>     targs[idx++] = HYDU_strdup(path);
> 
>     /* Allow X forwarding only if explicitly requested */
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (HYDT_bsci_info.enablex == 1)
>             targs[idx++] = HYDU_strdup("-X");
>         else if (HYDT_bsci_info.enablex == 0)
>             targs[idx++] = HYDU_strdup("-x");
>         else    /* default mode is disable X */
>             targs[idx++] = HYDU_strdup("-x");
>     }
> 
> to:
> 
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (!path)
>             path = HYDU_find_full_path("ssh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/ssh");
>     }
>     else if (!strcmp(HYDT_bsci_info.bootstrap, "blaunch")) {
> 	char* BinDirPath;
> 
> 	MPL_env2str("LSF_BINDIR", (const char **) &BinDirPath);
> 	if (BinDirPath) {
> 	    int BinDirLength = strlen(BinDirPath);
> 
> 	    if (BinDirLength > 0) {
> 		int PathLength = BinDirLength + 1 + strlen("blaunch");
> 
> 		if (BinDirPath[BinDirLength - 1] != '/')
> 		    ++PathLength;
> 		HYDU_MALLOC(path, char*, PathLength * sizeof(char), status);
> 		strcpy(path, BinDirPath);
> 		if (BinDirPath[BinDirLength - 1] != '/') {
> 		    path[BinDirLength] = '/';
> 		    strcpy(path + BinDirLength + 1, "blaunch");
> 		}
> 		else
> 		    strcpy(path + BinDirLength, "blaunch");
> 	    }
> 	}
>         if (!path)
> 	     path = HYDU_find_full_path("blaunch");
>     }
>     else {
>         if (!path)
>             path = HYDU_find_full_path("rsh");
>         if (!path)
>             path = HYDU_strdup("/usr/bin/rsh");
>     }
> 
>     idx = 0;
>     targs[idx++] = HYDU_strdup(path);
> 
>     /* Allow X forwarding only if explicitly requested */
>     if (!strcmp(HYDT_bsci_info.bootstrap, "ssh")) {
>         if (HYDT_bsci_info.enablex == 1)
>             targs[idx++] = HYDU_strdup("-X");
>         else if (HYDT_bsci_info.enablex == 0)
>             targs[idx++] = HYDU_strdup("-x");
>         else    /* default mode is disable X */
>             targs[idx++] = HYDU_strdup("-x");
>     }
>     else if (!strcmp(HYDT_bsci_info.bootstrap, "blaunch")) {
>         targs[idx++] = HYDU_strdup("-n");
>     }
> 
> Eugene.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list