[mpich-discuss] Hydra: LSF resource querying in exclusive mode

Yauheni Zelenko zelenko at cadence.com
Wed Nov 10 16:02:14 CST 2010


Hi!

We find out that LSF (at least 7.0.5)  didn't set LSB_MCPU_HOSTS properly with bsub -x (exclusive access). In this case all CPUs on host may be used by program. For MPICH2 it's mean that it may launch as much processes and number of physical CPUs.

My implementation of exclusive mode detection based on Platform suggestions:

  bool Result = false;
  char* LSFBinDirectory = getenv("LSF_BINDIR");
  char* LSFJobID = getenv("LSB_JOBID");

  if (LSFBinDirectory && LSFJobID)
    {
      char Command[BUFSIZ];
      FILE* CommandOutputFile = NULL;

      snprintf(Command, BUFSIZ, "%s/bjobs -l %s", LSFBinDirectory, LSFJobID);
      CommandOutputFile = popen(Command, "r");
      if (CommandOutputFile)
        {
          char Buffer[BUFSIZ];
          int Index = 0;

          while (!feof(CommandOutputFile))
            {
              char Symbol = fgetc(CommandOutputFile);

              if (!isspace(Symbol))
                {
                  Buffer[Index] = Symbol;
                  ++Index;
                }
              if (Index == (BUFSIZ - 1))
                break;
            }
          Buffer[Index] = '\0';
          if (strstr(Buffer, "ExclusiveExecution"))
            Result = true;
          pclose(CommandOutputFile);
        }
    }

  return Result;

However thing may be more complicated when several hosts will be returned by LSF from point of view of determining number of CPUs (require remote logins).

Will be good idea it other people with LSF access will make more experiments and tests.

Eugene.


More information about the mpich-discuss mailing list