[petsc-dev] Configure hangs on Summit

Smith, Barry F. bsmith at mcs.anl.gov
Fri Sep 20 23:50:19 CDT 2019


  Then the hang is curious. 

> On Sep 20, 2019, at 11:28 PM, Mills, Richard Tran <rtmills at anl.gov> wrote:
> 
> Everything that Barry says about '--with-batch' is valid, but let me point out one thing about Summit: You don't need "--with-batch" at all, because the Summit login/compile nodes run the same hardware (minus the GPUs) and software stack as the back-end compute nodes. This makes configuring and building software far, far easier than we are used to on the big LCF machines. I was actually shocked when I found this out -- I'd gotten so used to struggling with cross-compilers, etc.
> 
> --Richard
> 
> On 9/20/19 9:28 PM, Smith, Barry F. wrote:
>>     --with-batch is still there and should be used in such circumstances. The difference is that --with-branch does not generate a program that you need to submit to the batch system before continuing the configure. Instead --with-batch guesses at and skips some of the tests (with clear warnings on how you can adjust the guesses).
>> 
>>      Regarding the hanging. This happens because the thread monitoring of configure started executables was removed years ago since it was slow and occasionally buggy (the default wait was an absurd 10 minutes too). Thus when configure tried to test an mpiexec that hung the test would hang.   There is code in one of my branches I've been struggling to get into master for a long time that puts back the thread monitoring for this one call with a small timeout so you should never see this hang again.
>> 
>>   Barry
>> 
>>    We could be a little clever and have configure detect it is on a Cray or other batch system and automatically add the batch option. That would be a nice little feature for someone to add. Probably just a few lines of code. 
>>    
>> 
>> 
>>> On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev <petsc-dev at mcs.anl.gov>
>>>  wrote:
>>> 
>>> Hi Junchao,
>>> 
>>> Glad you've found a workaround, but I don't know why you are hitting this problem. The last time I built PETSc on Summit (just a couple days ago), I didn't have this problem. I'm working from the example template that's in the PETSc repo at config/examples/arch-olcf-summit-opt.py.
>>> 
>>> Can you point me to your configure script on Summit so I can try to reproduce your problem?
>>> 
>>> --Richard
>>> 
>>> On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
>>> 
>>>> Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
>>>> --Junchao Zhang
>>>> 
>>>> 
>>>> On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
>>>> <jczhang at mcs.anl.gov>
>>>>  wrote:
>>>> My configure hangs on Summit at
>>>>   TESTING: configureMPIEXEC from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)
>>>> 
>>>> On the machine one has to use script to submit jobs. So why do we need configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.
>>>> 
>>>> --Junchao Zhang
>>>> 
> 



More information about the petsc-dev mailing list