[petsc-dev] BG hang still broken in petsc-maint!

Wed Dec 18 14:59:10 CST 2013

  We’ve had this discussion before and wasted too much time on it. On the BG just don’t allow the damn loading of options for files for large runs, say greater than 128 nodes so if the user asks for loading from a file  or node 0 finds [.]petscrc files then generate a useful error message and stop. Yes, this check would be specific to one class of machines.

  We can’t deliver a product that simply doesn’t work on IBM and then blame IBM for being buggy.  The other choice is to have have configure detect BG and then stop and refuse to configure at all for the “buggy machine” but that is a silly emotional reaction.

   We sure as hell shouldn’t have a product that for each new user on BG requires them to try to use PETSc, have it fail, debug the problem, like it is now!

    So just fix it like this and let it go,

   Barry

On Dec 18, 2013, at 2:14 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Satish Balay <balay at mcs.anl.gov> writes:
>> I had a chat with Derek today morning. The error case was with 512
>> nodes [same as above] with --ranks-per-node 4 or 8. And this was on
>> ceatus.
> 
> It is spelled "cetus".
> 
>> The hang was confirmed to be in PetscInitialze [via the debugger] and
>> -skip_petscrc went past the hang.
> 
> That is where the particular sequence of collectives (MPI_Bcast) gets
> called.  Getting past that part does not rule out the same problem
> occurring later, perhaps with lower probability.