NetBSD port

Satish Balay balay at
Wed Dec 16 18:16:56 CST 2009

On Thu, 17 Dec 2009, Kevin.Buckley at wrote:

> > Ok - the code runs locally fine - but not on  'SunGridEngine'
> >
> Not Ok.
> That summary misses the whole point of the errors I am seeing.
> The code runs fine locally AND under Sun Grid Engine, if you only
> spawn TWO processes but not FOUR or EIGHT.

Well the the 'np 2' runs could be scheduled on your local node [or a
single SMP remote node]. So it could be that a different code path
within the mpi library gets used in 2 vs 4 case. [shared memory vs
tcp/some-other communication].

Perhaps you can get the nodefile list for each of these [2,4,8 proc]
runs and see how the 2-proc run differs. [petsc only]

And I suspect there is something wrong in your OpenMPI+SunGridEngine
config thats triggering this problem. I don't know exactly how
though.. [the basic petsc examples are supporsed to work in any valid
MPI enviornment].

> > Wrt SGE - what does it require from MPI. Is it MPI agnostic - or does
> > it need a perticular MPI to be used?
> It is more the other way around.
> OpenMPI has been compiled so as to be aware of SGE.


> But anyroad, what are the error messages, from PETSc, telling you
> is possibly going wrong here?

[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,probably memory access out of range

[2]PETSC ERROR: [2] VecScatterCreateCommon_PtoS line 1699 src/vec/vec/utils/vpscat.c
[2]PETSC ERROR: [2] VecScatterCreate_PtoS line 1508 src/vec/vec/utils/vpscat.c

[2]PETSC ERROR: User provided function() line 0 in unknown directory unknown file

Well it says there was a SEGV - and it gives some approximate
location. It could be inside the MPI code in those routines listed
here. A run in a debugger will confirm the exact location. [assuming
this can be done on this SGE]

0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
[0]PETSC ERROR: Memory allocated 90628 Memory used by process 0
[0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
[0]PETSC ERROR: Memory requested 320!

Malloc failing at this low memory allocation? Something else is going
wrong here.

> > BTW: what do you have for 'ldd ex19'?
> $ldd ex19
> ex19:
>         -lc.12 => /usr/lib/
>         -lXau.6 => /usr/pkg/lib/
>         -lXdmcp.6 => /usr/pkg/lib/
>         -lX11.6 => /usr/pkg/lib/
>         -lltdl.3 => /usr/pkg/lib/
>         -lutil.7 => /usr/lib/
>         -lm.0 => /usr/lib/
>         -lpthread.0 => /usr/lib/
>         -lopen-pal.0 => /usr/pkg/lib/
>         -lopen-rte.0 => /usr/pkg/lib/
>         -lmpi.0 => /usr/pkg/lib/
>         -lmpi_f77.0 => /usr/pkg/lib/
>         -lstdc++.6 => /usr/lib/
>         -lgcc_s.1 => /usr/lib/
>         -lmpi_cxx.0 => /usr/pkg/lib/

ok - mpi is shared. Can you confirm that the exact same version of
openmpi is installed on all the nodes - and that there is no minor
version differences that could trigger this?


More information about the petsc-dev mailing list