[Nek5000-users] MPI_TAG_UB too small

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Fri Apr 3 11:04:16 CDT 2015


Dear all,

indeed, we have experienced the same problem with the recent Cray MPI 
Library; the maximum tag size was reduced to 2**21 (the MPI standard 
requires 2^16, but most libraries have 2^32). The problem in Nek is that 
in the initial distribution of the mesh and velocities onto the 
processors, the element number is used as a message tag, which obviously 
fails if you have more than 2**21 elements. The check you mention 
(nval.lt.(10000+max(lp,lelg)) is exactly checking that (mpi_dummf.f is 
only used in serial runs, in parallel runs mpi_attr_get is a routine 
provided by the current mpi library and returns the maximum tag number 
of the implementation).

Anyway, we have recently fixed this problem by changing the way the tag 
is used (local element number vs global one). Maybe the easiest would be 
to merge these changes in the repo? Otherwise, I can also send the 
changed files, but then of course one has to be careful to have matching 
revisions.

Best regards,
Philipp

On 2015-04-03 14:04, nek5000-users at lists.mcs.anl.gov wrote:
> Dear Neks,
>
>
> I'm having a problem running my simulation with thousands of processors.
> In my case, I have 6090000 elements and I ran the simulation with 8520
> procs. However, the simulation aborted with the error message
> 'MPI_TAG_UB too small'.
>
>
> I checked the code and found that if "nval.lt.(10000+max(lp,lelg))", the
> simulation would abort. But in the subroutine 'mpi_attr_get', ival is
> set to be 9999999. In this case, nval should be larger than
> 10000+max(lp,lelg). So, why did the simulation not run?
>
>
> Hope anyone can help me on this. Thank you very much.
>
>
> Best regards,
>
> Tony
>
>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>


More information about the Nek5000-users mailing list