[Nek5000-users] MPI_TAG_UB too small

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Apr 7 10:34:15 CDT 2015


Hi Tony,

I am sending off the files from Philipp since it seems we are not 100% ready to do the commit, but these files were already tested so I think there should be no problem. In case you want to understand more what's going on I'd recommend to do a diff just to see the differences from the main code before you merge.

Oana
________________________________________
From: nek5000-users-bounces at lists.mcs.anl.gov [nek5000-users-bounces at lists.mcs.anl.gov] on behalf of nek5000-users at lists.mcs.anl.gov [nek5000-users at lists.mcs.anl.gov]
Sent: Saturday, April 04, 2015 12:48 PM
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI_TAG_UB too small

Dear Philipp,

That would be great. Thanks again.

Best regards,
Tony

________________________________________
From: nek5000-users-bounces at lists.mcs.anl.gov <nek5000-users-bounces at lists.mcs.anl.gov> on behalf of nek5000-users-request at lists.mcs.anl.gov <nek5000-users-request at lists.mcs.anl.gov>
Sent: 04 April 2015 18:00
To: nek5000-users at lists.mcs.anl.gov
Subject: Nek5000-users Digest, Vol 74, Issue 3

Send Nek5000-users mailing list submissions to
        nek5000-users at lists.mcs.anl.gov

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
or, via email, send a message with subject or body 'help' to
        nek5000-users-request at lists.mcs.anl.gov

You can reach the person managing the list at
        nek5000-users-owner at lists.mcs.anl.gov

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Nek5000-users digest..."


Today's Topics:

   1. Re: MPI_TAG_UB too small (nek5000-users at lists.mcs.anl.gov)
   2. Re: MPI_TAG_UB too small (nek5000-users at lists.mcs.anl.gov)


----------------------------------------------------------------------

Message: 1
Date: Fri, 3 Apr 2015 18:22:24 +0000
From: nek5000-users at lists.mcs.anl.gov
To: "nek5000-users at lists.mcs.anl.gov"
        <nek5000-users at lists.mcs.anl.gov>
Subject: Re: [Nek5000-users] MPI_TAG_UB too small
Message-ID:
        <mailman.1306.1428085354.14701.nek5000-users at lists.mcs.anl.gov>
Content-Type: text/plain; charset="iso-8859-1"

Dear Philipp,

I see. Thank you very much for the explanation. It would be great if you could help me make relevant changes or send me the files you mentioned if that is at all possible.

Best regards,
Tony

________________________________________
From: nek5000-users-bounces at lists.mcs.anl.gov <nek5000-users-bounces at lists.mcs.anl.gov> on behalf of nek5000-users-request at lists.mcs.anl.gov <nek5000-users-request at lists.mcs.anl.gov>
Sent: 03 April 2015 18:00
To: nek5000-users at lists.mcs.anl.gov
Subject: Nek5000-users Digest, Vol 74, Issue 2

Send Nek5000-users mailing list submissions to
        nek5000-users at lists.mcs.anl.gov

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
or, via email, send a message with subject or body 'help' to
        nek5000-users-request at lists.mcs.anl.gov

You can reach the person managing the list at
        nek5000-users-owner at lists.mcs.anl.gov

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Nek5000-users digest..."


Today's Topics:

   1. MPI_TAG_UB too small (nek5000-users at lists.mcs.anl.gov)
   2. Re: MPI_TAG_UB too small (nek5000-users at lists.mcs.anl.gov)


----------------------------------------------------------------------

Message: 1
Date: Fri, 3 Apr 2015 12:04:14 +0000
From: nek5000-users at lists.mcs.anl.gov
To: "nek5000-users at lists.mcs.anl.gov"
        <nek5000-users at lists.mcs.anl.gov>
Subject: [Nek5000-users] MPI_TAG_UB too small
Message-ID:
        <mailman.1264.1428062661.14701.nek5000-users at lists.mcs.anl.gov>
Content-Type: text/plain; charset="iso-8859-1"

Dear Neks,


I'm having a problem running my simulation with thousands of processors. In my case, I have 6090000 elements and I ran the simulation with 8520 procs. However, the simulation aborted with the error message 'MPI_TAG_UB too small'.


I checked the code and found that if "nval.lt.(10000+max(lp,lelg))", the simulation would abort. But in the subroutine 'mpi_attr_get', ival is set to be 9999999. In this case, nval should be larger than 10000+max(lp,lelg). So, why did the simulation not run?


 Hope anyone can help me on this. Thank you very much.


Best regards,

Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20150403/8f2d0483/attachment-0001.html>

------------------------------

Message: 2
Date: Fri, 03 Apr 2015 18:04:16 +0200
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI_TAG_UB too small
Message-ID:
        <mailman.1287.1428077071.14701.nek5000-users at lists.mcs.anl.gov>
Content-Type: text/plain; charset=windows-1252; format=flowed

Dear all,

indeed, we have experienced the same problem with the recent Cray MPI
Library; the maximum tag size was reduced to 2**21 (the MPI standard
requires 2^16, but most libraries have 2^32). The problem in Nek is that
in the initial distribution of the mesh and velocities onto the
processors, the element number is used as a message tag, which obviously
fails if you have more than 2**21 elements. The check you mention
(nval.lt.(10000+max(lp,lelg)) is exactly checking that (mpi_dummf.f is
only used in serial runs, in parallel runs mpi_attr_get is a routine
provided by the current mpi library and returns the maximum tag number
of the implementation).

Anyway, we have recently fixed this problem by changing the way the tag
is used (local element number vs global one). Maybe the easiest would be
to merge these changes in the repo? Otherwise, I can also send the
changed files, but then of course one has to be careful to have matching
revisions.

Best regards,
Philipp

On 2015-04-03 14:04, nek5000-users at lists.mcs.anl.gov wrote:
> Dear Neks,
>
>
> I'm having a problem running my simulation with thousands of processors.
> In my case, I have 6090000 elements and I ran the simulation with 8520
> procs. However, the simulation aborted with the error message
> 'MPI_TAG_UB too small'.
>
>
> I checked the code and found that if "nval.lt.(10000+max(lp,lelg))", the
> simulation would abort. But in the subroutine 'mpi_attr_get', ival is
> set to be 9999999. In this case, nval should be larger than
> 10000+max(lp,lelg). So, why did the simulation not run?
>
>
> Hope anyone can help me on this. Thank you very much.
>
>
> Best regards,
>
> Tony
>
>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>


------------------------------

_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


End of Nek5000-users Digest, Vol 74, Issue 2
********************************************


------------------------------

Message: 2
Date: Fri, 03 Apr 2015 23:59:21 +0200
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI_TAG_UB too small
Message-ID:
        <mailman.1340.1428098378.14701.nek5000-users at lists.mcs.anl.gov>
Content-Type: text/plain; charset=windows-1252

Dear Tony,
we aim at committing the necessary changes to the repository by Monday.
Hope that this is fine.
Best regards,
Philipp

On 2015-04-03 20:22, nek5000-users at lists.mcs.anl.gov wrote:
> Dear Philipp,
>
> I see. Thank you very much for the explanation. It would be great if you could help me make relevant changes or send me the files you mentioned if that is at all possible.
>
> Best regards,
> Tony
>
> ________________________________________
> From: nek5000-users-bounces at lists.mcs.anl.gov <nek5000-users-bounces at lists.mcs.anl.gov> on behalf of nek5000-users-request at lists.mcs.anl.gov <nek5000-users-request at lists.mcs.anl.gov>
> Sent: 03 April 2015 18:00
> To: nek5000-users at lists.mcs.anl.gov
> Subject: Nek5000-users Digest, Vol 74, Issue 2
>
> Send Nek5000-users mailing list submissions to
>         nek5000-users at lists.mcs.anl.gov
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
> or, via email, send a message with subject or body 'help' to
>         nek5000-users-request at lists.mcs.anl.gov
>
> You can reach the person managing the list at
>         nek5000-users-owner at lists.mcs.anl.gov
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Nek5000-users digest..."
>
>
> Today's Topics:
>
>    1. MPI_TAG_UB too small (nek5000-users at lists.mcs.anl.gov)
>    2. Re: MPI_TAG_UB too small (nek5000-users at lists.mcs.anl.gov)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 3 Apr 2015 12:04:14 +0000
> From: nek5000-users at lists.mcs.anl.gov
> To: "nek5000-users at lists.mcs.anl.gov"
>         <nek5000-users at lists.mcs.anl.gov>
> Subject: [Nek5000-users] MPI_TAG_UB too small
> Message-ID:
>         <mailman.1264.1428062661.14701.nek5000-users at lists.mcs.anl.gov>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Neks,
>
>
> I'm having a problem running my simulation with thousands of processors. In my case, I have 6090000 elements and I ran the simulation with 8520 procs. However, the simulation aborted with the error message 'MPI_TAG_UB too small'.
>
>
> I checked the code and found that if "nval.lt.(10000+max(lp,lelg))", the simulation would abort. But in the subroutine 'mpi_attr_get', ival is set to be 9999999. In this case, nval should be larger than 10000+max(lp,lelg). So, why did the simulation not run?
>
>
>  Hope anyone can help me on this. Thank you very much.
>
>
> Best regards,
>
> Tony
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20150403/8f2d0483/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 03 Apr 2015 18:04:16 +0200
> From: nek5000-users at lists.mcs.anl.gov
> To: nek5000-users at lists.mcs.anl.gov
> Subject: Re: [Nek5000-users] MPI_TAG_UB too small
> Message-ID:
>         <mailman.1287.1428077071.14701.nek5000-users at lists.mcs.anl.gov>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Dear all,
>
> indeed, we have experienced the same problem with the recent Cray MPI
> Library; the maximum tag size was reduced to 2**21 (the MPI standard
> requires 2^16, but most libraries have 2^32). The problem in Nek is that
> in the initial distribution of the mesh and velocities onto the
> processors, the element number is used as a message tag, which obviously
> fails if you have more than 2**21 elements. The check you mention
> (nval.lt.(10000+max(lp,lelg)) is exactly checking that (mpi_dummf.f is
> only used in serial runs, in parallel runs mpi_attr_get is a routine
> provided by the current mpi library and returns the maximum tag number
> of the implementation).
>
> Anyway, we have recently fixed this problem by changing the way the tag
> is used (local element number vs global one). Maybe the easiest would be
> to merge these changes in the repo? Otherwise, I can also send the
> changed files, but then of course one has to be careful to have matching
> revisions.
>
> Best regards,
> Philipp
>
> On 2015-04-03 14:04, nek5000-users at lists.mcs.anl.gov wrote:
>> Dear Neks,
>>
>>
>> I'm having a problem running my simulation with thousands of processors.
>> In my case, I have 6090000 elements and I ran the simulation with 8520
>> procs. However, the simulation aborted with the error message
>> 'MPI_TAG_UB too small'.
>>
>>
>> I checked the code and found that if "nval.lt.(10000+max(lp,lelg))", the
>> simulation would abort. But in the subroutine 'mpi_attr_get', ival is
>> set to be 9999999. In this case, nval should be larger than
>> 10000+max(lp,lelg). So, why did the simulation not run?
>>
>>
>> Hope anyone can help me on this. Thank you very much.
>>
>>
>> Best regards,
>>
>> Tony
>>
>>
>> _______________________________________________
>> Nek5000-users mailing list
>> Nek5000-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>
>
>
> ------------------------------
>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
>
> End of Nek5000-users Digest, Vol 74, Issue 2
> ********************************************
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>


------------------------------

_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


End of Nek5000-users Digest, Vol 74, Issue 3
********************************************
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tag.tgz
Type: application/x-compressed-tar
Size: 86522 bytes
Desc: tag.tgz
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20150407/38ca61e9/attachment-0001.bin>


More information about the Nek5000-users mailing list