[Nek5000-users] startup time nek5000

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Sep 11 07:56:43 CDT 2012


Hi again,
I forgot before, here I paste the complete output around the "critical"
parts:

gs_setup: 559948 unique labels shared
   pairwise times (avg, min, max): 0.000216368 0.000171804 0.000280309
   crystal router                : 0.000162064 0.000158095 0.000166893
   used all_to_all method: crystal router
 done :: setup h1 coarse grid    728.853798866272       sec

So you can see it takes about 10 minutes. Again, it is not a bit deal at
all, just wondering whether this is fine.

Best,
Philipp

-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: den 10 september 2012 23:09
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] startup time nek5000


Dear Philipp,

This is generally expected for the direct, XX^T-based, coarse
grid solve.   How many elements in your problem?

The only alternative is to switch to AMG, but that is less automatic than
XXT at this point.  (It is faster for some problems, but I don't think it's
faster for your class of problems.  By "faster" here I refer to the
execution phase rather than the setup costs.)

Best regards,

Paul



On Mon, 10 Sep 2012, nek5000-users at lists.mcs.anl.gov wrote:

> Dear all,
> We have just successfully compiled and run nek5000 on another cluster, 
> using Intel MPI and the corresponding wrappers mpiifort and mpiicc. 
> The code runs fine, without problem, but it stays for about 10 minutes 
> (using 4096 cores) during the startup with the following output:
> ....
> gs_setup: 559948 unique labels shared
>    pairwise times (avg, min, max): 0.000220039 0.000176096 0.000265098
>    crystal router                : 0.000166412 0.000162292 0.000180507
>    used all_to_all method: crystal router
>
>
> Attaching gdb tells me the following location:
>
> (gdb) where
> #0  0x00002adafd4b51db in MPIDI_CH3I_Progress () from
> /pdc/vol/intelmpi/4.0.3/lib64/libmpi.so.4
> #1  0x00002adafd625fe6 in PMPI_Recv () from
> /pdc/vol/intelmpi/4.0.3/lib64/libmpi.so.4
> #2  0x000000000083041c in orthogonalize ()
> #3  0x000000000082ed23 in jl_crs_setup ()
> #4  0x0000000000831d69 in crs_setup_ ()
> #5  0x0000000000632760 in set_up_h1_crs_ ()
> #6  0x000000000061feba in set_overlap_ ()
> #7  0x000000000040b7c1 in nek_init_ ()
> #8  0x000000000040a824 in MAIN__ ()
> #9  0x000000000040472c in main ()
>
> As I said, the code runs fine, and very fast, so no problem. Just 
> wanted to ask whether these 10 minutes in the startup would be to be 
> expected, or whether we could try to bring that time down a bit. We 
> restart every say 24 hours so it's not a big problem. I have to say 
> that our size is very close to the memory available per core.
>
> Thanks,
> Philipp
>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users




More information about the Nek5000-users mailing list