[petsc-dev] ?????? Petsc: Error code 1

Satish Balay balay at mcs.anl.gov
Wed Apr 7 09:40:29 CDT 2021


There are always diffs (between different compilers, versions, hardware) - doesn't necessarily mean failures.

Very large diffs [as Barry mentioned] might require a closer look.

Depending upon your purpose - you might consider that you have a valid petsc install and start using it.

Obviously - based on the problem that you are solving - you might need to re-visit this issue [checking how your application behaves between a debug build and optimized build - and play with different optimization flags]

Satish

On Wed, 7 Apr 2021, Chen Gang wrote:

> ARM: configure with the following: -ffp-contract=off
> 
> 
> ./configure -with-debugging=0 COPTFLAGS='-O3 -ffp-contract=off -march=armv8.2-a -mtune=tsv110' CXXOPTFLAGS='-O3 -ffp-contract=off -march=armv8.2-a -mtune=tsv110' FOPTFLAGS='-O3 -ffp-contract=off -march=armv8.2-a -mtune=tsv110' --with-x=1 -download-fblaslapack PETSC-KERNEL-USE-UNROLL-4
> 
> 
> There are 3 failures, all related to FC code
> 
> 
> 
> 
> # -------------
> #   Summary
> # -------------
> # FAILED diff-tao_bound_tutorials-plate2f_2 diff-tao_bound_tutorials-plate2f_1 diff-vec_is_is_tutorials-ex2f_1
> # success 7547/9528 tests (79.2%)
> # failed 3/9528 tests (0.0%)
> # todo 225/9528 tests (2.4%)
> # skip 1753/9528 tests (18.4%)
> #
> # Wall clock time for tests: 1371 sec
> # Approximate CPU time (not incl. build time): 106722.62999999999 sec
> #
> # To rerun failed tests:
> #     /opt/rh/devtoolset-9/root/usr/bin/gmake -f gmakefile test test-fail=1
> #
> # Timing summary (actual test time / total CPU time):
> #   dm_tests-ex36_3dp1: 467.68 sec / 477.71 sec
> #   dm_impls_stag_tests-ex1_multidof_3: 395.16 sec / 398.45 sec
> #   ts_tutorials-ex29_1: 236.54 sec / 238.30 sec
> #   dm_impls_stag_tests-ex1_basic_2: 205.41 sec / 207.34 sec
> #   dm_tests-ex34_1: 178.94 sec / 180.26 sec
> 
> 
> 
> 
> 
> 
> 
> ------------------ ???????? ------------------
> ??????:                                                                                                                        "Chen Gang"                                                                                    <569615491 at qq.com>;
> ????????: 2021??4??7??(??????) ????10:49
> ??????: "petsc-dev"<petsc-dev at mcs.anl.gov>;
> 
> ????: ?????? [petsc-dev] Petsc: Error code 1
> 
> 
> 
> ARM: configure with the following: -ffp-contract=off
> 
> 
> ./configure -with-debugging=0 COPTFLAGS='-O3 -ffp-contract=off -march=armv8.2-a -mtune=tsv110' CXXOPTFLAGS='-O3 -ffp-contract=off -march=armv8.2-a -mtune=tsv110' FOPTFLAGS='-O3 -ffp-contract=off -march=armv8.2-a -mtune=tsv110' --with-x=1 -download-fblaslapack PETSC-KERNEL-USE-UNROLL-4
> 
> 
> There are 3 failures, all related to FC code
> 
> 
> 
> 
> # -------------
> #   Summary
> # -------------
> # FAILED diff-tao_bound_tutorials-plate2f_2 diff-tao_bound_tutorials-plate2f_1 diff-vec_is_is_tutorials-ex2f_1
> # success 7547/9528 tests (79.2%)
> # failed 3/9528 tests (0.0%)
> # todo 225/9528 tests (2.4%)
> # skip 1753/9528 tests (18.4%)
> #
> # Wall clock time for tests: 1371 sec
> # Approximate CPU time (not incl. build time): 106722.62999999999 sec
> #
> # To rerun failed tests:
> #     /opt/rh/devtoolset-9/root/usr/bin/gmake -f gmakefile test test-fail=1
> #
> # Timing summary (actual test time / total CPU time):
> #   dm_tests-ex36_3dp1: 467.68 sec / 477.71 sec
> #   dm_impls_stag_tests-ex1_multidof_3: 395.16 sec / 398.45 sec
> #   ts_tutorials-ex29_1: 236.54 sec / 238.30 sec
> #   dm_impls_stag_tests-ex1_basic_2: 205.41 sec / 207.34 sec
> #   dm_tests-ex34_1: 178.94 sec / 180.26 sec
> 
> 
> 
> ------------------ ???????? ------------------
> ??????:                                                                                                                        "petsc-dev"                                                                                    <balay at mcs.anl.gov>;
> ????????: 2021??4??7??(??????) ????1:06
> ??????: "Barry Smith"<bsmith at petsc.dev>;
> ????: "Chen Gang"<569615491 at qq.com>;"Alp Dener"<alp.dener at gmail.com>;"petsc-dev"<petsc-dev at mcs.anl.gov>;"cglwdm"<cglwdm at scu.edu.cn>;
> ????: Re: [petsc-dev] Petsc: Error code 1
> 
> 
> 
> > See the attachements. alltest.log is on a machine with 96 cores, ARM, with FC,gcc9.3.5,mpich3.4.1,fblaslapack; 6 failures
> 
> Perhaps this is an issue with ARM - and such diffs are expected - as we already have multiple alt files for some of these tests
> 
> $ ls -lt src/tao/bound/tutorials/output/plate2f_*
> -rw-r--r--. 1 balay balay 1029 Mar 23 19:48 src/tao/bound/tutorials/output/plate2f_1_alt.out
> -rw-r--r--. 1 balay balay 1071 Mar 23 19:48 src/tao/bound/tutorials/output/plate2f_1.out
> -rw-r--r--. 1 balay balay 1029 Mar 23 19:48 src/tao/bound/tutorials/output/plate2f_2_alt.out
> -rw-r--r--. 1 balay balay 1071 Mar 23 19:48 src/tao/bound/tutorials/output/plate2f_2.out
> 
> >>>>>>
> not ok diff-vec_is_is_tutorials-ex2f_1 # Error code: 1
> #       16,24d15
> #       <   5
> #       <   7
> #       <   9
> #       <  11
> #       <  13
> #       <  15
> #       <  17
> #       <  19
> #       <  21
> <<<<<
> 
> This one is puzzling - missing fortran stdout? Perhaps compile issue on ARM? [its a sequential example - so can't blame MPI]
> 
> Or they are all related to the optimization flags used? What configure options were used for the build?
> 
> Satish
> 
> On Tue, 6 Apr 2021, Barry Smith wrote:
> 
> > 
> >     Alp,
> > 
> >    Except for the first test, these are all optimization problems (mostly in Fortran). The function values are very different so I am sending it to our optimization expert to take a look at it. The differences could possibly be related to the use of real() and maybe the direct use of floating point numbers that the compiler first treats as single and then converts to double thus losing precision.
> > 
> >    Chen Gang, I assume you compiled with the default standard precision PETSc configure options?
> > 
> > 
> > 
> > On Apr 6, 2021, at 3:56 AM, Chen Gang <569615491 at qq.com<mailto:569615491 at qq.com>> wrote:
> > 
> > 
> > See the attachements. alltest.log is on a machine with 96 cores, ARM, with FC,gcc9.3.5,mpich3.4.1,fblaslapack; 6 failures
> >                                  alltest2.log is on an intel machine with 40  cores,x86, without FC; icc&mkl& intel mpi; only 1 failure
> > 
> > ------------------ ???????? ------------------
> > ??????: "petsc-dev" <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>;
> > ????????: 2021??4??6??(??????) ????12:38
> > ??????: "petsc-dev"<petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>;
> > ????: "Chen Gang"<569615491 at qq.com<mailto:569615491 at qq.com>>;"cglwdm"<cglwdm at scu.edu.cn<mailto:cglwdm at scu.edu.cn>>;
> > ????: Re: [petsc-dev] Petsc: Error code 1
> > 
> > Note: do not use '-j' with alltests.
> > 
> > And run alltests on both machines [but *not* at the same time on machines] and send us logs from both the runs.
> > 
> > Satish
> > 
> > 
> > On Mon, 5 Apr 2021, Satish Balay wrote:
> > 
> > > Try:
> > >
> > > make alltests TIMEOUT=600
> > >
> > > And send us the complete log (alltests.log)
> > >
> > > Satish
> > >
> > > On Tue, 6 Apr 2021, Chen Gang wrote:
> > >
> > > > Dear sir,
> > > >
> > > >
> > > > The result of make check is OK. And I do set the timeout to a larger value, which keeps me from getting timeout error. The thing is I have two machines. And I get the error code 1 in different tests on different machines.I don??t know what is error code1. What case this? How can I fix the failure tests.
> > > >
> > > >
> > > > ------------------ Original ------------------
> > > > From: Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>&gt;
> > > > Date: Tue,Apr 6,2021 0:18 PM
> > > > To: Chen Gang <569615491 at qq.com<mailto:569615491 at qq.com>&gt;
> > > > Cc: petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>&gt;, cglwdm <cglwdm at scu.edu.cn<mailto:cglwdm at scu.edu.cn>&gt;
> > > > Subject: Re: [petsc-dev] Petsc: Error code 1
> > >
> > 
> > <alltests2.log><alltests.log>
> > 
> >



More information about the petsc-dev mailing list