[petsc-users] UCX ERROR KNEM inline copy failed

Fande Kong fdkong.jd at gmail.com
Tue Oct 2 10:52:58 CDT 2018


The error messages may have nothing to do with PETSc and MOOSE.

It might be from a package for MPI communication
https://github.com/openucx/ucx.  I have no experiences on such things. It
may be helpful to contact your HPC administer.

Thanks,

Fande,

On Tue, Oct 2, 2018 at 9:24 AM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Oct 2, 2018 at 11:16 AM Y. Yang <
> yangyiwei.yang at mfm.tu-darmstadt.de> wrote:
>
>> Dear PETSc team
>>
>> Recently I'm using MOOSE (http://www.mooseframework.org/) which is built
>> with PETSc and, Unfortunately, I encountered some problems with
>> following PETSc options:
>>
>
> I do not know what problem you are reporting.I don't know what package
> knem_ep.c is
> part of, but its not PETSc.
>
>   Thanks,
>
>      Matt
>
>
>> petsc_options_iname = '-pc_type -ksp_gmres_restart -sub_ksp_type
>> -sub_pc_type -pc_asm_overlap -pc_factor_mat_solver_package'
>>
>> petsc_options_value = 'asm          1201  preonly             ilu
>>             4    superlu_dist'
>>
>>
>> the error message is:
>>
>> Time Step 1, time = 1
>>                  dt = 1
>>
>>      |residual|_2 of individual variables:
>>                          c:   779.034
>>                          w:   0
>>                          T:   6.57948e+07
>>                          gr0: 211.617
>>                          gr1: 206.973
>>                          gr2: 209.382
>>                          gr3: 191.089
>>                          gr4: 185.242
>>                          gr5: 157.361
>>                          gr6: 128.473
>>                          gr7: 87.6029
>>
>>   0 Nonlinear |R| =  [32m6.579482e+07 [39m
>> [1538482623.976180] [hpb0085:22501:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482605.111342] [hpb0085:22502:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482606.761138] [hpb0085:22502:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482607.107478] [hpb0085:22502:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482605.882817] [hpb0085:22503:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482607.133543] [hpb0085:22503:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482621.905475] [hpb0085:22510:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482626.531234] [hpb0085:22510:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482627.613343] [hpb0085:22515:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482627.830489] [hpb0085:22515:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482629.852351] [hpb0085:22515:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482630.194620] [hpb0085:22515:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482630.280636] [hpb0085:22515:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482600.219314] [hpb0085:22516:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482658.960350] [hpb0085:22516:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482622.949471] [hpb0085:22517:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482612.502017] [hpb0085:22500:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482613.231970] [hpb0085:22500:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482621.417530] [hpb0085:22520:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482622.020998] [hpb0085:22520:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482606.221292] [hpb0085:22521:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482606.676987] [hpb0085:22521:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482606.896865] [hpb0085:22521:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482639.611427] [hpb0085:22522:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482631.435277] [hpb0085:22523:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482658.278343] [hpb0085:22512:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482658.396945] [hpb0085:22512:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482659.917476] [hpb0085:22512:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> [1538482660.162064] [hpb0085:22512:0]        knem_ep.c:84   UCX ERROR
>> KNEM inline copy failed, err = -1 Invalid argument
>> 2 total processes killed (some possibly by mpirun during cleanup)
>>
>>
>> Here's the status of the simulation
>>
>> Parallelism:
>>    Num Processors:          100
>>    Num Threads:             1
>>
>> Mesh:
>>    Parallel Type:           distributed
>>    Mesh Dimension:          3
>>    Spatial Dimension:       3
>>    Nodes:
>>      Total:                 2065551
>>      Local:                 22774
>>    Elems:
>>      Total:                 2000000
>>      Local:                 20006
>>    Num Subdomains:          1
>>    Num Partitions:          100
>>    Partitioner:             parmetis
>>
>> Nonlinear System:
>>    Num DOFs:                18589959
>>    Num Local DOFs:          204966
>>    Variables:               { "c" "w" "T" "gr0" "gr1" "gr2" "gr3" "gr4"
>> "gr5" }
>>    Finite Element Types:    "LAGRANGE"
>>    Approximation Orders:    "FIRST"
>>
>> Auxiliary System:
>>    Num DOFs:                10065551
>>    Num Local DOFs:          102798
>>    Variables:               "bnds" { "var_indices" "unique_grains" } {
>> "M" "dM/dT" }
>>    Finite Element Types:    "LAGRANGE" "MONOMIAL" "MONOMIAL"
>>    Approximation Orders:    "FIRST" "CONSTANT" "CONSTANT"
>>
>> Relationship Managers:
>>    Geometric                : GrainTrackerHaloRM (2 layers)
>>
>> Execution Information:
>>    Executioner:             Transient
>>    TimeStepper:             IterationAdaptiveDT
>>    Solver Mode:             Preconditioned JFNK
>>
>>
>> I tried modifying the parameters and other preconditioning option, the
>> problem is much the same. So I don't know where I did wrong or there is
>> actually suitable PETSc option to deal with such problem with large
>> mesh. I would like to hear your response.
>>
>> Sincerely,
>> Yang
>>
>> --
>> ______________________________________________________
>>
>> Yangyiwei Yang
>> Wissenschaftliche Hilfskraft
>>
>> TU Darmstadt
>> Fachbereich 11 - Material- und Geowissenschaften
>> Fachgebiet Mechanik funktionaler Materialien
>>
>> L1 | 08 402
>> Otto Berndt Straße 3
>> D-64287 Darmstadt
>>
>> Tel: +49 (0)6151-16-22923
>> Email: yangyiwei.yang at mfm.tu-darmstadt.de
>> Homepage: http://www.mawi.tu-darmstadt.de/mfm
>> ORCID: 0000-0001-5505-7117
>>
>> ______________________________________________________
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181002/322022de/attachment.html>


More information about the petsc-users mailing list