[petsc-users] UCX ERROR KNEM inline copy failed
Y. Yang
yangyiwei.yang at mfm.tu-darmstadt.de
Tue Oct 2 10:10:15 CDT 2018
Dear PETSc team
Recently I'm using MOOSE (http://www.mooseframework.org/) which is built
with PETSc and, Unfortunately, I encountered some problems with
following PETSc options:
petsc_options_iname = '-pc_type -ksp_gmres_restart -sub_ksp_type
-sub_pc_type -pc_asm_overlap -pc_factor_mat_solver_package'
petsc_options_value = 'asm 1201 preonly ilu
4 superlu_dist'
the error message is:
Time Step 1, time = 1
dt = 1
|residual|_2 of individual variables:
c: 779.034
w: 0
T: 6.57948e+07
gr0: 211.617
gr1: 206.973
gr2: 209.382
gr3: 191.089
gr4: 185.242
gr5: 157.361
gr6: 128.473
gr7: 87.6029
0 Nonlinear |R| = [32m6.579482e+07[39m
[1538482623.976180] [hpb0085:22501:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482605.111342] [hpb0085:22502:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482606.761138] [hpb0085:22502:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482607.107478] [hpb0085:22502:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482605.882817] [hpb0085:22503:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482607.133543] [hpb0085:22503:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482621.905475] [hpb0085:22510:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482626.531234] [hpb0085:22510:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482627.613343] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482627.830489] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482629.852351] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482630.194620] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482630.280636] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482600.219314] [hpb0085:22516:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482658.960350] [hpb0085:22516:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482622.949471] [hpb0085:22517:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482612.502017] [hpb0085:22500:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482613.231970] [hpb0085:22500:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482621.417530] [hpb0085:22520:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482622.020998] [hpb0085:22520:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482606.221292] [hpb0085:22521:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482606.676987] [hpb0085:22521:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482606.896865] [hpb0085:22521:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482639.611427] [hpb0085:22522:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482631.435277] [hpb0085:22523:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482658.278343] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482658.396945] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482659.917476] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
[1538482660.162064] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR
KNEM inline copy failed, err = -1 Invalid argument
2 total processes killed (some possibly by mpirun during cleanup)
Here's the status of the simulation
Parallelism:
Num Processors: 100
Num Threads: 1
Mesh:
Parallel Type: distributed
Mesh Dimension: 3
Spatial Dimension: 3
Nodes:
Total: 2065551
Local: 22774
Elems:
Total: 2000000
Local: 20006
Num Subdomains: 1
Num Partitions: 100
Partitioner: parmetis
Nonlinear System:
Num DOFs: 18589959
Num Local DOFs: 204966
Variables: { "c" "w" "T" "gr0" "gr1" "gr2" "gr3" "gr4"
"gr5" }
Finite Element Types: "LAGRANGE"
Approximation Orders: "FIRST"
Auxiliary System:
Num DOFs: 10065551
Num Local DOFs: 102798
Variables: "bnds" { "var_indices" "unique_grains" } {
"M" "dM/dT" }
Finite Element Types: "LAGRANGE" "MONOMIAL" "MONOMIAL"
Approximation Orders: "FIRST" "CONSTANT" "CONSTANT"
Relationship Managers:
Geometric : GrainTrackerHaloRM (2 layers)
Execution Information:
Executioner: Transient
TimeStepper: IterationAdaptiveDT
Solver Mode: Preconditioned JFNK
I tried modifying the parameters and other preconditioning option, the
problem is much the same. So I don't know where I did wrong or there is
actually suitable PETSc option to deal with such problem with large
mesh. I would like to hear your response.
Sincerely,
Yang
--
______________________________________________________
Yangyiwei Yang
Wissenschaftliche Hilfskraft
TU Darmstadt
Fachbereich 11 - Material- und Geowissenschaften
Fachgebiet Mechanik funktionaler Materialien
L1 | 08 402
Otto Berndt Straße 3
D-64287 Darmstadt
Tel: +49 (0)6151-16-22923
Email: yangyiwei.yang at mfm.tu-darmstadt.de
Homepage: http://www.mawi.tu-darmstadt.de/mfm
ORCID: 0000-0001-5505-7117
______________________________________________________
More information about the petsc-users
mailing list