<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run
the job with GPU enabled I get the following error:</div>
<div class="elementToProof" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span>
<div class="ContentPasted0"><span style="font-family:"Courier New",monospace"> <b>
what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered</b></span></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0"><span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0"><span style="font-family:"Courier New",monospace">Backtrace for this error:</span></div>
<div class="ContentPasted0"><span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span></div>
<div class="ContentPasted0"><span style="font-family:"Courier New",monospace"> what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered</span></div>
<div><br class="ContentPasted0">
</div>
<span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it:</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="ContentPasted1">#!/bin/bash
<div class="ContentPasted1">#SBATCH -J test</div>
<div class="ContentPasted1">#SBATCH -e /home/Issues/PETSc/test.err</div>
<div class="ContentPasted1">#SBATCH -o /home/Issues/PETSc/test.log</div>
<div class="ContentPasted1">#SBATCH --partition=batch</div>
<div class="ContentPasted1">#SBATCH --ntasks=2</div>
<div class="ContentPasted1">#SBATCH --nodes=1</div>
<div class="ContentPasted1">#SBATCH --cpus-per-task=1</div>
<div class="ContentPasted1">#SBATCH --ntasks-per-node=2</div>
<div class="ContentPasted1">#SBATCH --time=01:00:00</div>
<div class="ContentPasted1">#SBATCH --gres=gpu:1</div>
<div><br class="ContentPasted1">
</div>
<div class="ContentPasted1">export OMP_NUM_THREADS=1</div>
<div class="ContentPasted1">module load cuda/11.5</div>
<div class="ContentPasted1">module load openmpi/4.1.1</div>
<div><br class="ContentPasted1">
</div>
<div class="ContentPasted1">cd /home/Issues/PETSc</div>
<div class="ContentPasted1"><b>mpirun -n 2 </b>/home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds
<b>-vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg</b></div>
<br>
</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="ContentPasted1">If anyone has any suggestions on how o troubleshoot this please let me know.</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="ContentPasted1">Thanks!</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="ContentPasted1">Marcos<br>
</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div class="elementToProof ContentPasted0" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
</body>
</html>