[petsc-users] Petsc cannot be initialized on vesta in some --mode options

Roc Wang pengxwang at hotmail.com
Tue Jan 21 13:14:04 CST 2014


Hi,

   I am trying to run a PETSc program with 1024 MPI ranks on vesta.alcf.anl.gov.  The original program which was debugged and run successfully on other clusters and on vesta with a small number of ranks included many PETSc functions to use KSP solver, but they are commented off to test the PETSc initialization. Therefore, only PetscInitialize() and PetscFinalize() and some output functions are in the program. The command to run the job is:

qsub -n <number of nodes> -t 10 --mode <ranks per node> --env "F00=a:BAR=b" ./x.r 

The total number of ranks is 1024 with different combinations of <number of nodes> and <ranks per node>, such as -n 64 --mode c16 or -n 16 --mode  64.

The results showed that PetscInitialize() cannot start the petsc process with -n 64 --mode c16 since there is no output printed to stdout.  The .cobaltlog file shows the job started but just .output file didn't record any output. The .error file is like:

2014-01-21 16:31:50.414 (INFO ) [0x40000a3bc20] 32092:ibm.runjob.AbstractOptions: using properties file /bgsys/local/etc/bg.properties
2014-01-21 16:31:50.416 (INFO ) [0x40000a3bc20] 32092:ibm.runjob.AbstractOptions: max open file descriptors: 65536
2014-01-21 16:31:50.416 (INFO ) [0x40000a3bc20] 32092:ibm.runjob.AbstractOptions: core file limit: 18446744073709551615
2014-01-21 16:31:50.416 (INFO ) [0x40000a3bc20] 32092:tatu.runjob.client: scheduler job id is 154599
2014-01-21 16:31:50.419 (INFO ) [0x400004034e0] 32092:tatu.runjob.monitor: monitor started
2014-01-21 16:31:50.421 (INFO ) [0x40000a3bc20] VST-00420-11731-64:32092:ibm.runjob.client.options.Parser: set local socket to runjob_mux from properties file
2014-01-21 16:31:53.111 (INFO ) [0x40000a3bc20] VST-00420-11731-64:729041:ibm.runjob.client.Job: job 729041 started
2014-01-21 16:32:03.603 (WARN ) [0x400004034e0] 32092:tatu.runjob.monitor: tracklib terminated with exit code 1
2014-01-21 16:41:09.554 (WARN ) [0x40000a3bc20] VST-00420-11731-64:ibm.runjob.LogSignalInfo: received signal 15
2014-01-21 16:41:09.555 (WARN ) [0x40000a3bc20] VST-00420-11731-64:ibm.runjob.LogSignalInfo: signal sent from USER
2014-01-21 16:41:09.555 (WARN ) [0x40000a3bc20] VST-00420-11731-64:ibm.runjob.LogSignalInfo: sent from pid 5894
2014-01-21 16:41:09.555 (WARN ) [0x40000a3bc20] VST-00420-11731-64:ibm.runjob.LogSignalInfo: could not read /proc/5894/exe
2014-01-21 16:41:09.555 (WARN ) [0x40000a3bc20] VST-00420-11731-64:ibm.runjob.LogSignalInfo: Permission denied
2014-01-21 16:41:09.555 (WARN ) [0x40000a3bc20] VST-00420-11731-64:ibm.runjob.LogSignalInfo: sent from uid 0 (root)
2014-01-21 16:41:11.248 (WARN ) [0x40000a3bc20] VST-00420-11731-64:729041:ibm.runjob.client.Job: terminated by signal 9
2014-01-21 16:41:11.248 (WARN ) [0x40000a3bc20] VST-00420-11731-64:729041:ibm.runjob.client.Job: abnormal termination by signal 9 from rank 720
2014-01-21 16:41:11.248 (INFO ) [0x40000a3bc20] tatu.runjob.client: task terminated by signal 9
2014-01-21 16:41:11.248 (INFO ) [0x400004034e0] 32092:tatu.runjob.monitor: monitor terminating
2014-01-21 16:41:11.250 (INFO ) [0x40000a3bc20] tatu.runjob.client: monitor completed


The petsc can start with -n 16 --mode  64 and -n 1024 --mode c1.  I also replaced PetscInitialize()  with MPI_Init() and the program can start correctly with all combinations of the options. 

What is the reason cause this strange result? Thanks.


   
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140121/99deaf0f/attachment.html>


More information about the petsc-users mailing list