[petsc-dev] BG hang still broken in petsc-maint!

Satish Balay balay at mcs.anl.gov
Wed Dec 18 16:42:07 CST 2013


On Wed, 18 Dec 2013, Jed Brown wrote:

> Satish Balay <balay at mcs.anl.gov> writes:
> 
> > ~/.petscrc does not get loaded for me on vesta [with maint] - as there
> > is no $HOME in the run env.
> 
> We still attempt it for the current directory (./.petscrc and ./petscrc).
> 
> I dug into the PAMID source, but the environment variable processing is
> not something we can mess with later without some support being added
> inside MPICH/PAMID.  If PETSc calls MPI_Init(), we could use setenv()
> before, though that could be surprising.
> 
> We could write our own reference MPI_Bcast over point-to-point.  Our
> messages are always pretty short so this is actually a decent algorithm
> and would be best for the users (most reliable, least confusing).

I forced reading in ~/.petscrc by using 'runjob --env-all' and ran the
following runs in a loop [50*3 = 150 runs?] with the basic petsc
example src/sys/examples/tutorials/ex1.c - and there was no hang in
the whole session.

Satish

-------


[balay at cetuslac1 tutorials]$ cat ~/reserve.sh 
#!/bin/bash
echo I am going to wait for an hour
sleep 3600
echo Good bye

[balay at cetuslac1 ~]$ qsub -A BGQtools_esp -n 512 -t 60 --mode script ./reserve.sh 

[balay at cetuslac1 tutorials]$ cat run.sh 
#!/bin/bash
for i in `seq 1 50`;
do
echo  ******** $i ***************
runjob --env-all --np 2048 --ranks-per-node 16 --cwd $PWD --block CET-00040-33371-512 : $PWD/ex1 -log_summary
runjob --env-all --np 4096 --ranks-per-node 16 --cwd $PWD --block CET-00040-33371-512 : $PWD/ex1 -log_summary
runjob --env-all --np 8192 --ranks-per-node 16 --cwd $PWD --block CET-00040-33371-512 : $PWD/ex1 -log_summary
done    
        
[balay at cetuslac1 tutorials]$ ./run.sh 

<snip>



More information about the petsc-dev mailing list