input/output errors from "make check"
Wei-keng Liao
wkliao at eecs.northwestern.edu
Fri May 29 12:59:35 CDT 2015
Rob is right. In my previous email, I was not trying to say PnetCDF performs poorly.
Rather, I tried to explain why the testing took a long time. Those testings check all sorts of scenarios by reading/writing small amount of data, with file opens/closes, and sometimes with file sync.
As they are designed to check errors, not for performance, they might result in large overheads.
Thus, one should not take the timing of testings seriously.
If you are interested in PnetCDF I/O performance, there are several benchmark programs under directory benchmarks. They report execution time as well as read/write bandwidths.
More can be found from this URL
http://cucis.ece.northwestern.edu/projects/PnetCDF/#Benchmarks
Wei-keng
On May 29, 2015, at 9:08 AM, Rob Latham wrote:
>
>
> On 05/28/2015 10:20 AM, Wei-keng Liao wrote:
>> Hi, Carl
>>
>> The error message "=>> PBS: job killed: walltime 3636 exceeded limit 3600"
>> means the time allocated in your job submitted to the PBS queue is 3600 seconds, and the
>> job ran more than that limit and was killed by the system.
>
> Still, an hour to run the test seems really high. On this system, do you have a fast storage system and a slower home file system?
>
> On my 3 year old laptop (so plenty of caching) 'time make check':
>
> make check 141.63s user 10.17s system 94% cpu 2:40.65 total
>
> If you are hitting paths in ROMIO that require lustre or NFS locks, that's going to slow things down a lot... but 200x slower? Wonder what's going on here.
>
> Wei-keng is thinking specifically of the situation where one would run the pnetcdf tests out of the Blue Gene home directory, which has very low performance in addition to having all system calls relayed through i/o nodes.
>
> Building on the faster parallel file system (still gpfs, but with more servers) takes a long time -- close to 3 hours! Building on the home file system takes 13 minutes.
>
> GPFS, at least the one on our Blue Gene, takes a very long time to link. Perhaps all you will need to do, if you are using GPFS, is build the tests first before running them.
>
> That's all I can think of at the moment.
>
> ==rob
>
>
>>
>> Each of nc_test, nf_test, and nf90_test performs thousands of small writes to test PnetCDF.
>> On some systems, especially those with storage systems separated from the compute systems, eg. IBM BG machines, these tests will take a long time. Please allocated more time for you job, say 2 hours.
>>
>> Also, you can compile all test programs before running the tests, by doing the followings.
>> cd test
>> make
>> cd ../examples
>> make
>> cd ..
>>
>> This way your job will run with all test executables already built.
>>
>>
>> Wei-keng
>>
>> On May 28, 2015, at 2:00 AM, Carl Ponder wrote:
>>
>>> On 05/27/2015 06:41 PM, Wei-keng Liao wrote:
>>>> However, you should see the same error messages when testing C programs, i.e. nc_test. Is it not the case for you?
>>> Ok -- I did get the same errors in the C part of the testing.
>>>> As for the issue you encountered when using PGI compilers, can you send me the file config.log and show me the standard output on screen where the program hangs?
>>>> A successful run of "make check" should look like below. Are you saying it does not show this first line when entering directory nf90_test?
>>>> *** TESTING F90 ./nf90_test for CDF-1 ------ pass
>>> Here's the point where it hung
>>> /shared/apps/centos-6.6_SB/OpenMPI/1.8.5/PGI-15.5_CUDA-7.0_HWLoc-1.10.1_NUMACtl-2.0.9/bin/mpif90 -fPIC -m64 -tp=px -o nf90_test fortlib.o nf90_error.o nf90_test.o test_read.o test_write.o util.o test_get.o test_put.o test_iget.o test_iput.o /shared/apps/centos-6.6_SB/PNetCDF/1.6.0/OpenMPI-1.8.5_PGI-15.5_CUDA-7.0/distro/src/lib/libpnetcdf.a
>>> rm -f ./scratch.nc ./test.nc
>>> ./nf90_test -c -d .
>>> ./nf90_test -d .
>>> =>> PBS: job killed: walltime 3636 exceeded limit 3600
>>> so it finished the F77 tests and had just built the F90 tests.
>>> This looks like an issue with the PGI 15.5 compiler, I'd like to be able to reproduce the hang if I can.
>>> The config.log is attached here.
>>> Thanks,
>>>
>>> Carl
>>> This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>>> <config.log>
>>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list