[opa-nightly-tests] jam OPA_Daily_Tests_0404Mon_FAILED!!!
Dave Goodell
goodell at mcs.anl.gov
Mon Apr 4 16:08:15 CDT 2011
Those sound like reasonable options, although if there is substantial concurrent load on the system when we run our tests then a new threshold probably won't help that much. We can also let it run a few more nights and see if it happens again before messing with it.
Thanks for taking a look.
-Dave
On Apr 4, 2011, at 3:58 PM CDT, Neil Fortner wrote:
> Dave,
>
> amani is a dual core Opteron with CentOS 64. It has been tested since 3/4. It looks like what happened is the test hit the "timeout" for the dequeuer - it tried to dequeue 1000 times more than there were expected elements in the queue. I'd guess that this is a result of whatever the system load was at that time and the possibly different scheduling behaviour on Linux when paired with an Opteron. For what it's worth, "smirom" also ran an Opterons back when it was tested (exact same model in fact - 1216), though I believe it ran SuSe 64 then. I could possibly raise the threshold to 10,000, change it to a warning, and add cleanup code for the case where it times out. What do you think?
>
> Thanks,
> -Neil
>
>
> On 04/04/2011 02:25 PM, Dave Goodell wrote:
>> Neil, any idea what's up here? These look like real failures in the queue test, which could be caused by one or more of the following things:
>>
>> 1) bad queue code
>>
>> 2) bad queue test code
>>
>> 3) bad build system / #ifdef code that ends up selecting the wrong primitives implementation
>>
>> 4) an actual bug in the x86_64 primitives that hasn't been flushed out yet by other tests
>>
>> 5) hardware issues
>>
>> This looks like a new machine in the testing lineup. How much confidence do you have in amani?
>>
>> -Dave
>>
>> On Apr 4, 2011, at 12:22 AM CDT, HDF Tester wrote:
>>
>>> *** OPA Tests on 0404Mon ***
>>> =============================
>>> Tests Summary
>>> =============================
>>> ****FAILED amani: standard****
>>>
>>> PASSED duty: standard
>>> PASSED heiwa: standard
>>> PASSED jam: standard
>>> PASSED linew: standard
>>>
>>> =============================
>>> Tests Time Summary
>>> =============================
>>> jam: Ran 1(1/0/0) tests, Grand total test time = 1m 22s
>>> amani: Ran 1(0/1/0) tests, Grand total test time = 2m 16s
>>> heiwa: Ran 1(1/0/0) tests, Grand total test time = 5m 37s
>>> duty: Ran 1(1/0/0) tests, Grand total test time = 5m 51s
>>> linew: Ran 1(1/0/0) tests, Grand total test time = 11m 40s
>>> jam: Ran 6(0/0/0) hosts, Grand total test time = 12m 37s
>>>
>>>
>>> =============================
>>> Timekeeper log
>>> =============================
>>> Timekeeper started at Mon Apr 4 00:10:29 CDT 2011
>>> Timekeeper sleeping for 1800 seconds
>>>
>>>
>>> =============================
>>> Tests Failures
>>> =============================
>>> =========================
>>> Dumping logfile of amani: standard
>>> Last 50 lines of /mnt/scr1/SnapTest/snapshots-opa/log/amani_0404Mon_0010
>>> =========================
>>> LL/SC not available
>>> Testing pointer LL/SC stack -SKIP-
>>> LL/SC not available
>>> All primitives tests passed.
>>> PASS: test_primitives
>>> Testing memory barrier sanity PASSED
>>> Testing memory barriers with linear array with 2 threads PASSED
>>> Testing memory barriers with local variables with 2 threads PASSED
>>> Testing memory barriers with scattered array with 2 threads PASSED
>>> Testing memory barriers with linear array with 4 threads PASSED
>>> Testing memory barriers with local variables with 4 threads PASSED
>>> Testing memory barriers with scattered array with 4 threads PASSED
>>> Testing memory barriers with linear array with 10 threads PASSED
>>> Testing memory barriers with local variables with 10 threads PASSED
>>> Testing memory barriers with scattered array with 10 threads PASSED
>>> Testing memory barriers with linear array with 100 threads PASSED
>>> Testing memory barriers with local variables with 100 threads PASSED
>>> Testing memory barriers with scattered array with 100 threads PASSED
>>> All barriers tests passed.
>>> PASS: test_barriers
>>> Testing queue sanity PASSED
>>> Testing multithreaded queue with 2 threads PASSED
>>> Testing multithreaded queue (empty queue) with 2 threads PASSED
>>> Testing multithreaded queue (full queue) with 2 threads PASSED
>>> Testing multithreaded queue with 4 threads PASSED
>>> Testing multithreaded queue (empty queue) with 4 threads PASSED
>>> Testing multithreaded queue (full queue) with 4 threads PASSED
>>> Testing multithreaded queue with 10 threads PASSED
>>> Testing multithreaded queue (empty queue) with 10 threads Incorrect number of elements dequeued: 4031909 Expected: 4500000
>>> *FAILED*
>>> at /home/hdftest/snapshots-opa/current/test/test_queue.c:399 in test_queue_threaded()...
>>> Unexpected return from 1 thread
>>> Testing multithreaded queue (full queue) with 10 threads PASSED
>>> Testing multithreaded queue with 100 threads PASSED
>>> Testing multithreaded queue (empty queue) with 100 threads PASSED
>>> Testing multithreaded queue (full queue) with 100 threads PASSED
>>> ***** 1 QUEUE TEST FAILED! *****
>>> FAIL: test_queue
>>> ===================================================================
>>> 1 of 4 tests failed
>>> Please report to https://trac.mcs.anl.gov/projects/openpa/newticket
>>> ===================================================================
>>> gmake[2]: *** [check-TESTS] Error 1
>>> gmake[2]: Leaving directory `/scr/hdftest/snapshots-opa/TestDir/amani/test'
>>> gmake[1]: *** [check-am] Error 2
>>> gmake[1]: Leaving directory `/scr/hdftest/snapshots-opa/TestDir/amani/test'
>>> gmake: *** [check-recursive] Error 1
>>> Failed running make check
>>> ===== Exit bin/snapshot with status=2: Mon Apr 4 00:13:06 CDT 2011 =====
>>> Mon Apr 4 00:13:06 CDT 2011
>>> =========================
>>> Dumping done
>>> =========================
>>>
>>> Runtest did not exit normally.
>>>
>>> =============================
>>> Watchers List
>>> =============================
>>> OPA Daily test features/platforms watchers and procedure
>>> ---------------------------------------------------------
>>>
>>> Procedure:
>>> The watcher will investigate and report the cause of failure by 11am.
>>> The developer who checked in the error code may report so by then too.
>>> The watcher or the developer should get the failure fixed and report it
>>> by 3pm.
>>>
>>>
>>> Watcher for OPA: Neil
>>>
>>>
>>> ---
>>> updated: 2009/05/05
>>>
>>> =============================
>>> Tests Details
>>> =============================
>>> 00:10:09 up 59 days, 14:42, 74 users, load average: 1.90, 1.98, 2.20
>>> Filesystem 1K-blocks Used Available Use% Mounted on
>>> /dev/sda3 31738420 4065196 26034996 14% /
>>> /dev/sda1 101086 22885 72982 24% /boot
>>> /dev/sda2 31738420 209024 29891168 1% /tmp
>>> /dev/sda6 31738392 5959784 24140384 20% /var
>>> /dev/sda7 31738392 30271700 0 100% /usr
>>> /dev/sda8 124991068 4814828 113724540 5% /var/tmp
>>> /dev/mapper/VolGroup00-home
>>> 198351840 15132624 172980856 9% /home
>>> /dev/sde1 565688764 380484688 156005140 71% /scr
>>> /dev/sdc1 961432072 818852580 93741492 90% /mnt/scr1
>>> /dev/sdd1 961432072 885174524 27419548 97% /mnt/hdf
>>> tmpfs 8313848 16 8313832 1% /dev/shm
>>> gumund:/data/ftp 480719072 285632320 170667552 63% /mnt/ftp
>>> gumund:/data/web 480719072 285632320 170667552 63% /mnt/web
>>> amani:/mnt/rw-src 288451232 179573408 94225344 66% /mnt/ro-src
>>> amani:/mnt/rw-src 288451232 179573408 94225344 66% /mnt/rw-src
>>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>>> TEST_TYPES=standard
>>>
>>> Running source repository checkout with output saved in
>>> /mnt/scr1/SnapTest/snapshots-opa/log/REPO_LOG_0404Mon
>>> Checking MANIFEST file ...
>>> cat: /mnt/scr1/SnapTest/snapshots-opa/log/#runtest.0404Mon.8656: No such file or directory
>>> rm: cannot remove `/mnt/scr1/SnapTest/snapshots-opa/log/#runtest.0404Mon.8656': No such file or directory
>>>
>>> Mon Apr 4 00:10:29 CDT 2011
>>> *** launching tests from jam ***
>>>
>>> TESTHOST is linew
>>> jam
>>> amani
>>> heiwa
>>> duty
>>> liberty
>>> Fork off timekeeper 30
>>> cannot remote command with liberty
>>> ==============
>>> Testing linew
>>> ==============
>>> ssh linew -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname linew
>>> 12:10am up 23 day(s), 6:07, 4 users, load average: 4.88, 4.15, 3.59
>>> / (/dev/dsk/c1t0d0s0 ):71487500 blocks 8273143 files
>>> /devices (/devices ): 0 blocks 0 files
>>> /system/contract (ctfs ): 0 blocks 2147483605 files
>>> /proc (proc ): 0 blocks 29893 files
>>> /etc/mnttab (mnttab ): 0 blocks 0 files
>>> /etc/svc/volatile (swap ):20458480 blocks 1182196 files
>>> /system/object (objfs ): 0 blocks 2147483493 files
>>> /etc/dfs/sharetab (sharefs ): 0 blocks 2147483646 files
>>> /platform/sun4u-us3/lib/libc_psr.so.1(/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1):71487500 blocks 8273143 files
>>> /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1(/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1):71487500 blocks 8273143 files
>>> /dev/fd (fd ): 0 blocks 0 files
>>> /tmp (swap ):20458480 blocks 1182196 files
>>> /var/run (swap ):20458480 blocks 1182196 files
>>> /scr (/dev/dsk/c1t1d0s0 ):896681508 blocks 226811321 files
>>> /home (jam:/home ):366438432 blocks 50913130 files
>>> /mnt/hdf (jam:/mnt/hdf ):152515096 blocks 114002233 files
>>> /mnt/scr1 (jam:/mnt/scr1 ):285157600 blocks 116085930 files
>>> /mnt/web (gumund:/mnt/web ):390173488 blocks 60893585 files
>>> /mnt/ftp (gumund:/mnt/ftp ):390173488 blocks 60893585 files
>>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>>> TEST_TYPES=standard
>>>
>>> Mon Apr 4 00:11:01 CDT 2011
>>> *** starting standard tests in linew ***
>>> Uname -a: SunOS linew 5.10 Generic_144488-07 sun4u sparc SUNW,A70
>>> Running snapshot with output saved in
>>> /mnt/scr1/SnapTest/snapshots-opa/log/linew_0404Mon_0011
>>> PASSED linew: standard
>>> *** finished standard tests for linew ***
>>> Mon Apr 4 00:22:23 CDT 2011
>>> Total time = 11m 24s
>>>
>>> *** finished tests in linew ***
>>> Mon Apr 4 00:22:25 CDT 2011
>>> linew: Ran 1(1/0/0) tests, Grand total test time = 11m 40s
>>>
>>>
>>> ==============
>>> Testing jam
>>> ==============
>>> ssh jam -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname jam
>>> 00:10:37 up 59 days, 14:42, 74 users, load average: 2.20, 2.04, 2.21
>>> Filesystem 1K-blocks Used Available Use% Mounted on
>>> /dev/sda3 31738420 4065196 26034996 14% /
>>> /dev/sda1 101086 22885 72982 24% /boot
>>> /dev/sda2 31738420 209144 29891048 1% /tmp
>>> /dev/sda6 31738392 5959792 24140376 20% /var
>>> /dev/sda7 31738392 30271700 0 100% /usr
>>> /dev/sda8 124991068 4814828 113724540 5% /var/tmp
>>> /dev/mapper/VolGroup00-home
>>> 198351840 15132624 172980856 9% /home
>>> /dev/sde1 565688764 380484688 156005140 71% /scr
>>> /dev/sdc1 961432072 818853280 93740792 90% /mnt/scr1
>>> /dev/sdd1 961432072 885174524 27419548 97% /mnt/hdf
>>> tmpfs 8313848 16 8313832 1% /dev/shm
>>> gumund:/data/ftp 480719072 285632320 170667552 63% /mnt/ftp
>>> gumund:/data/web 480719072 285632320 170667552 63% /mnt/web
>>> amani:/mnt/rw-src 288451232 179573408 94225344 66% /mnt/ro-src
>>> amani:/mnt/rw-src 288451232 179573408 94225344 66% /mnt/rw-src
>>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>>> TEST_TYPES=standard
>>>
>>> Mon Apr 4 00:10:47 CDT 2011
>>> *** starting standard tests in jam ***
>>> Uname -a: Linux jam 2.6.18-194.3.1.el5PAE #1 SMP Thu May 13 13:48:44 EDT 2010 i686 i686 i386 GNU/Linux
>>> Running snapshot with output saved in
>>> /mnt/scr1/SnapTest/snapshots-opa/log/jam_0404Mon_0010
>>> PASSED jam: standard
>>> *** finished standard tests for jam ***
>>> Mon Apr 4 00:12:09 CDT 2011
>>> Total time = 1m 22s
>>>
>>> *** finished tests in jam ***
>>> Mon Apr 4 00:12:09 CDT 2011
>>> jam: Ran 1(1/0/0) tests, Grand total test time = 1m 22s
>>>
>>>
>>> ==============
>>> Testing amani
>>> ==============
>>> ssh amani -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname amani
>>> 00:10:40 up 48 days, 10:40, 3 users, load average: 1.02, 1.15, 1.92
>>> Filesystem 1K-blocks Used Available Use% Mounted on
>>> /dev/hda6 39021196 29788760 7218292 81% /
>>> /dev/hda3 12192636 4142712 7420580 36% /var
>>> /dev/hda2 12192636 162848 11400444 2% /tmp
>>> /dev/hda1 101086 23784 72083 25% /boot
>>> tmpfs 2057868 0 2057868 0% /dev/shm
>>> jam:/home 198351840 15132608 172980864 9% /home
>>> jam:/mnt/hdf 961432096 885174528 27419552 97% /mnt/hdf
>>> jam:/mnt/scr1 961432096 818853408 93740672 90% /mnt/scr1
>>> smirom:/scr 267601440 37497024 216291712 15% /mnt/tmp
>>> gumund:/data/ftp 480719072 285632320 170667552 63% /mnt/ftp
>>> gumund:/data/web 480719072 285632320 170667552 63% /mnt/web
>>> /dev/sdb1 144221592 75309328 61586224 56% /scr
>>> /dev/sda1 288451232 179573436 94225316 66% /mnt/rw-src
>>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>>> TEST_TYPES=standard
>>>
>>> Mon Apr 4 00:10:50 CDT 2011
>>> *** starting standard tests in amani ***
>>> Uname -a: Linux amani 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
>>> Running snapshot with output saved in
>>> /mnt/scr1/SnapTest/snapshots-opa/log/amani_0404Mon_0010
>>> *************************************
>>> Mon Apr 4 00:13:06 CDT 2011
>>> ****FAILED amani: standard****
>>> *************************************
>>> *** finished standard tests for amani ***
>>> Mon Apr 4 00:13:06 CDT 2011
>>> Total time = 2m 16s
>>>
>>> *** finished tests in amani ***
>>> Mon Apr 4 00:13:06 CDT 2011
>>> amani: Ran 1(0/1/0) tests, Grand total test time = 2m 16s
>>>
>>> ****SYSTEM ERROR amani: Abnormal exit from runtest ****
>>>
>>> *************************************
>>> Mon Apr 4 00:22:56 CDT 2011
>>> ****SYSTEM ERROR amani: runtest command failed ****
>>> *************************************
>>>
>>> ==============
>>> Testing heiwa
>>> ==============
>>> ssh heiwa -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname heiwa
>>> 00:10:36 up 30 days, 11:41, 5 users, load average: 0.97, 1.03, 1.06
>>> Filesystem 1K-blocks Used Available Use% Mounted on
>>> /dev/sdb8 35435232 2727740 30907452 9% /
>>> tmpfs 2008144 0 2008144 0% /dev/shm
>>> /dev/sdb4 198337 106660 81437 57% /boot
>>> /dev/sdb6 12385456 2122628 9633684 19% /tmp
>>> /dev/sdb7 12385456 3940944 7815368 34% /var
>>> /dev/sda6 82573108 686812 77691728 1% /scr
>>> jam:/mnt/hdf 961432096 885174528 27419552 97% /mnt/hdf
>>> jam:/mnt/scr1 961432096 818853600 93740480 90% /mnt/scr1
>>> jam:/home 198351840 15132608 172980864 9% /home
>>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>>> TEST_TYPES=standard
>>>
>>> Mon Apr 4 00:10:46 CDT 2011
>>> *** starting standard tests in heiwa ***
>>> Uname -a: Linux heiwa 2.6.32.16-150.fc12.ppc64 #1 SMP Sat Jul 24 05:19:27 UTC 2010 ppc64 ppc64 ppc64 GNU/Linux
>>> Running snapshot with output saved in
>>> /mnt/scr1/SnapTest/snapshots-opa/log/heiwa_0404Mon_0010
>>> PASSED heiwa: standard
>>> *** finished standard tests for heiwa ***
>>> Mon Apr 4 00:16:23 CDT 2011
>>> Total time = 5m 37s
>>>
>>> *** finished tests in heiwa ***
>>> Mon Apr 4 00:16:23 CDT 2011
>>> heiwa: Ran 1(1/0/0) tests, Grand total test time = 5m 37s
>>>
>>>
>>> ==============
>>> Testing duty
>>> ==============
>>> ssh duty -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname duty
>>> 12:10AM up 95 days, 8:25, 21 users, load averages: 0.08, 0.03, 0.08
>>> Filesystem 1K-blocks Used Avail Capacity Mounted on
>>> /dev/aacd0s1a 507630 60986 406034 13% /
>>> devfs 1 1 0 100% /dev
>>> /dev/aacd0s1h 188433016 123604522 49753854 71% /data
>>> /dev/aacd0s1g 32494668 24598546 5296550 82% /local_home
>>> /dev/aacd0s1f 12186190 4125638 7085658 37% /usr
>>> /dev/aacd0s1d 507630 50350 416670 11% /var
>>> /dev/aacd0s1e 253678 178 233206 0% /var/tmp
>>> procfs 4 4 0 100% /proc
>>> jam:/home 198351840 15132624 172980856 8% /home
>>> jam:/mnt/hdf 961432072 885174524 27419548 97% /mnt/hdf
>>> jam:/mnt/scr1 961432072 818853640 93740432 90% /mnt/scr1
>>> gumund:/data/web 480719056 285632312 170667544 63% /mnt/web
>>> gumund:/data/ftp 480719056 285632312 170667544 63% /mnt/ftp
>>> devfs 1 1 0 100% /var/named/dev
>>> /dev/md0 126702 98 116468 0% /tmp
>>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>>> TEST_TYPES=standard
>>>
>>> Mon Apr 4 00:10:54 CDT 2011
>>> *** starting standard tests in duty ***
>>> Uname -a: FreeBSD duty.hdfgroup.uiuc.edu 6.3-STABLE FreeBSD 6.3-STABLE #1: Fri Jul 25 17:10:59 CDT 2008 sukoziol at duty.hdfgroup.uiuc.edu:/usr/obj/usr/src/sys/DUTY i386
>>> Running snapshot with output saved in
>>> /mnt/scr1/SnapTest/snapshots-opa/log/duty_0404Mon_0010
>>> PASSED duty: standard
>>> *** finished standard tests for duty ***
>>> Mon Apr 4 00:16:45 CDT 2011
>>> Total time = 5m 51s
>>>
>>> *** finished tests in duty ***
>>> Mon Apr 4 00:16:45 CDT 2011
>>> duty: Ran 1(1/0/0) tests, Grand total test time = 5m 51s
>>>
>>>
>>> ==============
>>> Testing liberty
>>> ==============
>>> liberty does not accept Remote Command (Mon Apr 4 00:11:18 CDT 2011)
>>> *************************************
>>> Mon Apr 4 00:11:18 CDT 2011
>>> ****SYSTEM ERROR: liberty does not accept Remote Command (Mon Apr 4 00:11:18 CDT 2011)
>>> *************************************
>>> *************************************
>>> Mon Apr 4 00:22:56 CDT 2011
>>> ****INCOMPLETE liberty: snaptest did not complete****
>>> *************************************
>>>
>>> *** finished tests in jam ***
>>> Mon Apr 4 00:22:56 CDT 2011
>>> jam: Ran 6(0/0/0) hosts, Grand total test time = 12m 37s
>>>
>>> _______________________________________________
>>> opa-nightly-tests mailing list
>>> opa-nightly-tests at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/opa-nightly-tests
>> _______________________________________________
>> opa-nightly-tests mailing list
>> opa-nightly-tests at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/opa-nightly-tests
>
> _______________________________________________
> opa-nightly-tests mailing list
> opa-nightly-tests at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/opa-nightly-tests
More information about the opa-nightly-tests
mailing list