[opa-nightly-tests] jam OPA_Daily_Tests_0404Mon_FAILED!!!

Neil Fortner nfortne2 at hdfgroup.org
Mon Apr 4 15:58:43 CDT 2011


Dave,

amani is a dual core Opteron with CentOS 64.  It has been tested since 
3/4.  It looks like what happened is the test hit the "timeout" for the 
dequeuer - it tried to dequeue 1000 times more than there were expected 
elements in the queue.  I'd guess that this is a result of whatever the 
system load was at that time and the possibly different scheduling 
behaviour on Linux when paired with an Opteron.  For what it's worth, 
"smirom" also ran an Opterons back when it was tested (exact same model 
in fact - 1216), though I believe it ran SuSe 64 then.  I could possibly 
raise the threshold to 10,000, change it to a warning, and add cleanup 
code for the case where it times out.  What do you think?

Thanks,
-Neil


On 04/04/2011 02:25 PM, Dave Goodell wrote:
> Neil, any idea what's up here?  These look like real failures in the queue test, which could be caused by one or more of the following things:
>
> 1) bad queue code
>
> 2) bad queue test code
>
> 3) bad build system / #ifdef code that ends up selecting the wrong primitives implementation
>
> 4) an actual bug in the x86_64 primitives that hasn't been flushed out yet by other tests
>
> 5) hardware issues
>
> This looks like a new machine in the testing lineup.  How much confidence do you have in amani?
>
> -Dave
>
> On Apr 4, 2011, at 12:22 AM CDT, HDF Tester wrote:
>
>> *** OPA Tests on 0404Mon ***
>> =============================
>>    Tests Summary
>> =============================
>> ****FAILED amani: standard****
>>
>> PASSED duty: standard
>> PASSED heiwa: standard
>> PASSED jam: standard
>> PASSED linew: standard
>>
>> =============================
>>    Tests Time Summary
>> =============================
>> jam: Ran 1(1/0/0) tests, Grand total test time =  1m 22s
>> amani: Ran 1(0/1/0) tests, Grand total test time =  2m 16s
>> heiwa: Ran 1(1/0/0) tests, Grand total test time =  5m 37s
>> duty: Ran 1(1/0/0) tests, Grand total test time =  5m 51s
>> linew: Ran 1(1/0/0) tests, Grand total test time =  11m 40s
>> jam: Ran 6(0/0/0) hosts, Grand total test time =  12m 37s
>>
>>
>> =============================
>>    Timekeeper log
>> =============================
>> Timekeeper started at Mon Apr  4 00:10:29 CDT 2011
>> Timekeeper sleeping for 1800 seconds
>>
>>
>> =============================
>>    Tests Failures
>> =============================
>> =========================
>> Dumping logfile of amani: standard
>> Last 50 lines of /mnt/scr1/SnapTest/snapshots-opa/log/amani_0404Mon_0010
>> =========================
>>     LL/SC not available
>> Testing pointer LL/SC stack                                            -SKIP-
>>     LL/SC not available
>> All primitives tests passed.
>> PASS: test_primitives
>> Testing memory barrier sanity                                          PASSED
>> Testing memory barriers with linear array with 2 threads               PASSED
>> Testing memory barriers with local variables with 2 threads            PASSED
>> Testing memory barriers with scattered array with 2 threads            PASSED
>> Testing memory barriers with linear array with 4 threads               PASSED
>> Testing memory barriers with local variables with 4 threads            PASSED
>> Testing memory barriers with scattered array with 4 threads            PASSED
>> Testing memory barriers with linear array with 10 threads              PASSED
>> Testing memory barriers with local variables with 10 threads           PASSED
>> Testing memory barriers with scattered array with 10 threads           PASSED
>> Testing memory barriers with linear array with 100 threads             PASSED
>> Testing memory barriers with local variables with 100 threads          PASSED
>> Testing memory barriers with scattered array with 100 threads          PASSED
>> All barriers tests passed.
>> PASS: test_barriers
>> Testing queue sanity                                                   PASSED
>> Testing multithreaded queue with 2 threads                             PASSED
>> Testing multithreaded queue (empty queue) with 2 threads               PASSED
>> Testing multithreaded queue (full queue) with 2 threads                PASSED
>> Testing multithreaded queue with 4 threads                             PASSED
>> Testing multithreaded queue (empty queue) with 4 threads               PASSED
>> Testing multithreaded queue (full queue) with 4 threads                PASSED
>> Testing multithreaded queue with 10 threads                            PASSED
>> Testing multithreaded queue (empty queue) with 10 threads                 Incorrect number of elements dequeued: 4031909 Expected: 4500000
>> *FAILED*
>>         at /home/hdftest/snapshots-opa/current/test/test_queue.c:399 in test_queue_threaded()...
>>     Unexpected return from 1 thread
>> Testing multithreaded queue (full queue) with 10 threads               PASSED
>> Testing multithreaded queue with 100 threads                           PASSED
>> Testing multithreaded queue (empty queue) with 100 threads             PASSED
>> Testing multithreaded queue (full queue) with 100 threads              PASSED
>> ***** 1 QUEUE TEST FAILED! *****
>> FAIL: test_queue
>> ===================================================================
>> 1 of 4 tests failed
>> Please report to https://trac.mcs.anl.gov/projects/openpa/newticket
>> ===================================================================
>> gmake[2]: *** [check-TESTS] Error 1
>> gmake[2]: Leaving directory `/scr/hdftest/snapshots-opa/TestDir/amani/test'
>> gmake[1]: *** [check-am] Error 2
>> gmake[1]: Leaving directory `/scr/hdftest/snapshots-opa/TestDir/amani/test'
>> gmake: *** [check-recursive] Error 1
>> Failed running make check
>> ===== Exit bin/snapshot with status=2: Mon Apr  4 00:13:06 CDT 2011 =====
>> Mon Apr  4 00:13:06 CDT 2011
>> =========================
>> Dumping done
>> =========================
>>
>> Runtest did not exit normally.
>>
>> =============================
>>    Watchers List
>> =============================
>> OPA Daily test features/platforms watchers and procedure
>> ---------------------------------------------------------
>>
>> Procedure:
>> The watcher will investigate and report the cause of failure by 11am.
>> The developer who checked in the error code may report so by then too.
>> The watcher or the developer should get the failure fixed and report it
>> by 3pm.
>>
>>
>> Watcher for OPA:	 	Neil
>>
>>
>> ---
>> updated: 2009/05/05
>>
>> =============================
>>    Tests Details
>> =============================
>> 00:10:09 up 59 days, 14:42, 74 users,  load average: 1.90, 1.98, 2.20
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sda3             31738420   4065196  26034996  14% /
>> /dev/sda1               101086     22885     72982  24% /boot
>> /dev/sda2             31738420    209024  29891168   1% /tmp
>> /dev/sda6             31738392   5959784  24140384  20% /var
>> /dev/sda7             31738392  30271700         0 100% /usr
>> /dev/sda8            124991068   4814828 113724540   5% /var/tmp
>> /dev/mapper/VolGroup00-home
>>                      198351840  15132624 172980856   9% /home
>> /dev/sde1            565688764 380484688 156005140  71% /scr
>> /dev/sdc1            961432072 818852580  93741492  90% /mnt/scr1
>> /dev/sdd1            961432072 885174524  27419548  97% /mnt/hdf
>> tmpfs                  8313848        16   8313832   1% /dev/shm
>> gumund:/data/ftp     480719072 285632320 170667552  63% /mnt/ftp
>> gumund:/data/web     480719072 285632320 170667552  63% /mnt/web
>> amani:/mnt/rw-src    288451232 179573408  94225344  66% /mnt/ro-src
>> amani:/mnt/rw-src    288451232 179573408  94225344  66% /mnt/rw-src
>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>> TEST_TYPES=standard
>>
>> Running source repository checkout with output saved in
>>    /mnt/scr1/SnapTest/snapshots-opa/log/REPO_LOG_0404Mon
>> Checking MANIFEST file ...
>> cat: /mnt/scr1/SnapTest/snapshots-opa/log/#runtest.0404Mon.8656: No such file or directory
>> rm: cannot remove `/mnt/scr1/SnapTest/snapshots-opa/log/#runtest.0404Mon.8656': No such file or directory
>>
>> Mon Apr  4 00:10:29 CDT 2011
>> *** launching tests from jam ***
>>
>> TESTHOST is linew
>> jam
>> amani
>> heiwa
>> duty
>> liberty
>>     Fork off timekeeper 30
>> cannot remote command with liberty
>> ==============
>> Testing linew
>> ==============
>> ssh linew -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname linew
>> 12:10am  up 23 day(s),  6:07,  4 users,  load average: 4.88, 4.15, 3.59
>> /                  (/dev/dsk/c1t0d0s0 ):71487500 blocks  8273143 files
>> /devices           (/devices          ):       0 blocks        0 files
>> /system/contract   (ctfs              ):       0 blocks 2147483605 files
>> /proc              (proc              ):       0 blocks    29893 files
>> /etc/mnttab        (mnttab            ):       0 blocks        0 files
>> /etc/svc/volatile  (swap              ):20458480 blocks  1182196 files
>> /system/object     (objfs             ):       0 blocks 2147483493 files
>> /etc/dfs/sharetab  (sharefs           ):       0 blocks 2147483646 files
>> /platform/sun4u-us3/lib/libc_psr.so.1(/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1):71487500 blocks  8273143 files
>> /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1(/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1):71487500 blocks  8273143 files
>> /dev/fd            (fd                ):       0 blocks        0 files
>> /tmp               (swap              ):20458480 blocks  1182196 files
>> /var/run           (swap              ):20458480 blocks  1182196 files
>> /scr               (/dev/dsk/c1t1d0s0 ):896681508 blocks 226811321 files
>> /home              (jam:/home         ):366438432 blocks 50913130 files
>> /mnt/hdf           (jam:/mnt/hdf      ):152515096 blocks 114002233 files
>> /mnt/scr1          (jam:/mnt/scr1     ):285157600 blocks 116085930 files
>> /mnt/web           (gumund:/mnt/web   ):390173488 blocks 60893585 files
>> /mnt/ftp           (gumund:/mnt/ftp   ):390173488 blocks 60893585 files
>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>> TEST_TYPES=standard
>>
>> Mon Apr  4 00:11:01 CDT 2011
>> *** starting standard tests in linew ***
>> Uname -a: SunOS linew 5.10 Generic_144488-07 sun4u sparc SUNW,A70
>> Running snapshot with output saved in
>>    /mnt/scr1/SnapTest/snapshots-opa/log/linew_0404Mon_0011
>> PASSED linew: standard
>> *** finished standard tests for linew ***
>> Mon Apr  4 00:22:23 CDT 2011
>> Total time = 11m 24s
>>
>> *** finished tests in linew ***
>> Mon Apr  4 00:22:25 CDT 2011
>> linew: Ran 1(1/0/0) tests, Grand total test time =  11m 40s
>>
>>
>> ==============
>> Testing jam
>> ==============
>> ssh jam -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname jam
>> 00:10:37 up 59 days, 14:42, 74 users,  load average: 2.20, 2.04, 2.21
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sda3             31738420   4065196  26034996  14% /
>> /dev/sda1               101086     22885     72982  24% /boot
>> /dev/sda2             31738420    209144  29891048   1% /tmp
>> /dev/sda6             31738392   5959792  24140376  20% /var
>> /dev/sda7             31738392  30271700         0 100% /usr
>> /dev/sda8            124991068   4814828 113724540   5% /var/tmp
>> /dev/mapper/VolGroup00-home
>>                      198351840  15132624 172980856   9% /home
>> /dev/sde1            565688764 380484688 156005140  71% /scr
>> /dev/sdc1            961432072 818853280  93740792  90% /mnt/scr1
>> /dev/sdd1            961432072 885174524  27419548  97% /mnt/hdf
>> tmpfs                  8313848        16   8313832   1% /dev/shm
>> gumund:/data/ftp     480719072 285632320 170667552  63% /mnt/ftp
>> gumund:/data/web     480719072 285632320 170667552  63% /mnt/web
>> amani:/mnt/rw-src    288451232 179573408  94225344  66% /mnt/ro-src
>> amani:/mnt/rw-src    288451232 179573408  94225344  66% /mnt/rw-src
>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>> TEST_TYPES=standard
>>
>> Mon Apr  4 00:10:47 CDT 2011
>> *** starting standard tests in jam ***
>> Uname -a: Linux jam 2.6.18-194.3.1.el5PAE #1 SMP Thu May 13 13:48:44 EDT 2010 i686 i686 i386 GNU/Linux
>> Running snapshot with output saved in
>>    /mnt/scr1/SnapTest/snapshots-opa/log/jam_0404Mon_0010
>> PASSED jam: standard
>> *** finished standard tests for jam ***
>> Mon Apr  4 00:12:09 CDT 2011
>> Total time = 1m 22s
>>
>> *** finished tests in jam ***
>> Mon Apr  4 00:12:09 CDT 2011
>> jam: Ran 1(1/0/0) tests, Grand total test time =  1m 22s
>>
>>
>> ==============
>> Testing amani
>> ==============
>> ssh amani -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname amani
>> 00:10:40 up 48 days, 10:40,  3 users,  load average: 1.02, 1.15, 1.92
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/hda6             39021196  29788760   7218292  81% /
>> /dev/hda3             12192636   4142712   7420580  36% /var
>> /dev/hda2             12192636    162848  11400444   2% /tmp
>> /dev/hda1               101086     23784     72083  25% /boot
>> tmpfs                  2057868         0   2057868   0% /dev/shm
>> jam:/home            198351840  15132608 172980864   9% /home
>> jam:/mnt/hdf         961432096 885174528  27419552  97% /mnt/hdf
>> jam:/mnt/scr1        961432096 818853408  93740672  90% /mnt/scr1
>> smirom:/scr          267601440  37497024 216291712  15% /mnt/tmp
>> gumund:/data/ftp     480719072 285632320 170667552  63% /mnt/ftp
>> gumund:/data/web     480719072 285632320 170667552  63% /mnt/web
>> /dev/sdb1            144221592  75309328  61586224  56% /scr
>> /dev/sda1            288451232 179573436  94225316  66% /mnt/rw-src
>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>> TEST_TYPES=standard
>>
>> Mon Apr  4 00:10:50 CDT 2011
>> *** starting standard tests in amani ***
>> Uname -a: Linux amani 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
>> Running snapshot with output saved in
>>    /mnt/scr1/SnapTest/snapshots-opa/log/amani_0404Mon_0010
>> 	*************************************
>> 	Mon Apr  4 00:13:06 CDT 2011
>> 	****FAILED amani: standard****
>> 	*************************************
>> *** finished standard tests for amani ***
>> Mon Apr  4 00:13:06 CDT 2011
>> Total time = 2m 16s
>>
>> *** finished tests in amani ***
>> Mon Apr  4 00:13:06 CDT 2011
>> amani: Ran 1(0/1/0) tests, Grand total test time =  2m 16s
>>
>> ****SYSTEM ERROR amani: Abnormal exit from runtest ****
>>
>> 	*************************************
>> 	Mon Apr  4 00:22:56 CDT 2011
>> 	****SYSTEM ERROR amani: runtest command failed ****
>> 	*************************************
>>
>> ==============
>> Testing heiwa
>> ==============
>> ssh heiwa -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname heiwa
>> 00:10:36 up 30 days, 11:41,  5 users,  load average: 0.97, 1.03, 1.06
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sdb8             35435232   2727740  30907452   9% /
>> tmpfs                  2008144         0   2008144   0% /dev/shm
>> /dev/sdb4               198337    106660     81437  57% /boot
>> /dev/sdb6             12385456   2122628   9633684  19% /tmp
>> /dev/sdb7             12385456   3940944   7815368  34% /var
>> /dev/sda6             82573108    686812  77691728   1% /scr
>> jam:/mnt/hdf         961432096 885174528  27419552  97% /mnt/hdf
>> jam:/mnt/scr1        961432096 818853600  93740480  90% /mnt/scr1
>> jam:/home            198351840  15132608 172980864   9% /home
>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>> TEST_TYPES=standard
>>
>> Mon Apr  4 00:10:46 CDT 2011
>> *** starting standard tests in heiwa ***
>> Uname -a: Linux heiwa 2.6.32.16-150.fc12.ppc64 #1 SMP Sat Jul 24 05:19:27 UTC 2010 ppc64 ppc64 ppc64 GNU/Linux
>> Running snapshot with output saved in
>>    /mnt/scr1/SnapTest/snapshots-opa/log/heiwa_0404Mon_0010
>> PASSED heiwa: standard
>> *** finished standard tests for heiwa ***
>> Mon Apr  4 00:16:23 CDT 2011
>> Total time = 5m 37s
>>
>> *** finished tests in heiwa ***
>> Mon Apr  4 00:16:23 CDT 2011
>> heiwa: Ran 1(1/0/0) tests, Grand total test time =  5m 37s
>>
>>
>> ==============
>> Testing duty
>> ==============
>> ssh duty -n cd /home/hdftest/snapshots-opa;/mnt/scr1/SnapTest/snapshots-opa/bin/runtest -nodiff -norepo -configname duty
>> 12:10AM  up 95 days,  8:25, 21 users, load averages: 0.08, 0.03, 0.08
>> Filesystem       1K-blocks      Used     Avail Capacity  Mounted on
>> /dev/aacd0s1a       507630     60986    406034    13%    /
>> devfs                    1         1         0   100%    /dev
>> /dev/aacd0s1h    188433016 123604522  49753854    71%    /data
>> /dev/aacd0s1g     32494668  24598546   5296550    82%    /local_home
>> /dev/aacd0s1f     12186190   4125638   7085658    37%    /usr
>> /dev/aacd0s1d       507630     50350    416670    11%    /var
>> /dev/aacd0s1e       253678       178    233206     0%    /var/tmp
>> procfs                   4         4         0   100%    /proc
>> jam:/home        198351840  15132624 172980856     8%    /home
>> jam:/mnt/hdf     961432072 885174524  27419548    97%    /mnt/hdf
>> jam:/mnt/scr1    961432072 818853640  93740432    90%    /mnt/scr1
>> gumund:/data/web 480719056 285632312 170667544    63%    /mnt/web
>> gumund:/data/ftp 480719056 285632312 170667544    63%    /mnt/ftp
>> devfs                    1         1         0   100%    /var/named/dev
>> /dev/md0            126702        98    116468     0%    /tmp
>> STANDARD_OPT=op-configure --prefix=${PWD}/opainstall
>> TEST_TYPES=standard
>>
>> Mon Apr  4 00:10:54 CDT 2011
>> *** starting standard tests in duty ***
>> Uname -a: FreeBSD duty.hdfgroup.uiuc.edu 6.3-STABLE FreeBSD 6.3-STABLE #1: Fri Jul 25 17:10:59 CDT 2008     sukoziol at duty.hdfgroup.uiuc.edu:/usr/obj/usr/src/sys/DUTY  i386
>> Running snapshot with output saved in
>>    /mnt/scr1/SnapTest/snapshots-opa/log/duty_0404Mon_0010
>> PASSED duty: standard
>> *** finished standard tests for duty ***
>> Mon Apr  4 00:16:45 CDT 2011
>> Total time = 5m 51s
>>
>> *** finished tests in duty ***
>> Mon Apr  4 00:16:45 CDT 2011
>> duty: Ran 1(1/0/0) tests, Grand total test time =  5m 51s
>>
>>
>> ==============
>> Testing liberty
>> ==============
>> liberty does not accept Remote Command (Mon Apr  4 00:11:18 CDT 2011)
>> 	*************************************
>> 	Mon Apr  4 00:11:18 CDT 2011
>> 	****SYSTEM ERROR: liberty does not accept Remote Command (Mon Apr  4 00:11:18 CDT 2011)
>> 	*************************************
>> 	*************************************
>> 	Mon Apr  4 00:22:56 CDT 2011
>> 	****INCOMPLETE liberty: snaptest did not complete****
>> 	*************************************
>>
>> *** finished tests in jam ***
>> Mon Apr  4 00:22:56 CDT 2011
>> jam: Ran 6(0/0/0) hosts, Grand total test time =  12m 37s
>>
>> _______________________________________________
>> opa-nightly-tests mailing list
>> opa-nightly-tests at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/opa-nightly-tests
> _______________________________________________
> opa-nightly-tests mailing list
> opa-nightly-tests at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/opa-nightly-tests



More information about the opa-nightly-tests mailing list