FW: [AVT] Acoustic Echo cancellation memo

Thu Oct 14 06:11:06 CDT 2004

Interesting...

--Ivan 

-----Original Message-----
From: avt-bounces at ietf.org [mailto:avt-bounces at ietf.org] On Behalf Of
Andre.Adrian at dfs.de
Sent: Thursday, October 14, 2004 2:26 AM
To: avt at ietf.org
Subject: [AVT] Acoustic Echo cancellation memo

Dear Members of Audio/Video Transport group,

as attachment you find a memo about "Acoustic Echo Cancellation". This memo
was created while developing a Voice-over-IP Prototype for intercom
communication between air traffic controllers by the german air traffic
control agency DFS.

Mr. Colin Perkins wrote me:
>We're grateful that you considered the IETF AVT working group as a 
>venue for this work.  Unfortunately, we don't have sufficient expertise 
>to effectively review it, and so cannot accept it as an AVT work item.
>If you have a paper on this subject, you're welcome to post a pointer 
>to the AVT mailing list to encourage uptake, though.

>I'm not sure what an appropriate venue for publication might be, 
>although the ITU-T has done related work in the past.

>Regards,
>Colin

As you can read in the memo, the algorithm and the implementation are
royalty free and should not be monopolized as intellectual property by DFS
or by others.
The software is currently implemented in kphone - a SIP softphone running on
Linux.

You can find the memo and the Kphone source file patches on
http://home.arcor.de/andreadrian/

With best regards,
  Andre Adrian
  Senior engineer

email work: <Andre.Adrian at dfs.de>
email home: <adrianandre at compuserve.de>

snail-mail:
DFS
Flughafen Frankfurt
Gebaeude 501
60549 Frankfurt
Germany

Tel: (++49) 69 69766 176
FAX: (++49) 69 69766 175

############################################################################
######
Attachment:

   Draft                                                   Andre Adrian
   Document: draft-avt-aec-01.txt            DFS Deutsche Flugsicherung
   Category: Experimental
   october 11th, 2004
   Expires: ?

                 Voice over Internet Acoustic Echo Cancellation

Status of this Memo

  This document specifies an Acoustic Echo Cancellation implementation for
  hands-free Voice over Internet telephony and requests discussion and
  suggestions for improvements.
  Distribution of this memo is unlimited.

Copyright Notice

  You are allowed to use this source code in any open source or closed
source
  software you want. You are allowed to use the algorithms for a hardware
  solution. You are allowed to modify the source code.
  You are not allowed to remove the name of the author from this memo or
from
  the source code files. You are not allowed to monopolize the source code
or
  the algorithms behind the source code as your intellectual property.
  This source code is free of royalty and comes with no warranty.

Abstract

  This document specifies an acoustic echo cancellation (AEC) for voice over
  IP. Because of the large latency in VoIP communication (tenths to hunderts
  of milliseconds), AEC is necessary. The presented implementation is based
on
  the well-known Normalized Least Means Square (NLMS) and Geigel Double talk
  detector (DTD) algorithms. To improve performance, a pre-whitening filter
  is used. The presented algorithm is therefore of NLMS-pw family.
  The NLMS-pw family is known to give good echo cancellation for moderate
  processing resources. This algorithm is of complexity O(3*L) with L number
  of taps in the NLMS filter.

Table of Contents

  1.  INTRODUCTION
  2.  AEC PRINCIPLES
  3.  AEC algorithms
  3.1.  Infinite Impulse Response (IIR) Highpass Filter
  3.2.  Geigel Double Talk Detector
  3.3.  Normalized Least Means Square - Pre-Whitening Filter
  4.  References

  A.  The C++ Source Code
  A.1 aec.h
  A.2 aec.cxx
  A.3 aec_test.cxx

1.  INTRODUCTION

  A hands-free telephone or full-duplex intercom system has a feedback or
  echo problem because the output from the loudspeaker feeds into the
  microphone. Several methods can be used to reduce or eliminate the
  problem:

  1.) Reduce the overall amplification. If the system amplification is less
  then 1 a feedback dies away. This solution leads to poor volume.

  2.) Use Acoustic Echo Suppression. Echo Suppression is realized with
speech
  activated switches. Suppression reduces the full-duplex telephone to
half-
  duplex. The switches can even "switch away" beginnings of words.

  3.) Use Acoustic Echo Cancellation. This is realized with an adaptive or
  learning filter. First the filter learns from given microphone and speaker
  signals the acoustics. After learning, the filter can calculate an
estimated
  microphone signal from the loudspeaker signal. This estimated mic signal
is
  subtracted from the real mic signal. The difference signal no longer
  contains the loudspeaker signal - the feedback loop is broken.

  The Least Means Square algorithm from Widrow and Hoff is known since 1960.
  Unfortunately the LMS is a slow learner. The learning speed or convergence
  rate is controlled by a constant value. This value in the LMS can only be
  optimized for loud signals or for weak signals. Optimizing for loud
signals
  produces slow convergence with weak signals. Optimizing for weak signals
  gives divergence with loud signals. Divergence can be defined as "the
filter
  does not reduce the echo but does increase the echo" and is very ugly.
  The Normalized LMS has a constant convergence rate for loud and weak
  signals, the convergence rate controlling parameter is derived from the
  signal energy.
  For white noise signal, where all frequencies have the same energy, the
  NLMS performs good. But the human speech has more energy in low
frequencies
  then in high frequencies. Therefore, a NLMS gives good echo cancellation
  for low frequencies and poor echo cancellation for high frequencies.
  A pre-whitening filter in front of the echo cancellation filter transforms
  human speech into something more "white noise" like - the energy of high
  frequency signals is similar to the energy of low frequency signals.
  The presented algorithm uses the most simple pre-whitening filter
possible,
  a first order or one pole highpass filter with transfer frequency equal to
  half of the sample frequency (4kHz for the narrowband sample frequency of
  8kHz).
  Because the pre-whitening filter is fixed, the complexity of this NLMS-pw
  filter is still the same as for the NLMS filter.

  One important point should be remembered: The AEC in your telephony device
  helps your telephony partner to hear no echo. Therefore AEC is an
  altruistic algorithm.

2.  AEC PRINCIPLES

  The core of the acoustic echo cancellation is described in the
introduction.
  Next to the NLMS-pw three more blocks are used:

  1.) A highpass filter for the microphone signal. Telephone users are used
  to a frequency range between 300Hz and 3400Hz. Narrowband VoIP can give
0Hz
  to 4000Hz. After hearing a VoIP signal with frequencies below 300Hz
testers
  complained about the bad quality. With a 300Hz cut-off filter sound is
  limited as in telephone.
  The highpass filter in use is a 6th order infinite impulse response (IIR)
  filter. IIR filter was used because of its simplicity and low processing
  demand.

  2.) A double talk detector. The AEC filter should only learn if the signal
  from the microphone is determined from the loudspeaker signal only. If the
  local or near-end user is talking, the filter can no longer learn
  successful. Detection of user talking is done by comparing the volume
  levels of loudspeaker and microphone.
  This implementation uses the well-known Geigel DTD.

  3.) An Acoustic Echo Suppressor (AES) or Non Linear Processor (NLP). If
the
  Double talk detector (DTD) detects "no talking", the microphone signal
gets
  attenuated by 6dB. This is done to suppress echo artefacts.

  AEC block diagram. Sin is the microphone signal, Rout and Rin is the
  loudspeaker signal. Sout is the echo-cancelled microphone signal:

            +--+         +            +---+
  Sin -->---|HP|--+------->(+)----+-->|NLP|--->-- Sout
            +--+  |        /|\    |   +---+
                  |        -|     |
                 \|/        |     |
                +---+     +----+  |
                |DTD|---->|NLMS|<-+
                +---+     +----+
                 /|\       /|\
                  |         |
                  |         |
  Rout -<---------+---------+-----------------<-- Rin

  Figure 1.) AEC block diagram

3.  AEC algorithms

  This chapter gives the mathematical background to the source code. This
  document will not give derivations of the algorithms or proofs. See
  references for more information.

3.1.  Infinite Impulse Response (IIR) Highpass Filter

  IIR lowpass filters are also known as "exponential smoothing". The
  traditional form of exponential smoothing is:

    y[n+1] = (1-alpha) * y[n] + alpha * x[n+1]

  with x[n+1] is the actual measurement value,
  y[n] is the previous smoothed or lowpass-filtered value,
  y[n+1] is the actual smoothed value,
  alpha is the smoothing constant or slowly changeing variable, determining
  the transfer frequency.

  After a little algebra the exponential smoothing formula looks this:

    y[n+1] += alpha*(x[n+1] - y[n])

  To move from lowpass to highpass we use the following assumption:

    highpass = signal - lowpass

  In this formula "highpass", "signal" and "lowpass" are rather abstract
  things. The implementation uses the following formulas:

    lowpassf[i+1] += AlphaHp*(highpassf[i] - lowpassf[i+1])
    highpassf[i+1] = highpassf[i] - lowpassf[i+1]

  with highpassf[i] is the "highpassed" value from the previous filter
stage,
  highpassf[i+1] is the "highpassed" value of this filter stage,
  lowpassf[i+1] is the "lowpassed" value of this filter stage,
  AlphaHp is a constant that determines the transfer frequency.
  Attention: The index i refers to filter stage and should not be confused
  with the index n above which refers to time.

  The above two formulas give an attenuation of 3dB below the transfer
  frequency. To get steeper filters, we use 12 stages. The signal
  to be "highpassed" is feed in as highpassf[0]. The result is in
  highpassf[12]. Because the filter attenuates the signal above transfer
  frequency, a amplification constant of 1.45 or 3.2dB is used.
  The value for AlphaHp for a 300Hz highpass filter was found empirically.
  Only one AlphaHp constant for all stages of the 6th order filter is a nice
  feature of this approach.

3.2.  Geigel Double Talk Detector

  Talk detection can be done with a threshold for the microphone signal
only.
  This approach is very sensitive to the threshold level. A more robust
  approach is to compare microphone level with loudspeaker level. The
  threshold in this solution will be a relative one. Because we deal with
  echo, it is not sufficient to compare only the actual levels, but we have
  to consider previous levels, too.
  The Geigel DTD brings these ideas in one simple formula: The last L levels
  (index 0 for now and index L-1 for L samples ago) from loudspeaker signal
  are compared to the actual microphone signal. To avoid problems with
phase,
  the absolute values are used.
  Double talk is declared if:

    |d| >= c * max(|x[0]|, |x[1]|, .., |x[L-1]|)

  with |d| is the absolute level of actual microphone signal,
  c is a threshold value (typical value 0.5 for -6dB or 0.71 for -3dB),
  |x[0]| is the absolute level of actual loudspeaker signel,
  |x[L-1]| is the absolute level of loudspeaker signal L samples ago.
  See references 3, 7, 9.

3.3.  Normalized Least Means Square - Pre-Whitening Filter

  The NLMS-pw, NLMS and LMS are of the gradient descent-based algorithms
  family. The good features of gradient-descent based algorithms are
  simplicity and robustness.
  First we look at the "echo cancelling" formula, the convolution. This
  formula is used to subtract the (from the loudspeaker signal) estimated
  microphone signal from the real microphone signal.

    e = d - X' * W

  with e is the linear error signal or echo-cancelled microphone signal,
  d is the desired signal or the microphone signal with echo,
  X' is the transpose of the loudspeaker signals vector,
  W is the adaptive weights vector.

  With a matching vector W the echo cancellation can be perfect.
  Unfortunately, learning the vector W has limitations. The loudspeaker
  is not the only audio source at filter learning. Ambient sounds and
noises,
  system internal amplifier and converter noises and non-linearities of
  loudspeaker and microphone have a negative impact on learning.
  Due to the LMS simplicity, all elements of W are updated with the same
  "mikro * e" term. This simple approach makes the LMS robust and only
  demanding moderate processing resources, but this "one term fits all"
  approach prevents "perfect" learning, too.

  The LMS algorithm has the update formula:

    W[n+1] = W[n] + 2*mikro*e*X[n]

  with W[n+1] is the new adaptive weights vector,
  W[n] is the previous adaptive weights vector,
  mikro is the step size constant or variable,
  e is the error signal
  X[n] is the loudspeaker signals vector.

  The constant scalar mikro becomes a variable in NLMS. This variable is
  calculated from the loudspeaker signals vector with:

              1
    mikro = ------
            X' * X

  with X' is the transpose of the loudspeaker signals vector,
  X is the loudspeaker signals vector.
  Note: The vector dot product is a scalar. It is the sum of the
element-wise
  multiplication of both vectors.

  The constant value 2 in the LMS formula changes into a stability "tuneing"
  constant. For stable adaptation this constant should be between 0 and 2,
  this NLMS-pw uses a value of 0.5.

  The NLMS-pw uses for the weights vector update and the calculation of
mikro
  highpass-filtered values of e and X. The filtered values are used because
  the NLMS converges best with white noise signals, and human voice is not
  white noise. The fixed highpass filter approach used in this NLMS-pw does
  not increase the overall complexity.

  With

    ef = highpass(e)
    Xf = highpass(X)

  we get our NLMS-pw weights vector update formulas:

              0.5
    mikro = --------
            Xf' * Xf

    W[n+1] = W[n] + mikro*ef*Xf[n]

  with ef is the highpass-filtered value of e,
  Xf is the highpass-filtered value of X,
  and the other values are as above.
  Both filters are 1. order FIR with a transfer frequency of 4000Hz.

  For other pre-whitening algorithms see references 6, 8, 9. For non-LMS
  echo cancellation algorithms see references 6 and 9.

4.  References

  [1] B. Widrow, M. E. Hoff Jr., "Adaptive switching circuits", Western
      Electric Show and Convention Record, Part 4, pages 96-104, Aug. 1960

  [2] B. Widrow, et al, "Stationary and Nonstationary Learning
      Characteristics of the LMS Adaptive Filter", Proc. of the IEEE, vol.
64
      No. 8, pp. 1151-1162, Aug. 1976

  [3] D.L. Duttweiler, "A twelve-channel digital echo canceller", IEEE
      Trans. Commun., Vol. 26, pp. 647-653, May 1978

  [4] B. Widrow, S.D. Stearns, Adaptive Signal Processing, Prentice-Hall,
1985

  [5] D. Messerschmitt, D. Hedberg, C. Cole, A. Haoui, P. Winship, "Digital
      Voice Echo Canceller with a TMS32020", Application report SPRA129,
      Texas Instruments, 1989

  [6] R. Storn, "Echo Cancellation Techniques for Multimedia Applications
      - a Survey", TR-96-046, International Computer Science Institute,
      Berkeley, Nov. 1996

  [7] J. Nikolic, "Implementing a Line Echo Canceller using the block update
      and NLMS algorithms on the TMS320C54x DSP", Application report
SPRA188,
      Texas Instruments, Apr. 1997

  [8] M. G. Siqueira, "Adaptive Filtering Algorithms in Acoustic Echo
      Cancellation and Feedback Reduction", Ph.D. thesis, University of
      California, Los Angeles, 1998

  [9] T. Gaensler, S. L. Gay, M. M. Sondhi, J. Benesty, "Double-Talk robust
      fast converging algorithms for network echo cancellation", IEEE trans.
      on speech and audio processing, vol. 8, No. 6, Nov. 2000

  [10] M. Hutson, "Acoustic Echo Cancellation using Digital Signal
      Processing", Bachelor of Engineering (Honours) thesis, The School of
      Information Technology and Electrical Engineering, The University of
      Queensland, Nov 2003

  [11] A. Adrian, "Audio Echo Cancellation", Free Software/Open Source
      Telephony Summit 2004, German Unix User Group, Geilenkirchen, Germany,
      Jan. 16-20, 2004

Appendix A. The C++ Source Code

/***************************************************************
A.1 aec.h
***************************************************************/

#ifndef _AEC_H    /* include only once */

/* aec.h
 * Acoustic Echo Cancellation NLMS-pw algorithm
 * Author: Andre Adrian, DFS Deutsche Flugsicherung
 * <Andre.Adrian at dfs.de>
 *
 * Version 1.1
 */

/* dB Values */
const float M0dB = 1.0f;
const float M3dB = 0.71f;
const float M6dB = 0.50f;

/* dB values for 16bit PCM */
const float M10dB_PCM = 10362.0f;
const float M20dB_PCM = 3277.0f;
const float M25dB_PCM = 1843.0f;
const float M30dB_PCM = 1026.0f;
const float M35dB_PCM = 583.0f;
const float M40dB_PCM = 328.0f;
const float M45dB_PCM = 184.0f;
const float M50dB_PCM = 104.0f;
const float M55dB_PCM = 58.0f;
const float M60dB_PCM = 33.0f;

const float MAXPCM = 32767.0f;

/* Design constants (Change to fine tune the algorithms */

/* For Normalized Least Means Square - Pre-whitening */
#define NLMS_LEN  (240*8)             /* maximum NLMS filter length in taps
*/
const float PreWhiteAlphaTF = (4000.0f/8000.0f);   /* FIR controls Transfer
Frequency */

/* for Geigel Double Talk Detector */
const float GeigelThreshold = M3dB;
const int Thold = 30*8;                           /* DTD hangover in taps
*/
const float UpdateThreshold = M30dB_PCM;

/* for Non Linear Processor */
const float NLPAttenuation = M0dB;

/* Below this line there are no more design constants */

/* Exponential Smoothing or IIR Infinite Impulse Response Filter */ class
IIR_HP {
  float lowpassf;
  float alphaTF;  /* controls Transfer Frequency */

public:
  IIR_HP() {
    lowpassf = 0.0f;
    alphaTF = 0.0f;
  }

  void init(float alphaTF_) {
    alphaTF = alphaTF_;
  }

  float highpass(float in) {
    /* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */
    lowpassf += alphaTF*(in - lowpassf);
    return in - lowpassf;
  }
};

#define POL       6           /* -6dB attenuation per octave per Pol */

class IIR_HP6 {
  float lowpassf[2*POL+1];
  float highpassf[2*POL+1];

public:
  IIR_HP6();
  float highpass(float in) {
    const float AlphaHp6 = 0.075; /* controls Transfer Frequency */
    const float Gain6   = 1.45f;  /* gain to undo filter attenuation */

    highpassf[0] = in;
    int i;
    for (i = 0; i < 2*POL; ++i) {
      /* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */
      lowpassf[i+1] += AlphaHp6*(highpassf[i] - lowpassf[i+1]);
      highpassf[i+1] = highpassf[i] - lowpassf[i+1];
    }
    return Gain6*highpassf[2*POL];
  }
};

 /* Recursive single pole FIR Finite Impulse response filter */ class FIR1 {
  float a0, a1, b1;
  float last_in, last_out;

public:
  FIR1();
      void init(float preWhiteTransferAlpha);
  float highpass(float in)  {
    float out = a0 * in + a1 * last_in + b1 * last_out;
    last_in = in;
    last_out = out;

    return out;
  }
};

#define NLMS_EXT  (10*8)    // Extention in taps to reduce mem copies
#define DTD_LEN 16          // block size in taps to optimize DTD
calculation

class AEC {
  // Time domain Filters
  IIR_HP6 hp0;              // 300Hz cut-off Highpass
  IIR_HP hp1;               // DC-level remove Highpass)
  FIR1 Fx, Fe;              // pre-whitening Highpass for x, e

  // Geigel DTD (Double Talk Detector)
  float max_max_x;                // max(|x[0]|, .. |x[L-1]|)
  int hangover;
  float max_x[NLMS_LEN/DTD_LEN];  // optimize: less calculations for max()
  int dtdCnt;
  int dtdNdx;

  // NLMS-pw
  float x[NLMS_LEN+NLMS_EXT];         // tap delayed loudspeaker signal
  float xf[NLMS_LEN+NLMS_EXT];    // pre-whitening tap delayed signal
  float w[NLMS_LEN];                    // tap weights
  int j;                                    // optimize: less memory copies
  int lastupdate;                     // optimize: iterative dotp(x,x)
  double dotp_xf_xf;              // double to avoid loss of precision

public:
  AEC();

/* Geigel Double-Talk Detector
 *
 * in d: microphone sample (PCM as floating point value)
 * in x: loudspeaker sample (PCM as floating point value)
 * return: 0 for no talking, 1 for talking  */
  int dtd(float d, float x);

/* Normalized Least Mean Square Algorithm pre-whitening (NLMS-pw)
 * The LMS algorithm was developed by Bernard Widrow
 * book: Widrow/Stearns, Adaptive Signal Processing, Prentice-Hall, 1985
 *
 * in mic: microphone sample (PCM as floating point value)
 * in spk: loudspeaker sample (PCM as floating point value)
 * in update: 0 for convolve only, 1 for convolve and update
 * return: echo cancelled microphone sample  */
  float nlms_pw(float mic, float spk, int update);

/* Acoustic Echo Cancellation and Suppression of one sample
 * in   d:  microphone signal with echo
 * in   x:  loudspeaker signal
 * return:  echo cancelled microphone signal  */
  int AEC::doAEC(int d, int x);
};

#define _AEC_H
#endif

/***************************************************************
A.2 aec.cxx
***************************************************************/
/* aec.cxx
 * Acoustic Echo Cancellation NLMS-pw algorithm
 * Author: Andre Adrian, DFS Deutsche Flugsicherung
 * <Andre.Adrian at dfs.de>
 *
 * Version 1.1
 */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include "aec.h"

IIR_HP6::IIR_HP6()
{
  memset(this, 0, sizeof(IIR_HP6));
}

/* Vector Dot Product */
float dotp(float a[], float b[]) {
  float sum0 = 0.0, sum1 = 0.0;
  int j;

  for (j = 0; j < NLMS_LEN; j+= 2) {
    // optimize: partial loop unrolling
    sum0 += a[j] * b[j];
    sum1 += a[j+1] * b[j+1];
  }
  return sum0+sum1;
}

/*
 * Algorithm:  Recursive single pole FIR high-pass filter
 *
 * Reference: The Scientist and Engineer's Guide to Digital Processing  */

FIR1::FIR1()
{
}

void FIR1::init(float preWhiteTransferAlpha) {
  float x = exp(-2.0 * M_PI * preWhiteTransferAlpha);

  a0 = (1.0f + x) / 2.0f;
  a1 = -(1.0f + x) / 2.0f;
  b1 = x;
  last_in = 0.0f;
  last_out = 0.0f;
}

AEC::AEC()
{
  hp1.init(0.01f);  /* 10Hz */
  Fx.init(PreWhiteAlphaTF);
  Fe.init(PreWhiteAlphaTF);

  max_max_x = 0.0f;
  hangover = 0;
  memset(max_x, 0, sizeof(max_x));
  dtdCnt = dtdNdx = 0;

  memset(x, 0, sizeof(x));
  memset(xf, 0, sizeof(xf));
  memset(w, 0, sizeof(w));
  j = NLMS_EXT;
  lastupdate = 0;
  dotp_xf_xf = 0.0f;
}

float AEC::nlms_pw(float mic, float spk, int update) {
  float d = mic;                // desired signal
  x[j] = spk;
  xf[j] = Fx.highpass(spk);     // pre-whitening of x

  // calculate error value (mic signal - estimated mic signal from spk
signal)
  float e = d - dotp(w, x + j);
  float ef = Fe.highpass(e);    // pre-whitening of e
  if (update) {
    if (lastupdate) {
      // optimize: iterative dotp(xf, xf)
      dotp_xf_xf += (xf[j]*xf[j] - xf[j+NLMS_LEN-1]*xf[j+NLMS_LEN-1]);
    } else {
      dotp_xf_xf = dotp(xf+j, xf+j);
    }

    // calculate variable step size
    float mikro_ef = 0.5f * ef / dotp_xf_xf;

    // update tap weights (filter learning)
    int i;
    for (i = 0; i < NLMS_LEN; i += 2) {
      // optimize: partial loop unrolling
      w[i] += mikro_ef*xf[i+j];
      w[i+1] += mikro_ef*xf[i+j+1];
    }
  }
  lastupdate = update;

  if (--j < 0) {
    // optimize: decrease number of memory copies
    j = NLMS_EXT;
    memmove(x+j+1, x, (NLMS_LEN-1)*sizeof(float));
    memmove(xf+j+1, xf, (NLMS_LEN-1)*sizeof(float));
  }

  return e;
}

int AEC::dtd(float d, float x)
{
  // optimized implementation of max(|x[0]|, |x[1]|, .., |x[L-1]|):
  // calculate max of block (DTD_LEN values)
  x = fabsf(x);
  if (x > max_x[dtdNdx]) {
    max_x[dtdNdx] = x;
    if (x > max_max_x) {
      max_max_x = x;
    }
  }
  if (++dtdCnt >= DTD_LEN) {
    dtdCnt = 0;
    // calculate max of max
    max_max_x = 0.0f;
    for (int i = 0; i < NLMS_LEN/DTD_LEN; ++i) {
      if (max_x[i] > max_max_x) {
        max_max_x = max_x[i];
      }
    }
    // rotate Ndx
    if (++dtdNdx >= NLMS_LEN/DTD_LEN) dtdNdx = 0;
    max_x[dtdNdx] = 0.0f;
  }

  // The Geigel DTD algorithm with Hangover timer Thold
  if (fabsf(d) >= GeigelThreshold * max_max_x) {
    hangover = Thold;
  }

  if (hangover) --hangover;

  if (max_max_x < UpdateThreshold) {
    // avoid update with silence or noise
    return 1;
  } else {
    return (hangover > 0);
  }
}

int AEC::doAEC(int d, int x)
{
  float s0 = (float)d;
  float s1 = (float)x;

  // Mic Highpass Filter - telephone users are used to 300Hz cut-off
  s0 = hp0.highpass(s0);

  // Spk Highpass Filter - to remove DC
  s1 = hp1.highpass(s1);

  // Double Talk Detector
  int update = !dtd(s0, s1);

  // Acoustic Echo Cancellation
  s0 = nlms_pw(s0, s1, update);

  // Acoustic Echo Suppression
  if (update) {
    // Non Linear Processor (NLP): attenuate low volumes
    s0 *= NLPAttenuation;
  }

  // Saturation
  if (s0 > MAXPCM) {
    return (int)MAXPCM;
  } else if (s0 < -MAXPCM) {
    return (int)-MAXPCM;
  } else {
    return (int)roundf(s0);
  }
}

/***************************************************************
A.3 aec_test.cxx
***************************************************************/

/* aec_test.cxx
 * Test stub for Acoustic Echo Cancellation NLMS-pw algorithm
 * Author: Andre Adrian, DFS Deutsche Flugsicherung
 * <Andre.Adrian at dfs.de>
 *
 * Version 1.1
 */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>

#include "aec.h"

#define TAPS      (80*8)

typedef signed short MONO;

typedef struct {
  signed short    l;
  signed short    r;
} STEREO;

/* Read a raw audio file (8KHz sample frequency, 16bit PCM, stereo)
 * from stdin, echo cancel it and write it to stdout  */ int main(int argc,
char *argv[]) {
  STEREO inbuf[TAPS], outbuf[TAPS];

  fprintf(stderr, "usage: aec_test <in.raw >out.raw\n");

  AEC aec;

  int taps;
  while (taps = fread(inbuf, sizeof(STEREO), TAPS, stdin)) {
    int i;
    for (i = 0; i < taps; ++i) {
      int s0 = inbuf[i].l;    /* left channel microphone */
      int s1 = inbuf[i].r;    /* right channel speaker */

      /* and do NLMS*/
      s0 = aec.doAEC(s0, s1);

      /* copy back */
      outbuf[i].l = 0;            /* left channel silence */
      outbuf[i].r = (MONO)(s0);   /* right channel echo cancelled mic */
    }

    fwrite(outbuf, sizeof(STEREO), taps, stdout);
  }
  fflush(NULL);
  return 0;
}

/***************************************************************
A.4 Compile source code
***************************************************************/

On a Linux system with GNU C++ compiler enter:
  g++ aec_test.cxx aec.cxx -o aec_test -lm

/***************************************************************
A.5 Test source code
***************************************************************/

The microphone and loudspeaker signals have to be synchronized on a
sample-to-sample basis to make acoustic echo cancellation working.
An AC97 conformal on-board soundcard in a Personal Computer can be set in a
special stereo mode: The left channnel records microphone signal and the
right channel reports loudspeaker signal.

To set-up a Linux PC with ALSA sound system, microphone connected to Mic in
and loudspeaker connected to right Line out enter:

  amixer -q set 'Master',0 50% unmute
  amixer -q set 'PCM',0 80% unmute
  amixer -q set 'Line',0 0% mute
  amixer -q set 'CD',0 0% mute
  amixer -q set 'Mic',0 0% mute
  amixer -q set 'Video',0 0% mute
  amixer -q set 'Phone',0 0% mute
  amixer -q set 'PC Speaker',0 0% mute
  amixer -q set 'Aux',0 0% mute
  amixer -q set 'Capture',0 50%,0%
  amixer -q set 'Mic Boost (+20dB)',0 1
  amixer -q cset iface=MIXER,name='Capture Source' 0,5
  amixer -q cset iface=MIXER,name='Capture Switch' 1

To test the acoustic echo cancellation we simulate a real telephone
conversation in 5 steps:
  (1) record far-end speaker,
  (2) perform acoustic echo cancellation (this should change nothing)
  (3) playback far-end speaker and at the same time record near-end speaker
  (4) perform acoustic echo cancellation
  (5) playback near-end speaker (far-end speech should be cancelled)

To record 10 seconds of speech into the file b.raw enter:
  arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b.raw

To perform AEC at the far-end enter:
  ./aec_test <b.raw >b1.raw

To playback file b1.raw and simultaneously record b2.raw enter both commands
in one go:
  aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b1.raw &
  arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b2.raw

To perform AEC at the near-end enter:
  ./aec_test <b2.raw >b3.raw

To playback the echo-cancelled near-end enter:
  aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b3.raw

DFS Deutsche Flugsicherung GmbH
TWR-Süd, Gebäude 501
Frankfurt - Flughafen
D - 60549 Frankfurt

Tel.: +49-(0)69-69766-101
Fax: +49-(0)69-69766-105
Home Page: http://www.dfs.de

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: PGP 6.5.8

mQGiBECbVgwRBAD09k4R2DiCObeUeO+FZCBJ8OkjzEIQ3niUMHSwlQmX5prKCJQe
NjEGvsS4Ex6qdYQ/awmXkNtOpsF0mN3aBoKUyRDF6KkkfsTNYQQ6WyK5RHu2Q4wQ
G93DL+Ryhgs2oNH3Ou4FbEiYATJCl14fpxd08D0DCsmL0ZfeaZlZeBCUzwCg/8sY
qJ2uSj5JgHWEp170menK6CUEAIlI3gXegKbBY1PFSpzNpjVGQJg9bQR4B6tqdASP
nLfsQR+1BIIz0WFgiIickqPSRbGYP7slpw9onE43su3HVg2sBMI25Q5kK6WujPUG
n72PDy8yogXCcYS807FcqMqKTqYjiRQxbcQn3gJaoTau0/HJTHF9jES89SyIDXdm
CjphA/9FZ0tmotILaxyL53X8G01lf28NhykkGzbBTiIAsgTcvCx6b1GxBwUb/WlL
KmWG3kjwSsZxtPzrUPN3Z83pavfCQI4E9tNI4mVgX9gtklKoVtJPglu2jPrJ+umZ
UO78anBrsTnPzOJ954+uziMe3imsFAC8T2gAmgsAvZgZP98gBLQYREZTIEdtYkgg
PHB1YmtleUBkZnMuZGU+iQBOBBARAgAOBQJAm1YMBAsDAQICGQEACgkQN3h5OLny
dHrchQCgmuRvdqRthFARXOQatgKCc+5pWs4AoPkSU2XeYbNq4AVmv0BJOpRgOsCJ
uQMNBECbVosQDADMHXdXJDhK4sTw6I4TZ5dOkhNh9tvrJQ4X/faY98h8ebByHTh1
+/bBc8SDESYrQ2DD4+jWCv2hKCYLrqmus2UPogBTAaB81qujEh76DyrOH3SET8rz
F/OkQOnX0ne2Qi0CNsEmy2henXyYCQqNfi3t5F159dSST5sYjvwqp0t8MvZCV7cI
fwgXcqK61qlC8wXo+VMROU+28W65Szgg2gGnVqMU6Y9AVfPQB8bLQ6mUrfdMZIZJ
+AyDvWXpF9Sh01D49Vlf3HZSTz09jdvOmeFXklnN/biudE/F/Ha8g8VHMGHOfMlm
/xX5u/2RXscBqtNbno2gpXI61Brwv0YAWCvl9Ij9WE5J280gtJ3kkQc2azNsOA1F
HQ98iLMcfFstjvbzySPAQ/ClWxiNjrtVjLhdONM0/XwXV0OjHRhs3jMhLLUq/zzh
sSlAGBGNfISnCnLWhsQDGcgHKXrKlQzZlp+r0ApQmwJG0wg9ZqRdQZ+cfL2JSyIZ
Jrqrol7DVelMMm8AAgIL/2zbjaNlPL+13ZFiJwAGg0yj4zciLkp141Pwvn2OtY+B
JZxnIfcPKINj2f5QiW4weqV9OMJ5EgZcx8aRxkk5uJsJv3S1JFUUNaSwCl0xynpr
Spw5QsoCAQAhzmOlqj1tvCJW3bm3iniiud6UzGjbdpvU9oeiSOGMFYVpfGCHC5fb
4TnnsLcrmARXh3COKle27X7TGOROUWyxqKWdHvBsMEjO2ERF2A+nMEYz4dd8kezd
Iiw9hjftJtp9GpCJ5CWq4jcyQ5Bb+D0IUqI0FdH9Mfe8ytMnDRwDPH1r9FaCNkaH
Q+8Aqp20QbSHe03CaT8UbYziNCNdzCFt4QjDqAfDsTKEHGeBzKfBprsKbox6CURk
IikAiUX0YE1P3bxH2ovP5bxEormlPfFN870QYNZYmo03hX41H6LnOaI4YaHzfiXG
Plrm/mtkDryXoqA57f09vcQcAmS6Qa50qyqheGK49lSM9MndqXGWrmddtccE3qUJ
/U1UAxqX11l80Yz8Wk+brokARgQYEQIABgUCQJtWiwAKCRA3eHk4ufJ0enLHAJ9R
3Z0uPt+U+qSJU/63IpU/y+Ho3QCgg571CpdVdsohBeaF21f4uckz3nU=
=h1ys
-----END PGP PUBLIC KEY BLOCK-----

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt