FW: [AVT] Acoustic Echo cancellation memo
Ivan R. Judson
judson at mcs.anl.gov
Thu Oct 14 06:11:06 CDT 2004
Interesting...
--Ivan
-----Original Message-----
From: avt-bounces at ietf.org [mailto:avt-bounces at ietf.org] On Behalf Of
Andre.Adrian at dfs.de
Sent: Thursday, October 14, 2004 2:26 AM
To: avt at ietf.org
Subject: [AVT] Acoustic Echo cancellation memo
Dear Members of Audio/Video Transport group,
as attachment you find a memo about "Acoustic Echo Cancellation". This memo
was created while developing a Voice-over-IP Prototype for intercom
communication between air traffic controllers by the german air traffic
control agency DFS.
Mr. Colin Perkins wrote me:
>We're grateful that you considered the IETF AVT working group as a
>venue for this work. Unfortunately, we don't have sufficient expertise
>to effectively review it, and so cannot accept it as an AVT work item.
>If you have a paper on this subject, you're welcome to post a pointer
>to the AVT mailing list to encourage uptake, though.
>I'm not sure what an appropriate venue for publication might be,
>although the ITU-T has done related work in the past.
>Regards,
>Colin
As you can read in the memo, the algorithm and the implementation are
royalty free and should not be monopolized as intellectual property by DFS
or by others.
The software is currently implemented in kphone - a SIP softphone running on
Linux.
You can find the memo and the Kphone source file patches on
http://home.arcor.de/andreadrian/
With best regards,
Andre Adrian
Senior engineer
email work: <Andre.Adrian at dfs.de>
email home: <adrianandre at compuserve.de>
snail-mail:
DFS
Flughafen Frankfurt
Gebaeude 501
60549 Frankfurt
Germany
Tel: (++49) 69 69766 176
FAX: (++49) 69 69766 175
############################################################################
######
Attachment:
Draft Andre Adrian
Document: draft-avt-aec-01.txt DFS Deutsche Flugsicherung
Category: Experimental
october 11th, 2004
Expires: ?
Voice over Internet Acoustic Echo Cancellation
Status of this Memo
This document specifies an Acoustic Echo Cancellation implementation for
hands-free Voice over Internet telephony and requests discussion and
suggestions for improvements.
Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved.
You are allowed to use this source code in any open source or closed
source
software you want. You are allowed to use the algorithms for a hardware
solution. You are allowed to modify the source code.
You are not allowed to remove the name of the author from this memo or
from
the source code files. You are not allowed to monopolize the source code
or
the algorithms behind the source code as your intellectual property.
This source code is free of royalty and comes with no warranty.
Abstract
This document specifies an acoustic echo cancellation (AEC) for voice over
IP. Because of the large latency in VoIP communication (tenths to hunderts
of milliseconds), AEC is necessary. The presented implementation is based
on
the well-known Normalized Least Means Square (NLMS) and Geigel Double talk
detector (DTD) algorithms. To improve performance, a pre-whitening filter
is used. The presented algorithm is therefore of NLMS-pw family.
The NLMS-pw family is known to give good echo cancellation for moderate
processing resources. This algorithm is of complexity O(3*L) with L number
of taps in the NLMS filter.
Table of Contents
1. INTRODUCTION
2. AEC PRINCIPLES
3. AEC algorithms
3.1. Infinite Impulse Response (IIR) Highpass Filter
3.2. Geigel Double Talk Detector
3.3. Normalized Least Means Square - Pre-Whitening Filter
4. References
A. The C++ Source Code
A.1 aec.h
A.2 aec.cxx
A.3 aec_test.cxx
1. INTRODUCTION
A hands-free telephone or full-duplex intercom system has a feedback or
echo problem because the output from the loudspeaker feeds into the
microphone. Several methods can be used to reduce or eliminate the
problem:
1.) Reduce the overall amplification. If the system amplification is less
then 1 a feedback dies away. This solution leads to poor volume.
2.) Use Acoustic Echo Suppression. Echo Suppression is realized with
speech
activated switches. Suppression reduces the full-duplex telephone to
half-
duplex. The switches can even "switch away" beginnings of words.
3.) Use Acoustic Echo Cancellation. This is realized with an adaptive or
learning filter. First the filter learns from given microphone and speaker
signals the acoustics. After learning, the filter can calculate an
estimated
microphone signal from the loudspeaker signal. This estimated mic signal
is
subtracted from the real mic signal. The difference signal no longer
contains the loudspeaker signal - the feedback loop is broken.
The Least Means Square algorithm from Widrow and Hoff is known since 1960.
Unfortunately the LMS is a slow learner. The learning speed or convergence
rate is controlled by a constant value. This value in the LMS can only be
optimized for loud signals or for weak signals. Optimizing for loud
signals
produces slow convergence with weak signals. Optimizing for weak signals
gives divergence with loud signals. Divergence can be defined as "the
filter
does not reduce the echo but does increase the echo" and is very ugly.
The Normalized LMS has a constant convergence rate for loud and weak
signals, the convergence rate controlling parameter is derived from the
signal energy.
For white noise signal, where all frequencies have the same energy, the
NLMS performs good. But the human speech has more energy in low
frequencies
then in high frequencies. Therefore, a NLMS gives good echo cancellation
for low frequencies and poor echo cancellation for high frequencies.
A pre-whitening filter in front of the echo cancellation filter transforms
human speech into something more "white noise" like - the energy of high
frequency signals is similar to the energy of low frequency signals.
The presented algorithm uses the most simple pre-whitening filter
possible,
a first order or one pole highpass filter with transfer frequency equal to
half of the sample frequency (4kHz for the narrowband sample frequency of
8kHz).
Because the pre-whitening filter is fixed, the complexity of this NLMS-pw
filter is still the same as for the NLMS filter.
One important point should be remembered: The AEC in your telephony device
helps your telephony partner to hear no echo. Therefore AEC is an
altruistic algorithm.
2. AEC PRINCIPLES
The core of the acoustic echo cancellation is described in the
introduction.
Next to the NLMS-pw three more blocks are used:
1.) A highpass filter for the microphone signal. Telephone users are used
to a frequency range between 300Hz and 3400Hz. Narrowband VoIP can give
0Hz
to 4000Hz. After hearing a VoIP signal with frequencies below 300Hz
testers
complained about the bad quality. With a 300Hz cut-off filter sound is
limited as in telephone.
The highpass filter in use is a 6th order infinite impulse response (IIR)
filter. IIR filter was used because of its simplicity and low processing
demand.
2.) A double talk detector. The AEC filter should only learn if the signal
from the microphone is determined from the loudspeaker signal only. If the
local or near-end user is talking, the filter can no longer learn
successful. Detection of user talking is done by comparing the volume
levels of loudspeaker and microphone.
This implementation uses the well-known Geigel DTD.
3.) An Acoustic Echo Suppressor (AES) or Non Linear Processor (NLP). If
the
Double talk detector (DTD) detects "no talking", the microphone signal
gets
attenuated by 6dB. This is done to suppress echo artefacts.
AEC block diagram. Sin is the microphone signal, Rout and Rin is the
loudspeaker signal. Sout is the echo-cancelled microphone signal:
+--+ + +---+
Sin -->---|HP|--+------->(+)----+-->|NLP|--->-- Sout
+--+ | /|\ | +---+
| -| |
\|/ | |
+---+ +----+ |
|DTD|---->|NLMS|<-+
+---+ +----+
/|\ /|\
| |
| |
Rout -<---------+---------+-----------------<-- Rin
Figure 1.) AEC block diagram
3. AEC algorithms
This chapter gives the mathematical background to the source code. This
document will not give derivations of the algorithms or proofs. See
references for more information.
3.1. Infinite Impulse Response (IIR) Highpass Filter
IIR lowpass filters are also known as "exponential smoothing". The
traditional form of exponential smoothing is:
y[n+1] = (1-alpha) * y[n] + alpha * x[n+1]
with x[n+1] is the actual measurement value,
y[n] is the previous smoothed or lowpass-filtered value,
y[n+1] is the actual smoothed value,
alpha is the smoothing constant or slowly changeing variable, determining
the transfer frequency.
After a little algebra the exponential smoothing formula looks this:
y[n+1] += alpha*(x[n+1] - y[n])
To move from lowpass to highpass we use the following assumption:
highpass = signal - lowpass
In this formula "highpass", "signal" and "lowpass" are rather abstract
things. The implementation uses the following formulas:
lowpassf[i+1] += AlphaHp*(highpassf[i] - lowpassf[i+1])
highpassf[i+1] = highpassf[i] - lowpassf[i+1]
with highpassf[i] is the "highpassed" value from the previous filter
stage,
highpassf[i+1] is the "highpassed" value of this filter stage,
lowpassf[i+1] is the "lowpassed" value of this filter stage,
AlphaHp is a constant that determines the transfer frequency.
Attention: The index i refers to filter stage and should not be confused
with the index n above which refers to time.
The above two formulas give an attenuation of 3dB below the transfer
frequency. To get steeper filters, we use 12 stages. The signal
to be "highpassed" is feed in as highpassf[0]. The result is in
highpassf[12]. Because the filter attenuates the signal above transfer
frequency, a amplification constant of 1.45 or 3.2dB is used.
The value for AlphaHp for a 300Hz highpass filter was found empirically.
Only one AlphaHp constant for all stages of the 6th order filter is a nice
feature of this approach.
3.2. Geigel Double Talk Detector
Talk detection can be done with a threshold for the microphone signal
only.
This approach is very sensitive to the threshold level. A more robust
approach is to compare microphone level with loudspeaker level. The
threshold in this solution will be a relative one. Because we deal with
echo, it is not sufficient to compare only the actual levels, but we have
to consider previous levels, too.
The Geigel DTD brings these ideas in one simple formula: The last L levels
(index 0 for now and index L-1 for L samples ago) from loudspeaker signal
are compared to the actual microphone signal. To avoid problems with
phase,
the absolute values are used.
Double talk is declared if:
|d| >= c * max(|x[0]|, |x[1]|, .., |x[L-1]|)
with |d| is the absolute level of actual microphone signal,
c is a threshold value (typical value 0.5 for -6dB or 0.71 for -3dB),
|x[0]| is the absolute level of actual loudspeaker signel,
|x[L-1]| is the absolute level of loudspeaker signal L samples ago.
See references 3, 7, 9.
3.3. Normalized Least Means Square - Pre-Whitening Filter
The NLMS-pw, NLMS and LMS are of the gradient descent-based algorithms
family. The good features of gradient-descent based algorithms are
simplicity and robustness.
First we look at the "echo cancelling" formula, the convolution. This
formula is used to subtract the (from the loudspeaker signal) estimated
microphone signal from the real microphone signal.
e = d - X' * W
with e is the linear error signal or echo-cancelled microphone signal,
d is the desired signal or the microphone signal with echo,
X' is the transpose of the loudspeaker signals vector,
W is the adaptive weights vector.
With a matching vector W the echo cancellation can be perfect.
Unfortunately, learning the vector W has limitations. The loudspeaker
is not the only audio source at filter learning. Ambient sounds and
noises,
system internal amplifier and converter noises and non-linearities of
loudspeaker and microphone have a negative impact on learning.
Due to the LMS simplicity, all elements of W are updated with the same
"mikro * e" term. This simple approach makes the LMS robust and only
demanding moderate processing resources, but this "one term fits all"
approach prevents "perfect" learning, too.
The LMS algorithm has the update formula:
W[n+1] = W[n] + 2*mikro*e*X[n]
with W[n+1] is the new adaptive weights vector,
W[n] is the previous adaptive weights vector,
mikro is the step size constant or variable,
e is the error signal
X[n] is the loudspeaker signals vector.
The constant scalar mikro becomes a variable in NLMS. This variable is
calculated from the loudspeaker signals vector with:
1
mikro = ------
X' * X
with X' is the transpose of the loudspeaker signals vector,
X is the loudspeaker signals vector.
Note: The vector dot product is a scalar. It is the sum of the
element-wise
multiplication of both vectors.
The constant value 2 in the LMS formula changes into a stability "tuneing"
constant. For stable adaptation this constant should be between 0 and 2,
this NLMS-pw uses a value of 0.5.
The NLMS-pw uses for the weights vector update and the calculation of
mikro
highpass-filtered values of e and X. The filtered values are used because
the NLMS converges best with white noise signals, and human voice is not
white noise. The fixed highpass filter approach used in this NLMS-pw does
not increase the overall complexity.
With
ef = highpass(e)
Xf = highpass(X)
we get our NLMS-pw weights vector update formulas:
0.5
mikro = --------
Xf' * Xf
W[n+1] = W[n] + mikro*ef*Xf[n]
with ef is the highpass-filtered value of e,
Xf is the highpass-filtered value of X,
and the other values are as above.
Both filters are 1. order FIR with a transfer frequency of 4000Hz.
For other pre-whitening algorithms see references 6, 8, 9. For non-LMS
echo cancellation algorithms see references 6 and 9.
4. References
[1] B. Widrow, M. E. Hoff Jr., "Adaptive switching circuits", Western
Electric Show and Convention Record, Part 4, pages 96-104, Aug. 1960
[2] B. Widrow, et al, "Stationary and Nonstationary Learning
Characteristics of the LMS Adaptive Filter", Proc. of the IEEE, vol.
64
No. 8, pp. 1151-1162, Aug. 1976
[3] D.L. Duttweiler, "A twelve-channel digital echo canceller", IEEE
Trans. Commun., Vol. 26, pp. 647-653, May 1978
[4] B. Widrow, S.D. Stearns, Adaptive Signal Processing, Prentice-Hall,
1985
[5] D. Messerschmitt, D. Hedberg, C. Cole, A. Haoui, P. Winship, "Digital
Voice Echo Canceller with a TMS32020", Application report SPRA129,
Texas Instruments, 1989
[6] R. Storn, "Echo Cancellation Techniques for Multimedia Applications
- a Survey", TR-96-046, International Computer Science Institute,
Berkeley, Nov. 1996
[7] J. Nikolic, "Implementing a Line Echo Canceller using the block update
and NLMS algorithms on the TMS320C54x DSP", Application report
SPRA188,
Texas Instruments, Apr. 1997
[8] M. G. Siqueira, "Adaptive Filtering Algorithms in Acoustic Echo
Cancellation and Feedback Reduction", Ph.D. thesis, University of
California, Los Angeles, 1998
[9] T. Gaensler, S. L. Gay, M. M. Sondhi, J. Benesty, "Double-Talk robust
fast converging algorithms for network echo cancellation", IEEE trans.
on speech and audio processing, vol. 8, No. 6, Nov. 2000
[10] M. Hutson, "Acoustic Echo Cancellation using Digital Signal
Processing", Bachelor of Engineering (Honours) thesis, The School of
Information Technology and Electrical Engineering, The University of
Queensland, Nov 2003
[11] A. Adrian, "Audio Echo Cancellation", Free Software/Open Source
Telephony Summit 2004, German Unix User Group, Geilenkirchen, Germany,
Jan. 16-20, 2004
Appendix A. The C++ Source Code
/***************************************************************
A.1 aec.h
***************************************************************/
#ifndef _AEC_H /* include only once */
/* aec.h
* Acoustic Echo Cancellation NLMS-pw algorithm
* Author: Andre Adrian, DFS Deutsche Flugsicherung
* <Andre.Adrian at dfs.de>
*
* Version 1.1
*/
/* dB Values */
const float M0dB = 1.0f;
const float M3dB = 0.71f;
const float M6dB = 0.50f;
/* dB values for 16bit PCM */
const float M10dB_PCM = 10362.0f;
const float M20dB_PCM = 3277.0f;
const float M25dB_PCM = 1843.0f;
const float M30dB_PCM = 1026.0f;
const float M35dB_PCM = 583.0f;
const float M40dB_PCM = 328.0f;
const float M45dB_PCM = 184.0f;
const float M50dB_PCM = 104.0f;
const float M55dB_PCM = 58.0f;
const float M60dB_PCM = 33.0f;
const float MAXPCM = 32767.0f;
/* Design constants (Change to fine tune the algorithms */
/* For Normalized Least Means Square - Pre-whitening */
#define NLMS_LEN (240*8) /* maximum NLMS filter length in taps
*/
const float PreWhiteAlphaTF = (4000.0f/8000.0f); /* FIR controls Transfer
Frequency */
/* for Geigel Double Talk Detector */
const float GeigelThreshold = M3dB;
const int Thold = 30*8; /* DTD hangover in taps
*/
const float UpdateThreshold = M30dB_PCM;
/* for Non Linear Processor */
const float NLPAttenuation = M0dB;
/* Below this line there are no more design constants */
/* Exponential Smoothing or IIR Infinite Impulse Response Filter */ class
IIR_HP {
float lowpassf;
float alphaTF; /* controls Transfer Frequency */
public:
IIR_HP() {
lowpassf = 0.0f;
alphaTF = 0.0f;
}
void init(float alphaTF_) {
alphaTF = alphaTF_;
}
float highpass(float in) {
/* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */
lowpassf += alphaTF*(in - lowpassf);
return in - lowpassf;
}
};
#define POL 6 /* -6dB attenuation per octave per Pol */
class IIR_HP6 {
float lowpassf[2*POL+1];
float highpassf[2*POL+1];
public:
IIR_HP6();
float highpass(float in) {
const float AlphaHp6 = 0.075; /* controls Transfer Frequency */
const float Gain6 = 1.45f; /* gain to undo filter attenuation */
highpassf[0] = in;
int i;
for (i = 0; i < 2*POL; ++i) {
/* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */
lowpassf[i+1] += AlphaHp6*(highpassf[i] - lowpassf[i+1]);
highpassf[i+1] = highpassf[i] - lowpassf[i+1];
}
return Gain6*highpassf[2*POL];
}
};
/* Recursive single pole FIR Finite Impulse response filter */ class FIR1 {
float a0, a1, b1;
float last_in, last_out;
public:
FIR1();
void init(float preWhiteTransferAlpha);
float highpass(float in) {
float out = a0 * in + a1 * last_in + b1 * last_out;
last_in = in;
last_out = out;
return out;
}
};
#define NLMS_EXT (10*8) // Extention in taps to reduce mem copies
#define DTD_LEN 16 // block size in taps to optimize DTD
calculation
class AEC {
// Time domain Filters
IIR_HP6 hp0; // 300Hz cut-off Highpass
IIR_HP hp1; // DC-level remove Highpass)
FIR1 Fx, Fe; // pre-whitening Highpass for x, e
// Geigel DTD (Double Talk Detector)
float max_max_x; // max(|x[0]|, .. |x[L-1]|)
int hangover;
float max_x[NLMS_LEN/DTD_LEN]; // optimize: less calculations for max()
int dtdCnt;
int dtdNdx;
// NLMS-pw
float x[NLMS_LEN+NLMS_EXT]; // tap delayed loudspeaker signal
float xf[NLMS_LEN+NLMS_EXT]; // pre-whitening tap delayed signal
float w[NLMS_LEN]; // tap weights
int j; // optimize: less memory copies
int lastupdate; // optimize: iterative dotp(x,x)
double dotp_xf_xf; // double to avoid loss of precision
public:
AEC();
/* Geigel Double-Talk Detector
*
* in d: microphone sample (PCM as floating point value)
* in x: loudspeaker sample (PCM as floating point value)
* return: 0 for no talking, 1 for talking */
int dtd(float d, float x);
/* Normalized Least Mean Square Algorithm pre-whitening (NLMS-pw)
* The LMS algorithm was developed by Bernard Widrow
* book: Widrow/Stearns, Adaptive Signal Processing, Prentice-Hall, 1985
*
* in mic: microphone sample (PCM as floating point value)
* in spk: loudspeaker sample (PCM as floating point value)
* in update: 0 for convolve only, 1 for convolve and update
* return: echo cancelled microphone sample */
float nlms_pw(float mic, float spk, int update);
/* Acoustic Echo Cancellation and Suppression of one sample
* in d: microphone signal with echo
* in x: loudspeaker signal
* return: echo cancelled microphone signal */
int AEC::doAEC(int d, int x);
};
#define _AEC_H
#endif
/***************************************************************
A.2 aec.cxx
***************************************************************/
/* aec.cxx
* Acoustic Echo Cancellation NLMS-pw algorithm
* Author: Andre Adrian, DFS Deutsche Flugsicherung
* <Andre.Adrian at dfs.de>
*
* Version 1.1
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include "aec.h"
IIR_HP6::IIR_HP6()
{
memset(this, 0, sizeof(IIR_HP6));
}
/* Vector Dot Product */
float dotp(float a[], float b[]) {
float sum0 = 0.0, sum1 = 0.0;
int j;
for (j = 0; j < NLMS_LEN; j+= 2) {
// optimize: partial loop unrolling
sum0 += a[j] * b[j];
sum1 += a[j+1] * b[j+1];
}
return sum0+sum1;
}
/*
* Algorithm: Recursive single pole FIR high-pass filter
*
* Reference: The Scientist and Engineer's Guide to Digital Processing */
FIR1::FIR1()
{
}
void FIR1::init(float preWhiteTransferAlpha) {
float x = exp(-2.0 * M_PI * preWhiteTransferAlpha);
a0 = (1.0f + x) / 2.0f;
a1 = -(1.0f + x) / 2.0f;
b1 = x;
last_in = 0.0f;
last_out = 0.0f;
}
AEC::AEC()
{
hp1.init(0.01f); /* 10Hz */
Fx.init(PreWhiteAlphaTF);
Fe.init(PreWhiteAlphaTF);
max_max_x = 0.0f;
hangover = 0;
memset(max_x, 0, sizeof(max_x));
dtdCnt = dtdNdx = 0;
memset(x, 0, sizeof(x));
memset(xf, 0, sizeof(xf));
memset(w, 0, sizeof(w));
j = NLMS_EXT;
lastupdate = 0;
dotp_xf_xf = 0.0f;
}
float AEC::nlms_pw(float mic, float spk, int update) {
float d = mic; // desired signal
x[j] = spk;
xf[j] = Fx.highpass(spk); // pre-whitening of x
// calculate error value (mic signal - estimated mic signal from spk
signal)
float e = d - dotp(w, x + j);
float ef = Fe.highpass(e); // pre-whitening of e
if (update) {
if (lastupdate) {
// optimize: iterative dotp(xf, xf)
dotp_xf_xf += (xf[j]*xf[j] - xf[j+NLMS_LEN-1]*xf[j+NLMS_LEN-1]);
} else {
dotp_xf_xf = dotp(xf+j, xf+j);
}
// calculate variable step size
float mikro_ef = 0.5f * ef / dotp_xf_xf;
// update tap weights (filter learning)
int i;
for (i = 0; i < NLMS_LEN; i += 2) {
// optimize: partial loop unrolling
w[i] += mikro_ef*xf[i+j];
w[i+1] += mikro_ef*xf[i+j+1];
}
}
lastupdate = update;
if (--j < 0) {
// optimize: decrease number of memory copies
j = NLMS_EXT;
memmove(x+j+1, x, (NLMS_LEN-1)*sizeof(float));
memmove(xf+j+1, xf, (NLMS_LEN-1)*sizeof(float));
}
return e;
}
int AEC::dtd(float d, float x)
{
// optimized implementation of max(|x[0]|, |x[1]|, .., |x[L-1]|):
// calculate max of block (DTD_LEN values)
x = fabsf(x);
if (x > max_x[dtdNdx]) {
max_x[dtdNdx] = x;
if (x > max_max_x) {
max_max_x = x;
}
}
if (++dtdCnt >= DTD_LEN) {
dtdCnt = 0;
// calculate max of max
max_max_x = 0.0f;
for (int i = 0; i < NLMS_LEN/DTD_LEN; ++i) {
if (max_x[i] > max_max_x) {
max_max_x = max_x[i];
}
}
// rotate Ndx
if (++dtdNdx >= NLMS_LEN/DTD_LEN) dtdNdx = 0;
max_x[dtdNdx] = 0.0f;
}
// The Geigel DTD algorithm with Hangover timer Thold
if (fabsf(d) >= GeigelThreshold * max_max_x) {
hangover = Thold;
}
if (hangover) --hangover;
if (max_max_x < UpdateThreshold) {
// avoid update with silence or noise
return 1;
} else {
return (hangover > 0);
}
}
int AEC::doAEC(int d, int x)
{
float s0 = (float)d;
float s1 = (float)x;
// Mic Highpass Filter - telephone users are used to 300Hz cut-off
s0 = hp0.highpass(s0);
// Spk Highpass Filter - to remove DC
s1 = hp1.highpass(s1);
// Double Talk Detector
int update = !dtd(s0, s1);
// Acoustic Echo Cancellation
s0 = nlms_pw(s0, s1, update);
// Acoustic Echo Suppression
if (update) {
// Non Linear Processor (NLP): attenuate low volumes
s0 *= NLPAttenuation;
}
// Saturation
if (s0 > MAXPCM) {
return (int)MAXPCM;
} else if (s0 < -MAXPCM) {
return (int)-MAXPCM;
} else {
return (int)roundf(s0);
}
}
/***************************************************************
A.3 aec_test.cxx
***************************************************************/
/* aec_test.cxx
* Test stub for Acoustic Echo Cancellation NLMS-pw algorithm
* Author: Andre Adrian, DFS Deutsche Flugsicherung
* <Andre.Adrian at dfs.de>
*
* Version 1.1
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include "aec.h"
#define TAPS (80*8)
typedef signed short MONO;
typedef struct {
signed short l;
signed short r;
} STEREO;
/* Read a raw audio file (8KHz sample frequency, 16bit PCM, stereo)
* from stdin, echo cancel it and write it to stdout */ int main(int argc,
char *argv[]) {
STEREO inbuf[TAPS], outbuf[TAPS];
fprintf(stderr, "usage: aec_test <in.raw >out.raw\n");
AEC aec;
int taps;
while (taps = fread(inbuf, sizeof(STEREO), TAPS, stdin)) {
int i;
for (i = 0; i < taps; ++i) {
int s0 = inbuf[i].l; /* left channel microphone */
int s1 = inbuf[i].r; /* right channel speaker */
/* and do NLMS*/
s0 = aec.doAEC(s0, s1);
/* copy back */
outbuf[i].l = 0; /* left channel silence */
outbuf[i].r = (MONO)(s0); /* right channel echo cancelled mic */
}
fwrite(outbuf, sizeof(STEREO), taps, stdout);
}
fflush(NULL);
return 0;
}
/***************************************************************
A.4 Compile source code
***************************************************************/
On a Linux system with GNU C++ compiler enter:
g++ aec_test.cxx aec.cxx -o aec_test -lm
/***************************************************************
A.5 Test source code
***************************************************************/
The microphone and loudspeaker signals have to be synchronized on a
sample-to-sample basis to make acoustic echo cancellation working.
An AC97 conformal on-board soundcard in a Personal Computer can be set in a
special stereo mode: The left channnel records microphone signal and the
right channel reports loudspeaker signal.
To set-up a Linux PC with ALSA sound system, microphone connected to Mic in
and loudspeaker connected to right Line out enter:
amixer -q set 'Master',0 50% unmute
amixer -q set 'PCM',0 80% unmute
amixer -q set 'Line',0 0% mute
amixer -q set 'CD',0 0% mute
amixer -q set 'Mic',0 0% mute
amixer -q set 'Video',0 0% mute
amixer -q set 'Phone',0 0% mute
amixer -q set 'PC Speaker',0 0% mute
amixer -q set 'Aux',0 0% mute
amixer -q set 'Capture',0 50%,0%
amixer -q set 'Mic Boost (+20dB)',0 1
amixer -q cset iface=MIXER,name='Capture Source' 0,5
amixer -q cset iface=MIXER,name='Capture Switch' 1
To test the acoustic echo cancellation we simulate a real telephone
conversation in 5 steps:
(1) record far-end speaker,
(2) perform acoustic echo cancellation (this should change nothing)
(3) playback far-end speaker and at the same time record near-end speaker
(4) perform acoustic echo cancellation
(5) playback near-end speaker (far-end speech should be cancelled)
To record 10 seconds of speech into the file b.raw enter:
arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b.raw
To perform AEC at the far-end enter:
./aec_test <b.raw >b1.raw
To playback file b1.raw and simultaneously record b2.raw enter both commands
in one go:
aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b1.raw &
arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b2.raw
To perform AEC at the near-end enter:
./aec_test <b2.raw >b3.raw
To playback the echo-cancelled near-end enter:
aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b3.raw
DFS Deutsche Flugsicherung GmbH
TWR-Süd, Gebäude 501
Frankfurt - Flughafen
D - 60549 Frankfurt
Tel.: +49-(0)69-69766-101
Fax: +49-(0)69-69766-105
Home Page: http://www.dfs.de
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: PGP 6.5.8
mQGiBECbVgwRBAD09k4R2DiCObeUeO+FZCBJ8OkjzEIQ3niUMHSwlQmX5prKCJQe
NjEGvsS4Ex6qdYQ/awmXkNtOpsF0mN3aBoKUyRDF6KkkfsTNYQQ6WyK5RHu2Q4wQ
G93DL+Ryhgs2oNH3Ou4FbEiYATJCl14fpxd08D0DCsmL0ZfeaZlZeBCUzwCg/8sY
qJ2uSj5JgHWEp170menK6CUEAIlI3gXegKbBY1PFSpzNpjVGQJg9bQR4B6tqdASP
nLfsQR+1BIIz0WFgiIickqPSRbGYP7slpw9onE43su3HVg2sBMI25Q5kK6WujPUG
n72PDy8yogXCcYS807FcqMqKTqYjiRQxbcQn3gJaoTau0/HJTHF9jES89SyIDXdm
CjphA/9FZ0tmotILaxyL53X8G01lf28NhykkGzbBTiIAsgTcvCx6b1GxBwUb/WlL
KmWG3kjwSsZxtPzrUPN3Z83pavfCQI4E9tNI4mVgX9gtklKoVtJPglu2jPrJ+umZ
UO78anBrsTnPzOJ954+uziMe3imsFAC8T2gAmgsAvZgZP98gBLQYREZTIEdtYkgg
PHB1YmtleUBkZnMuZGU+iQBOBBARAgAOBQJAm1YMBAsDAQICGQEACgkQN3h5OLny
dHrchQCgmuRvdqRthFARXOQatgKCc+5pWs4AoPkSU2XeYbNq4AVmv0BJOpRgOsCJ
uQMNBECbVosQDADMHXdXJDhK4sTw6I4TZ5dOkhNh9tvrJQ4X/faY98h8ebByHTh1
+/bBc8SDESYrQ2DD4+jWCv2hKCYLrqmus2UPogBTAaB81qujEh76DyrOH3SET8rz
F/OkQOnX0ne2Qi0CNsEmy2henXyYCQqNfi3t5F159dSST5sYjvwqp0t8MvZCV7cI
fwgXcqK61qlC8wXo+VMROU+28W65Szgg2gGnVqMU6Y9AVfPQB8bLQ6mUrfdMZIZJ
+AyDvWXpF9Sh01D49Vlf3HZSTz09jdvOmeFXklnN/biudE/F/Ha8g8VHMGHOfMlm
/xX5u/2RXscBqtNbno2gpXI61Brwv0YAWCvl9Ij9WE5J280gtJ3kkQc2azNsOA1F
HQ98iLMcfFstjvbzySPAQ/ClWxiNjrtVjLhdONM0/XwXV0OjHRhs3jMhLLUq/zzh
sSlAGBGNfISnCnLWhsQDGcgHKXrKlQzZlp+r0ApQmwJG0wg9ZqRdQZ+cfL2JSyIZ
Jrqrol7DVelMMm8AAgIL/2zbjaNlPL+13ZFiJwAGg0yj4zciLkp141Pwvn2OtY+B
JZxnIfcPKINj2f5QiW4weqV9OMJ5EgZcx8aRxkk5uJsJv3S1JFUUNaSwCl0xynpr
Spw5QsoCAQAhzmOlqj1tvCJW3bm3iniiud6UzGjbdpvU9oeiSOGMFYVpfGCHC5fb
4TnnsLcrmARXh3COKle27X7TGOROUWyxqKWdHvBsMEjO2ERF2A+nMEYz4dd8kezd
Iiw9hjftJtp9GpCJ5CWq4jcyQ5Bb+D0IUqI0FdH9Mfe8ytMnDRwDPH1r9FaCNkaH
Q+8Aqp20QbSHe03CaT8UbYziNCNdzCFt4QjDqAfDsTKEHGeBzKfBprsKbox6CURk
IikAiUX0YE1P3bxH2ovP5bxEormlPfFN870QYNZYmo03hX41H6LnOaI4YaHzfiXG
Plrm/mtkDryXoqA57f09vcQcAmS6Qa50qyqheGK49lSM9MndqXGWrmddtccE3qUJ
/U1UAxqX11l80Yz8Wk+brokARgQYEQIABgUCQJtWiwAKCRA3eHk4ufJ0enLHAJ9R
3Z0uPt+U+qSJU/63IpU/y+Ho3QCgg571CpdVdsohBeaF21f4uckz3nU=
=h1ys
-----END PGP PUBLIC KEY BLOCK-----
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt
More information about the ag-dev
mailing list