當前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

Introduction to CELP Coding

發布時間：2023/12/13 综合教程 24 生活家

生活随笔收集整理的這篇文章主要介紹了 Introduction to CELP Coding 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section7. The CELP technique is based on three ideas:

The use of a linear prediction (LP) model to model the vocal tract
The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
The search performed in closed-loop in a ``perceptually weighted domain''

This section describes the basic ideas behind CELP. Note that it's still incomplete.

Linear Prediction (LPC)

Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signalusing a linear combination of its past samples:

whereis the linear prediction of. The prediction error is thus given by:

The goal of the LPC analysis is to find the best prediction coefficientswhich minimize the quadratic error function:

That can be done by making all derivativesequal to zero:

Thefilter coefficients are computed using the Levinson-Durbinalgorithm, which starts from the auto-correlationof the signal.

For an orderfilter, we have:

The filter coefficientsare found by solving the system. What the Levinson-Durbin algorithm does here is making the solution to the probleminstead ofby exploiting the fact that matrixis toeplitz hermitian. Also, it can be proven that all the roots ofare within the unit circle, which means thatis always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiplyby a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.

The linear prediction model represents each speech sample as a linear combination of past samples, plus an error signal called the excitation (or residual).

In thez-domain, this can be expressed as

whereis defined as

We usually refer toas the analysis filter andas the synthesis filter. The whole process is called short-term prediction as it predicts the signalusing a prediction using only thepast samples, whereis usually around 10.

Because LPC coefficients have very little robustness to quantization, they are converted to Line Spectral Pair(LSP) coefficients which have a much better behaviour with quantization, one of them being that it's easy to keep the filter stable.

Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signalby a gain times the past of the excitation:

whereis the pitch period,is the pitch gain. We call that long-term prediction since the excitation is predicted fromwith.

Innovation Codebook

The final excitationwill be the sum of the pitch prediction and aninnovationsignaltaken from a fixed codebook, hence the nameCodeExcited Linear Prediction. The final excitation is given by:

The quantization ofis where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In thez-domain we can represent the final signalas

Analysis-by-Synthesis and Error Weighting

Most (if not all) modern audio codecs attempt to ``shape'' the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder andvice versa. That's why instead of minimizing the simple quadratic error

whereis the encoder signal, we minimize the error for the perceptually weighted signal

whereis the weighting filter, usually of the form

(1)

with control parameters. If the noise is white in the perceptually weighted domain, then in the signal domain its spectral shape will be of the form

If a filterhas (complex) poles atin the-plane, the filterwill have its poles at, making it a flatter version of.

Analysis-by-synthesis refers to the fact that when trying to find the best pitch parameters (,) and innovation signal, we do not work by making the excitationas close as the original one (which would be simpler), but apply the synthesis (and weighting) filter and try makingas close to the original as possible.

參考資料：

1 百科總結： https://zh.wikipedia.org/wiki/%E7%A0%81%E6%BF%80%E5%8A%B1%E7%BA%BF%E6%80%A7%E9%A2%84%E6%B5%8B
2 詳細介紹： http://ntools.net/arc/Documents/speex/manual/node8.html

總結

以上是生活随笔為你收集整理的Introduction to CELP Coding的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：语言基础之description方法
下一篇： .NET 数据库缓存依赖策略实现