【语音识别】基于MFCC实现声纹识别matlab源码
一、簡介
本文基于Matlab設(shè)計(jì)實(shí)現(xiàn)了一個(gè)文本相關(guān)的聲紋識(shí)別系統(tǒng),可以判定說話人身份。
1 系統(tǒng)原理
a.聲紋識(shí)別
? ? 這兩年隨著人工智能的發(fā)展,不少手機(jī)App都推出了聲紋鎖的功能。這里面所采用的主要就是聲紋識(shí)別相關(guān)的技術(shù)。聲紋識(shí)別又叫說話人識(shí)別,它和語音識(shí)別存在一點(diǎn)差別。
b.梅爾頻率倒譜系數(shù)(MFCC)
梅爾頻率倒譜系數(shù)(Mel Frequency Cepstrum Coefficient, MFCC)是語音信號(hào)處理中最常用的語音信號(hào)特征之一。
實(shí)驗(yàn)觀測發(fā)現(xiàn)人耳就像一個(gè)濾波器組一樣,它只關(guān)注頻譜上某些特定的頻率。人耳的聲音頻率感知范圍在頻譜上的不遵循線性關(guān)系,而是在Mel頻域上遵循近似線性關(guān)系。
梅爾頻率倒譜系數(shù)考慮到了人類的聽覺特征,先將線性頻譜映射到基于聽覺感知的Mel非線性頻譜中,然后轉(zhuǎn)換到倒譜上。普通頻率轉(zhuǎn)換到梅爾頻率的關(guān)系式為:
c.矢量量化(VectorQuantization)
本系統(tǒng)利用矢量量化對(duì)提取的語音MFCC特征進(jìn)行壓縮。
VectorQuantization (VQ)是一種基于塊編碼規(guī)則的有損數(shù)據(jù)壓縮方法。事實(shí)上,在 JPEG 和 MPEG-4 等多媒體壓縮格式里都有 VQ 這一步。它的基本思想是:將若干個(gè)標(biāo)量數(shù)據(jù)組構(gòu)成一個(gè)矢量,然后在矢量空間給以整體量化,從而壓縮了數(shù)據(jù)而不損失多少信息。
3 系統(tǒng)結(jié)構(gòu)
本文整個(gè)系統(tǒng)的結(jié)構(gòu)如下圖:
? –訓(xùn)練過程
首先對(duì)語音信號(hào)進(jìn)行預(yù)處理,之后提取MFCC特征參數(shù),利用矢量量化方法進(jìn)行壓縮,得到說話人發(fā)音的碼本。同一說話人多次說同一內(nèi)容,重復(fù)該訓(xùn)練過程,最終形成一個(gè)碼本庫。
? –識(shí)別過程
在識(shí)別時(shí),同樣先對(duì)語音信號(hào)預(yù)處理,提取MFCC特征,比較本次特征和訓(xùn)練庫碼本之間的歐氏距離。當(dāng)小于某個(gè)閾值,我們認(rèn)定本次說話的說話人及說話內(nèi)容與訓(xùn)練碼本庫中的一致,配對(duì)成功。
二、源代碼
function varargout = test4(varargin) % TEST4 MATLAB code for test4.fig % TEST4, by itself, creates a new TEST4 or raises the existing % singleton*. % % H = TEST4 returns the handle to a new TEST4 or the handle to % the existing singleton*. % % TEST4('CALLBACK',hObject,eventData,handles,...) calls the local % function named CALLBACK in TEST4.M with the given input arguments. % % TEST4('Property','Value',...) creates a new TEST4 or raises the % existing singleton*. Starting from the left, property value pairs are % applied to the GUI before test4_OpeningFcn gets called. An % unrecognized property name or invalid value makes property application % stop. All inputs are passed to test4_OpeningFcn via varargin. % % *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one % instance to run (singleton)". % % See also: GUIDE, GUIDATA, GUIHANDLES% Edit the above text to modify the response to help test4% Last Modified by GUIDE v2.5 17-Mar-2019 09:58:00% Begin initialization code - DO NOT EDIT gui_Singleton = 1; gui_State = struct('gui_Name', mfilename, ...'gui_Singleton', gui_Singleton, ...'gui_OpeningFcn', @test4_OpeningFcn, ...'gui_OutputFcn', @test4_OutputFcn, ...'gui_LayoutFcn', [] , ...'gui_Callback', []); if nargin && ischar(varargin{1})gui_State.gui_Callback = str2func(varargin{1}); endif nargout[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:}); elsegui_mainfcn(gui_State, varargin{:}); end % End initialization code - DO NOT EDIT% --- Executes just before test4 is made visible. function test4_OpeningFcn(hObject, eventdata, handles, varargin) % This function has no output args, see OutputFcn. % hObject handle to figure % eventdata reserved - to be defined in a future version of MATLAB % handles structure with handles and user data (see GUIDATA) % varargin command line arguments to test4 (see VARARGIN)% Choose default command line output for test4 handles.output = hObject;% Update handles structure guidata(hObject, handles);% UIWAIT makes test4 wait for user response (see UIRESUME) % uiwait(handles.figure1);% --- Outputs from this function are returned to the command line. function varargout = test4_OutputFcn(hObject, eventdata, handles) % varargout cell array for returning output args (see VARARGOUT); % hObject handle to figure % eventdata reserved - to be defined in a future version of MATLAB % handles structure with handles and user data (see GUIDATA)% Get default command line output from handles structure varargout{1} = handles.output;% --- Executes on button press in pushbutton1. function pushbutton1_Callback(hObject, eventdata, handles) % hObject handle to pushbutton1 (see GCBO) % eventdata reserved - to be defined in a future version of MATLAB % handles structure with handles and user data (see GUIDATA) global thk1 thk2 thk3 global tlc1 tlc2 tlc3 global tlyy1 tlyy2 tlyy3 global tqs1 tqs2 tqs3 global tyqc1 tyqc2 tyqc3 global startpos lenstartpos=601; len=399; [s,fs]=audioread('訓(xùn)練樣本hk1.wav'); thk1= MFCC2par(s,fs); thk1=thk1(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本hk2.wav'); thk2= MFCC2par(s,fs); thk2=thk2(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本hk3.wav'); thk3= MFCC2par(s,fs); thk3=thk3(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本lc1.wav'); tlc1= MFCC2par(s,fs); tlc1=tlc1(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本lc2.wav'); tlc2= MFCC2par(s,fs); tlc2=tlc2(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本lc3.wav'); tlc3= MFCC2par(s,fs); tlc3=tlc3(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本lyy1.wav'); tlyy1= MFCC2par(s,fs); tlyy1=tlyy1(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本lyy2.wav'); tlyy2= MFCC2par(s,fs); tlyy2=tlyy2(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本lyy3.wav'); tlyy3= MFCC2par(s,fs); tlyy3=tlyy3(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本qs1.wav'); tqs1= MFCC2par(s,fs); tqs1=tqs1(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本qs2.wav'); tqs2= MFCC2par(s,fs); tqs2=tqs2(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本qs3.wav'); tqs3= MFCC2par(s,fs); tqs3=tqs3(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本yqc1.wav'); tyqc1= MFCC2par(s,fs); tyqc1=tyqc1(startpos:startpos+len,1:12);[s,fs]=audioread('訓(xùn)練樣本yqc2.wav'); tyqc2= MFCC2par(s,fs); tyqc2=tyqc2(startpos:startpos+len,1:12); function getmfcc= MFCC2par( x,fs)%=========================================================% 無去噪及端點(diǎn)檢測% Input:音頻數(shù)據(jù)x,采樣率fs% Output:(N,M)大小的特征參數(shù)矩陣 其中N為分幀個(gè)數(shù),M為特征維度% 特征參數(shù):M=24 倒譜系數(shù)12維,一階差分12維%=========================================================%[x fs]=wavread(sound); %取單聲道信號(hào) [~,etmp]=size(x); if (etmp==2) x=x(:,1); end%歸一化mel濾波器組系數(shù)bank=melbankm(24,256,fs,0,0.5,'m');%Mel濾波器的階數(shù)為24,fft變換的長度為256,采樣頻率為8000Hz bank=full(bank);bank=bank/max(bank(:));%[24*129]%設(shè)定DCT系數(shù)for k=1:12n=0:23;dctcoef(k,:)=cos((2*n+1)*k*pi/(2*24));end%歸一化倒譜提升窗口w=1+6*sin(pi*[1:12]./12);w=w/max(w);%預(yù)加重濾波器xx=double(x);xx=filter([1-0.9375],1,xx);%預(yù)加重xx=enframe(xx,256,80);%對(duì)x 256點(diǎn)分為一幀%計(jì)算每幀的MFCC參數(shù)for i=1:size(xx,1)y=xx(i,:);%取一幀數(shù)據(jù)s=y'.*hamming(256);t=abs(fft(s));%fft快速傅立葉變換 幅度譜t=t.^2; %能量譜%對(duì)fft參數(shù)進(jìn)行mel濾波取對(duì)數(shù)再計(jì)算倒譜 c1=dctcoef*log(bank*t(1:129));%對(duì)能量譜濾波及DCT %t(1:129)對(duì)一幀的前128個(gè)數(shù)(幀移為128)c2=c1.*w';%歸一化倒譜%mfcc參數(shù)m(i,:)=c2';end三、運(yùn)行結(jié)果
總結(jié)
以上是生活随笔為你收集整理的【语音识别】基于MFCC实现声纹识别matlab源码的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 前缀中缀后缀表达式的计算求值
- 下一篇: 如何获取option的下标和值_数智化时