05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测)
文章目錄
- 作業1:機器翻譯
- 1. 日期轉換
- 1.1 數據集
- 2. 用注意力模型進行機器翻譯
- 2.1 注意力機制
- 3. 可視化注意力
- 作業2:觸發詞檢測
- 1. 數據合成:創建語音數據集
- 1.1 聽一下數據
- 1.2 音頻轉頻譜
- 1.3 生成一個訓練樣本
- 1.4 全部訓練集
- 1.5 開發集
- 2. 模型
- 2.1 建模
- 2.2 訓練
- 2.3 測試模型
- 3. 預測
- 3.3 在開發集上測試
- 4. 用自己的樣本測試
測試題:參考博文
筆記:W3.序列模型和注意力機制
作業1:機器翻譯
建立一個神經元機器翻譯(NMT)模型來將人類可讀日期(25th of June, 2009)翻譯成機器可讀日期(“2009—06—25”)
將使用注意力模型來實現這一點,這是最復雜的 序列到序列 模型之一
注意安裝包
pip install Faker==2.0.0 pip install babel- 導入包
1. 日期轉換
模型將輸入以各種可能格式書寫的日期(例如"the 29th of August 1958", "03/30/1968", "24 JUNE 1987"),并將其轉換為標準化、機器可讀的日期(如 "1958-08-29", "1968-03-30", "1987-06-24")。我們將讓模型學習以通用機器可讀格式YYYY-MM-DD輸出日期
1.1 數據集
- 1萬條數據
- 打印看看
輸出:
[('9 may 1998', '1998-05-09'),('10.11.19', '2019-11-10'),('9/10/70', '1970-09-10'),('saturday april 28 1990', '1990-04-28'),('thursday january 26 1995', '1995-01-26'),('monday march 7 1983', '1983-03-07'),('sunday may 22 1988', '1988-05-22'),('08 jul 2008', '2008-07-08'),('8 sep 1999', '1999-09-08'),('thursday january 1 1981', '1981-01-01')]上面加載了:
- dataset
- human_vocab: 字典, human readable dates : an integer-valued index
- machine_vocab: 字典, machine readable dates : an integer-valued index
- inv_machine_vocab: 字典,machine_vocab的反向映射,indices : characters
輸出:
X.shape: (10000, 30) Y.shape: (10000, 10) Xoh.shape: (10000, 30, 37) # 37 是 len(human_vocab) Yoh.shape: (10000, 10, 11) # 11 是 日期中的字符種類 0-9 和 ‘-’- 看看數據(數據不夠長度的,會補充 pad,所有 x 都是 30 長度)
輸出:
Source date: saturday october 9 1976 Target date: 1976-10-09Source after preprocessing (indices): [29 13 30 31 28 16 13 34 0 26 15 30 26 14 17 28 0 12 0 4 12 10 9 3636 36 36 36 36 36] Target after preprocessing (indices): [ 2 10 8 7 0 2 1 0 1 10]Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]...[0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.]] Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.][0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.][0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]2. 用注意力模型進行機器翻譯
2.1 注意力機制
context<t>=∑t′=0Txα<t,t′>a<t′>context^{<t>} = \sum_{t' = 0}^{T_x} \alpha^{<t,t'>}a^{<t'>}context<t>=t′=0∑Tx??α<t,t′>a<t′>
將輸入重復幾次:https://keras.io/zh/layers/core/#repeatvector
輸入張量通過 axis 軸串聯起來 https://keras.io/zh/layers/merge/#concatenate_1
https://keras.io/zh/layers/wrappers/#bidirectional
- 注意力計算
- 定義模型
- 定義優化器、配置模型
- 訓練
- 為了節省時間,老師準備好了訓練好的權值
輸出:
source: 5th Otc 2019 output: 2019-10-05 source: 5 April 09 output: 2009-04-05 source: 21th of August 2016 output: 2016-08-20 source: Tue 10 Jul 2007 output: 2007-07-10 source: Saturday May 9 2018 output: 2018-05-09 source: March 3 2001 output: 2001-03-03 source: March 3rd 2001 output: 2001-03-03 source: 1 March 2001 output: 2001-03-013. 可視化注意力
attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64)
可以看出大部分的注意力用來預測年份
作業2:觸發詞檢測
- 導入包
1. 數據合成:創建語音數據集
1.1 聽一下數據
有正向音頻 activates(觸發詞)、負向音頻(非觸發詞)、背景噪聲
IPython.display.Audio("./raw_data/backgrounds/1.wav")1.2 音頻轉頻譜
音頻為 44100 Hz 的,時長 10秒
x = graph_spectrogram("audio_examples/example_train.wav")
本作業訓練樣本時長 10 秒,頻譜時間步為 5511,所以 Tx=5511T_x = 5511Tx?=5511
輸出:
Time steps in audio recording before spectrogram (441000,) Time steps in input after spectrogram (101, 5511)- 定義參數
1.3 生成一個訓練樣本
- 隨機選擇10s 背景噪聲
- 隨機插入 0-4 段 觸發詞音頻
- 隨機插入 0-2 段 非觸發詞音頻
輸出:
background len: 10000 activate[0] len: 721 activate[1] len: 731- 獲取背景音頻中的隨機時間段
- 檢測插入的音頻是否重疊
- 插入音頻
- 插入標簽 1
- 合成訓練數據
1.4 全部訓練集
老師已經處理完了所有數據
# Load preprocessed training examples X = np.load("./XY_train/X.npy") Y = np.load("./XY_train/Y.npy")1.5 開發集
使用真人錄制的音頻
# Load preprocessed dev set examples X_dev = np.load("./XY_dev/X_dev.npy") Y_dev = np.load("./XY_dev/Y_dev.npy")2. 模型
- 導入包
2.1 建模
模型先由一個 1維的卷積 來抽取一些特征,還可以加速GRU計算只需要處理 1375 個時間步,而不是5511個
注意:不要使用雙向RNN,我們需要檢測到觸發詞后馬上輸出動作,如果使用雙向RNN,我們需要等待 10s 音頻被記錄下來,再判斷
- 一些 Keras 參考
conv1d https://keras.io/zh/layers/convolutional/#conv1d
BN https://keras.io/zh/layers/normalization/#batchnormalization
GRU https://keras.io/zh/layers/recurrent/#gru
timedistributed https://keras.io/zh/layers/wrappers/#timedistributed
# GRADED FUNCTION: modeldef model(input_shape):"""Function creating the model's graph in Keras.Argument:input_shape -- shape of the model's input data (using Keras conventions)Returns:model -- Keras model instance"""X_input = Input(shape = input_shape)### START CODE HERE #### Step 1: CONV layer (≈4 lines)X = Conv1D(filters=196,kernel_size=15,strides=4)(X_input) # CONV1DX = BatchNormalization()(X) # Batch normalizationX = Activation('relu')(X) # ReLu activationX = Dropout(rate=0.8)(X) # dropout (use 0.8)# Step 2: First GRU Layer (≈4 lines)X = GRU(128, return_sequences=True)(X) # GRU (use 128 units and return the sequences)X = Dropout(rate=0.8)(X) # dropout (use 0.8)X = BatchNormalization()(X) # Batch normalization# Step 3: Second GRU Layer (≈4 lines)X = GRU(128, return_sequences=True)(X) # GRU (use 128 units and return the sequences)X = Dropout(rate=0.8)(X) # dropout (use 0.8)X = BatchNormalization()(X) # Batch normalizationX = Dropout(rate=0.8)(X) # dropout (use 0.8)# Step 4: Time-distributed dense layer (≈1 line)X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed (sigmoid)### END CODE HERE ###model = Model(inputs = X_input, outputs = X)return model model = model(input_shape = (Tx, n_freq)) model.summary()輸出:
Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) (None, 5511, 101) 0 _________________________________________________________________ conv1d_2 (Conv1D) (None, 1375, 196) 297136 _________________________________________________________________ batch_normalization_2 (Batch (None, 1375, 196) 784 _________________________________________________________________ activation_2 (Activation) (None, 1375, 196) 0 _________________________________________________________________ dropout_2 (Dropout) (None, 1375, 196) 0 _________________________________________________________________ gru_2 (GRU) (None, 1375, 128) 124800 _________________________________________________________________ dropout_3 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ batch_normalization_3 (Batch (None, 1375, 128) 512 _________________________________________________________________ gru_3 (GRU) (None, 1375, 128) 98688 _________________________________________________________________ dropout_4 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ batch_normalization_4 (Batch (None, 1375, 128) 512 _________________________________________________________________ dropout_5 (Dropout) (None, 1375, 128) 0 _________________________________________________________________ time_distributed_1 (TimeDist (None, 1375, 1) 129 ================================================================= Total params: 522,561 Trainable params: 521,657 Non-trainable params: 9042.2 訓練
訓練很費時,在4000個樣本上,老師已經訓練好了該模型
model = load_model('./models/tr_model.h5')再用我們的數據集,訓練1代
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01) model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"]) model.fit(X, Y, batch_size = 5, epochs=1)2.3 測試模型
loss, acc = model.evaluate(X_dev, Y_dev) print("Dev set accuracy = ", acc)輸出:
25/25 [==============================] - 1s 46ms/step Dev set accuracy = 0.9427199959754944但是 準確率 在這里不是一個好的衡量標準,因為大部分標簽都是0,都預測為0,準確率也會很高,應該用 F1值等
3. 預測
def detect_triggerword(filename):plt.subplot(2, 1, 1)x = graph_spectrogram(filename)# the spectogram outputs (freqs, Tx) and we want (Tx, freqs) to input into the modelx = x.swapaxes(0,1)x = np.expand_dims(x, axis=0)predictions = model.predict(x)plt.subplot(2, 1, 2)plt.plot(predictions[0,:,0])plt.ylabel('probability')plt.show()return predictions一旦估計了在每個輸出步驟檢測到單詞“activate”的概率,當概率高于某個閾值時,您可以觸發“chiming”聲音播放。此外,在說“activate”之后,有很多個 y 值可能接近1,但我們只想發出一次蜂鳴音。所以最多每75個輸出步驟插入一個蜂鳴音。這將有助于防止我們為“activate”的單個實例插入兩個蜂鳴音。(這與計算機視覺的非最大值抑制類似)
chime_file = "audio_examples/chime.wav" def chime_on_activate(filename, predictions, threshold):audio_clip = AudioSegment.from_wav(filename)chime = AudioSegment.from_wav(chime_file)Ty = predictions.shape[1]# Step 1: Initialize the number of consecutive output steps to 0consecutive_timesteps = 0# Step 2: Loop over the output steps in the yfor i in range(Ty):# Step 3: Increment consecutive output stepsconsecutive_timesteps += 1# Step 4: If prediction is higher than the threshold and more than 75 consecutive output steps have passedif predictions[0,i,0] > threshold and consecutive_timesteps > 75:# Step 5: Superpose audio and background using pydubaudio_clip = audio_clip.overlay(chime, position = ((i / Ty) * audio_clip.duration_seconds)*1000)# Step 6: Reset consecutive output steps to 0consecutive_timesteps = 0audio_clip.export("chime_output.wav", format='wav')3.3 在開發集上測試
- 第一段語音,有1個觸發
- 第二段語音,有2個觸發
4. 用自己的樣本測試
# Preprocess the audio to the correct format def preprocess_audio(filename):# Trim or pad audio segment to 10000mspadding = AudioSegment.silent(duration=10000)segment = AudioSegment.from_wav(filename)[:10000]segment = padding.overlay(segment)# Set frame rate to 44100segment = segment.set_frame_rate(44100)# Export as wavsegment.export(filename, format='wav') your_filename = "audio_examples/my_audio.wav" preprocess_audio(your_filename) IPython.display.Audio(your_filename) # listen to the audio you uploaded chime_threshold = 0.5 prediction = detect_triggerword(your_filename) chime_on_activate(your_filename, prediction, chime_threshold) IPython.display.Audio("./chime_output.wav")本文地址:https://michael.blog.csdn.net/article/details/108933798
我的CSDN博客地址 https://michael.blog.csdn.net/
長按或掃碼關注我的公眾號(Michael阿明),一起加油、一起學習進步!
總結
以上是生活随笔為你收集整理的05.序列模型 W3.序列模型和注意力机制(作业:机器翻译+触发词检测)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: LeetCode 1723. 完成所有工
- 下一篇: LeetCode 1686. 石子游戏