當前位置：首頁 > 编程语言 > python >内容正文

python

含辞未吐,声若幽兰,史上最强免费人工智能AI语音合成TTS服务微软Azure(Python3.10接入)

發布時間：2023/12/18 python 98 豆豆

生活随笔收集整理的這篇文章主要介紹了含辞未吐,声若幽兰,史上最强免费人工智能AI语音合成TTS服务微软Azure(Python3.10接入) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

所謂文無第一，武無第二，云原生人工智能技術目前呈現三足鼎立的態勢，微軟，谷歌以及亞馬遜三大巨頭各擅勝場，不分伯仲，但目前微軟Azure平臺不僅僅只是一個PaaS平臺，相比AWS，以及GAE，它應該是目前提供云計算人工智能服務最全面的一個平臺，尤其是語音合成領域，論AI語音的平順、自然以及擬真性，無平臺能出其右。

本次，我們通過Python3.10版本接入Azure平臺語音合成接口，打造一款本地的TTS服務(文本轉語音:Text To Speech)。

準備工作

首先根據Azure平臺官方文檔：https://learn.microsoft.com/zh-cn/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=macos%2Cterminal&pivots=programming-language-python

在平臺上創建免費訂閱服務：https://azure.microsoft.com/zh-cn/free/cognitive-services/

免費訂閱成功后，進入資源創建環節，這里我們訪問網址，創建免費的語音資源：https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices

這里注意訂閱選擇免費試用，使用區域選擇東亞，如果在國外可以選擇國外的對應區域。

創建語音服務資源成功后，轉到資源組列表，點擊獲取資源秘鑰：

需要注意的是，任何時候都不要將秘鑰進行傳播，或者將秘鑰寫入代碼并且提交版本。

這里相對穩妥的方式是將秘鑰寫入本地系統的環境變量中。

Windows系統使用如下命令：

setx COGNITIVE_SERVICE_KEY 您的秘鑰

Linux系統使用如下命令：

export COGNITIVE_SERVICE_KEY=您的秘鑰

Mac系統的bash終端：

編輯 ~/.bash_profile，然后添加環境變量

export COGNITIVE_SERVICE_KEY=您的秘鑰

添加環境變量后，請從控制臺窗口運行 source ~/.bash_profile，使更改生效。

Mac系統的zsh終端：

編輯 ~/.zshrc，然后添加環境變量

export COGNITIVE_SERVICE_KEY=您的秘鑰

如此，前期準備工作就完成了。

本地接入

確保本地Python環境版本3.10以上，然后安裝Azure平臺sdk:

pip3 install azure-cognitiveservices-speech

創建test.py文件：

`import azure.cognitiveservices.speech as speechsdk import os speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia")``audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)`

這里定義語音的配置文件，通過os模塊將上文環境變量中的秘鑰取出使用，region就是新建語音資源時選擇的地區，audio_config是選擇當前計算機默認的音箱進行輸出操作。

接著，根據官方文檔的配置，選擇一個語音機器人：https://learn.microsoft.com/zh-cn/azure/cognitive-services/speech-service/language-support?tabs=stt-tts#prebuilt-neural-voices

純文本 wuu-CN-XiaotongNeural1（女） wuu-CN-YunzheNeural1（男）不支持 yue-CN 中文（粵語，簡體） yue-CN 純文本 yue-CN-XiaoMinNeural1（女） yue-CN-YunSongNeural1（男）不支持 zh-CN 中文（普通話，簡體） zh-CN 音頻 + 人工標記的腳本純文本結構化文本短語列表 zh-CN-XiaochenNeural4、5、6（女） zh-CN-XiaohanNeural2、4、5、6（女） zh-CN-XiaomengNeural1、2、4、5、6（女） zh-CN-XiaomoNeural2、3、4、5、6（女） zh-CN-XiaoqiuNeural4、5、6（女） zh-CN-XiaoruiNeural2、4、5、6（女） zh-CN-XiaoshuangNeural2、4、5、6、8（女） zh-CN-XiaoxiaoNeural2、4、5、6（女） zh-CN-XiaoxuanNeural2、3、4、5、6（女） zh-CN-XiaoyanNeural4、5、6（女） zh-CN-XiaoyiNeural1、2、4、5、6（女） zh-CN-XiaoyouNeural4、5、6、8（女） zh-CN-XiaozhenNeural1、2、4、5、6（女） zh-CN-YunfengNeural1、2、4、5、6（男） zh-CN-YunhaoNeural1、2、4、5、6（男） zh-CN-YunjianNeural1、2、4、5、6（男） zh-CN-YunxiaNeural1、2、4、5、6（男） zh-CN-YunxiNeural2、3、4、5、6（男） zh-CN-YunyangNeural2、4、5、6（男） zh-CN-YunyeNeural2、3、4、5、6（男） zh-CN-YunzeNeural1、2、3、4、5、6（男）神經網絡定制聲音專業版神經網絡定制聲音精簡版（預覽版）跨語言語音（預覽版） zh-CN-henan 中文（中原河南普通話，中國大陸）不支持不支持 zh-CN-henan-YundengNeural1（男）不支持 zh-CN-liaoning 中文（東北普通話，中國大陸）不支持不支持 zh-CN-liaoning-XiaobeiNeural1（女）不支持 zh-CN-shaanxi 中文（中原陜西普通話，中國大陸）不支持不支持 zh-CN-shaanxi-XiaoniNeural1（女）不支持 zh-CN-shandong 中文（冀魯普通話，中國大陸）不支持不支持 zh-CN-shandong-YunxiangNeural1（男）不支持 zh-CN-sichuan 中文（西南普通話，簡體） zh-CN-sichuan 純文本 zh-CN-sichuan-YunxiNeural1（男）不支持 zh-HK 中文（粵語，繁體） zh-HK 純文本 zh-HK-HiuGaaiNeural4、5、6（女） zh-HK-HiuMaanNeural4、5、6（女） zh-HK-WanLungNeural1、4、5、6（男）神經網絡定制聲音專業版 zh-TW 中文(臺灣普通話) zh-TW 純文本 zh-TW-HsiaoChenNeural4、5、6（女） zh-TW-HsiaoYuNeural4、5、6（女） zh-TW-YunJheNeural4、5、6（男）神經網絡定制聲音專業版

單以中文語音論，可選擇的范圍還是相當廣泛的。

繼續編輯代碼：

import azure.cognitiveservices.speech as speechsdk import os speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia") audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True) speech_config.speech_synthesis_voice_name='zh-CN-XiaomoNeural' speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config) text = "hello 大家好，這里是人工智能AI機器人在說話" speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

這里我們選擇zh-CN-XiaomoNeural作為默認AI語音，并且將text文本變量中的內容通過音箱進行輸出。

如果愿意，我們也可以將語音輸出為實體文件進行存儲：

import azure.cognitiveservices.speech as speechsdk import os speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia") audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)file_config = speechsdk.audio.AudioOutputConfig(filename="./output.wav") speech_config.speech_synthesis_voice_name='zh-CN-XiaomoNeural' speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config) text = "hello 大家好，這里是人工智能AI機器人在說話" speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

這里指定file_config配置為腳本相對路徑下的output.wav文件：

ls output.wav

如此，音頻文件就可以被保存起來，留作以后使用了。

語音調優

默認AI語音聽多了，難免會有些索然寡味之感，幸運的是，Azure平臺提供了語音合成標記語言 (SSML) ，它可以改善合成語音的聽感。

根據Azure官方文檔：https://learn.microsoft.com/zh-cn/azure/cognitive-services/speech-service/speech-synthesis-markup

通過調整語音的角色以及樣式來獲取定制化的聲音：

語音樣式角色 en-GB-RyanNeural1 cheerful, chat 不支持 en-GB-SoniaNeural1 cheerful, sad 不支持 en-US-AriaNeural chat, customerservice, narration-professional, newscast-casual, newscast-formal, cheerful, empathetic, angry, sad, excited, friendly, terrified, shouting, unfriendly, whispering, hopeful 不支持 en-US-DavisNeural chat, angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering 不支持 en-US-GuyNeural newscast, angry, cheerful, sad, excited, friendly, terrified, shouting, unfriendly, whispering, hopeful 不支持 en-US-JaneNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering 不支持 en-US-JasonNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering 不支持 en-US-JennyNeural assistant, chat, customerservice, newscast, angry, cheerful, sad, excited, friendly, terrified, shouting, unfriendly, whispering, hopeful 不支持 en-US-NancyNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering 不支持 en-US-SaraNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering 不支持 en-US-TonyNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering 不支持 es-MX-JorgeNeural1 cheerful, chat 不支持 fr-FR-DeniseNeural1 cheerful, sad 不支持 fr-FR-HenriNeural1 cheerful, sad 不支持 it-IT-IsabellaNeural1 cheerful, chat 不支持 ja-JP-NanamiNeural chat, customerservice, cheerful 不支持 pt-BR-FranciscaNeural calm 不支持 zh-CN-XiaohanNeural5 calm, fearful, cheerful, disgruntled, serious, angry, sad, gentle, affectionate, embarrassed 不支持 zh-CN-XiaomengNeural1、5 chat 不支持 zh-CN-XiaomoNeural5 embarrassed, calm, fearful, cheerful, disgruntled, serious, angry, sad, depressed, affectionate, gentle, envious YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, Girl, Boy zh-CN-XiaoruiNeural5 calm, fearful, angry, sad 不支持 zh-CN-XiaoshuangNeural5 chat 不支持 zh-CN-XiaoxiaoNeural5 assistant, chat, customerservice, newscast, affectionate, angry, calm, cheerful, disgruntled, fearful, gentle, lyrical, sad, serious, poetry-reading 不支持 zh-CN-XiaoxuanNeural5 calm, fearful, cheerful, disgruntled, serious, angry, gentle, depressed YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, Girl, Boy zh-CN-XiaoyiNeural1、5 angry, disgruntled, affectionate, cheerful, fearful, sad, embarrassed, serious, gentle 不支持 zh-CN-XiaozhenNeural1、5 angry, disgruntled, cheerful, fearful, sad, serious 不支持 zh-CN-YunfengNeural1、5 angry, disgruntled, cheerful, fearful, sad, serious, depressed 不支持 zh-CN-YunhaoNeural1、2、5 advertisement-upbeat 不支持 zh-CN-YunjianNeural1、3、4、5 Narration-relaxed, Sports_commentary, Sports_commentary_excited 不支持 zh-CN-YunxiaNeural1、5 calm, fearful, cheerful, angry, sad 不支持 zh-CN-YunxiNeural5 narration-relaxed, embarrassed, fearful, cheerful, disgruntled, serious, angry, sad, depressed, chat, assistant, newscast Narrator, YoungAdultMale, Boy zh-CN-YunyangNeural5 customerservice, narration-professional, newscast-casual 不支持 zh-CN-YunyeNeural5 embarrassed, calm, fearful, cheerful, disgruntled, serious, angry, sad YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, Girl, Boy zh-CN-YunzeNeural1、5 calm, fearful, cheerful, disgruntled, serious, angry, sad, depressed, documentary-narration OlderAdultMale, SeniorMale

這里將語音文本改造為SSML的配置格式：

import os import azure.cognitiveservices.speech as speechsdkspeech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia") audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)file_config = speechsdk.audio.AudioOutputConfig(filename="./output.wav") speech_config.speech_synthesis_voice_name='zh-CN-XiaomoNeural' speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config) #text = "hello 大家好，這里是人工智能AI機器人在說話" #speech_synthesis_result = speech_synthesizer.speak_text_async(text).get() text = """ <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN"> <voice name="zh-CN-XiaoxiaoNeural"> <mstts:express-as style="lyrical" role="YoungAdultFemale" > <prosody rate="+12.00%"> hello 大家好，這里是劉悅的技術博客大江東去，浪淘盡，千古風流人物。故壘西邊，人道是，三國周郎赤壁。亂石穿空，驚濤拍岸，卷起千堆雪。江山如畫，一時多少豪杰。 </prosody> </mstts:express-as> </voice> </speak>""" result = speech_synthesizer.speak_ssml_async(ssml=text).get()

通過使用style和role標記進行定制，同時使用rate屬性來提升百分之十二的語速，從而讓AI語音更加連貫順暢。注意這里使用ssml=text來聲明ssml格式的文本。

結語

人工智能AI語音系統完成了人工智能在語音合成這個細分市場的落地應用，為互聯網領域內許多需要配音的業務節約了成本和時間。

總結

以上是生活随笔為你收集整理的含辞未吐,声若幽兰,史上最强免费人工智能AI语音合成TTS服务微软Azure(Python3.10接入)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： interface详解
下一篇： java的duplicate用法_Jav