ROS实战( 三 )利用科大讯飞tts实现ROS下语音合成播报
一.前言
繼上篇博客的內(nèi)容,下面主要介紹流程:
我們從圖中可以看出,首先xf_tts節(jié)點(diǎn)訂閱了/voice/xf_tts_topic這個(gè)話題,這個(gè)話題的類型是std_msgs/String,然后通過調(diào)用科大訊飛在線合成代碼形成節(jié)點(diǎn)將收到的文本輸入進(jìn)語音合成文件,文件類型是.wav,最后通過system函數(shù)來調(diào)用play命令,來播放.wav文件.
二.操作流程
首先默認(rèn)你安裝了ros,,并配置好了相關(guān)的路徑和環(huán)境,沒有安裝的參考我這篇博客
https://blog.csdn.net/weixin_40522162/article/details/79244089
打開終端,cd到ros工作路徑下源文件下,即/catkin_ws/src下,首先創(chuàng)建一個(gè)自己的包
未修改的tts_sample.c如下:
/* * 語音合成(Text To Speech,TTS)技術(shù)能夠自動將任意文字實(shí)時(shí)轉(zhuǎn)換為連續(xù)的 * 自然語音,是一種能夠在任何時(shí)間、任何地點(diǎn),向任何人提供語音信息服務(wù)的 * 高效便捷手段,非常符合信息時(shí)代海量數(shù)據(jù)、動態(tài)更新和個(gè)性化查詢的需求。 */#include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h>#include "qtts.h" #include "msp_cmn.h" #include "msp_errors.h"/* wav音頻頭部格式 */ typedef struct _wave_pcm_hdr {char riff[4]; // = "RIFF"int size_8; // = FileSize - 8char wave[4]; // = "WAVE"char fmt[4]; // = "fmt "int fmt_size; // = 下一個(gè)結(jié)構(gòu)體的大小 : 16short int format_tag; // = PCM : 1short int channels; // = 通道數(shù) : 1int samples_per_sec; // = 采樣率 : 8000 | 6000 | 11025 | 16000int avg_bytes_per_sec; // = 每秒字節(jié)數(shù) : samples_per_sec * bits_per_sample / 8short int block_align; // = 每采樣點(diǎn)字節(jié)數(shù) : wBitsPerSample / 8short int bits_per_sample; // = 量化比特?cái)?shù): 8 | 16char data[4]; // = "data";int data_size; // = 純數(shù)據(jù)長度 : FileSize - 44 } wave_pcm_hdr;/* 默認(rèn)wav音頻頭部數(shù)據(jù) */ wave_pcm_hdr default_wav_hdr = {{ 'R', 'I', 'F', 'F' },0,{'W', 'A', 'V', 'E'},{'f', 'm', 't', ' '},16,1,1,16000,32000,2,16,{'d', 'a', 't', 'a'},0 }; /* 文本合成 */ int text_to_speech(const char* src_text, const char* des_path, const char* params) {int ret = -1;FILE* fp = NULL;const char* sessionID = NULL;unsigned int audio_len = 0;wave_pcm_hdr wav_hdr = default_wav_hdr;int synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;if (NULL == src_text || NULL == des_path){printf("params is error!\n");return ret;}fp = fopen(des_path, "wb");if (NULL == fp){printf("open %s error.\n", des_path);return ret;}/* 開始合成 */sessionID = QTTSSessionBegin(params, &ret);if (MSP_SUCCESS != ret){printf("QTTSSessionBegin failed, error code: %d.\n", ret);fclose(fp);return ret;}ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);if (MSP_SUCCESS != ret){printf("QTTSTextPut failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "TextPutError");fclose(fp);return ret;}printf("正在合成 ...\n");fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音頻頭,使用采樣率為16000while (1) {/* 獲取合成音頻 */const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret);if (MSP_SUCCESS != ret)break;if (NULL != data){fwrite(data, audio_len, 1, fp);wav_hdr.data_size += audio_len; //計(jì)算data_size大小}if (MSP_TTS_FLAG_DATA_END == synth_status)break;printf(">");usleep(150*1000); //防止頻繁占用CPU}printf("\n");if (MSP_SUCCESS != ret){printf("QTTSAudioGet failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "AudioGetError");fclose(fp);return ret;}/* 修正wav文件頭數(shù)據(jù)的大小 */wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);/* 將修正過的數(shù)據(jù)寫回文件頭部,音頻文件為wav格式 */fseek(fp, 4, 0);fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //寫入size_8的值fseek(fp, 40, 0); //將文件指針偏移到存儲data_size值的位置fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //寫入data_size的值fclose(fp);fp = NULL;/* 合成完畢 */ret = QTTSSessionEnd(sessionID, "Normal");if (MSP_SUCCESS != ret){printf("QTTSSessionEnd failed, error code: %d.\n",ret);}return ret; }int main(int argc, char* argv[]) {int ret = MSP_SUCCESS;const char* login_params = "appid = 5afcee34, work_dir = .";//登錄參數(shù),appid與msc庫綁定,請勿隨意改動/** rdn: 合成音頻數(shù)字發(fā)音方式* volume: 合成音頻的音量* pitch: 合成音頻的音調(diào)* speed: 合成音頻對應(yīng)的語速* voice_name: 合成發(fā)音人* sample_rate: 合成音頻采樣率* text_encoding: 合成文本編碼格式**/const char* session_begin_params = "voice_name = xiaoyan, text_encoding = utf8, sample_rate = 16000, speed = 50, volume = 50, pitch = 50, rdn = 2";const char* filename = "tts_sample.wav"; //合成的語音文件名稱const char* text = "親愛的用戶,您好,這是一個(gè)語音合成示例,感謝您對科大訊飛語音技術(shù)的支持!科大訊飛是亞太地區(qū)最大的語音上市公司,股票代碼:002230"; //合成文本/* 用戶登錄 */ret = MSPLogin(NULL, NULL, login_params);//第一個(gè)參數(shù)是用戶名,第二個(gè)參數(shù)是密碼,第三個(gè)參數(shù)是登錄參數(shù),用戶名和密碼可在http://www.xfyun.cn注冊獲取if (MSP_SUCCESS != ret){printf("MSPLogin failed, error code: %d.\n", ret);goto exit ;//登錄失敗,退出登錄}printf("\n###########################################################################\n");printf("## 語音合成(Text To Speech,TTS)技術(shù)能夠自動將任意文字實(shí)時(shí)轉(zhuǎn)換為連續(xù)的 ##\n");printf("## 自然語音,是一種能夠在任何時(shí)間、任何地點(diǎn),向任何人提供語音信息服務(wù)的 ##\n");printf("## 高效便捷手段,非常符合信息時(shí)代海量數(shù)據(jù)、動態(tài)更新和個(gè)性化查詢的需求。 ##\n");printf("###########################################################################\n\n");/* 文本合成 */printf("開始合成 ...\n");ret = text_to_speech(text, filename, session_begin_params);if (MSP_SUCCESS != ret){printf("text_to_speech failed, error code: %d.\n", ret);}printf("合成完畢\n");exit:printf("按任意鍵退出 ...\n");getchar();MSPLogout(); //退出登錄return 0; }將它修改成ros的節(jié)點(diǎn)文件如下:
/* * 語音合成(Text To Speech,TTS)技術(shù)能夠自動將任意文字實(shí)時(shí)轉(zhuǎn)換為連續(xù)的 * 自然語音,是一種能夠在任何時(shí)間、任何地點(diǎn),向任何人提供語音信息服務(wù)的 * 高效便捷手段,非常符合信息時(shí)代海量數(shù)據(jù)、動態(tài)更新和個(gè)性化查詢的需求。 */#include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <ros/ros.h> #include <std_msgs/String.h> #include "qtts.h" #include "msp_cmn.h" #include "msp_errors.h"const char* fileName="/home/zc/Music/voice.wav"; const char* playPath="play /home/zc/Music/voice.wav";/* wav音頻頭部格式 */ typedef struct _wave_pcm_hdr {char riff[4]; // = "RIFF"int size_8; // = FileSize - 8char wave[4]; // = "WAVE"char fmt[4]; // = "fmt "int fmt_size; // = 下一個(gè)結(jié)構(gòu)體的大小 : 16short int format_tag; // = PCM : 1short int channels; // = 通道數(shù) : 1int samples_per_sec; // = 采樣率 : 8000 | 6000 | 11025 | 16000int avg_bytes_per_sec; // = 每秒字節(jié)數(shù) : samples_per_sec * bits_per_sample / 8short int block_align; // = 每采樣點(diǎn)字節(jié)數(shù) : wBitsPerSample / 8short int bits_per_sample; // = 量化比特?cái)?shù): 8 | 16char data[4]; // = "data";int data_size; // = 純數(shù)據(jù)長度 : FileSize - 44 } wave_pcm_hdr;/* 默認(rèn)wav音頻頭部數(shù)據(jù) */ wave_pcm_hdr default_wav_hdr = {{ 'R', 'I', 'F', 'F' },0,{'W', 'A', 'V', 'E'},{'f', 'm', 't', ' '},16,1,1,16000,32000,2,16,{'d', 'a', 't', 'a'},0 }; /* 文本合成 */ int text_to_speech(const char* src_text, const char* des_path, const char* params) {int ret = -1;FILE* fp = NULL;const char* sessionID = NULL;unsigned int audio_len = 0;wave_pcm_hdr wav_hdr = default_wav_hdr;int synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;if (NULL == src_text || NULL == des_path){printf("params is error!\n");return ret;}fp = fopen(des_path, "wb");if (NULL == fp){printf("open %s error.\n", des_path);return ret;}/* 開始合成 */sessionID = QTTSSessionBegin(params, &ret);if (MSP_SUCCESS != ret){printf("QTTSSessionBegin failed, error code: %d.\n", ret);fclose(fp);return ret;}ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);if (MSP_SUCCESS != ret){printf("QTTSTextPut failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "TextPutError");fclose(fp);return ret;}printf("正在合成 ...\n");fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音頻頭,使用采樣率為16000while (1) {/* 獲取合成音頻 */const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret);if (MSP_SUCCESS != ret)break;if (NULL != data){fwrite(data, audio_len, 1, fp);wav_hdr.data_size += audio_len; //計(jì)算data_size大小}if (MSP_TTS_FLAG_DATA_END == synth_status)break;printf(">");usleep(15*1000); //防止頻繁占用CPU}printf("\n");if (MSP_SUCCESS != ret){printf("QTTSAudioGet failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "AudioGetError");fclose(fp);return ret;}/* 修正wav文件頭數(shù)據(jù)的大小 */wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);/* 將修正過的數(shù)據(jù)寫回文件頭部,音頻文件為wav格式 */fseek(fp, 4, 0);fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //寫入size_8的值fseek(fp, 40, 0); //將文件指針偏移到存儲data_size值的位置fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //寫入data_size的值fclose(fp);fp = NULL;/* 合成完畢 */ret = QTTSSessionEnd(sessionID, "Normal");if (MSP_SUCCESS != ret){printf("QTTSSessionEnd failed, error code: %d.\n",ret);}return ret; } /* make topic callback to wav file */ void makeTextToWav(const char* text,const char* filename) {int ret=MSP_SUCCESS;const char* login_params="appid = 5afcee34, work_dir = .";//登錄參數(shù),appid與msc庫綁定,請勿隨意改動/** rdn: 合成音頻數(shù)字發(fā)音方式* volume: 合成音頻的音量* pitch: 合成音頻的音調(diào)* speed: 合成音頻對應(yīng)的語速* voice_name: 合成發(fā)音人* sample_rate: 合成音頻采樣率* text_encoding: 合成文本編碼格式**/const char* session_begin_params = "voice_name = xiaowanzi, text_encoding = utf8, sample_rate = 16000, speed = 60, volume = 60, pitch = 50, rdn = 0";/* const char* filename = "tts_sample.wav"; //合成的語音文件名稱const char* text = "大家好,我叫小倩,我今年15歲,我的車牌號是A123456,今天是2018年5月17號"; //合成文本 *//* 用戶登錄 */ret = MSPLogin(NULL, NULL, login_params);//第一個(gè)參數(shù)是用戶名,第二個(gè)參數(shù)是密碼,第三個(gè)參數(shù)是登錄參數(shù),用戶名和密碼可在http://www.xfyun.cn注冊獲取if (MSP_SUCCESS != ret){printf("MSPLogin failed, error code: %d.\n", ret);}/* 文本合成 */ else {printf("開始合成 ...\n");ret = text_to_speech(text, filename, session_begin_params);if (MSP_SUCCESS != ret){printf("text_to_speech failed, error code: %d.\n", ret);}printf("合成完畢\n"); }MSPLogout(); } /* play compose.wav file */ void playWav() { system(playPath); } /* topic auto invoke,make text to wav file,then play wav file */ void topicCallBack(const std_msgs::String::ConstPtr& msg) { std::cout<<"get topic text:"<<msg->data.c_str();makeTextToWav(msg->data.c_str(),fileName);playWav();} int main(int argc, char* argv[]) { const char* start = "科大訊飛在線語音合成模塊啟動成功"; makeTextToWav(start,fileName); playWav();ros::init(argc,argv,"xf_tts_node");ros::NodeHandle nd;ros::Subscriber sub = nd.subscribe("/voice/xf_tts_topic",3,topicCallBack);ros::spin();return 0; }總的修改思路如下:將main函數(shù)里的內(nèi)容剪切,然后放進(jìn)另一個(gè)自己定義的函數(shù)里,那么main函數(shù)里面空了.參考發(fā)布節(jié)點(diǎn)的main函數(shù)里的格式,在main函數(shù)里面加入初始化,句柄,回調(diào)函數(shù),訂閱者等內(nèi)容,另外開頭加上ROS的頭文件以及消息類型的頭文件,具體參考如下wiki上的編寫訂閱器節(jié)點(diǎn)文件:
#include "ros/ros.h" #include "std_msgs/String.h"/*** This tutorial demonstrates simple receipt of messages over the ROS system.*/ void chatterCallback(const std_msgs::String::ConstPtr& msg) {ROS_INFO("I heard: [%s]", msg->data.c_str()); }int main(int argc, char **argv) {/*** The ros::init() function needs to see argc and argv so that it can perform* any ROS arguments and name remapping that were provided at the command line. For programmatic* remappings you can use a different version of init() which takes remappings* directly, but for most command-line programs, passing argc and argv is the easiest* way to do it. The third argument to init() is the name of the node.** You must call one of the versions of ros::init() before using any other* part of the ROS system.*/ros::init(argc, argv, "listener");/*** NodeHandle is the main access point to communications with the ROS system.* The first NodeHandle constructed will fully initialize this node, and the last* NodeHandle destructed will close down the node.*/ros::NodeHandle n;/*** The subscribe() call is how you tell ROS that you want to receive messages* on a given topic. This invokes a call to the ROS* master node, which keeps a registry of who is publishing and who* is subscribing. Messages are passed to a callback function, here* called chatterCallback. subscribe() returns a Subscriber object that you* must hold on to until you want to unsubscribe. When all copies of the Subscriber* object go out of scope, this callback will automatically be unsubscribed from* this topic.** The second parameter to the subscribe() function is the size of the message* queue. If messages are arriving faster than they are being processed, this* is the number of messages that will be buffered up before beginning to throw* away the oldest ones.*/ros::Subscriber sub = n.subscribe("chatter", 1000, chatterCallback);/*** ros::spin() will enter a loop, pumping callbacks. With this version, all* callbacks will be called from within this thread (the main one). ros::spin()* will exit when Ctrl-C is pressed, or the node is shutdown by the master.*/ros::spin(); return 0; }最后cmakelist里加上要執(zhí)行的文件以及依賴庫
add_executable(xf_tts_node src/xf_tts.cpp) target_link_libraries(xf_tts_node ${catkin_LIBRARIES} -lmsc -lrt -ldl -lpthread)以及前面include_directories里加上include
include_directories(include${catkin_INCLUDE_DIRS} )這樣就行了
三.測試運(yùn)行
開多個(gè)終端
第一個(gè)終端
第二個(gè)終端
rosrun voice_system xf_tts_node可以聽見科大訊飛在線語音模塊啟動成功的聲音
這時(shí)可以打開topic看看有哪些節(jié)點(diǎn)在運(yùn)行,以及手動用命令發(fā)布消息讓其播報(bào)
命令如下:
貼圖如下:
這時(shí)能聽見你好的聲音,說明成功了.程序改的沒問題.
總結(jié)
以上是生活随笔為你收集整理的ROS实战( 三 )利用科大讯飞tts实现ROS下语音合成播报的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 学习defi备忘
- 下一篇: 计算机不能取代老师的英语作文,电脑是否会