手机版

基于HTK和MicrosoftSpeechSDK的连续语音识别系统的研究及实现

时间:2025-04-22   来源:未知    
字号:

摘要

语音识别是近年来高速发展的一项技术。让计算机听懂人说话,甚至和人进行交流是我们梦寐以求的梦想。在不久的将来,这个梦想会变成现实。本文的主要目的是对连续语音识别进行探讨。

本文开始先介绍了语音识别的基础知识,详细的讨论了语音信号的处理方法和语音识别系统的原理。接下来本文从两方面进行展开。

一方面从模式识别的研究角度,探讨语音信号的提取和语音识别的原理,建立相应的语音识别模型。本文综合了基于MFCC的特征提取技术,HMM原理、训练算法和单音子模型的建模技术,上下文无关文法的定义和应用,Viterbi算法等方法,构筑了HTK3.4+TIMIT的连续语音识别实验系统,并做了关于混合分量维数的实验,结果表明随着混合分量数从1提高到128,系统的识别率从47.01%提高到了62.33%。

针对LVCSR中混合分量数的提高带来的时间消耗增加问题,本文研究了似然率的快速算法。本文基于HTK 3.4 实现了部分距离消去算法(PDE)、最佳混合预测算法(BMP)和特征矢量元素重排算法(FCR)。实验结果表明,快速似然率计算方法在可接受的识别率的降低范围内,可显著地降低似然率计算的时间开销。

另一个方面从识别系统的软件开发考虑,建立一个基于篮球比赛的数据统计的语音识别系统。介绍了Microsoft的Speech SDK在系统中的嵌入,介绍了XML。之后给出一个实例,运用SAPI建立了能够识别多个句型和几十个词汇的一个限定领域的连续语音识别系统,用作篮球比赛数据统计系统的语音界面,经测试系统的识别率可以达到86%,之后介绍了噪声控制的一些技术和提高系统语音识别率的方法。

关键词:连续语音识别、快速高斯计算、Speech API

ABSTRACT

Speech recognition is a fast growing technique these years. Making computers understand human speech and even communicate with human beings are dreams of us. In the near future, this dream may come true. The main purpose of this paper is to discuss continuous speech recognition.

At the very beginning, the basic knowledge of speech recognition was introduced. Detailed discussion of the speech signal processing and speech recognition theory were given. Then, the paper launched in two ways.

In the way of pattern recognition, speech signal extraction and speech recognition principle were discussed, while the corresponding speech recognition model was built. Firstly, speech signal was preprocessed, the characteristic parameters MFCC was extracted. Then, on the basis of HMM, monophone model, a large-scale vocabulary continuous speech recognition experiment system was built, HTK3.4 as the platform and TIMIT as the corpus. Experiment about Gaussian mixture splitting was finished. The experiment showed that as the mixture number increased from 1 to 128, the recognition accuracy increased from 47.01% to 62.33%.

To derive high level of recognition accuracy, even more Gaussians can be used and thus the percentage of the recognition time used in Gaussian evaluations could be higher. This kind of likelihood-based statistical acoustic modeling is so time-consuming that the recognition is very slow. Some LVCSR systems might even decode speech several times slower than real time. Therefore, it is necessary to develop efficient techniques in order to reduce the time consumption of likelihood computation without a significant degradation of recognition accuracy. In this paper, partial distance elimination (PDE) technique, best mixture prediction (BMP) technique and feature component reordering (FCR) technique were introduced. Experiments showed that the combination of these techniques were effective to fast Gaussian likelihood computation.

Another aspect of the paper focused on speech recognition software development, a speech recognition system used for statistics of a basketball game was built. How to use the Microsoft Speech SDK as a voice interface was given, and XML was also introduced. Following is an example of getting started with SAPI, a domain specific continuous speech recognition system, which could identify a number of sentences and dozens of words. Then it was used as the voice interface for statistics of a basketball game. Experiments showed that its recognition accuracy was 86%. Finally, ways of noise control and how to improve the speech recognition rate were introduced.

KEY WORDS: continuous speech recognition; fast likelihood computation; Speech API

目 录

摘 要………………………………………………………………………………1

ABSTRACT...........................................................1

第一章绪论.........................................................1

1.1 语音识别概述..........................................................................................1

1.2 语音识别现状..........................................................................................2

1.3 本文主要内容及论文结构......................................................................4

第二章语音识别系统.................................................5

2.1 语音识别基本原理..................................................................................5

2.2 语音信号预处理和特征提取..................................................................6

2.2.1 采样与量化.................................................................................6

2.2.2 预加重.........................................................................................6

2.2.3 加窗........................................................ …… 此处隐藏:15633字,全部文档内容请下载后查看。喜欢就下载吧 ……

基于HTK和MicrosoftSpeechSDK的连续语音识别系统的研究及实现.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
    ×
    二维码
    × 游客快捷下载通道(下载后可以自由复制和排版)
    VIP包月下载
    特价:29 元/月 原价:99元
    低至 0.3 元/份 每月下载150
    全站内容免费自由复制
    VIP包月下载
    特价:29 元/月 原价:99元
    低至 0.3 元/份 每月下载150
    全站内容免费自由复制
    注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
    × 常见问题(客服时间:周一到周五 9:30-18:00)