Support

Technology

Design

Contact Us

	Miss Yu
	13823761625 0755-27595155
	Sales@ChipSourceTek.com
	Room302,building A3,MingXi Creative Park,FuYongHuaiDe,Bao‘An District.ShenZhen

You are here：Home >> Support >> Technology

Technology

Audio signal processing in speech recognition

Time:2021-12-14 Views:2492

本文为荷兰代尔夫特理工大学（作者：JOEP DE JONG）的学士论文，共50页。

利用神经网络对语音进行转录是一项值得关注的技术，目前，语音助手正变得越来越流行。神经网络通常很难确定说话人和噪音之间的区别。人类对这一点有了更好的理解，并可能应用它们对信号结构的知识来提高对神经网络的理解。

理解和转录歌曲的歌词是一个非常困难的问题，本文分析了可应用于歌曲的信号处理技术，以提高对语音识别算法的理解。主要集中在从伴奏中过滤歌词。介绍了几种基本的滤波方法，包括低幅度滤波结按道和带通滤波。同时，还民于在讨论了利用背景音乐周期性的两个更复杂的滤波器。第一种滤波器是使用二维傅里叶变换的语音分离方法。该方法由PremSeetharaman、Fatemeh Pishdadian、BryanPardo于2017年提出，将信号处理和图像处理技术相结合，通过识别信号谱图的二维傅里叶变换中的峰值来发现信号中的新抓周期性重复。第二种滤波器是一种新提出的方法，可用于分离背景音乐。该算法通过比较谱图中的序列，如果有多个与所选列相似地出现（重复），则将该列分类为重叠列。然后，将重叠列的频率分量（通过离散短时傅里叶变换获得的不同频率）与其他列中相同频率的分量进行比较。在某些情况下，重叠的频率分量从频谱图的其他列分量中减去，以此消除了歌曲中重复的频率。在这种方法的多次迭代之后，谱图的主要成分最有可能对应于歌曲中重复最少的部分。讨论了在构造比较谱图列的方法时所作的决定，并与使用二维傅里叶变换方法的步骤进行了比较。从研究结果可以看出，二维傅立叶变换在严格的周期伴奏中表现得更好，而比较谱图列的方法在节奏不太紧凑的歌曲中表现得更好。

The transcription of voice using neural networks is a technique that deserves attention, asspeech assistantsare becoming increasingly popular. Neural networks have often difficulty withdetermining the differencesbetween a talking person and noise. Humans have a much betterunderstanding of this and could possibly applytheir knowledge of the structure of the signalsto improve the understanding ofthe neural network. A problem that isextremely difficult for aneural network is understanding and transcribing thelyrics of a song.This thesis analyzes signal-processing techniques that can beapplied to a song to improvethe understanding of a speech-recognitionalgorithm. It is mainly focused onfiltering the fore-ground lyrics from the accompaniment. Some basic filtering methods are describedincluding alow-amplitude filter and a band-pass filter. But also two more complicated filters whichmakeuse of the periodicity of the background music will be treated.The first filter is a method of voice separation using the two-dimensional Fourier transform.This method, proposed by PremSeetharaman, Fatemeh Pishdadian, Bryan Pardo in 2017 [15],combines techniquesof signal-processing and image-processing by finding periodic repetitionsin a signalby identifying peaks in the two-dimensional Fourier transform of thespectrogram ofthe signal.The second filter is a newly proposed method that canbe used for the separation of foregroundfrom background music. The algorithm compares columns in the spectrogram and classifiescolumns asoverlapping if there are multiple occurrences of columns similar to theselected col-umn (repetitions). Thefrequency components, the different frequencies obtained from adiscreteshort-time Fourier transform, of overlapping columns are afterwardcompared with componentsof the same frequency in other columns. Under certain circumstances, overlapping frequencycomponents are subtractedfrom components in other columns of the spectrogram. This removesrepetitions of that frequencythroughout the song. The components ofthe spectrogram that re-main after several iterations of this method are mostlikely to correspond to the least repetitiveparts of the song.The decisions that are made while constructing the method of comparing spectrogramcolumns are discussed and are compared with steps performed in the method that uses thetwo-dimensional Fourier transform. An implementation and demonstration are alsoattached.From the research it is expected that the two-dimensional Fouriertransform perform better onstrict periodic accompaniment, while the method thatcompares spectrogram columns is morelikely to perform better on songs with aless tight rhythm.

1.引言

2.信号、采样与频谱理论

3.滤波

4.通过比较频谱列分离语音信号

5.具体实现与验证

6.讨论与结论

免责声明：本文章转自其它平台，并不代表本站观点及立场。若有侵权或异议，请联系我们删除。谢谢！
BD手机网页版官方登录入口-半岛彩票官方网站 ChipSourceTek

Previous：CST118S is widely used in a wireless charging mobile toothbrush scheme 2021/08/02

Next：Analysis of speech recognition technology 2021/12/13

Home

About Us

Products

Download

News

Support

Application

Contact Us

Support

Contact Us

Technology

Audio signal processing in speech recognition