Rearch on Different Feature Parameters in Speake

更新时间:2023-05-30 19:39:53 阅读：评论：0

Journal of Signal and Information Processing, 2013, 4, 106-110

doi:10.4236/jsip.2013.42014 Published Online May 2013 (/journal/jsip)

Rearch on Different Feature Parameters in Speaker Recognition

Qiyue Liu, Mingqiu Yao, Han Xu, Fang Wang

Department of Communication and Information System, Hebei University of Science and Technology, Shijiazhuang, China. Received March 15th, 2013; revid April 16th, 2013; accepted April 25th, 2013

Copyright © 2013 Qiyue Liu et al. This is an open access article distributed under the Creative Commons Attribution Licen, which permits unrestricted u, distribution, and reproduction in any medium, provided the original work is properly cited. ABSTRACT

Feature parameters extraction is critical for speaker recognition rearch. The paper prents the funct

ion of pitch, for- mant and Mel frequency central coefficient (MFCC) in speaker recognition. It can increa the identification rate effec- tively for feature parameter sorts the speech corpus. Using Euclid Distance to compare feature parameters is very effec- tive.

Keywords: Pitch; Formant; MFCC; Euclid Distance

关于阅读的句子1. Introduction

People can distinguish different speakers through the ear, since people can know the difference, machine can also do it in some kind of method. Speaker recognition is to make a machine to identify different people, which are to let the machine know who is talking.

The ultimate purpo of speaker recognition is to iden-tify who is speaking, while to ignore the content of speech. In fact, it is the recognition of the characteristics of speech.

The human voice is a natural property, each person’s speech organs have their own characteristics and pronun- ciation habits. Therefore, to identify speaker exactly, the parameters that can fully reflect the personality charac- teristic must be extracted from the speech signal.

The feature parameters should have the character- istics [1,2]:

•The can fully embody the large difference between different people, and can keep stability relatively when the speaker’s speech changes.

•The can maintain good health and stubbornness when voice suffers from outside interference. •The cannot imitate easily.

•The are easy to extract and compute, and have fa-vorable independence between each dimension of the characteristic parameter.

Voice is different from fingerprint, the fingerprint is fixed, but the voice is changing, so it has not been found in some kind of parameter which could fully meet all of the features we mentioned above. The sound is con-nected with human emotion, health and environment, etc. And also has a relationship with the voice content. Therefore, all of the characteristic parameters we applied to have some defects now, which cannot accurately stand for the speaker’s personality traits.

2. Rearch on Different Feature Parameters Speaker’s characteristics are generally reflected in chan-nel feature and the glottal feature.

In the ca of ensuring the recognition rate, it should be very difficult to improve recognition time thro

ugh reducing computational complexity. It has been ud that expending computation time to improve the recognition rate. And in speaker recognition, with the increasing in the number of speakers, the time it takes to identify is increasing in a rectilinear fashion. Becau every time recognition must be matched with every speaker model orderly, and then find the clost corresponding speaker model as the final recognition result. In this way, the more registered number, the longer discriminating time, it must be reached by a limit that leads to a very long time to identify, it cannot meet the requirement finally. In this ca it could be nicely solved adopting classification.

2.1. Pitch Frequency

The pitch has aroud the periodicity through vocal cords vibration when madding voiced sound, pitch frequency is

All Rights Rerved.

请帖模板Rearch on Different Feature Parameters in Speaker Recognition

107

a very important parameter using to describe the charac- teristic of voice excitation source. The varia

tional range of pitch frequency is generally from 50 Hz to 500 Hz, the cycle of the male voice is 50 Hz - 300 Hz, and the female is 100 Hz - 500 Hz. Although each person’s different vocal structure lead to different fundamental frequency, becau of the pitch frequency’s scope is a little small, the gap between different people is little, and the most important is pitch frequency is affected by a lot of factors, such as emotion, tone, it is very difficult to achieve accu- rate fundamental frequency. Thus, the recognition rate is very low using the fundamental frequency for speaker recognition now. But male fundamental frequency is gen- erally lower than the female, it is a good argument as clas- sification.

Since the rearch of voice signal analysis, pitch ex- traction is always an important rearch topic. Speech signal changes complexly, which is affected by channel and has an ample harmonic constituent. Although many methods have been propod at prent, they all have limitations, cannot delegate speaker’s different characteris- tics, and can not adapt to different requirement and envi- ronment.

There are a variety of methods to extract the funda- mental tone [1]. The can be roughly divided into three categories, wave form estimation, correlation process and converter technique [3]. This paper ud a converter tech- nique to extract pitch, it transforms the speech signal to the cepstrum d

omain, eliminates channel impact using homo- morphic analytical method, then obtains the information of pumping part, and ermittelts fundamental frequency. Only voice sound has pitch alternation. The glottal ex- citation is less energy and white noi of spectrum evenly distributed when madding voiceless; when madding voice sound, it is a shock quence having a certain pe- riod. This period is the pitch alternation. A finite length quence of periodic impul has a periodic impul -

quence in cepstrum domain ()(0M

r r )p s n n αδ==− rT , M

is positive, r is crest factor, αp T is pitch alternation, and the period cannot change in cepstrum domain, the amplitude increas along with r and the rate of decay is faster than in the time domain. In this way, the method bad on cepstrum can be ud to extract fundamental frequency and it has a better effect.

Lab ttings: Intel(R) Core(TM)2 Duo T6400, 2 GHz memory, Windows XP system, MATLAB7.0 develop- ment platform, the experiment’s voice data u Cool Edit Pro to transcribe, sampling frequency is 16,000 Hz, sam- pling precision is 16 bit, single track, the age of recorded speaker is in

8 - 60 years old, speaking mandarin, every- one speaks 7 ntences, the time of every ntence is in 3 - 12 s, including vowel, consonant, Chine, English and

figure.

The experimental results were shown in Table 1, every speaker’s pitch frequency could not be accurately achieved with this method. The result appears in a scope rather than is a exact value. The scope of different peo-ple’s pitch frequency has a small gap and interction. So it is clearly not feasible only with a frequency value in the speaker recognition. The male voice’s pitch fre-quency is generally lower than the female, therefore, pitch can be ud to distribute speakers.

2.2. Formant

Formant information include in spectral envelope. The formant is generally the maximum of spectral envelope, so the necessary procedure of extracting formant is to estimate spectral envelope.

Methods of fetching formant contain cepstrum method and linear forecasting method [1]. Formant generally de-fined as the attenuation sine component of sound channel impul respon. A primary question for extracting for-mant is that impul respon of the sound channel can-not measure dire

ctly. Voice signals are the convolution of all pole model and glottal quasiperiodic function, so when analyzing, it must solve convolution, parate im-pul respon and excitation function.

The paper adopts linear forecasting method to estimate formant, the specific method is peak detection. Analyz-ing formant with linear predictor coefficients is faster and better than others. The track function which is de-scribed by linear predictor coefficients (LPC) is com-puted firstly, the function is ud to compute the spec-trum, according to the spectrum, the formant’s peak, fre-quency and bandwidth are computed [4].

The experimental environment is identical with pitch’s. The Table 2 shows that each formant could change when the same person said different word. Even though the same person’s value of alteration has a scope, this scope includes others’. Therefore formant parameter can- not be the effective one in speaker recognition. Experi- mental data proved that children’s value of F1 are higher than adults’, so the parameter can be ud to distinguish between child and adult.

Talbe 1. The result of pitch frequency with cepstrum me- thod.

Voice 1Voice 2 Voice 3 Voice 4Voice 5Woman 1Woman 2Woman 3Woman 4Man 1 Man 2 Man 3 Man 4

333 266 262 202 183 172 121 112

301 262 231 213 195 141 133 109

307 210 280 210 183 168 124 134

311 250 271 220 178 141 132 114

318 243 250 206 181 156 108 129

All Rights Rerved.

Rearch on Different Feature Parameters in Speaker Recognition 108

Talbe 2. Formants of speech signals.

F1 F2 F3

Adult

Man 1

Man 1 (different content)

Man 2

Man 3

Man 4

Woman 1

Woman 2

Woman 3

Woman 4

704

616

652

581

438

618

544

551

590

1174

1831

1323

1830

1614

1814

1834

2653

大学音乐1210

2456

2891

2721

2519

1780

2617

2960

2630

2279

Child Girl 1

Girl 2

Boy 1

Boy 2

749

1015

904

837

1405

1733

1353

1379

1643

2314

2990

2560

2.3. Mel Frequency Central Coefficient

In a noisy environment, people can also identify correctly different sound in the ear, the important reason is the cochlea played a role. The cochlear is equivalent to a t of filters, the filters are done to the signal on logarithmic frequency scale, and so human ear is more nsitive to low frequency signals [5].

A t of Mel filters of imitating the role of the cochlea are triangular filters, the center frequency is equispaced in the Mel frequency axis, and they have the same span on the Mel frequency scale. The number of filter bank is decided by cutoff frequency of signal, all of the filter bank collectively cover between 0 and 1/2 sampling fre-quency.

To emphasizing low frequency information of the sig-nal, MFCC change the linear frequency scale into Mel frequency scale, so uful information for identifying is highlighted and the noi jamming is shielded effectively. If Mel cepstrum is ud, filtering and weighting in the cepstrum domain are bad on linear spectrum process-ing [6].

MFCC generally reflect the static characteristics, but the human ear is more nsitive to the dynamic charac-teristics of voice. ΔMFCC can reflect dynamic property. This parameter can be acquired by computing first-order difference and cond. The paper us the parameter combining 12 dimensions MFCC with ΔMFCC.

The experimental environment is identical with pitch’s. Five methods of comparing to two MFCC were attempted.

1) Correlation coefficient

In theory, the correlation coefficient is the maximum when the same person speaks the same word, and it is the cond highest when the same people speak different words. Only in this way, could the speaker be identified. Analyzing experimental result, the Table 3 shows that the correlation coefficient of female speaker L cannot be identified, becau the value of the same people speaking different words is lower than the different people speak-ing the same word. So this means cannot be resultful Table 3. Correlation coefficient of MFCC of two voice sig-

nal.

x Same people

same content

0.52980.69470.6371

Same people

different content

0.36650.41160.4446

Different people

多少的英语same content

0.41610.66660.4463

Different people

different content

0.39320.5084 0.4544

method in speaker recognition.

2) Comparing to similarity of corresponding three- dimensional map

The data of the Figures 1 and 2 are from the same person. Although they are similar in general, there are

many data of MFCC, they don’t have regularity, the drew

three-dimensional map is intricate, after smoothing, it is

difficult to compare the similarity.

3) Comparing related coefficient of each column

Becau each MFCC dimension is uncorrelated, they

can be compared independently.

As shown in Table 4, this method is not uful for comparing MFCC. The part of women in different people

different content is larger than same people same content

in related coefficient of the first dimension. It is a nega-

tive relationship in the first, cond, fifth, sixth, tenth and

twelfth dimension of same people same content. So this

method cannot rve as the way of comparing MFCC.

4) Euclid distance

Table 5 shows that the euclid distance is minimum

when the same person speaks the same word, and it is

cond smallest when the same person speaks different

words. Regardless of what the speaker said, the mini-

mum Euclidean distance corresponding to the speaker is

the recognition results.

3. Conclusions

Pitch and formant are both the most important parame-

ters of the speech signal. In theory, becau of the dif-

叮咚叮咚

ferences of buccal structure and sound track, everyone

should have their own different characteristics of pitch

and formant. Speech signal changes in complex, sound

channel and noi have an effect on the signal, and ex-

tracting methods are imperfect, so pitch or formant is not

an effective parameter in speaker recognition recently,

they can only play a supporting role. MFCC is effective

for speaker identification, becau it combines nsing features of the human ear with producing mechanism of

voice.

Speaker’s personality can not be reprented well by a

single parameter, using only one just describes part of

All Rights Rerved.

Rearch on Different Feature Parameters in Speaker Recognition

109

Figure 2. Three-dimensional map of female consonant.

Figure 1. Three-dimensional map of female vowel.

Table 4. Correlation coefficient of each dimension of MFCC.

Different people same content

Different people different content

Same people same content

Same people different content

woman man woman man 1 −0.0053 −0.3237 −0.1266 −0.1800 0.0782 −0.2611 2 −0.3008

有雨的诗句−0.1839

−0.2274

−0.3389 0.2223 0.1788

3 0.4592 0.3923 0.5468 0.3992 0.3423 −0.102

4 4 0.3984 −0.1197

−0.0152 0.4367 0.2005 0.2433

5 −0.0324

−0.1635 0.3900 0.2537 −0.2118

−0.1130

6 −0.054

7 0.0859 −0.2082 −0.0695 0.1711 −0.0897

7 0.0890

−0.0764 0.1685 −0.2432 −0.0870 0.1036

8 0.0187 0.2787 0.1434 0.1532 0.1763 −0.1018 9 0.4090 0.0681 0.2142 0.0786 0.2639 −0.0544 10 −0.0865 −0.1368

−0.0766

−0.1188 0.2791 0.3179

中国百家姓11 0.1750

−0.3922 0.4273 −0.2079 −0.2229 0.3064

−0.1299

月子早餐食谱大全0.0124

0.3571 0.0360 0.2908 0.0185

Table 5. Euclid distance of MFCC of two speech singnal.

y L x Same people same content 92,338 41,214 79,086 Same people different content

124,110 90,346 139,190 woman 141,340 94,724 199,120 Different people same content

man 182,240 183,370 140,270 woman 176,860 92,334 219,090 Different people different content

man

149,040 116,800 188,970

All Rights Rerved.

Rearch on Different Feature Parameters in Speaker Recognition 110

speaker’s characteristics, therefore, to improve the speaker recognition rate, many parameters should be combined to identify.

REFERENCES

[1]H. Hu, “Introduction to Speech Signal Processing,”

Harbin Institute of Technology Press, Harbin, 2000. [2]X. J. Yang and H. S. Chi, “Digital Processing of Speech

Signals,” Electronic Industry Press, Beijing, 1995.

[3]M. M. Sondhi, “New Methods of Pitch Extraction,” IEEE

Transaction on AU, Vol. 16, No. 1, 1968, pp. 262-266. [4]K. Du, “LPC Analysis on Formant of Speech

Signal,”

Natural Science Journal of Harbin Normal University, Vol. 2, 1998, pp. 49-52.

[5]N. Do Minh, “An Automatic Speaker Recognition Sys-

tem,” Audio Visual Communications Laboratory Swiss Federal Institute of Technology, Lausanne, 2001.

[6]Y. Chen, Z. Y. Qu, Y. Liu, K. Jiu, A. P. Guo and Z. G.

Yang, “Extraction and Application on One of Speech Pa-

rameters,” MFCC Journal of Hunan Agricultural Univer-

sity (Natural Science), Vol. 35, No. 1, 2009, pp. 106-107.

All Rights Rerved.

本文发布于:2023-05-30 19:39:53，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/815493.html

上一篇：圣诞节购物文案

下一篇：五年级下册美术教学计划

标签：阅读请帖音乐百家姓食谱句子

留言与评论（共有 0 条评论）