CHARACTERISTICS OF A LOW REJECT MODE SPEAKER VERIFICATION SYSTEM

更新时间:2023-07-23 06:58:35 阅读: 评论:0

CHARACTERISTICS OF A LOW REJECT MODE SPEAKER
benefit是什么意思
VERIFICATION SYSTEM
weak是什么意思Daniel Elenius and Mats Blomberg
Department of Speech Music and Hearing, Centre for Speech Technology (CTT)
KTH/TMH, Sweden
daniele@speech.kth., matsb@speech.kth.
ABSTRACT
The performance of a speaker verification (SV) system is normally determined by the fal reject (FRR) and fal accept (FAR) rates as averages on a population of test speakers. However, information on the FRR distribution is required when estimating the portion of clients that will suffer from an unacceptably high reject rate. This paper studies this distribution in a population using a SV system operating in low reject mode. Two models of the distribution are propod and compared with test data. An attempt is also made to tune the decision threshold in order to obtain a desired portion of clients having a reject rate lower than a specified value.
1.INTRODUCTION
The performance of a speaker verification (SV) system is normally assd by measuring the fal reject (FRR) and fal accept (FAR) rates as averages on a test population. However, as the FRR of each client is individual, some clients may suffer from high FRR while others are very ldom rejected. The portion of clients with individual FRR above a certain threshold is not possible to determine solely by the average FRR of the population. As an example, the proportion of clients with unacceptably high individual FRR is expected to be larger in a population with high spread among the speakers than in a homogenous one, if the average FRR values of the two ts are identical. More information on the detailed characteristics of the distribution function is required for this purpo.
In this paper, we study the problem in an SV system operating in low reject mode, intended for combined u with other identification methods, such as a PIN code. The task of the SV module is to detect impostors who have knowledge of the PIN code of a target client. To limit the number of true clients being annoyed by the SV system frequently denying them access, a low rejection mode of operation is needed. It is of interest for the rvice provider to t the acceptance threshold to a value where the proportion of true clients having this problem is below a specified limit. As pointed o
ut above, the average FRR of a population is not sufficient for this purpo.
Direct measurement of the FRR distribution is possible but needs to be bad on a large number of utterances spoken by many speakers. A statistical model may therefore be needed to estimate this measure bad on fewer obrvations.
The purpo of the experiments in this paper has been to:
•Examine the FRR distribution in a population
青少年心里using an SV system in a low reject mode.
•Create statistical models that estimate the
六年级英语试卷
distribution of the fal reject rate of a
population sample.
•Compare the model distributions with tho
measured on an SV databa.
•Try a novel threshold tting method bad on
two parameters: the accepted rejection rate and
the accepted size of population having a reject
rate greater than the desired one.
2.METHODS
The resolution and accuracy of an FRR distribution estimate is limited by the size of the population sample. In this ca, an appropriate statistical model may give a better approximation of the true distribution.
In speaker verification the decision is normally bad on the verification score. Other solutions have been propod [1], but in this paper we concentrate on score-bad verification systems. In such a system, the FRR distribution may be derived from a score model or being bad on modeling the FRR directly. Methods bad on the two principles are described below.
2.1.Score-bad model
The score an SV system assigns an utterance spoken by the client may be viewed as a stochastic variable. A model of the distribution of this stochastic may be designed. As the model parameters are individual, a distribution of parameters is expected in a client population. Modeling the parameter distributions forms a population model. This model may then be ud to estimate the scores of the modeled speaker poulation.
In this paper a model is ud where the score of the i’th client is assumed to be Gaussian with an average, m i, and a standard deviation, σi. This results in a population model consisting of two stochastic variables for each client. To simplify calculations, the variables are assumed to be independent and normally distributed. For the standard deviation, one departure from Gaussian distribution was
necessary. Since it is a non-negative measure, its distribution was cut off at zero and then re-normalized.
Individual fal reject rate may be viewed as an approximation of the probability of rejecting an utterance made by the client. According to the model, this is given by
)
,(21)},({)|(2
2
1)(i i th m x i
i i i i m FRR dx e
m c c th S P i
i σπ
σσσ==
=⇒=≤∫jason richardson
∞−−−(1)
where the probability that a score, S , for an utterance spoken by client c i  is not greater than the threshold, th  is calculated.
阅读理解
In a population of clients the probability of rejecting an individual client may vary, resulting in a distribution of FRR.The probability of a client in the population having an individual FRR not greater than x is given by
∑≤≤=
===
=
≤x
c FRR c N
i x
carry是什么意思英语c FRR c i i i i i N c p c p x FRR P )(:1)(:}/1)({)()((2)
alternatelywhere the probability, p(c i ), of a client speaking is summed for all clients having an FRR not greater
xes
than x .
In the population model, the average score and standard deviation are continuous stochastic variables. The sum in equation (2) therefore becomes an integral of a speaker density function. This integral is given by
∫∫∫∫≤Σ≤Σ=
=Σ==
≤x
m FRR m M x
m FRR m M dmd f m f t independen M dmd m f x FRR P ),(:),(),(:),(,)()(},,{),()(σσσσσ
σσ
σ(3)
where the Gaussian probability density functions of the mean,f M (m), and standard deviation, f Σ(σ),
are integrated over the domain where FRR is not greater than the limit, x . The FRR is given by m  and σ according to Equation (1). Solving the integral (3) is complicated, therefore numerical integration may be needed.2.2.
A model of the fal reject rate
An alternative to the score-bad approach is to model the fal reject rate directly. An appropriate distribution to approximate the FRR may be chon by obrving the shape of this measure for a population sample. One possibility is to u the exponential distribution. This distribution has one parameter,the average FRR.2.3.
Client-centered threshold tting
A client-centered threshold tting method is also propod. In this method, a threshold is determined by two criteria. The maximum acceptable frequency of rejection may be chon bad on client interviews. The rvice provider may then require that at least a certain fraction of clients should have an individual reject rate below this value. An example tting might be: acceptable reject rate, 5% and requested fraction of population, 95%. This requires an estimation of the FRR
distribution. As was mentioned above, though, a large number of speakers are required in order to measure this distribution directly. An appropriate model offers a method of estimating this distribution bad on fewer obrvations.
If the FRR is assumed to be exponentially distributed, the average FRR is directly given by the design criteria. A threshold may then be t to match this average FRR, on a development t.
In a score-bad SV system, the score model prented in 2.1 may be tuned on a development t. If the model holds for the development t, then the client-centered threshold may be estimated bad on the model.
3. EXPERIMENT
To explore the distribution of FRR of clients, a databa with clients speaking four-digit utterances was ud. The similarity of the utterances to target speakers was scored by an SV system. Three a posteriori  score-thresholds where chon to achieve an average FRR of 1, 2 or 3%. For each threshold, the individual FRR was obrved in order to measure the FRR distribution among clients.
The score model was tuned to the speakers by measuring the mean and standard deviation of spea
kers ’ score means and standard deviations resulting in a model using four parameters.Numerical integration was ud to calculate the FRR distribu-tion bad on 100x100 samples of the m σ-plane. The standard deviation was also cut off at 0.001 to avoid division by zero.
The dissimilarity between the model and client distributions of FRR was compared for the score model and the exponential model. This comparison was performed on the cumulative frequency distribution of FRR to avoid having to choo an appropriate number of bins.
A small experiment with the client-centered threshold tting method was also performed. Due to the very limited size of the available speaker databa, the results may only give an indication of the possible potential of the methods. In this experiment a design criterion of a given portion of the speakers not having an FRR greater than 5% was ud. The exponential model, the score model, and, for comparison, a straightforward non-model-bad approach were ud in order to t an a posteriori  threshold in a development t of speakers. In the two model-bad approaches, the threshold was adjusted to meet the design criterion in the respective model distribution of the development t. In the non-model-bad approach, the threshold was adjusted to meet the design criterion directly in the development t. The thresholds were then applied on a test t, and the portion of speakers having a FRR not greater than 5% was measured.3.1.
Databa
The experiments where conducted on speech material in the speaker verification databa Gandalf [2]. This databa is divided into two ts of speakers (called “dev-t ” and “eval-t ”) as in [3]. In this paper the speaker ts will be called pop 1 and pop 2 respectively. Pop 1 consisted of 40 speakers and pop 2, of 42 speakers. Each speaker made up to 26 phone calls. During every phone call all subjects spoke a t of four-digit quences. This resulted in 5544 utterances being spoken by speakers in pop 1 and 5304 utterances for pop 2. The utterances where spoken during a period of 12 months.
3.2.Speaker verification system
A text dependent digit-bad SV system was ud [3],[4]. The system was trained on 25 five-digit utterances made by each client. Each client was modeled by ten HMMs, one per digit. Recordings of utterances where made through an ISDN line and 13 mean-subtracted mel-cepstrum coefficients with cepstral liftering were computed at a frame rate of 100 Hz. In this process a 24-channel mel-filterbank and a Hamming window of 25.6 ms were applied to the speech signal.
The system operates as follows. First the utterance is gmented into digit words. Each digit is then
assigned a duration-normalized score, bad on the logarithm of the ratio between the likelihoods of the word being produced by the client and the background models. The word scores are averaged to produce a score for the whole utterance.
A background model was chon from a t of two models: one trained on male speech one on female speech. For each word in the utterance, the background model most likely generating the word was lected.
This may result in different thresholds being needed depending on whether a male or female background model was ud. In this paper, the verification scale was assumed to be identical for all models ud, justifying the u of a global threshold.
4.RESULTS
Choosing an average FRR of 1, 2 and 3% resulted in an
Figure 1: Detection error tradeoff curve bad on four-
digit utterances spoken by clients and casual impostors.
The distribution of FRR was investigated further for each average FRR. Figure 2 shows this distribution in the databa and the distributions estimated bad on the score model and exponential model. As is en both models model the data they are tuned on fairly well. The score model ems to match data more cloly then does the exponential model.The cumulative frequency distribution of FRR in the population sample was compared with tho of the two models, on the same speaker group that was ud for modeling. The difference between the distributions was measured by integrating the RMS-error along the FRR axis in the interval [0-100]%. This was done for 100 threshold ttings in the average FRR interval [0-100]%. The RMS-error is shown in Figure 3. The score model ems to model data better than does the exponential model in the low reject region, which is of special interest in this paper. In the interval between 10 and around 30%, the opposite is true. At high average FRR, the
exponential distribution is obviously not a good model. For the score model, however, the error is quite stable throughout the interval. It should be noted that FAR is 0% for FRR higher than around 45% according to Fig. 1, which makes the error in that region irrelevant.
Figure 3: RMS-error of modeling the cumulative distribution of client FRR in pop 1 and pop 2.
4.2.
Model-bad threshold
teeA design criteria of 95% of clients not having an FRR greater than 5% was ud. The threshold was t bad on a statistical model of the dev-t. This threshold was then applied on an eval-t of speakers. The measured number of speakers in each population having an FRR not greater than 5% is shown in Table 1.
Dev-t (N=40)Eval-t  (N=42)
R e q u e s t e d
(95% o f  N )
B y  s c o r e m e t h o d
B y  e x p .m e t h o d
B y  s c o r e m e t h o d
B y  e x p .m e t h o d
N o  m o d e l
384036---40--363535
Table 1: The number of dev-t and eval-t speakers  not having an FRR greater than 5%. A threshold was chon bad on a statistical model tuned on the 40 speaker dev-t. This threshold was then applied on the eval-t of 42 speakers.The number of speakers is similar to the requested for all the methods. However, the ts of speakers are too small to draw more detailed conclusions.
5. DISCUSSION AND CONCLUSIONS
A new statistical model, the score model, was developed for predicting the FRR distribution. It operates on the verification score distribution. A cond method is direct matching of the FRR distribution by an exponential distribution model.
The RMS-error of the cumulative frequency function of the model FRR to that of the population sample was computed for both models. This measure was lower for the score model than for the exponential model in the low FRR region.
The score model needs to be explored further as the model assumptions may not hold for a general SV system. In particular, the standard deviation of client scores may not be Gaussian, since it is a non-negative measure.
An application of the statistical models ud in the experiments was tried - the client-centered threshold. In this method the accepted rejection rate may be chon bad on client inquiry and the accepted size of population having a reject rate greater than the desired one may be chon by the rvice provider. Further investigation of the verification threshold tting method needs to be done using a much larger databa, but the initial results are encouraging.
6. ACKNOWLEDGEMENTS
This rearch was carried out at the Centre for Speech Technology, supported by Vinnova, KTH and participating Swedish companies and organizations. A number of good suggestions and comments where given by Arne Leijon and H åkan Melin.
7.
REFERENCES
[1]
Monro F., Reiter Michael K., Li Q. (Peter), Wetzel S.,“Cryptographic Key Generation From Voice ”,Proceedings of the IEEE Conference on Security and Privacy , Oakland, CA. May, 2001.
[2]
Melin H., “Gandalf - A Swedish Telephone Speaker Verification Databa ”, Proc. Fourth International Conference on Spoken Language Processing (ICSLP’96),pp 1954-1957.
[3]
Melin H. and Lindberg J. “Variance Flooring, Scaling and Tying for Text-Dependent Speaker Verification ”,Proceedings of the 6th  European Conference on Speech Communication and Technology  (EUROSPEECH'99),Budapest, Hungary, 5-9th  of September, pp 1975-1978.[4]
Melin H, Sandell A & Ih M., “CTT-bank: A speech controlled telephone banking system - an initial evaluation ”, TMH-QPSR 2001, KTH, Stockholm, 1:1-28.

本文发布于:2023-07-23 06:58:35,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/1112299.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:阅读   理解
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图