首页 > 美文鉴赏

【sphinx】中文声学模型训练

更新时间:2023-06-30 20:43:33 阅读：评论：0

【sphinx】中⽂声学模型训练

⼀ .使⽤CMUSphinx训练声学模型

CMUSphinx⼯具包中⾃带好⼏个⾼质量的声学模型。美语模型，法语，中⽂模型。这些模型是经过优化的，为了得到最佳的性能，⼤多数指令交互系统能直接⽤这些模型，甚⾄⼀些⼤词汇量的应⽤也能直接⽤他们。

除此之外，CMUSphinx提供了功能，能适应现存的模型，为了满⾜有些需要更⾼精度的需求。当你需要使⽤不同的录⾳环境，（⽐如近距离，远离麦克分或者通过通话过程中），这些情况下做适应结果都是不错的，

或者当需要转换⼀种⼝⾳，⽐如美语和英语的转换，印度英语的使⽤等。⾃适应能满⾜这样的要求：那就是你需要在很短的时间内，⽀持⼀门新的语⾔，那么你只需要基于词典做出⼀个声学模型⾳素集到⽬标⾳素集的

转换就可。

然⽽，在某些时候，当下的模型并没法⽤。⽐如⼿写识别中，或者其他语⾔的监测中。这些情况下，你需要重新训练你⾃⼰的声学模型。如下教程会指导你如何开始训练。

⼆开始训练

训练之前，假设你有充⾜的数据：

⽤于单个⼈的指令应⽤，⾄少需要⼀⼩时录⾳，

⽤于很多⼈指令应⽤，需要200个录⾳⼈，每⼈5⼩时

⽤于单个⼈的听写，需要10⼩时他的录⾳

⽤于多个⼈的听写，需要200个说话⼈，每⼈50⼩时的录⾳

同时你要有这门语⾔的语⾳学知识，以及你有⾜够的⽐如⼀个⽉的时间，来训练模型

⽽如果你没有⾜够的数据，⾜够的时间，⾜够的经验，那么建议你还是做已有模型的⾃适应来满⾜你的要求。

数据准备

训练者需要知道，使⽤哪⼀个声⾳单元来学习参数，⾄少每个序列都要在你的训练集中出现。这个信息储存在transcript file中。

然后通过词典dictionary，其中对于每个单词都有相应的声⾳序列做了映射。

所以，除了语⾳数据，你还需要⼀个transcripts，和两个词典。⼀个中是每个单词到发⾳的对应表，以及⼀个中是不发⾳的单元的表，记为filler dictionay.

训练开始

训练之前需要准备如下两个⽬录

etc

your_db.dic - Phonetic dictionary

your_db.phone - Phonet file

your_db.lm.DMP - Language model -语⾔模型

your_db.filler - List of fillers

your_db_train.fileids - List of files for training

your_anscription - Transcription for training

your_db_test.fileids - List of files for testing

your_anscription - Transcription for testing

wav

speaker_1

file_1.wav - Recording of speech utterance

speaker_2

file_2.wav

Fileids (your_db_train.fileids and your_db_test.fileids中列出了语⾳数据的⽂件名。如果是多个⼈的录⾳，可以加上录⾳⼈的信息，注意⽂件名不要加上后缀。

speaker_1/file_1

speaker_2/file_2

Transcription file (your_anscription and your_anscription) 中列出了你的录⾳⽂本。句⼦前后加上<s>标签，末尾加上句⼦序号。

<s> hello world </s> (file_1)

<s> foo bar </s> (file_2)

注意transcript中的⾏的序号和fileids中的序号要⼀致，如下，第⼆个句⼦放在第⼀位，就是个错误例⼦，会报错。

speaker_2/file_2

speaker_1/file_1

//Error! Do not create fileids file like this!

录⾳⽂件，⽐如是MS WAV格式，采样率为16khz，16bit，mono单声道的录⾳⽤于桌⾯应⽤。8khz，16bit，mono录⾳⽤于电话应⽤。注意这点，错误的语⾳格式，常常是训练错误的原因。录⾳句⼦不能太长，也不能太短。⼀般是5s-30s之间。句⼦你必须确认你的语⾳数据是16khz，16bit，单声道。如果你⽤于电话应⽤，那可以是8khz，但是必须保证训练期间的参数设置也是8khz的设置。注意不能上采样语⾳数据，也就是说，不能⽤8khz的数据训练16khz的模型。

词典⽂件 (your_db.dict) ⼀⾏⼀个单词，后⾯空格后跟着的是发⾳

HELLO HH AH L OW

WORLD W AO R L D

如果需要⾃⼰创建⾳素词典，学习下它的语⾳致使。sphinxtrain不⽀持类似“*”，“/"这样的⾳素符号。可以⽀持”+“”-“+“”：“这届符号。不过最好还是⽤字母表。可以将”a~"替换为“aa”

如果实在没有发⾳词典，可以复写它的单词如下。

ONE O N E

TWO T W O

CMUSphinx与其他⼯具有⼀点不同，CMUSPhinx不⽀持单词模型，如果你要训练，你需要⼀个基于单词的词典

（For small vocabulary CMUSphinx is different from other toolkits. It's often recommended to train word-bad models for small vocabulary databas like digits. But it only makes n if your HMMs could have

variable length. CMUSphinx does not support word models. Instead, you need to u a word-dependent phone dictionary:）

ONE W_ONE AH_ONE N_ONE

TWO T_TWO UH_TWO

NINE N_NINE AY_NINE N_END_NINE

招商银行待遇

This is actually equivalent to word-bad models and some times even gives better accuracy. Do not u word-bad models with CMUSphinx.

z开头的火车⾳素⽂件 (your_db.phone) 中⼀⾏⼀个⾳素，词典中出现的所有⾳素都要包含，额外再加⼀个sil

语⾔模型⽂件 file (your_db.lm.DMP)需要时arpa格式（后缀为lm）或者是DMP格式。 .握紧你的手

便秘药物Filler 词典 (your_db.filler) 含有停顿，呼吸，感叹词，笑声等这些⽆法⽤语⾔模型覆盖的单元。可以只包含静⾳

<s> SIL

</s> SIL

<sil> SIL

或者如下内容，这些内容需要在transcription中出现过。

+um+ ++um++

+noi+ ++noi++

可以使⽤an4语⾳库。,

局部测光⼯具准备

训练模型需要⽤到的⼯具包

sphinxba-5prealpha （pocketsphinx⽤的时候需要将这个包放在⼀起）

sphinxtrain-5prealpha （训练声学模型的)

pocketsphinx-5prealpha (识别⽤）

以及另外两个⼯具：perl和python

perl, for example ActivePerl on Windows

python, for example ActivePython on Windows

建议在linux上做训练，这样可以利⽤到sphinxtrain的所有特征。如果是windows上训练，那么可以使⽤activeperl。

********************************************************************************************************

*****此处，an4语⾳库⽆法下载，如果能下载，可以看看⾥⾯的格式，内容，数量就好了******

*********************************************************************

安装了后⽤以下命令加载路径。

export PATH=/usr/local/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/lib

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

如果你不想装在系统⽬录中，你可以装在home⽬录下，然后⽤如下命令，添加到autogen.sh中。

--prefix=/home/ur/local

使⽤中如果遇到failed to open libsphinx.so.0 no such file or directory这类问题代表安装路径，环境没配置好。训练⽂件建⽴

南京审计学院金审学院a.如下命令，建⽴⽬录

On Linux

sphinxtrain -t an4 tup （其中，an4换成你的数据库的名字，可以是data3 data4)

On Windows

python ../sphinxtrain/scripts/sphinxtrain -t an4 tup

会在data3⽬录下得到如下两个⽬录。

etc

wav

等训练完后，data3中内容是如下样⼦：

etc

feat

logdir

model_parameters

model_architecture

result

wav

b.训练参数的设置

在⽂件etc/sphinx_train.cfg中做如下改动

（1）数据格式的配置

$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";

$CFG_WAVFILE_EXTENSION = 'sph';

$CFG_WAVFILE_TYPE = 'nist'; # one of nist, mswav, raw

如果语⾳是wav格式，上⾯中sph，nist改为下⾯

$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";

$CFG_WAVFILE_EXTENSION = 'wav';

$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw

（2）训练路径的配置

如下，要确定CFG_DB_NAME就是数据在⽬录中的名字格式

# Variables ud in main training of models

$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";

$CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";

$CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";

$CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";

$CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_anscription"

（3）模型格式和模型参数的配置

$CFG_HMM_TYPE = '.cont.'; # Sphinx4, Pocketsphinx

#$CFG_HMM_TYPE = '.mi.'; # PocketSphinx only

#$CFG_HMM_TYPE = '.ptm.'; # Sphinx4, Pocketsphinx, faster model

$CFG_FINAL_NUM_DENSITIES = 8;

此处，如果有充⾜的数据（>100⼩时），则使⽤32，否则是8.阶数分别是 2: 4, 8, 16, 32, 64.

如果训练的是半连续的或者PTM模型，使⽤256 ⾼斯。

# Number of tied states (nones) to create in decision-tree clustering

$CFG_N_TIED_STATES = 1000;

这个是模型中训练的声元的个数。声元越多越精确。但是也不能过训练。否则会得到如下错误：ERROR: "gauden.c", line 1700: Variance (mgau= 948, feat= 0, density=3,

component=38) is less then 0. Most probably the number of nones is too

high for such a small training databa. U smaller $CFG_N_TIED_STATES.

如下是各个应⽤，词汇量，需要的数据时间，需要的发⾳⼈，阶数对⽐表。

Vocabulary Hours in db Senones Densities Example

2052008Tidigits Digits Recognition

1002020008RM1 Command and Control

500030400016WSJ1 5k Small Dictation

2000080400032WSJ1 20k Big Dictation

60000200600016HUB4 Broadcast News

6000020001200064Fisher Rich Telephone Transcription

重点：最好的结果，依赖于你的数据库。

（4）声⾳参数的配置

如果⽤的是8khz的语料，需要做如下修改：

# Feature extraction parameters

$CFG_WAVFILE_SRATE = 8000.0;

$CFG_NUM_FILT = 31; # For wideband speech it's 40, for telephone 8khz reasonable value is 31

$CFG_LO_FILT = 200; # For telephone 8kHz speech value is 200

$CFG_HI_FILT = 3500; # For telephone 8kHz speech value is 3500

（5）训练速度的设置

如果是在多核机器上，修改如下设置，加快速度。

$DEC_CFG_DICTIONARY = "$DEC_CFG_BASE_DIR/etc/$DEC_CFG_DB_NAME.dic";

$DEC_CFG_FILLERDICT = "$DEC_CFG_BASE_DIR/etc/$DEC_CFG_DB_NAME.filler";

$DEC_CFG_LISTOFFILES = "$DEC_CFG_BASE_DIR/etc/${DEC_CFG_DB_NAME}_test.fileids";

$DEC_CFG_TRANSCRIPTFILE = "$DEC_CFG_BASE_DIR/etc/${DEC_CFG_DB_NAME}_anscription";

$DEC_CFG_RESULT_DIR = "$DEC_CFG_BASE_DIR/result";

# The variables, ud by the decoder, have to be ur defined, and

# may affect the decoder output

$DEC_CFG_LANGUAGEMODEL_DIR = "$DEC_CFG_BASE_DIR/etc";

$DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/an4.lm.DMP"

（6）解码参数的设置

etc/sphinx_train.cfg, （此处是否有误，应该是decode.cfg吧）

$DEC_CFG_DICTIONARY = "$DEC_CFG_BASE_DIR/etc/$DEC_CFG_DB_NAME.dic";

紫罗兰原产地$DEC_CFG_FILLERDICT = "$DEC_CFG_BASE_DIR/etc/$DEC_CFG_DB_NAME.filler";

$DEC_CFG_LISTOFFILES = "$DEC_CFG_BASE_DIR/etc/${DEC_CFG_DB_NAME}_test.fileids";

$DEC_CFG_TRANSCRIPTFILE = "$DEC_CFG_BASE_DIR/etc/${DEC_CFG_DB_NAME}_anscription";

$DEC_CFG_RESULT_DIR = "$DEC_CFG_BASE_DIR/result";

# The variables, ud by the decoder, have to be ur defined, and

# may affect the decoder output

$DEC_CFG_LANGUAGEMODEL_DIR = "$DEC_CFG_BASE_DIR/etc";

$DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/an4.lm.DMP"

训练

cd an4 进⼊数据⽬录，其中已有各种⼩⽬录以及各种配置⽂件。

On Linux

sphinxtrain run 开始了

On Windows

python ../sphinxtrain/scripts/sphinxtrain run

训练开始，⾸先会检查各个⽂件格式是否正确。不要忽视每⼀个错误。

水晶头接线顺序

Do not ignore the errors reported on the first 00.verify_all step.

典型的解码过程中输出如下所⽰：

Baum welch starting for 2 Gaussian(s), iteration: 3 (1 of 1)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Normalization for iteration: 3

Current Overall Likelihood Per Frame = 30.6558644286942

Convergence Ratio = 0.633864444461992

Baum welch starting for 2 Gaussian(s), iteration: 4 (1 of 1)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Normalization for iteration: 4

训练过程详解

在scripts-pl⽬录下，有00-99的流程，他们是如下顺序执⾏的：

perl scripts_p_feat/slave_feat.pl

perl scripts_pl/00.verify/verify_all.pl

perl scripts_pl/10.vector_quantize/slave.VQ.pl

perl scripts_pl/20.ci_hmm/slave_convg.pl

perl scripts_pl/30.cd_hmm_untied/slave_convg.pl

perl scripts_pl/40.builder.pl

perl scripts_pl/45.prunetree/slave-state-tying.pl

perl scripts_pl/50.cd_hmm_tied/slave_convg.pl

perl scripts_pl/90.deleted_interpolation/deleted_interpolation.pl

其中有个⽂件是 an4.html，其中记有已执⾏的进程信息。

其中执⾏00-90步骤时候，会⽣成很多组声学模型，他们每个都能⽤于识别。只有在选择mi-continuous的时候，会需要⼀些步骤。选择continuous时候，其中⼀些步骤不会被⽤到。

20.ci_hmm中，开始训练独⽴于上下⽂的模型。（单个phone）

30.cd_hmm_untied中开始训练依赖上下⽂的（triphone）的模型，称为 CD-untied models，它的作⽤对于建⽴决策树，连接各个状态是必要的。

40.buildtrees会对每个单词单元，每个状态建⽴决策树。

45.prunetree对决策树做剪枝，同时将各个状态连接起来。

50.cd-hmm_tied是训练最终的triphone的模型。叫做 CD-tied models. CD-tied models是多个阶段训练得到的。每个HMM状态，我们先从1⾼斯开始，接着是2⾼斯，直到8⾼斯。

转换矩阵训练（⾼级版）

通过在fig中做⼀些设置，会加做⼀些额外的训练，可以提⾼识别精度。

MMIE Training (advanced)

同上。

测试开始

sphinxtrain -s decode run

得到的结果，统计词错误率和句⼦错误率。10⼩时的应⽤，WER应该是10%附近。更⼤应⽤，WER会在30%附近

在result⽬录下，有结果的详细内容，如在an4.align中有:

p I T t s b u r g H (MMXG-CEN5-MMXG-B)

p R EIGHTY t s b u r g EIGHT (MMXG-CEN5-MMXG-B)

Words: 10 Correct: 7 Errors: 3 Percent correct = 70.00% Error = 30.00% Accuracy = 70.00%

Inrtions: 0 Deletions: 0 Substitutions: 3

october twenty four nineteen venty (MMXG-CEN8-MMXG-B)

Words: 5 Correct: 5 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%

Inrtions: 0 Deletions: 0 Substitutions: 0

TOTAL Words: 773 Correct: 587 Errors: 234

TOTAL Percent correct = 75.94% Error = 30.27% Accuracy = 69.73%

TOTAL Inrtions: 48 Deletions: 15 Substitutions: 171

使⽤模型

训练后的模型在如下⽬录：

model_parameters/<your_db_name>.cd_cont_<number_of nones>

or in

model_parameters/<your_db_name>.cd_mi_<number_of nones>

其中⽂件如下

mdef

feat.params

mixture_weights

means

noidict

transition_matrices

variances

如下命令开始使⽤

pocketsphinx_continuous -hmm <your_new_model_folder> -lm <your_lm> -dict <your_dict>.

⽤sphinx4的话，需要代码中设置模型路径

configuration.tAcousticModelPath("file:model_parameters/db.cd_cont_200");

⼀些问题

发现有问题后，1是检查logdir中的细节log，2是看your_project_name.html

如下还有⼀些常见问题及原因：

WARNING: this phone (something) appears in the dictionary (dictionary file name), but not in the phone list (phone file name).

WARNING: This word (word) has duplicate entries in (dictionary file name). Check for duplicates.（词典中不能有重复单词，⼤⼩写两种也不⾏）

WARNING: This word: word was in the transcript file, but is not in the dictionary (transcript line) Do cas match?（⽂本中所有单词都要在词典中）

WARNING: CTL file, audio file name.mfc, does not exist, or is empty. （是不是有新加了语⾳，但是没有提取特征这⼀步）

Very low recognition accuracy.

ERROR: "backward.c", line 430: Failed to align audio to transcript: final state of the arch is not reached.

　这个原因是因为录⾳和⽂本对不上，录⾳可能是hello hello word，⽽⽂本只有hello word。解决办法：1.尽量严格对齐⽂本，改变⽂本中内容，2.设置强⾏对齐参数，在train.cfg中。

$CFG_FORCEDALIGN = 'yes';，然后重新训练。会执⾏10,11⽬录来过滤数据库。

Can't open */*-1-1.match word_align.pl failed with error code 65280

This error occurs becau the decoder did not run properly after training. First check if the correct executable (psdecode_batch if the decoding script being ud is psdecode.pl as t by $DEC_CFG_SCRIPT variable in sphinx_train.cfg) is prent in PATH. On Linux run

which pocketsphinx_batch

and e if it is located. If it is not, you need to t the PATH variable properly. Similarly on Windows, run

where pocketsphinx_batch*

If the path to decoding executable is t properly, read the log files at logdir/decode/ to find out other reasons behind the error.

pocketsphinx_continuous -hmm <your_new_model_folder> -lm <your_lm> -dict <your_dict>.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

分隔线

对于⽬前使⽤的版本，⼤约是sphinx3版本，在linux上，使⽤过程中各个步骤如下：

1.建⽴各个⽬录；

SphinxTrain/scripts_pl/tup_SphinxTrain.pl -task data3

2.修改配置：相同

3.开始训练：

cd /sphinx/MyTrain

/scripts_pl/make_feats.pl -ctl etc/my_db_train.fileids ./scripts_pl/make_feats.pl -ctl etc/my_db_test.fileids ./scripts_pl/RunAll.pl

本文发布于:2023-06-30 20:43:33，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1062083.html

上一篇：小学成绩证明

下一篇：学生成绩单英文模版

标签：训练模型需要设置格式声学

留言与评论（共有 0 条评论）