首页 > 美文鉴赏

tensorflow官方语音识别例子中的input_data.py---孤立词识别

更新时间:2023-07-14 03:49:10 阅读：评论：0

tensorflow官⽅语⾳识别例⼦中的input_data.py---孤⽴词识别

建议还是把⽂件下载下来⽤notepad++或者pycharm看⽐较好⼀点，函数可以折叠。

本⼈是⼩⽩，所以才会⼀句⼀句看代码是什么意思，⾼⼿请不要嘲笑啊~~

这个speech_commands的例⼦中，我主要就看了

input_data.py

train.py

models.py 这三个代码

汽车保养与维护

input_data.py 对怎么加静⾳，怎么加噪⾳，怎么提取需要的训练集，逻辑还是很清楚的，看完以后收获很⼤

train.py 就是⼀般的训练步骤，还是⽐较容易看清楚的

models.py 主要是5个不同的model是怎么建⽴的，对于不同的model还有相应的论⽂可以下载

数据集我也下载好了，但是训练过程就是有很多bug，不是在这⾥出错就是哪⾥出错，还没有完整的跑下来，哎我还想试试部署到安卓⼿机上试试呢，什么时候才能实现啊

# Licend under the Apache Licen, Version 2.0 (the "Licen");

# you may not u this file except in compliance with the Licen.

# You may obtain a copy of the Licen at

# www.apache/licens/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software

# distributed under the Licen is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the Licen for the specific language governing permissions and

# limitations under the Licen.

# ==============================================================================

"""Model definitions for simple speech recognition.

"""

from __future__ import absolute_import

from __future__ import division

from __future__ import print_function

import hashlib

import math

import os.path

import random

import re

import sys

import tarfile

import numpy as np

ves import urllib

ves import xrange

# pylint: disable=redefined-builtin

import tensorflow as tf

# sorfly/ tf中⽂社区

餐饮运营# sorflow/ 英⽂社区

ib.framework.python.ops import audio_ops as contrib_audio

from tensorflow.python.ops import io_ops

# 操作节点的IO

from tensorflow.python.platform import gfile

音乐剪裁'''

tf.gfile⽂件读写函数

/p/d8f5357b95b3

blog.csdn/pursuit_zhangyu/article/details/80557958

'''

from tensorflow.python.util import compat

'''

Module: tf.compat

模块:tf.compat

Functions for Python 2 vs. 3 compatibility.

Python 2与Python 3兼容的函数。

e more:

'''

MAX_NUM_WAVS_PER_CLASS =2**27-1# ~134M

SILENCE_LABEL ='_silence_'

SILENCE_INDEX =0

UNKNOWN_WORD_LABEL ='_unknown_'

UNKNOWN_WORD_INDEX =1

BACKGROUND_NOISE_DIR_NAME ='_background_noi_'

RANDOM_SEED =59185

def prepare_words_list(wanted_words):

# line 331附近使⽤了这个函数

"""Prepends common tokens to the custom word list.

⼀共需要训练多少个命令，在你选的基础上加2，⽐如，你只选了up，那就是需要训练3个命令官⽹例⼦选了10个命令： 'yes,no,up,down,left,right,on,off,stop,go'，那就是需要训练12个命令返回命令的list

Args:

wanted_words: List of strings containing the custom words.

Returns:

List with the standard silence and unknown tokens added.

"""

return[SILENCE_LABEL, UNKNOWN_WORD_LABEL]+ wanted_words

# 返回要训练的命令标签list，加上了静⾳和未知两个标签

def which_t(filename, validation_percentage, testing_percentage):

# line 298附近使⽤了这个函数

"""Determines which data partition the file should belong to.

判断这个⽂件属于哪个集合，是训练集，验证集，还是测试集

返回⼀个字符串

⾄于为什么要保证每次训练的时候，⼀个⽂件属于⼀个固定的集合，⾃⼰百度吧

We want to keep files in the same training, validation, or testing ts even

if new ones are added over time. This makes it less likely that testing

samples will accidentally be reud in training when long runs are restarted

for example. To keep this stability, a hash of the filename is taken and ud

to determine which t it should belong to. This determination only depends on

the name and the t proportions, so it won't change as other files are added.

It's also uful to associate particular files as related (for example words

spoken by the same person), so anything after '_nohash_' in a filename is

ignored for t determination. This ensures that 'bobby_nohash_0.wav' and

'bobby_nohash_1.wav' are always in the same t, for example.

Args:

filename: File path of the data sample.

validation_percentage: How much of the data t to u for validation.

testing_percentage: How much of the data t to u for testing.

Returns:

String, one of 'training', 'validation', or 'testing'.

"""

ba_name = os.path.baname(filename)

# We want to ignore anything after '_nohash_' in the file name when

三国演义经典片段

# deciding which t to put a wav in, so the data t creator has a way of

# grouping wavs that are clo variations of each other.

hash_name = re.sub(r'_nohash_.*$','', ba_name)

# This looks a bit magical, but we need to decide whether this file should

# go into the training, testing, or validation ts, and we want to keep

# existing files in the same t even if more files are subquently

# added.

# To do that, we need a stable way of deciding bad on just the file name

# itlf, so we do a hash of that and then u that to generate a

# probability value that we u to assign it.

hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()

percentage_hash =((int(hash_name_hashed,16)%

(MAX_NUM_WAVS_PER_CLASS +1))*

(100.0/ MAX_NUM_WAVS_PER_CLASS))

if percentage_hash < validation_percentage:

result ='validation'

elif percentage_hash <(testing_percentage + validation_percentage):

result ='testing'

脸用英语怎么说el:

result ='training'

return result

# 根据⽂件名（和各集合⽐例）判断该⽂件属于哪个集合，是训练集，验证集，还是测试集

# 返回⼀个字符串，String, one of 'training', 'validation', or 'testing'.

# ⾄于为什么要保证每次训练的时候，⼀个⽂件属于⼀个固定的集合，⾃⼰百度吧

def load_wav_file(filename):

"""Loads an audio file and returns a float PCM-encoded array of samples.

最终返回的是⼀个代表这段wav的Numpy array

Args:

filename: Path to the .wav file to load.

Returns:

Numpy array holding the sample data as floats between -1.0 and 1.0.

"""

家庭自制面包

with tf.Session(graph=tf.Graph())as ss:

wav_filename_placeholder = tf.placeholder(tf.string,[])

wav_loader = ad_file(wav_filename_placeholder)

# 通过节点的操作来获取⽂件

wav_decoder = contrib_audio.decode_wav(wav_loader, desired_channels=1)

return ss.run(

wav_decoder,

feed_dict={wav_filename_placeholder: filename}).audio.flatten()

'''

wav_decoder是decode_wav函数得来的，decode_wav的作⽤是最终返回⼀串-1到1之间的数字，是⼀个tensor，

decode_wav 的输⼊是⼀个16bit的wavfile：

Decode a 16-bit PCM WAV file to a float tensor.

The -32768 to 32767 signed 16-bit values will be scaled to -1.0 to 1.0 in float.

'''

# 这个函数只是被定义了，在input_data.py和train.py和models.py中都没有被使⽤，在input_data_test.py中line150附近被使⽤# Returns:

# Numpy array holding the sample data as floats between -1.0 and 1.0.教研组活动记录表

def save_wav_file(filename, wav_data, sample_rate):

"""Saves audio sample data to a .wav audio file.

Args:

filename: Path to save the file to.

wav_data: 2D array of float PCM-encoded audio data.

sample_rate: Samples per cond to encode in the file.

"""

with tf.Session(graph=tf.Graph())as ss:

wav_filename_placeholder = tf.placeholder(tf.string,[])

sample_rate_placeholder = tf.placeholder(tf.int32,[])

wav_data_placeholder = tf.placeholder(tf.float32,[None,1])

wav_encoder = de_wav(wav_data_placeholder,

sample_rate_placeholder)

'''

encode_wav:

Encode audio data using the WAV file format.

This operation will generate a string suitable to be saved out to create a .wav

audio file. It will be encoded in the 16-bit PCM format. It takes in float

values in the range -1.0f to 1.0f, and any outside that value will be clamped to

that range.

'''

wav_saver = io_ops.write_file(wav_filename_placeholder, wav_encoder)

'''

write_file:

Writes contents to the file at input filename. Creates file and recursively

creates directory if not existing.

'''

ss.run(

wav_saver,

feed_dict={

wav_filename_placeholder: filename,

sample_rate_placeholder: sample_rate,

wav_data_placeholder: np.reshape(wav_data,(-1,1))

})

# 这个函数跟上⾯函数的作⽤正好相反，也只是被定义了，暂时没有使⽤

def get_features_range(model_ttings):

"""Returns the expected min/max for generated features.

返回特征的最⼤值和最⼩值，应该是为了归⼀化使⽤

Args:

model_ttings: Information about the current model being trained.

Returns:

Min/max float pair holding the range of features.

Rais:

Exception: If preprocessing mode isn't recognized.

"""

# TODO(petewarden): The values have been derived from the obrved ranges

# of spectrogram and MFCC inputs. If the preprocessing pipeline changes,

# they may need to be updated.

if model_ttings['preprocess']=='average':

features_min =0.0

features_max =127.5

elif model_ttings['preprocess']=='mfcc':

features_min =-247.0

features_max =30.0

el:

rai Exception('Unknown preprocess mode "%s" (should be "mfcc" or'

' "average")'%(model_ttings['preprocess']))

return features_min, features_max

# 这个函数在本⽂件中只是定义，没有使⽤，但在train.py的line135附近被使⽤了

如何止血# 返回特征的最⼤值和最⼩值，应该是为了归⼀化使⽤

class AudioProcessor(object):

"""Handles loading, partitioning, and preparing audio training data.

加载⾳频、分集合和准备⾳频训练数据

"""

def__init__(lf, data_url, data_dir, silence_percentage, unknown_percentage,

wanted_words, validation_percentage, testing_percentage,

model_ttings, summaries_dir):

if data_dir:# 如果这个⽂件夹不为空

lf.data_dir = data_dir

lf.maybe_download_and_extract_datat(data_url, data_dir)

lf.prepare_data_index(silence_percentage, unknown_percentage,

wanted_words, validation_percentage,

testing_percentage)

lf.prepare_background_data()

# ⽆论data_dir是否为空，下⾯这个函数肯定是要执⾏的

lf.prepare_processing_graph(model_ttings, summaries_dir)

def maybe_download_and_extract_datat(lf, data_url, dest_directory):

# line 232附近使⽤了这个函数

"""Download and extract data t tar file.

下载，并解压缩⽂件

If the data t we're using doesn't already exist, this function

downloads it from the TensorFlow website and unpacks it into a

directory.

If the data_url is none, don't download anything and expect the data

directory to contain the correct files already.

如果data_url这个字符串为空，则什么也不做，默认data_directory已经包含了正确的⽂件 Args:

data_url: Web location of the tar file containing the data t.

dest_directory: File path to extract data to.

"""

if not data_url:# 如果data_url这个字符串为空，则什么也不做，这个函数就当不存在return

if not ists(dest_directory):

os.makedirs(dest_directory)

filename = data_url.split('/')[-1]

filepath = os.path.join(dest_directory, filename)

if not ists(filepath):

def_progress(count, block_size, total_size):

sys.stdout.write(

'\r>> Downloading %s %.1f%%'%

(filename,float(count * block_size)/float(total_size)*100.0))

sys.stdout.flush()

try:

filepath, _ = quest.urlretrieve(data_url, filepath, _progress) except:

('Failed to download URL: %s to folder: %s', data_url,

filepath)

('Plea make sure you have enough free space and'

' an internet connection')

rai

print()

statinfo = os.stat(filepath)

tf.logging.info('Successfully downloaded %s (%d bytes)', filename,

statinfo.st_size)

tarfile.open(filepath,'r:gz').extractall(dest_directory)

# 下载，并解压缩⽂件

def prepare_data_index(lf, silence_percentage, unknown_percentage,

wanted_words, validation_percentage,

testing_percentage):

# line 233附近使⽤了这个函数

"""Prepares a list of the samples organized by t and label.

应该返回是这三个东西

lf.data_index，

lf.words_list，

lf.word_to_index，

The training loop needs a list of all the available data, organized by

which partition it should belong to, and with ground truth labels attached.

This function analyzes the folders below the `data_dir`, figures out the

right

labels for each file bad on the name of the subdirectory it belongs to,

and us a stable hash to assign it to a data t partition.

Args:

silence_percentage: How much of the resulting data should be background. unknown_percentage: How much should be audio outside the wanted class. wanted_words: Labels of the class we want to be able to recognize.

validation_percentage: How much of the data t to u for validation.

testing_percentage: How much of the data t to u for testing.

Returns:

Dictionary containing a list of file information for each t partition,

and a lookup map for each class to determine its numeric index.

Rais:

Exception: If expected files are not found.

"""

# Make sure the shuffling and picking of unknowns is deterministic.

random.ed(RANDOM_SEED)

wanted_words_index ={}

for index, wanted_word in enumerate(wanted_words):

wanted_words_index[wanted_word]= index +2

'''

wanted_words = ['yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go']

wanted_words_index:

{'down': 5,

'go': 11,

'left': 6,

'no': 3,

'off': 9,

'on': 8,

'right': 7,

'stop': 10,

'up': 4,

'yes': 2}

'''

本文发布于:2023-07-14 03:49:10，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1080704.html

上一篇：Javascript拼接HTML字符串的方法列举及思路

下一篇：鸿蒙的底层代码是什么,调试鸿蒙App源代码的两种方式

标签：训练函数命令返回集合时候需要没有

留言与评论（共有 0 条评论）