python绘制语谱图(详细注释)⽤python 绘制语谱图
1.步骤:
1)导⼊相关模块
2)读⼊⾳频并获取⾳频参数
水彩动漫3)将⾳频转化为可处理形式(注意读⼊的是字符串格式,需要转换成int或short型)
代码如下:
import numpy as np
import matplotlib.pyplot as plt
import os
import wave
#读⼊⾳频。
path = "E:\SpeechWarehou\zmkm"
name = 'zmkm0.wav'
#我⾳频的路径为E:\SpeechWarehou\zmkm\zmkm0.wav
filename = os.path.join(path, name)
# 打开语⾳⽂件。
f = wave.open(filename,'rb')
# 得到语⾳参数
params = f.getparams()
nchannels, sampwidth, framerate,nframes = params[:4]
黄芪泡水喝的正确方法# 将字符串格式的数据转成int型
strData = f.readframes(nframes)
waveData = np.fromstring(strData,dtype=np.short)
# 归⼀化
waveData = waveData * 1.0/max(abs(waveData))
#将⾳频信号规整乘每⾏⼀路通道信号的格式,即该矩阵⼀⾏为⼀个通道的采样点,共nchannels⾏
waveData = np.reshape(waveData,[nframes,nchannels]).T # .T 表⽰转置
f.clo()#关闭⽂件
其中getparams⽅法介绍如下:
getnchannels() -- returns number of audio channels (1 for mono, 2 for stereo)
外国语言学及应用语言学getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getparams() -- returns a namedtuple consisting of all of the above in the above order
稍微翻译⼀下:
nchannels:⾳频通道数(the number of audio channels),getnchannels()
sampwidth:每个⾳频样本的字节数(the number of bytes per audio sample),getsampwidth()
framerate:采样率(the sampling frequency),getframerate()
nframes:⾳频采样点数(the number of audio frames),getnframes()
4)绘制时域波形:
4.1)计算时间:t = n/fs
4.2)绘图
'''绘制语⾳波形'''
time = np.arange(0,nframes) * (1.0 / framerate) time= np.reshape(time,[nframes,1]).T
plt.plot(time[0,:nframes],waveData[0,:nframes],c="b") plt.xlabel("time(conds)")
plt.ylabel("amplitude")
plt.title("Original wave")
plt.show()
时域波形如图:
5)绘制语谱图:
5.1)求出帧长,⼀般取20~30ms
N = t*fs 每帧点数等于每帧时间乘以采样率
帧叠点数,⼀般取每帧点数的1/3~1/2
且FFT点数等于每帧点数(即不补零)
5.2)绘制语谱图,利⽤specgram()⽅法
#绘制频谱
print("")
framelength = 0.025 #帧长20~30ms
framesize = framelength*framerate #每帧点数 N = t*fs,通常情况下值为256或512,要与NFFT相等\
#⽽NFFT最好取2的整数次⽅,即framesize最好取的整数次⽅
#找到与当前framesize最接近的2的正整数次⽅
剪雪花
nfftdict = {}
lists = [32,64,128,256,512,1024]
for i in lists:
nfftdict[i] = abs(framesize - i)
sortlist = sorted(nfftdict.items(), key=lambda x: x[1])#按与当前framesize差值升序排列
framesize = int(sortlist[0][0])#取最接近当前framesize的那个2的正整数次⽅值为新的framesize
NFFT = framesize #NFFT必须与时域的点数framsize相等,即不补零的FFT
overlapSize = 1.0/3 * framesize #重叠部分采样点数overlapSize约为每帧点数的1/3~1/2
overlapSize = int(round(overlapSize))#取整
spectrum,freqs,ts,fig = plt.specgram(waveData[0],NFFT = NFFT,Fs =framerate,window=np.hanning(
M = framesize),noverlap=overlapSize,mode='default',scale_b plt.ylabel('Frequency')
管理系
plt.xlabel('Time(s)')
plt.title('Spectrogram')
plt.show()
specgram()⽅法概述,详细信息见
matplotlib.pyplot.specgram(x, NFFT=None, Fs=None, Fc=None, detrend=None, window=None, noverlap=None, cmap=None, xextent=None, pad_to=None, sides=None, scale_by_freq=None, mode=None, scale=None, vmin=None, vmax=None, *,
data=None, **kwargs)
#参数:
x : 信号,⼀维arry或deqyence
NFFT:fft点数,默认256.不应该⽤于的零填充,最好为2的整数次⽅
Fs:采样率,默认2
Fc:信号x的中⼼频率,默认为0,⽤于移动图像,
window : 窗函数,长度必须等于NFFT(帧长),默认为汉宁窗
window_hanning(), window_none(), numpy.blackman(), numpy.hamming(), numpy.bartlett(), scipy.signal(),
_window(), etc.
会说话的人sides : {'default', 'onesided', 'twosided'}单边频谱或双边谱
Default gives the default behavior, which returns one-sided for real data and both for complex data.
'onesided' forces the return of a one-sided spectrum,
while 'twosided' forces two-sided.
pad_to : 执⾏FFT时填充数据的点数,可以与NFFT不同(补零,不会增加频谱分辨率,可以减轻栅栏效应,默认为None,即等于NFFT)scale_by_freq : bool, optional是否按密度缩放频率,MATLAB默
认为真
Specifies whether the resulting density values should be scaled by the scaling frequency, which gives density in units of Hz^-1.
This allows for integration over the returned frequency values. The default is True for MATLAB compatibility.
mode : 使⽤什么样的频谱,默认为PSD谱(功率谱){'default', 'psd', 'magnitude', 'angle', 'pha'}
'complex' returns the complex-valued frequency spectrum.
'magnitude' returns the magnitude spectrum.
'angle' returns the pha spectrum without unwrapping.
'pha' returns the pha spectrum with unwrapping.
noverlap : 帧叠点数,默认为128
scale : {'default', 'linear', 'dB'}频谱纵坐标单位,默认为dB
xextent : None or (xmin, xmax)图像x轴范围
cmap :lors.Colormap instance; if , u default determined by rc
detrend : {'default', 'constant', 'mean', 'linear', 'none'}
The function applied to each gment before fft-ing, designed to remove the mean or linear trend.
Unlike in MATLAB, where the detrend parameter is a vector, in matplotlib is it a function.
The mlab module defines detrend_none(), detrend_mean(), and detrend_linear(), but you can u a custom function as well.
张延生
You can also u a string to choo one of the functions. 'default', 'constant', and 'mean' call detrend_mean(). 'linear' calls detrend_linear(). 'none' calls detrend_none()
重金属的危害#返回:
spectrum:频谱矩阵
freqs:频谱图每⾏对应的频率
ts:频谱图每列对应的时间
fig :图像
结果如图:
2.完整代码
import numpy as np
import matplotlib.pyplot as plt
import os
import wave
#读⼊⾳频。
path = "E:\SpeechWarehou\zmkm"
name = 'zmkm0.wav'
#我⾳频的路径为E:\SpeechWarehou\zmkm\zmkm0.wav
filename = os.path.join(path, name)
# 打开语⾳⽂件。
f = wave.open(filename,'rb')
# 得到语⾳参数
params = f.getparams()
nchannels, sampwidth, framerate,nframes = params[:4]
#---------------------------------------------------------------#
# 将字符串格式的数据转成int型
print("reading ")
strData = f.readframes(nframes)
waveData = np.fromstring(strData,dtype=np.short)
# 归⼀化
waveData = waveData * 1.0/max(abs(waveData))
#将⾳频信号规整乘每⾏⼀路通道信号的格式,即该矩阵⼀⾏为⼀个通道的采样点,共nchannels⾏waveData = np.reshape(waveData,[nframes,nchannels]).T # .T 表⽰转置
f.clo()#关闭⽂件
print("file is clod!")
#----------------------------------------------------------------#
'''绘制语⾳波形'''
print("plotting ")
time = np.arange(0,nframes) * (1.0 / framerate)#计算时间
time= np.reshape(time,[nframes,1]).T
plt.plot(time[0,:nframes],waveData[0,:nframes],c="b")
plt.xlabel("time")
plt.ylabel("amplitude")
plt.title("Original wave")
plt.show()
#--------------------------------------------------------------#
'''
绘制频谱