自然语言处理:序列标注(BiLSTM-CRF)

更新时间:2023-06-17 09:25:09 阅读: 评论:0

⾃然语⾔处理:序列标注(BiLSTM-CRF )
⽂章⽬录
Reference:
Tagging Scheme
IOBES: Inside, Outside, Beginning, End, Single
Bidirectional LSTM Networks
By utilize bidirectional LSTM, we can efficiently make u of past features (via forward states) and future features (via backward states) for a specific time frame.
We train the bidirectional LSTM networks using BPTT, and specially treat at the beginning and the end of the data points.Such as ret the hidden states to 0 at the begging of each ntence.At time  for NER task:
the input layer reprents features could be one-hot-encoding for word feature, den vector features, or spar features.
the output layer reprents a probability distribution (distributed by softmax) over labels. It has the same dimensionality as size of labels. Using the label with the max probability as output of timestep .
Why u the CRF Networks?
Despite u the  as features to make independent tagging decisions for each output  is success in POS tagging, the independent classification decisions are limiting when there are strong dependencies across output labels.As the NER task, since the “grammar” that characterizes interpretable quences of tags impos veral hard constraints () that would be impossible to model with independence assumptions.
The first way is to predict a distribution of tags of each time step and then u beam-like decoding to find optimal tag quences, such as Maximum entropy classifier  (Ratnaparkhi, 1996) and Maximum entropy Markov models  (McCallum etal., 2000).
t t h t y t
The cond way is to focus on ntence level instead of individual positions, thus leading to Conditional Random Fields (CRF) models that the inputs and outputs are directly connected, as oppod to LSTM networks where memory cells/recurrent
components  are employed.
如下最右侧图,若不考虑不同位置词标注之间的关系,会出现错误的标注。
对于序列标注任务,输⼊词序列为观测序列,带标注的序列为隐藏状态序列。基于状态独⽴假设的序列标注模型,⽆法处理不同状态间的硬性约束,MRMM、CRF擅长解决此类问题。
MEMM假设当前状态仅与上⼀状态有关(马尔可夫性假设),CRF没有马尔可夫性假设,预测每个状态时考虑全局状态信息。
HMM(马尔可夫性假设、观测独⽴性假设) -> MEMM(马尔可夫性假设)-> CRF
CRF Networks
BiLSTM-CRF networks
Given  and  to reprent an input quence and a quence of predicted tags respectively, where  is the length of the input ntence.
Emission score
x =(x ,x ,⋯,x )12n y =(y ,y ,⋯,y )12n n
We consider  to be the matrix of scores output  by the BiLSTM network, where  is the number of distinct tags, the element  corresponds to the score of the -th tag of the -th word in a ntence.
In BiLSTM-CRF networks, emission scores come from the BiLSTM layer. For instance, according the above figure, the score of  labeled as B-Person is 1.5.
对于长度为的句⼦,发射矩阵包含个维的隐藏状态.
Transition score
CRF introduces a transition  matrix , that is position independent, which measure the score from -th tag to
-th tag by the element .
In order to make the transition score matrix more robust, we will add the START and the END tags of a ntence to the t of possible tags.  is therefore a square matrix of size .
Here is an example of the transition matrix score including the extra added START and END labels.
Transition Matrix START B-Person I-Person B-Organization I-Organization O END START 00.80.0070.70.00080.90.08B-Person 00.60.90.20.00060.60.009I-Person -10.50.530.550.00030.850.008B-Organization 0.90.50.00030.250.80.770.006I-Organization -0.90.450.0070.70.650.760.2O 00.650.00070.70.00080.90.08END
As shown in the table above, we can find that the transition matrix has learned some uful constraints.
The label of first word in a ntence should start with “B-” or “O”, not “I-”.etc.
Where or how to get the transition matrix?
Transition matrix is a parameter of CRF layer. It’s initialization with random value, that will be more and more reasonable gradually with increasing training iterations.
Decoding
For a ntence has 5 words: , the real tags  is:
"START B-Person I-Person O B-Organization O END"
Here, we add two more extra words which denote the start and the end of ntence: .A linear-chain CRF defines a global score  consists of 2 parts, such that:
P ∈R n ×k k P [i ,j ]j i w 0n P n k A ∈R (k +2)×(k +2)i j A [i ,j ]A k +2x ,x ,x ,x ,x 12345y x ,x 06s (x ,y )s (x ,y )=s (x ,y )+e s (y )=
t P [x ,y ]+
i =1
n
i i A [y ,y ]
民间习俗i =0
n
i i +1
Emission Score
where  and  just t them zeros,  are from the previous BiLSTM.
1829年
Transition Score
the score are actually the parameters of CRF layer.
Illustration of the scoring of a ntence with linear-chain CRF:
水仙花公主The path PER-O-LOC has a score of:
1 + 10 + 4 + 3 +
2 + 11 + 0 = 31
The path PER-PER-LOC has a score of:
1 + 10 +
2 + 4 - 2 + 11 + 0 = 26
Now that we understand the scoring function of the CRF, we need to do 2 things:
Find the quence of tags with the best score.
Compute a probability distribution over all the quence of tags (total score).
主力资金The simplest way to measure the total score is that: enumerating all the possible paths and sum their scores. However, it is very inefficient, with  complexity. The recurrent nature of our formula makes it the perfect candidates to apply dynamic programming .
维特⽐算法求解最优路径
Let’s suppo that  is the solution for time steps  for quences that start with :
s (x ,y )=e P [x ,START]+0P [x ,B-Person]+1⋯+P [x ,END]
6P 0P 6P ⋯P 15s (y )=t A [START,B-Person]+A [B-Person,I-Person]+⋯+A [O,END]
n k c (y ,x )t t t ,⋯,n y t
The best score and path are:
盛世大阅兵
As we perform  step, final cost is , much less than .类似于CRF解码(已知模型、观测求最可能的状态序列)的维特⽐算法
动态规划求解归⼀化因⼦
底妆你的名字高清壁纸模型的优化⽬标是最⼤化⽬标标注序列的概率,⼀般使⽤softmax将分数转化为概率,softmax分母项需求解所有可能标注序列的分数和,也称为归⼀化因⼦,下⾯介绍基于前向递推动态规划的优化求解算法。
假定已知时刻以为标注结尾的所有可能标注序列的总分数为,即
对于总时间步为的序列,所有可能标注序列的总分数为
动态规划求解对数归⼀化因⼦
c (y ,x )t t =arg max s (y ,⋯y ,x )
y ,⋯,y t n t n =arg max P [x ,y ]+A [y ,y ]+s (y ,⋯,y ,x )y ,⋯,y t n t t t t +1t +1n =arg max P [x ,y ]+A [y ,y ]+c (y ,x )
y t +1t t t t +1t +1t +1s (x ,y )=∗c (y =00START,x ),y =∗arg
s (x ,)
感冒适合吃什么
∈Y
y ~
max y ~
n O (nk )2O (k )n Z p (y ∣x )=exp(s (x ,y )),
Z =
Z 1
exp(s (x ,))
y
~∑
y ~t y t Z (y )t t Z (y )t t =exp(s (y ,⋯,y ,x ))
y ,⋯,y 1t −1∑
1t =exp(P [x ,y ]+A [y ,y ])exp(s (y ,⋯,y ))y t −1∑t t t −1t y ,⋯,y 1t −2∑
1t −1=exp(P [x ,y ]+A [y ,y ])⋅Z (y )
y t −1∑
t t t −1t t −1t −1n Z =
Z (y )
y n
n n

本文发布于:2023-06-17 09:25:09,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/974494.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:标注   序列   状态   求解   模型   可能   假设   规划
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图