Hammersley-Clifford theorem for Markov random fields
1.Markov random fields and Gibbs distributions
astronautsLet {X t :t ∈T }be a finite collection of random variables—a stochastic process—with X t taking values in a finite t S t .For simplicity of notation,suppo the index t T is {1,2,...,n }and S t ={0,1,...,m t }.The joint distribution of the variables is
Q {x }=P {X t =x t for t ∈T }where x =(x 1,...,x n ),
with 0≤x t ≤m t .More formally,the vector X =(X 1,...,X n )takes values in X = t ∈T S t ,the t of all n -tuples x =(x 1,...,x n )with x t ∈S t for each t .
Suppo T is the t of nodes of a graph.Let N t denote the t of nodes (the neighbors of t )for which (t ,s )is an edge of the graph.
到期英语<1>Definition.The process is said to be a Markov random field if
(i)Q {x }>0for every x in X
(ii)for each t and x ,
P {X t =x t |X s =x s for s =t }=P {X t =x t |X s =x s for s ∈N s }.Property (ii)is equivalent to the requirement:
globalization翻译(ii) the conditional probability P {X t =x t |X s =x s for s ∈N s }depends only on x s for s ∈{t }∪N t .
A subt A of T is said to be complete if each pair of vertices in A de fines an edge of the graph.Write C for the collection of all complete subts.<2>Definition.The probability distribution Q is called a Gibbs distribution for the graph if it can be written in the form Q {x }= A ∈C V A
(x ),where each V A is a positive function that depends on x only through the coordinates {x t :t ∈A }.
The Hammersley-Clifford Theorem asrts that the process {X t :t ∈T }is a Markov random field if and only if the corresponding Q is a Gibbs distribution.
It is mostly a matter of bookkeeping to show that every Gibbs distribution de fines a Markov random field.
<3>Example.With only a slight abu of notation,we may write V A (x )as V A (x i 1,...,x i k )if A ={i 1,...,i k },ignoring the arguments that do not affect V A .交替传译
commieSuppo N 1={2,3}.Consider the variables x j that actually appear in the conditional probability
P {X 1=x 1|X 2=x 2,X 3=x 3,...,X n =x n }=P {X 1=x 1,X 2=x 2,X 3=x 3,...,X n =x n }2233n n =
A ∈C V A (x 1,x 2,...,x n ) w
A ∈C V A 2n For example,which terms actually involve the value x 4?By assumption,V A depends on x 4only if 4∈A .For such an A ,we cannot also have 1∈A ,becau then we would have (1,4)as an edge of the graph,contradicting the assumption that 4/∈N 1.For concreteness,suppo A ={4,7,19}.Then V A (x 4,x 7,x 19)appears once as a factor in the numerator and once as a factor in each summand in the denominator.It cancels from the ratio.
The only factors that do not cancel are tho for which 1∈A .By de finition of a complete subt,tho factors can depend only on x j values for
口才训练视频j ∈{1}∪N 1.
It is slightly harder to show that every Markov random field corresponds to some Gibbs distribution.The simplest proof that I know depends on a general reprentation of a function as a sum of simpler functions.
<4>Lemma.Let g be any real-valued function on X .For each subt A ⊆S define g A (x )=g (y )where
y i = x i
if i ∈A 0if i ∈A c
and A (x )=
B ⊆A (−1)#(A \B )g B (x ).
ebbe
Then
(i)the function A depends on x only through tho coordinates x j with j ∈A (in particular, ∅is a constant)
(ii)for A =∅,if x i =0for at least one i in A then A (x )=0(iii)g (x )= A ⊆T A (x )
Proof .Asrtion (i)is trivial:every g B appearing in the de finition of A (x )does not depend on the variables {x j :j /∈A }.
willson
For (ii),divide the subts of A into two subcollections:tho that contain i and tho that do not contain i .For each B of the first type there is a unique t,˜B ={i }∪B ,of the cond type.Note that g
B (x )=g ˜B (x )becau x i =0.The contributions to A (x )from the ts B ,˜B cancel,becau one of the two numbers #A \B and #A \˜B
is odd and the other is even.For (iii),note that the coef ficient of g B in the double sum <5> A ⊆T A (x )= A ⊆T B ⊆A
(−1)#(A \B )g B (x )equals A {B ⊆A ⊆T }(−1)#(A \B )= E ⊆B c (−1)#E .
For B equal to T ,the last sum reduces to (−1)0becau ∅is the only subt of T c .For B c =∅,half of the subts E have #E even and the other half have #E odd,which reduces the coef ficient to 0.Thus the double sum <5>simpli fies to g T (x )=g (x ).
Applying the Lemma with g (y )=log Q {y }gives <6>Q {x }=exp A ⊆I
A (x ) To show that the expression on the right-hand side is a Gibbs distribution,we have only to prove that A (x )≡0when A is not a complete subt of the graph.
<7>Theorem.For a Markov random field,the term A in <6>is identically
tenderness>suspender
zero if A is not a complete subt of T .
Proof .For simplicity of notation,suppo 1,2∈A but nodes 1and 2are not connected by an edge of the graph,that is,they are not neighbors.Consider the contributions to A (x )from pairs B ,˜B ,where 1/∈B and ˜B =B ∪{1}.The numbers #A \B and #A \˜B differ by 1;the pair contributes ± g ˜B (x )−g B (x ) to the sum.De fine y i = x i
if i ∈B 0if i ∈B c
Then
g˜B(x)−g B(x)=log P{X1=x1,X2=y2,...,X n=y n} 122n n
=log P{X1=x1|X2=y2,...,X n=y n} 122n n
A common factor of P{X2=y2,...,X n=y n}has cancelled from numerator
and denominator.
The Markov property ensures that the conditional probabilities in the last ratio do not depend on the value y2.The ratio is unchanged if we replace y2 by0.The same argument works for every B,˜B pair.Thus A(x)is unchanged if we put x2equal to0.From Lemma<4>(ii),deduce that A(x)=0for all x,as asrted.
References
Griffeath,D.(1976),Introduction to Markov Random Fields,Springer.Chapter 12of Denumerable Markov Chains by Kemeny,Knapp,and Snell(2nd
edition).