【代码实践】使⽤Garch模型估计VaR title: “Value at Risk estimation using GARCH model”
author: “Ionas Kelepouris & Dimos Kelepouris”
date: “July 6, 2019”
output:
html_document:
fig_height: 7
fig_width: 10
highlight: tango
toc: yes
knitr::opts_chunk$t(echo = TRUE , warning = FALSE , error = FALSE , message = FALSE)
Introduction
本分析的⽬的:通过给定的时变波动率构建VaR。VaR被⼴泛地运⽤于计算市场风险。
我们的时间序列包括2668天的黄⾦期货收益。为了解释⽇度收益的部分波动率,我们
使⽤Box-Jenkins⽅法拟合ARIMA模型,并测试其假设。
其次,我们检测了收益的概率密度分布是否符合正态分布,以先择最合适的分布形式。
再次,我们使⽤Garch模型估算了残差的条件⽅差,并与delta-normal⽅法及进⾏对⽐。
最后,我们进⾏了1-step的VaR预测,并且回测数据以检验模型是否完善。
Data & Libraries
为了建模,我们收集了10年的黄⾦期货⽇度收益,共计2668个观测值。
# Load libraries
library(tidyver)
library(ggthemes)
library(forecast)
library(tries)
library(gridExtra)
library(rugarch)
# Load data
stocks = read.csv('E:\\BaiduNetdiskWorkspace\\Gold_CSS\\Model\\R.csv' , header = T)
stocks = stocks %>% lect(Date , R , Css_e)
rets = stocks$R
p1 = qplot(x = 1:length(rets) , y = rets , geom = 'line') + geom_line(color = 'darkblue') +
geom_hline(yintercept = mean(rets) , color = 'red' , size = 1) +
labs(x = '' , y = 'Daily Returns')
p2 = qplot(rets , geom = 'density') + coord_flip() + geom_vline(xintercept = mean(rets) , color = 'red' , size = 1) + geom_density(fill = 'lightblue' , alpha = 0.4) + labs(x = '')
grid.arrange(p1 , p2 , ncol = 2)
为了识别收益的平稳性,我们使⽤Augmented Dickey-Fuller检验。
其零假设为:不平稳。
Small P-value (<0.01) 表明可以拒绝零假设。我们的收益率是平稳的。
Box-Jenkins Methodology
Box-Jenkins approach可应⽤ARIMA模型以发现时间序列最好的拟合,该⽅法主要有三个
阶段:
a) identification, b) estimation, c) diagnostic checking.
Identification
前期我们已经检验时间序列的平稳性。基于Autocorelation Function (ACF) and Partial Autocorrelation Function (PACF),以选择合适的参数p,d,q。Akaike Information Criterion (AICc)同样可以⽤于选取参数。
$AIC = ln\frac{\sum\hat{u}^2}{T} + \frac{2k}{T}$where = Sum of Squared Residuals = number of obrvations
= number of model parameters (p + q + 1)
AIC可以解决过度拟合以及拟合不⾜的风险。AIC值最低的模型参数将被选择。
model.arima = auto.arima(rets , der = c(3 , 0 ,3) , stationary = TRUE , trace = T , ic = 'aicc')
根据上述结果,选择ARIMA(0,0,2)
Estimation
可以估计所选参数的coefficients,我们使⽤Maximum Likelihood。
model.arima
Therefore the process can be described as:
$r_{t} = -0.1238*\epsilon_{t-1} - 0.1611*\epsilon_{t-2} + \epsilon_{t}$ where $\epsilon_{t}$ is White Noi
Diagnostics Checking
该过程包含obrving residual plot以及ACF & PACF diagra,以及Ljung-Box的检验
结果。如果ACF & PACF表明模型残差不存在显著滞后项,说明模型合适。
model.arima$residuals %>% pe = 'hist' , lag.max = 14)
ACF 和 PACF结果相似,其相关性接近于0。右侧残差接近正态分布 N(0 , ).
为了进⼀步验证残差不相关,we perform Ljung-Box test.
$$Q_{LB} = T(T+2)\sum_{s=1}^{m} \frac{\hat\rho_{s}^{2}}{T-s}$$
The statistic follows asymmetrically a distribution with m-p-q degrees of freedom. The null hypothesis refers s = model.arima$residuals
我们不能拒绝零假设,因此残差的过程就像⽩噪声⼀样,所以没有可能被建模的模式的迹象。
GARCH Implementation
尽管ACF & PACF of residuals都表明没有显著的滞后性,但时间序列图仍然表明存在波动聚集。ARIMA对数据线性建模,⽆法反应新信息。为了对波动率建模,我们使⽤ARCH模型。
ARCH是⼀个时间序列数据的统计模型,它描述了当前误差项的⽅差作为前⼀时间段误差项的实际⼤⼩的函数。
We assume that the time ries of interest, , is decompod into two parts, the predictable and unpredictable component, , where is the information t at time and and is the unpredictable part, or innovation process.
∑u ^2T k σ2Q LB X 2H :0ρ=1ρ=2⋯=ρ=m 0
r t r =t E (r ∣I )+t t −1ϵt I t −1t −1E (r ∣I )=t t −10.0437∗r −t −10.0542∗r t −2ϵt
The unpredictable component, can be expresd as a GARCH process in the following form:
where is a quence of independently and identically distributed random variables with zero mean and variance equal to 1.The conditional variance of is , a time-varying function of the information t at time .
Next step is to define the the cond part of the error term decomposition which is the conditional variance, . For such a task, we can u a GARCH(1 , 1) model, expresd as:
当残差平⽅存在⾃相关时,可以使⽤Garch模型。ACF and PACF plots clearly indicate significant correlation.
s^2 , main = 'Squared Residuals')
另⼀种检测残差平⽅的异⽅差性的⽅法测试参数的显著性。
and parameters.
# Model specification
model.spec = del = list(model = 'sGARCH' , garchOrder = c(1 , 1)) ,
model.fit = ugarchfit(spec = model.spec , data = ar.res , solver = 'solnp')
options(scipen = 999)
model.fit@fit$matcoef
Both and are significantly different from zero,
因此可以假设残差存在波动聚集现象。
With successive replacement of the term, the GARCH equation can be written as:
When we replace with the coefficient estimates given by optimization we get the following equation:
Given that , as lag increas the effect of the squared residual decreas.
Value at Risk
VaR⽤于测度当下位置的下跌风险。
A VaR statistic has three components: a) time period, b) confidence level, c) loss ammount (or loss p
ercentage). For 95%confidence level, we can say that the worst daily loss will not exceed VaR estimation. If we u historical data, we can estimate VaR by taking the 5% quantile value. For our data this estimation is:
quantile(rets , 0.05)
qplot(rets , geom = 'histogram') + geom_histogram(fill = 'lightblue' , bins = 30) +
geom_histogram(aes(rets[rets < quantile(rets , 0.05)]) , fill = 'red' , bins = 30) +
labs(x = 'Daily Returns')
ϵ=t z ∗t σt
z t ϵt σt t −1σt σ=t 2ω+a ∗1ϵ+t −12β∗1σt −1
2a 1β1a 1β1σt −12σ=t 2+1−β1ωa ∗1β∗i =1∑∞1i −1ϵt −i
2σ=t 20.000087+0.108∗(ϵ+t −120.825∗ϵ+t −220.680∗ϵ+t −320.561∗ϵ+t −42…)
0<β<11
Red bars refer to returns lower than 5% quantile.
Distributional Properties
为了估算VaR,我们需要适当定义假设分布的quantile。
对于正态分布⽽⾔,5%对应的quantile为-1.645。实证结果表明正态分布的假设往往
产⽣weak results。Jarque-Bera test can test the hypothesis that stock returns follow a normal distribution.
where is skewness and is kurtosis. A normal distributed sample would return a JB score of zero. The low p-value indicates stock returns are not normally distributed.
st(rets)
p2_1 = qplot(rets , geom = 'density') + geom_density(fill = 'blue' , alpha = 0.4) +
geom_density(aes(rnorm(200000 , 0 , sd(rets))) , fill = 'red' , alpha = 0.25) +
labs(x = '')
p2_2 = qplot(rets , geom = 'density') + geom_density(fill = 'blue' , alpha = 0.4) +
geom_density(aes(rnorm(200000 , 0 , sd(rets))) , fill = 'red' , alpha = 0.25) +
coord_cartesian(xlim = c(-0.07 , -0.02) , ylim = c(0 , 10)) +
geom_vline(xintercept = c(qnorm(p = c(0.01 , 0.05) , mean = mean(rets) , sd = sd(rets))) ,
color = c('darkgreen' , 'green') , size = 1) + labs(x = 'Daily Returns')
grid.arrange(p2_1 , p2_2 , ncol = 1)
如图所⽰,Density plots are shown for 收益(蓝)以及正态分布(红)数据。
Vertical lines表明了5%和1%对应的quantile。
正态分布都会⾼估风险。
Student’s t-distribution
为了更好对收益建模,我们使⽤其他分布。
The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean. We u the fitdist function from rugarch package to get the fitting parameters of t-distribution.
fitdist(distribution = 'std' , x = rets)$pars
cat("For a = 0.05 the quantile value of normal distribution is: " ,
qnorm(p = 0.05) , "\n" ,
"For a = 0.05 the quantile value of t-distribution is: " ,
qdist(distribution = 'std' , shape = 3.7545967917 , p = 0.05) , "\n" , "\n" ,
'For a = 0.01 the quantile value of normal distribution is: ' ,
qnorm(p = 0.01) , "\n" ,
"For a = 0.01 the quantile value of t-distribution is: " ,
qdist(distribution = 'std' , shape = 3.7545967917 , p = 0.01) , p = "")
95%置信区间下,正态分布⾼估风险;
99%置信区间下,正态分布未能捕捉outliers,低估风险。
Garch VaR vs Delta-normal approach
Delta-normal approach 假设收益正态分布。
JB =∗6
n −k +1(S +2∗41
(C −3))2S C V aR (a )=μ+σ∗N (a )
−1
where is the mean stock return, is the standard deviation of returns, is the lected confidence level and is the inver PDF function, generating the corresponding quantile of a normal distribution given .
The results of such a simple model is often disapointing and are rarely ud in practice today. The as
sumption of normality and constant daily variance is usually wrong and that is the ca for our data as well.
Previously we obrved that returns exhibit time-varying volatility. Hence for the estimation of VaR we u the conditional variance given by GARCH(1,1) model. For the underlined ast’s distribution properties we u the student’s t-
distribution. For this method Value at Risk is expresd as:
where is the conditional standard deviation given the information at and is the inver PDF function of t-distribution.
qplot(y = rets , x = 1:2668 , geom = 'point') + geom_point(colour = 'lightgrey' , size = 2) +
geom_line(aes(y = model.fit@fit$sigma*(-1.485151) , x = 1:2668) , colour = 'red') +
geom_hline(yintercept = sd(rets)*qnorm(0.05) , colour = 'darkgreen' , size = 1.2) + theme_light() +
labs(x = '' , y = 'Daily Returns' , title = 'Value at Risk Comparison')
Red line denotes VaR produced by GARCH model and green line refers to delta-normal VaR.
VaR forecasting
The ugarchroll method 可以进⾏滚动预测,it returns分布预测参数以计算forecasted density的必要测度。我们将最后500个观测设为测试集,对条件标准差进⾏1-step的预测。
refit.window = 'moving')
# Test t 500 obrvations
VaR95_td = mean(rets) + ll@forecast$density[,'Sigma']*qdist(distribution='std', shape=3.7545967917, p=0.05)
Backtesting
设 为收益低于VaR估计的个数。
VaR estimate, where is 1 if and 0 if . Hence, N is the obrved number of exceptions in the sample. As argued in Kupiec (1995), the failure number follows a binomial distribution, B(T, p).
p = c()
p[1] = pbinom(q = 0 , size = 500 , prob = 0.05)
for(i in 1:50){
p[i] = (pbinom(q = (i-1) , size = 500 , prob = 0.05) - pbinom(q = (i-2) , size = 500 , prob = 0.05))
}
qplot(y = p , x = 1:50 , geom = 'line') + scale_x_continuous(breaks = q(0 , 50 , 2)) +
annotate('gment' , x = c(16 , 35) , xend = c(16 , 35) , y = c(0 , 0) , yend = p[c(16 , 35)] , color = 'red' ,
size = 1) + labs(y = 'Probability' , x = 'Number of Exceptions') + theme_light()该图代表了异常的分布概率。The expected number为25个,两条红线表⽰95%的置信区间。 the lower being 16 and the upper 35.我们期待16-35之间的数字,to state that GARCH model as successfully preditive.
qplot(y = VaR95_td , x = 1:500 , geom = 'line') +
geom_point(aes(x = 1:500 , y = rets[2169:2668] , color = as.factor(rets[2169:2668] < VaR95_td)) , size = 2) + scale_color_manual(values = c('gray' , 'red ')) +
labs(y = 'Daily Returns' , x = 'Test t Obrvation') + theme_light() +
theme(legend.position = 'none')
Black line为由Garch模型所预测VaR,red points refer to returns lower than VaR.
Final step is to count the number of exceptions and compare it with the one generated with delta-normal approach.μσa N −1a V aR (a )=μ+∗σ^t ∣t −1F (a )
−1σ^t ∣t −1t −1F −1σ^t +1∣t N =I ∑t =1T
t I t y <t +1V aR t +1∣t y ≥t +1V aR t +1∣t