首页 > 美文鉴赏

When Is a Liability Not a Liability？Textual Analysis Dictionaries and 10-Ks

更新时间:2023-05-26 03:03:09 阅读：评论：0

THE JOURNAL OF FINANCE•VOL.LXVI,NO.1•FEBRUARY2011

When Is a Liability Not a Liability?Textual Analysis,Dictionaries,and10-Ks

TIM LOUGHRAN and BILL MCDONALD∗

ABSTRACT

抗疫天使Previous rearch us negative word counts to measure the tone of a text.We show

that word lists developed for other disciplines misclassify common words inﬁnancial

text.In a large sample of10-Ks during1994to2008,almost three-fourths of the words

identiﬁed as negative by the widely ud Harvard Dictionary are words typically not

considered negative inﬁnancial contexts.We develop an alternative negative word

list,along withﬁve other word lists,that better reﬂect tone inﬁnancial text.We link

the word lists to10-Kﬁling returns,trading volume,return volatility,fraud,material

weakness,and unexpected earnings.

A GROWING BODY ofﬁnance and accounting rearch us textual analysis to examine the tone and ntiment of corporate10-K reports,newspaper arti-cles,press releas,and investor message boards.Examples are Antweiler and Frank(2004),Tetlock(2007),Engelberg(2008),Li(2008),and Tetlock,Saar-Tchansky,and Macskassy(2008).The results to date indicate that negative word classiﬁcations can be effective in measuring tone,as reﬂected by signiﬁ-cant correlations with otherﬁnancial variables.

A commonly ud source for word classiﬁcations is the Harvard Psychoso-ciological Dictionary,speciﬁcally,the Harvard-IV-4TagNeg(H4N)ﬁle.One positive feature of this list for rearch is that its composition is beyond the control of the rearcher.That is,the rearcher cannot pick and choo which words have negative implications.Yet English words have many meanings, and a word categorization scheme derived for one discipline might not trans-late effectively into a discipline with its own dialect.

In a survey of textual analysis,Berelson(1952)notes that:“Content analysis stands or falls by its categories.Particular studies have been productive to the extent that the categories were clearly for

mulated and well adapted to the problem”(p.92).In some contexts,the H4N list of negative words may effectively capture the tone of a text.The question we address in this paper is whether a word list developed for psychology and sociology translates well into the realm of business.

∗Loughran and McDonald are with University of Notre Dame.We are indebted to Paul Tetlock for comments on a previous draft.We also thank Robert Battalio,Peter Easton,James Fuehrmeyer, Paul Gao,Campbell Harvey(Editor),Nicholas Hirschey,Jennifer Marietta-Westberg,Paul Schultz, an anonymous referee,an anonymous associate editor,and minar participants at the2009FMA meeting,University of Notre Dame,and York University for helpful comments.We thank Hang Li for rearch assistance.

颠倒的世界35

36The Journal of Finance R

While measuring document tone using any word classiﬁcation scheme is inherently impreci,we provide evidence bad on50,115ﬁrm-year10-Ks between1994and2008that the H4N list substantially misclassiﬁes words when gauging tone inﬁnancial applications.Misclassiﬁed words that are not likely correlated with the variables under consideration—for example,taxes or liabilities—sim

ply add noi to the measurement of tone and thus attenuate the estimated regression coefﬁcients.However,we alsoﬁnd evidence that some high frequency misclassiﬁcations in the Harvard list,such as mine or cancer, could introduce type I errors into the analysis to the extent that they proxy for industry gments orﬁrm attributes.

We make veral contributions to the literature on textual analysis.Most notably,weﬁnd that almost three-fourths(73.8%)of the negative word counts according to the Harvard list are attributable to words that are typically not negative in aﬁnancial context.Words such as tax,cost,capital,board,liability, foreign,and vice are on the Harvard list.The words also appear with great frequency in the vast majority of10-Ks,yet often do no more than name a board of directors or a company’s vice-presidents.Other words on the Harvard list,such as mine,cancer,crude(oil),tire,or capital,are more likely to identify a speciﬁc industry gment than reveal a negativeﬁnancial event.

We create a list of2,337words that typically have negative implications in aﬁnancial n.The prevalence of polymes in English—words that have multiple meanings—makes an absolute mapping of speciﬁc words intoﬁnan-cial ntiment impossible.We can,however,develop lists bad on actual usage frequency that are most likely associated with a target construct.We u the term Fin-Neg to describe our list of negativeﬁnancial words.Some of the words also appear on the H4N

春天手抄报超简单list,but others,such as felony,litigation,re-stated,misstatement,and unanticipated do not.

When testing the10-K sample,whether tone should be gauged by the entire document or just the Management Discussion and Analysis(MD&A)ction is an empirical question.We show that the MD&A ction does not produce tone measures that have a more discernable impact on10-Kﬁle date excess returns.Thus,the MD&A ction does not allow us to asss tone through a clearer lens.

In our results,weﬁnd that dividingﬁrms into quintiles according to the pro-portion of H4N words(with inﬂections)in their10-Ks produces no discernable pattern.That is,the proportion of H4N words does not systematically increa as10-Kﬁling returns decrea.However,when we u ourﬁnancial negative list to sortﬁrms,we obrve a strong pattern.Regressions with multiple con-trol variables conﬁrm the univariateﬁndings of no effect for the proportional counts from the Harvard list versus a signiﬁcant impact for the Fin-Neg list. We also show that the attenuation bias introduced by misclassiﬁcations,es-pecially by high frequency words(which may be overweighted bad on simple proportional measures),can be substantially mitigated by using term weight-ing.Most textual analysis us a“bag of words”method where a document is summarized in a vector of word counts,and then combined across documents

When Is a Liability Not a Liability?37 into a term-document matrix.In other disciplines,term weighting is typically ud in any vector space reprentation of documents.1With term weighting, where the enormous differences in frequencies are dampened through a log transformation and common words are weighted less,both the Harvard list and our Fin-Neg list generally produce similar results.

To expand the word classiﬁcation categories,we createﬁve additional word lists.Speciﬁcally,in addition to the negative word lists,we consider positive, uncertainty,litigious,strong modal,and weak modal word categories.2When we asss whether the word lists actually gauge tone,weﬁnd signiﬁcant relations between our word lists andﬁle date returns,trading volume,sub-quent return volatility,standardized unexpected earnings,and two parate samples of fraud and material weakness.We also examine whether negative tone classiﬁcations are related to future returns in terms of a trading strategy, andﬁnd no evidence of return predictability bad on the competing measures. The nature of word usage inﬁrm-related news is not identical across me-dia.Whether our results hold for samples beyond10-Ks is an important question.We provide preliminary evidence in alternative contexts showing that in comparison with the Harvard list,the Fin-Neg list has larger cor-relations with returns in samples of asoned equity offerings and news articles.

The remainder of the paper is organized as follows.Section I discuss related rearch on textual a

nalysis.Section II introduces the data sources,variables, and term weighting method ud in our analysis.Section III describes the various word lists and Section IV reports the empirical results.Finally,Section V concludes.

I.Rearch on Textual Analysis

Textual analysis is a subt of a broader literature inﬁnance on qualitative information.This literature is confronted by the difﬁcult process of accurately converting qualitative information into quantitative measures.Examples of qualitative studies not bad on textual analysis include Coval and Shumway (2001),who examine the relation between trading volume in futures contracts and noi levels in the trading pits,and Mayew and Venkatachalam(2009),who analyze conference call audioﬁles for positive or negative vocal cues revealed by managers’vocal signatures.

Although we focus on the more common word categorization(bag of words) method for measuring tone,other papers consider alternative approaches bad on vector distance,Na¨ıve Bayes classiﬁcations,likelihood ratios,or other clas-siﬁcation algorithms.(See,for example,Das and Chen(2001),Antweiler and Frank(2004),or Li(2009)).Li discuss the beneﬁts of using a statistical 1See Manning and Sch¨utze(2003),Jurafsky and Martin(2009),or Singhal(2009).

2Modal verbs are ud to express possibility(weak)and necessity(strong).We extend this categorization to create our more general classiﬁcation of modal words.

38The Journal of Finance R

approach over a word categorization one,arguing that categorization might have low power for corporateﬁlings becau“there is no readily available dic-tionary that is built for the tting of corporateﬁlings”(p.12).Tetlock(2007, p.1440)discuss the drawbacks of using methods that require the estimation of likelihood ratios bad on difﬁcult to replicate and subjective classiﬁcation of texts’tone.3

Authors commonly u external word lists,like Harvard’s General Inquirer, to evaluate the tone of a text.The General Inquirer has182tag categories. Examples include positive,negative,strong,weak,active,pleasure,and even pain categories.Finance and accounting rearchers generally focus on the Harvard IV-4negative and positive word categories,although none ems to ﬁnd much incremental value in the positive word lists.

The limitations of positive words in prior tests,as noted by others,is likely attributable to their frequent negation.It is common to e the framing of negative news using positive words(“did not beneﬁt”),wh

ereas corporate com-munications rarely convey positive news using negated negative words(“not downgraded”).

While not every prior work us the Harvard negative word list to gauge text tone,it is a typical example of word classiﬁcation schemes.We choo to u the Harvard list for our tests becau,unlike many other word lists,the Harvard list is nonproprietary.This allows us to asss exactly which words contribute most to the aggregate counts.

Perhaps the best known study in this area is Tetlock(2007),who links the Wall Street Journal’s popular“Abreast of the Market”column with subquent stock returns and trading volume.Tetlockﬁnds that high levels of pessimistic words in the column precede lower returns the next day.Pessimism is initially determined by word counts using a factor derived from77General Inquirer cat-egories in the Harvard dictionary.However,later in his paper,Tetlock focus on both negative words and weak words,as the are most highly correlated with pessimism.Tetlock notes that“negative word counts are noisy measures of qualitative information”and that the noisy measures attenuate estimated regression coefﬁcients.In a subquent study,Tetlock,Saar-Tchansky,and Macskassy(2008)focus exclusively on the Harvard negative word list using ﬁrm-speciﬁc news stories.Our study shows that the noi of misclassiﬁcation (nontonal words classiﬁed as negative)in t

he Harvard list is substantial when analyzing10-Ks and that some of the misclassiﬁed words might unintention-ally capture other effects.

寒假学习

3Other rearchers link the tone of newspaper articles(Kothari,Li,and Short(2008))or com-pany press releas(Demers and Vega(2008),Engelberg(2008),and Henry(2008))with lower ﬁrm earnings,earnings drift,or stock returns.Also considered are aﬁrm’s10-K or IPO prospectus (Li(2008,2009),Hanley and Hoberg(2010),and Feldman et al.(2008)).The main point of the papers is that the linguistic content of a document is uful in explaining stock returns,stock volatility,or trading volume.

When Is a Liability Not a Liability?39两洞齐插

Table I

10-K Sample Creation

This table reports the impact of various dataﬁlters on initial10-K sample size.

Sample Obrvations Source/Filter Size Removed Full10-K Document

EDGAR10-K/10-K4051994–2008complete sample

121,217

(excluding duplicates)

Include onlyﬁrstﬁling in a given year120,290927

At least180days between a givenﬁrm’s10-Kﬁlings120,074216 CRSP PERMNO match75,25244,822 Reported on CRSP as an ordinary common equity

70,0615,191ﬁrm辩论赛推文

CRSP market capitalization data available64,2275,834 Price onﬁling date day minus one≥$355,9468,281 Returns and volume for day0–3event period55,630316 NYSE,AMEX,or Nasdaq exchange listing55,61218

At least60days of returns and volume in year prior

55,038574 to and followingﬁle date

50,2684,770 Book-to-market COMPUSTAT data available and

克里斯蒂布朗

book value>0

Number of words in10-K≥2,00050,115153 Firm-Year Sample50,115

Number of uniqueﬁrms8,341

Average number of years perﬁrm6

网上推广怎么做Management Discussion and Analysis(MD&A)

Subction

Subt of10-K sample where MD&A ction could

49,179936 be identiﬁed

MD&A ction≥250words37,28711,892

II.Data,Variables,and Term Weights

A.The10-K Sample

We download all10-Ks and10-K405s,excluding amended documents,from the EDGAR v)over1994to2008.4Table I shows how the original sample of10-Ks is impacted by our dataﬁlters and data requirements. Most notably,the requirement of a CRSP PERMNO match reduces the original sample of121,21710-Ks by44,822ﬁrms.5This is not surprising as many of the

4A10-K405is a10-K where a box on theﬁrst page is checked indicating that a“disclosure of delinquentﬁlers pursuant to Item405”was not included in the currentﬁling.Until this distinction was eliminated in2003,a substantial portion of10-Ks were categorized as10-K405.The SEC eliminated the405classiﬁcation due to confusion and inconsistency in its application.The choice does not impact our study,so we include both form types in our sample and simply refer to their aggregation as10-Ks.

5We u the Wharton Data Services CIKﬁle to link SEC CIK numbers to the CRSP PERMNOs.

本文发布于:2023-05-26 03:03:09，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/931289.html

上一篇：保险专业词汇英汉对照1

下一篇：2023年受理通知书受理通知书多久送达(大全十四篇)

标签：抗疫颠倒寒假辩论赛天使世界

留言与评论（共有 0 条评论）