THE JOURNAL OF FINANCE•VOL.LXVI,NO.1•FEBRUARY2011
When Is a Liability Not a Liability?Textual Analysis,Dictionaries,and10-Ks
TIM LOUGHRAN and BILL MCDONALD∗
ABSTRACT
抗疫天使Previous rearch us negative word counts to measure the tone of a text.We show
that word lists developed for other disciplines misclassify common words infinancial
text.In a large sample of10-Ks during1994to2008,almost three-fourths of the words
identified as negative by the widely ud Harvard Dictionary are words typically not
considered negative infinancial contexts.We develop an alternative negative word
list,along withfive other word lists,that better reflect tone infinancial text.We link
the word lists to10-Kfiling returns,trading volume,return volatility,fraud,material
weakness,and unexpected earnings.
A GROWING BODY offinance and accounting rearch us textual analysis to examine the tone and ntiment of corporate10-K reports,newspaper arti-cles,press releas,and investor message boards.Examples are Antweiler and Frank(2004),Tetlock(2007),Engelberg(2008),Li(2008),and Tetlock,Saar-Tchansky,and Macskassy(2008).The results to date indicate that negative word classifications can be effective in measuring tone,as reflected by signifi-cant correlations with otherfinancial variables.
A commonly ud source for word classifications is the Harvard Psychoso-ciological Dictionary,specifically,the Harvard-IV-4TagNeg(H4N)file.One positive feature of this list for rearch is that its composition is beyond the control of the rearcher.That is,the rearcher cannot pick and choo which words have negative implications.Yet English words have many meanings, and a word categorization scheme derived for one discipline might not trans-late effectively into a discipline with its own dialect.
In a survey of textual analysis,Berelson(1952)notes that:“Content analysis stands or falls by its categories.Particular studies have been productive to the extent that the categories were clearly for
mulated and well adapted to the problem”(p.92).In some contexts,the H4N list of negative words may effectively capture the tone of a text.The question we address in this paper is whether a word list developed for psychology and sociology translates well into the realm of business.
∗Loughran and McDonald are with University of Notre Dame.We are indebted to Paul Tetlock for comments on a previous draft.We also thank Robert Battalio,Peter Easton,James Fuehrmeyer, Paul Gao,Campbell Harvey(Editor),Nicholas Hirschey,Jennifer Marietta-Westberg,Paul Schultz, an anonymous referee,an anonymous associate editor,and minar participants at the2009FMA meeting,University of Notre Dame,and York University for helpful comments.We thank Hang Li for rearch assistance.
颠倒的世界35
36The Journal of Finance R
While measuring document tone using any word classification scheme is inherently impreci,we provide evidence bad on50,115firm-year10-Ks between1994and2008that the H4N list substantially misclassifies words when gauging tone infinancial applications.Misclassified words that are not likely correlated with the variables under consideration—for example,taxes or liabilities—sim
ply add noi to the measurement of tone and thus attenuate the estimated regression coefficients.However,we alsofind evidence that some high frequency misclassifications in the Harvard list,such as mine or cancer, could introduce type I errors into the analysis to the extent that they proxy for industry gments orfirm attributes.
We make veral contributions to the literature on textual analysis.Most notably,wefind that almost three-fourths(73.8%)of the negative word counts according to the Harvard list are attributable to words that are typically not negative in afinancial context.Words such as tax,cost,capital,board,liability, foreign,and vice are on the Harvard list.The words also appear with great frequency in the vast majority of10-Ks,yet often do no more than name a board of directors or a company’s vice-presidents.Other words on the Harvard list,such as mine,cancer,crude(oil),tire,or capital,are more likely to identify a specific industry gment than reveal a negativefinancial event.
We create a list of2,337words that typically have negative implications in afinancial n.The prevalence of polymes in English—words that have multiple meanings—makes an absolute mapping of specific words intofinan-cial ntiment impossible.We can,however,develop lists bad on actual usage frequency that are most likely associated with a target construct.We u the term Fin-Neg to describe our list of negativefinancial words.Some of the words also appear on the H4N
春天手抄报超简单list,but others,such as felony,litigation,re-stated,misstatement,and unanticipated do not.
When testing the10-K sample,whether tone should be gauged by the entire document or just the Management Discussion and Analysis(MD&A)ction is an empirical question.We show that the MD&A ction does not produce tone measures that have a more discernable impact on10-Kfile date excess returns.Thus,the MD&A ction does not allow us to asss tone through a clearer lens.
In our results,wefind that dividingfirms into quintiles according to the pro-portion of H4N words(with inflections)in their10-Ks produces no discernable pattern.That is,the proportion of H4N words does not systematically increa as10-Kfiling returns decrea.However,when we u ourfinancial negative list to sortfirms,we obrve a strong pattern.Regressions with multiple con-trol variables confirm the univariatefindings of no effect for the proportional counts from the Harvard list versus a significant impact for the Fin-Neg list. We also show that the attenuation bias introduced by misclassifications,es-pecially by high frequency words(which may be overweighted bad on simple proportional measures),can be substantially mitigated by using term weight-ing.Most textual analysis us a“bag of words”method where a document is summarized in a vector of word counts,and then combined across documents
When Is a Liability Not a Liability?37 into a term-document matrix.In other disciplines,term weighting is typically ud in any vector space reprentation of documents.1With term weighting, where the enormous differences in frequencies are dampened through a log transformation and common words are weighted less,both the Harvard list and our Fin-Neg list generally produce similar results.
To expand the word classification categories,we createfive additional word lists.Specifically,in addition to the negative word lists,we consider positive, uncertainty,litigious,strong modal,and weak modal word categories.2When we asss whether the word lists actually gauge tone,wefind significant relations between our word lists andfile date returns,trading volume,sub-quent return volatility,standardized unexpected earnings,and two parate samples of fraud and material weakness.We also examine whether negative tone classifications are related to future returns in terms of a trading strategy, andfind no evidence of return predictability bad on the competing measures. The nature of word usage infirm-related news is not identical across me-dia.Whether our results hold for samples beyond10-Ks is an important question.We provide preliminary evidence in alternative contexts showing that in comparison with the Harvard list,the Fin-Neg list has larger cor-relations with returns in samples of asoned equity offerings and news articles.
The remainder of the paper is organized as follows.Section I discuss related rearch on textual a
nalysis.Section II introduces the data sources,variables, and term weighting method ud in our analysis.Section III describes the various word lists and Section IV reports the empirical results.Finally,Section V concludes.
I.Rearch on Textual Analysis
Textual analysis is a subt of a broader literature infinance on qualitative information.This literature is confronted by the difficult process of accurately converting qualitative information into quantitative measures.Examples of qualitative studies not bad on textual analysis include Coval and Shumway (2001),who examine the relation between trading volume in futures contracts and noi levels in the trading pits,and Mayew and Venkatachalam(2009),who analyze conference call audiofiles for positive or negative vocal cues revealed by managers’vocal signatures.
Although we focus on the more common word categorization(bag of words) method for measuring tone,other papers consider alternative approaches bad on vector distance,Na¨ıve Bayes classifications,likelihood ratios,or other clas-sification algorithms.(See,for example,Das and Chen(2001),Antweiler and Frank(2004),or Li(2009)).Li discuss the benefits of using a statistical 1See Manning and Sch¨utze(2003),Jurafsky and Martin(2009),or Singhal(2009).
2Modal verbs are ud to express possibility(weak)and necessity(strong).We extend this categorization to create our more general classification of modal words.
38The Journal of Finance R
approach over a word categorization one,arguing that categorization might have low power for corporatefilings becau“there is no readily available dic-tionary that is built for the tting of corporatefilings”(p.12).Tetlock(2007, p.1440)discuss the drawbacks of using methods that require the estimation of likelihood ratios bad on difficult to replicate and subjective classification of texts’tone.3
Authors commonly u external word lists,like Harvard’s General Inquirer, to evaluate the tone of a text.The General Inquirer has182tag categories. Examples include positive,negative,strong,weak,active,pleasure,and even pain categories.Finance and accounting rearchers generally focus on the Harvard IV-4negative and positive word categories,although none ems to find much incremental value in the positive word lists.
The limitations of positive words in prior tests,as noted by others,is likely attributable to their frequent negation.It is common to e the framing of negative news using positive words(“did not benefit”),wh
ereas corporate com-munications rarely convey positive news using negated negative words(“not downgraded”).
While not every prior work us the Harvard negative word list to gauge text tone,it is a typical example of word classification schemes.We choo to u the Harvard list for our tests becau,unlike many other word lists,the Harvard list is nonproprietary.This allows us to asss exactly which words contribute most to the aggregate counts.
Perhaps the best known study in this area is Tetlock(2007),who links the Wall Street Journal’s popular“Abreast of the Market”column with subquent stock returns and trading volume.Tetlockfinds that high levels of pessimistic words in the column precede lower returns the next day.Pessimism is initially determined by word counts using a factor derived from77General Inquirer cat-egories in the Harvard dictionary.However,later in his paper,Tetlock focus on both negative words and weak words,as the are most highly correlated with pessimism.Tetlock notes that“negative word counts are noisy measures of qualitative information”and that the noisy measures attenuate estimated regression coefficients.In a subquent study,Tetlock,Saar-Tchansky,and Macskassy(2008)focus exclusively on the Harvard negative word list using firm-specific news stories.Our study shows that the noi of misclassification (nontonal words classified as negative)in t
he Harvard list is substantial when analyzing10-Ks and that some of the misclassified words might unintention-ally capture other effects.
寒假学习
3Other rearchers link the tone of newspaper articles(Kothari,Li,and Short(2008))or com-pany press releas(Demers and Vega(2008),Engelberg(2008),and Henry(2008))with lower firm earnings,earnings drift,or stock returns.Also considered are afirm’s10-K or IPO prospectus (Li(2008,2009),Hanley and Hoberg(2010),and Feldman et al.(2008)).The main point of the papers is that the linguistic content of a document is uful in explaining stock returns,stock volatility,or trading volume.
When Is a Liability Not a Liability?39两洞齐插
Table I
10-K Sample Creation
This table reports the impact of various datafilters on initial10-K sample size.
Sample Obrvations Source/Filter Size Removed Full10-K Document
EDGAR10-K/10-K4051994–2008complete sample
121,217
(excluding duplicates)
Include onlyfirstfiling in a given year120,290927
At least180days between a givenfirm’s10-Kfilings120,074216 CRSP PERMNO match75,25244,822 Reported on CRSP as an ordinary common equity
70,0615,191firm辩论赛推文
CRSP market capitalization data available64,2275,834 Price onfiling date day minus one≥$355,9468,281 Returns and volume for day0–3event period55,630316 NYSE,AMEX,or Nasdaq exchange listing55,61218
At least60days of returns and volume in year prior
55,038574 to and followingfile date
50,2684,770 Book-to-market COMPUSTAT data available and
克里斯蒂布朗
book value>0
Number of words in10-K≥2,00050,115153 Firm-Year Sample50,115
Number of uniquefirms8,341
Average number of years perfirm6
网上推广怎么做Management Discussion and Analysis(MD&A)
Subction
Subt of10-K sample where MD&A ction could
49,179936 be identified
MD&A ction≥250words37,28711,892
II.Data,Variables,and Term Weights
A.The10-K Sample
We download all10-Ks and10-K405s,excluding amended documents,from the EDGAR v)over1994to2008.4Table I shows how the original sample of10-Ks is impacted by our datafilters and data requirements. Most notably,the requirement of a CRSP PERMNO match reduces the original sample of121,21710-Ks by44,822firms.5This is not surprising as many of the
4A10-K405is a10-K where a box on thefirst page is checked indicating that a“disclosure of delinquentfilers pursuant to Item405”was not included in the currentfiling.Until this distinction was eliminated in2003,a substantial portion of10-Ks were categorized as10-K405.The SEC eliminated the405classification due to confusion and inconsistency in its application.The choice does not impact our study,so we include both form types in our sample and simply refer to their aggregation as10-Ks.
5We u the Wharton Data Services CIKfile to link SEC CIK numbers to the CRSP PERMNOs.