Some Coding Notes (extended) from Respons to Sexism journal
Procedural Memo: 10/22/97 Unitizing Reliability: Figures and Formula
延续英文Holsti, O.R. (1969). Content analysis for the social sciences and
humanities. Reading, MA: Addison-Wesley.
SUMMARY OF SOURCES
HOLSTI (pp. 138-141) provides a couple of pretty basic formulas for determining IC reliability; One follows the formula:
C.R. = 2M/N1 + N2,vichy where “M is the number of coding decisions on which the two judgees are in agreement, and N1 and N2 refer to the number of coding decisions made by judges 1 and 2, respectively” (p. 140).
disturb是什么意思“This formula has been criticized, however, 网学becau it does not take into account the extent of inter-coder agreement which may result from chancebuilding是什么意思 (Bennett, Alpert, and Goldstein, 19
54). By chance alone, agreement should increa as the number of categories decreas” (p. 140).
Scott’s pi corrects for the number of categories in the category t, but also for the probable frequency with which each is ud (reword this before put in press; veral phras are word for word):
pi = % obrved agreement - % expected agreement / 1 - % expected agreement
% agreement is found by “finding the proportion of items falling into each category of a category t, and summing the square of tho proportions” (p. 140). Holsti gives an example, but the example only ems to reflect the categories of one of the two coders in his comparison—or it might reflect both, but in this example, both have ud each of 4 categories the same number of times; I don’t think this would always happen in real life. Holsti (1969) gives a third method, but it does not em to apply to the ca at hand.
While there are veral other methods cited (from 1940s and 1950s), Holsti (1969) es Scott’s pi as a good conrvative estimate.
However, note that the problem with the first formula (henceforth called Holsti’s formula) is that it capitalizes on chance *when there are a small number of categories*.
Guetzkow, H. (1950). Unitizing and categorizing problems in coding
qualitative data. Journal of Clinical Psychology, 6, 47-58.
CODER A CODER B Totals
289 312
This figure was determined by adding the total number of units each saw across 5 questions for 20 surveys.
怜悯的意思U = (O1 - O2) / (O1 + O2)
where O1 is the number of units Obrver 1 es in a text, and O2 is the number of obrvations Obrver 2 es in a text.
U = (289-312) / (289 + 312)
= -23 / 601
= .038
Since U is actually a measure of disagreement (Folger, Hewes, & Poole) we could say that there is an agreement of .962 agreement.
One of the problems with this figure is that it can obscure many differences. For example, in our data t there were veral occasions when Coder A saw one more unit than Coder B did, and others where Coder B saw one more than Coder A. Depending on how the units of a “text” are calculated, the differences can cancel each other out, giving an inflated reliability.
Folger et al. state the problem clearly:
“Although Guetzkow’s index is certainly uful, it falls short of being ideal. To be ideal, an index of unitizing reliability should estimate the degree of agreement between two or more coders in identifying specific gments of text. That is, an ideal index should quaniti
cfo培训fy the unit-by-unit agreement between two or more coders. Neither U or his more sophisticaed index bad on U does this. Guetzkow’s indices only show the degree to which two coders identify the same number of units in a text of fixed length, not whether tho units were in fact the same units.” (p. 120).
At the same time, Folger et al. suggest the index may be appropriate in certain situations. They suggest (but do not demonstrate) a way of looking at agreement in each objective gment (in our ca, the amount of agreement for each question?) and then calculating across gments. They refer the reader to:
Hewes et al. (1980)
计算机辅助设计的英文缩写Newtson & Engquist (1976)kld
Newtson et al. (1977) and
Ebbeson & Allen (1979)
for examples.
I will later check on the cites. One of them is:
Hewes, D.E., Planalp, S. K., & Streibel, M. (1980) Analyzing social interaction: Some excruciating models and exhilarating results. Communication Yearbook 4, 123-144.
Folger et al. ask:
“Is it always necessary to go to so much work to provide evidence of unitizing reliability? Probably not in all cas. If one is using an exhaustive coding system, i.e., a coding system in which each and every act is coded, and Guetzkow’s U is quite low, perhaps .10 or below, it 气眼may prove unnecessary to perform a unit-by-unit [gment by gment?] analysis. Similarly, if the actual unit is relatively objective and easily coded, Guetzkow’s indices may suffice. On the other hand, if the units are subjective, the coding scheme is not exhaustive or the data arre to be ud for quential analysis (lagged-quential analyisi, Markov process, et.), unit-by-unit analysis is esntial. In any even some measure of unitizing relaibility should be reported in any quantative study of social interact.” (p. 121).