ThemeRiver: Visualizing Theme Changes over Time Susan Havre, Beth Hetzler, and Lucy Nowell
Battelle Pacific Northwest Division
Richland, Washington 99352 USA
1+509+375-6948
{susan.havre | beth.hetzler | well}@v
Abstract
ThemeRiver™ is a prototype system that visualizes thematic variations over time within a large collection of documents. The “river” flows from left to right through time, changing width to depict changes in thematic strength of temporally associated documents. Colored “currents” flowing within the river narrow or widen to indicate decreas or increas in the strength of an individual topic or a group of topics in the associated documents. The river is shown within the context of a timeline and a corresponding textual prentation of external events.Keywords: visualization metaphors, trend analysis, timeline
1.Introduction
In exploratory information visualization, one goal is to prent information so that urs can easily discern patterns. Patterns reveal trends, relationships, anoma-lies, and structure in the data, and may help urs
Figure 1: ThemeRiver™ us a river metaphor to reprent theme changes over time.
confirm knowledge or hypothes. Perhaps more impor-tantly, they also rai unexpected questions leading urs to new insights. The challenge is to create visuali-zations that enable urs to find patterns quickly and easily. ThemeRiver, shown in Figure 1, is a prototype system designed to reveal temporal patterns in text collections.
Information visualization systems such as Envision [13], BEAD[1], LyberWorld [ 3, 4] and SPIRE [18] reprent each document or group of documents with a glyph or icon, portraying various document attributes. Various methods have been explored for showing change over time in document-centric visualizations. See Section 3 below.
However, a ur may be less interested in documents themlves than in theme changes within the whole col-lection over time. For example, how did Shakespeare’s themes change during various periods of his life or in relation to contemporary events? Such information is difficult, if not impossible, to glean from most visuali-zations. A visualization that focus on themes, rather than documents, could be more uful for such explora-tion.
ThemeRiver provides urs with a macro-view of thematic changes in a corpus of documents over a rial dimension. It is designed to facilitate the identification of trends, patterns, and unexpected occ
urrence or non-occurrence of themes or topics. In our prototype, we u time as the rial dimension. We provide contextual information through a timeline and markers for co-occurring events of interest. Figure 1 shows a sample ThemeRiver visualization. This paper describes the design of ThemeRiver, walks through a sample informa-tion exploration ssion, and discuss results of forma-tive usability testing.
2.Design
Our major design goal was to provide a visualization of theme change over time. Consider using a histogram to visualize the changes. In a histogram (such as the one shown in Figure 2), each bar reprents a time slice, and color variations and size within the bar reprent the relative strength of themes specific to that slice. However, understanding the histogram requires urs to work at integrating the themes across time becau the bars are anchored to a baline and the position of a particular theme within the bars may vary considerably.
Like a histogram, ThemeRiver us variations in width to reprent variations in strength or degree of reprentation. However, it connects the strength values in adjacent time slices with smooth and continuous curves. The horizontal flow of the river reprents the flow of time. Colored currents that r
un horizontally within the river reprent themes. Each vertical ction of the river corresponds to an ordered time slice.
The width of each current changes to reflect the thematic strength for each time slice. For example, in Figure 1 the theme “soviet” increas in relative strength in June 1960 as indicated by the widening of the upper bright orange current. “Soviet” los relative strength in July and August; thus the same current nar-rows in the next two time slices. “Soviet” then increas significantly in relative strength in September; the current widens proportionately.
Currents maintain their integrity as a single entity over time. If a theme ceas to occur in the documents for a period of time and then recurs, the current likewi disappears and then reappears. Consistent color and relative position to other themes make theme currents easy to recognize. In Figure 1, the lower purple band depicts the changes in relative strength of the theme “cane.” The “cane” current occurs grows and shrinks over time; “cane” occurs most strongly in March 1961.
We believe that ThemeRiver’s continuous curves have much to do with its usability. The Gestalt School of Psychology [8], founded in 1919 in Germany, theorized that with perception, “the whole is greater than the sum of the parts.” Simply put, during the perception process humans do not organiz
e individual, low-level, nd elements, but n more complete “packages” that reprent objects or patterns. In his recent book [6], Hoffman prents a compelling discus-sion of how our perceptual process identify curves and silhouettes, recognize parts, and group them togeth-er into objects. Numerous aspects of the image influ-ence our ability to perceive the parts and objects, including similarity, continuity, symmetry, proximity, and closure. For example, it is easier to perceive objects that are bounded by continuous curves than tho that contain abrupt changes [17].
The vertical proximity of the river currents makes it easy for urs to judge the relative width of currents and thus the relative strength of the themes. Similarly, sym-metry around the horizontal axis of the river, a current, or group of currents makes it easier for urs to perceive flow patterns and changes. Widths of currents combine to show cumulative widening and narrowing, reprent-ing changing strength for the lected t of themes as a whole.
Values for theme strength can be calculated various ways. For example, they might reprent the number of documents containing the word. Becau the river los its continuity and structure if there are too few or too many themes, we created veral theme subts for exploration.
We have implemented a proof-of-principle prototype and ud it to explore data from multiple source
s. Figure 1 portrays data from a collection of speeches, interviews, articles, and other text associated with Fidel Castro. The visualization includes the river, a timeline below the river, and markers for related historical events along the top. With ThemeRiver, urs may •display topic and event labels
•display time and event grid lines
•display the raw data points
•choo among drawing algorithms for the currents and river.
Urs may also display the associated time or theme name by simply moving the mou across the image. In addition, urs may pan and zoom to e other time periods or parts of the river and to e more detail or broader context. In this sample data t, we found veral interesting correspondences between themes and events, such as the expansion of the “oil” theme just before Castro confiscated American oil refineries (e Figure 1).
玉米香肠
3.Related Work
Many systems include features for viewing time. One common method is to show discrete time slices. For ex-ample, in the Spatial Paradigm for Information Retriev-al and Exploration (SPIRE) Gal
axy visualization [18], urs may choo to progressively step through time, showing only the icons for documents originating within each specified time period. Another common approach is to show time as an attribute of documents, as done in the Virginia Tech’s Envision system, which lets urs map various metadata values, including date, to x-axis, y-axis, or color, shape, or size graphical encodings [13].
More similar to ThemeRiver in intent are systems that focus directly on time. The LifeLines system, developed jointly by the University of Maryland and IBM, has been ud to visualize medical records and juvenile criminal records [14, 15]. The visualization displays time along the x-axis and us the y-axis to categorize events. Bars depict duration for a given event, and graphical attributes such as color show event attributes. TmViewer us a similar approach, adding the ability to show parent-child relationships with lines between related time bars [10]. The DIVA system [12] us animation to show how particular measured values change in relation to the temporal flow of a video. To help groups collaborating to create a document or other artifact, the Timewarp system developed at Xerox PARC [2] lets urs view and edit multiple timelines of the changing state of that artifact. The metaphor ud is similar to a state diagram, with lines connecting state nodes and branches. Additional work on timelines includes Karam’s [7] and Kullberg’s [9].
We know of no other systems that u the river meta-phor to depict the passage of time. However, Tufte [16] prents a similar idea in an artist’s illustration showing trends in music. In that illustration, width reprents sales and proximity indicates influence of preceding styles. Our work differs in veral aspects, such as the u of color, the inclusion of contextual events, and the ability to generate the visualization automatically from a potentially very large collection of documents.
4.Usability Evaluation
Early in ThemeRiver’s development, we carried out a simple formative usability evaluation with two urs. Questions we wanted to answer with this evaluation included双鱼座的英文
•Do urs understand the metaphor?
•Can they identify themes that are more often discusd?
•Does the visualization help them rai new questions about the data?
•Do they interpret details of the visualization in ways we had not expected?
冬瓜虾皮•How does their interpretation of the visualization differ from that of a histogram
showing the same data?
The data were the Castro collection described above, focusing on the years 1960-1963. We reprented the same data both in ThemeRiver and in a histogram that we created using a spreadsheet. (See Figure 2.) We made the content of the histogram as similar as possible to ThemeRiver’s. For example, the histogram depicted thematic content by months, using the same values that drive ThemeRiver. The month timeline was shown along the bottom and we added an event line to the histogram like the one in ThemeRiver.
Usability evaluation began with a brief explanation of the purpo of the ssion, followed by an introduc-tion to the data. Both participants viewed the data in both visualizations; one participant started first with the
histogram and one with ThemeRiver. We asked each participant questions about what they obrved in each display.
Examples of specific questions include
•In July ’62, what are the three most discusd
themes?
•Where is a new theme introduced?
Examples of more general questions include
•What looks interesting here – what do you
want to explore?
•How would you like to change or manipulate
讷口少言
the view?
We captured verbal protocol during this discussion. At the end, we asked participants to complete a short questionnaire, with feedback about the visualization and possible enhancements.
营养麦片From the verbal protocol and from ur behavior, we obrved that the urs had no difficulty in understand-ing the metaphor. They were able to identify themes that were strongly reprented and
able to understand the relationship between the width of the current and theme strength. The visualization also triggered ques-tions about the reasons behind certain theme strengths and patterns. For exploratory visualizations, this is a good result; we believe that a visualization should help the ur identify questions of interest to explore.
Questionnaire respons showed that urs found ThemeRiver easy to understand. They also found ThemeRiver uful, particularly for identifying macro trends. They told us that it was less uful for identi-fying minor trends becau the curves tend to de-emphasize very small values. We asked about the value of the river metaphor, and urs rated it highly as well. They obrved that the connectedness of the river helped them follow a trend more easily over time than in the histogram; this result is compatible with the per-ception principles described by Ware [17].
Urs liked some features of the histogram and rec-ommended adding them to ThemeRiver. One such fea-ture is the ability to e numeric values that drive the histogram and river currents. One ur expresd more trust in the histogram, becau she “knew” that the bars were exactly the data values, whereas she was not sure exactly what the data values were in ThemeRiver. Her point is a valid one, especially becau the curved lines怎么添加字体
Figure 2: Like ThemeRiver TM in Figure 1, this histogram us the Castro collection data and depicts changes in thematic content over time.
of ThemeRiver do require that we interpolate between data points to produce the curves. We have added the capability for urs to e the exact data points on demand.
Although urs liked the abstraction to the whole collection and thus away from individual documents,both urs suggested adding features to access docu-ments if desired. They wanted the ability to e the total number of documents during any time period and to get the text of each document on demand. They wanted to lect a current and e the documents that contributed to it.
Urs also wanted the ability to reorder the theme currents. Options they discusd included ur-defined ordering and ordering by correlation, so that themes appearing together in the documents would be nearby in the river.
5. Interactions and Sample Usage
Bad on usability evaluation results, we added a number of features to combine the best of both the river metaphor and histogram capabilities. This ction pre-
nts a sample usage scenario, illustrating the capabilities of the current version.
We ud ThemeRiver to explore the 1990 Associ-ated Press (AP) newswire data from the TREC5 distri-bution disks, a t of over 100,000 documents (e Figure 3). To explore the lected themes in this collec-tion, a ur might begin with a high-level survey of the visualization by panning along the cour of the river.The ur might look for wider currents that signal heavy u of a topic, such as the one for “baghdad” in Figure 3. Changes in the color distribution of the river signal changes in themes. We e such a change in August 1990, when the “kuwait” current, which had vanished in late July, suddenly appears and rapidly widens. The ur could also look for narrow currents in the river that signal relatively light u of particular themes.
In an earlier paper, Hetzler et al. [5] explored the AP data t with a variety of our visual analysis tools, fo-cusing on large theme changes surrounding the Iraqi invasion of Kuwait on August 2. ThemeRiver also re-flects the large theme changes. Near the right side of
西华大学分数线
Figure 3, we e veral currents that expand dramatic-
Figure 3: AP data from July - August 1990. A wide current in the river indicates heavy u of a topic,while changes in color distribution correlate to changes in themes.
>个人三年工作总结