Enhancing Visual Exploration by Appropriate Color
Coding
Petra Schulze-Wollgast University of Rostock Institute for Computer Science
A.-Einstein-Str. 21
18059 Rostock, Germany psw@informatik.uni-
rostock.de
Christian Tominski
University of Rostock
Institute for Computer Science
A.-Einstein-Str. 21
18059 Rostock, Germany
ct@informatik.uni-rostock.de
Heidrun Schumann
University of Rostock
Institute for Computer Science
A.-Einstein-Str. 21
18059 Rostock, Germany
schumann@informatik.uni-
rostock.de
ABSTRACT
Visualization is an effective means for exploring and analyzing complex data. Color coding is a fundamental technique for mapping data to visual reprentations. Although color coding is widely ud in a large variety of visualizations, it is often provided in a limited way only or it is not ud effec
春天来了图片tively. Therefore, we describe in this paper how appropriate (automatic) color coding can enhance the visual exploration of spatial-temporal data. We demonstrate our techniques with a system for visualizing human health data by means of choropleth maps. Furthermore, we focus on how to u color coding for facilitating comparison tasks in visualization. Keywords
Visualization, Color Coding, Perception-Bad Color Scales, Comparison.
1.INTRODUCTION
Visualization is an effective means for exploring and analyzing complex data. Regarding this, color plays an important role. Color coding is a fundamental technique for mapping data to visual reprentations. Although, color coding is widely ud in a large va-riety of visualizations, it is often provided in a lim-ited way only or it is not ud effectively. Further-more, adapting color scales automatically by apply-ing a simple minimum-maximum-scaling often re-sults in visual reprentations of different views on the data which cannot be compared with each other. Therefore, we describe in this paper how appropriate automatic color coding can enhance the visual explo-ration of spatial-temporal data. This is achieved by taking into account:
•Perception-bad color schemes, •Ur aims, and
•Characteristics of the data.
We u perception-bad color schemes suggested in [Bre94] and [Ber95], which have proven to be effec-tive. From a collection of such schemes we choo the most appropriate one with respect to the urs’ visualization goal. In this context, we focus on com-parison; on the one hand comparison is a major visu-alization task for interactive data exploration, on the other, comparison is not supported sufficiently by most existing color-bad visualization systems. By considering data characteristics we have the ability to combine our color scale legends with Box-Whisker plots. By doing so, urs intuitively get more insight into the data. We demonstrate our techniques consid-ering color coded maps, which reprent human health data.
The paper is structured as follows. In Section 2 we give a general overview on color scales, describe problems regarding the u of color in visualization and give some guidelines on how color can be ud efficiently. Known approaches addressing the is-sues are reviewed before prenting our approach in Section 3. In Section 4 we describe a system for visualizing spatial-temporal human health data on maps. It is shown how our approach can enhance visual exploration of such data. Section 5 concludes the paper and gives an outlook for future work.
Permission to make digital or hard copies of all or part of this work for personal or classroom u is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy oth-erwi, or republish, to post on rvers or to redistribute to lists, requires prior specific permission and/or a fee. WSCG 2005 conference proceedings, ISBN 80-903100-7-9 WSCG’2005, January 31-February 4, 2005
Plzen, Czech Republic.
Copyright UNION Agency – Science Press
2. COLOR SCALES
Color is a retinal variable, which is very effective for mapping (abstract) data [Ber83]. This is due to the spontaneous perception of color by the human visual system. Since for now the interrelation between the physical phenomena color and its perception by the human visual system is not fully understood, color scales have to be chon carefully to fully utilize effectiveness of color-bad visual reprentations and furthermore, to avoid misinterpretations. A color can be described as a point in a 3-dimensional color space. In technical applications the primary colors red, green and blue
(RGB) span the dimensions of the color space. Hue, saturation and brightness (HSB) are ud as dimensions in percep-tion-bad applications. A color scale provides a range of colors varying in hue, saturation, and/or brightness. Color scales are defined by: • A t of control points and
•
A mapping function describing the transition between colors.
40集团军
Control points associate parameter values with col-ors. They are utilized to add colors to a color scale according to a parameter range :0.0 1.0t t ≤≤. A color scale consists of at least two control points, one for 0.0t = and one for 1.0t =; however, more con-trol points could be ud to create more sophisticated color scales. The mapping function describes how color is interpolated between two control points. A requirement for color coding is that the value range to be displayed must be scaled to [0.0;1.0], the range of parameter t
.
Figure 1 A standard RGB-bad color scale and a
rainbow color scale. Nowadays visualizations mainly u, on one hand, color scales that are created by linearly interpolating two colors from the RGB color space (e.g. (0.0)t black = and (1.0)t green =; cp. Figure 1). On the other, a rainbow color scale is ud. This scale contains all spectral colors from blue to red (i.e. the colors appearing in a rainbow). Though the scales are intended to linearly map colors to a scalar value range, the resulting color scales are not perceived as linear. Quite the contrary, urs perceive differently
sized regions in the color scale, which show variance not only in hue, but in saturation and brightness as well (cp. Figure 1). Therefore, providing only stan-dard color scales for visualization is not sufficient. In literature veral approaches are known address-ing the creation of color scales or giving guidelines for the u of color. Brewer describes the u of color for mapping data on cartographic maps [Bre94, Bre99]. Regarding this, binary, qualitative, quential and diverging color schemes are described. The schemes are bad on human perception and are, therefore, a good basis for creating effective visuali-zations. Additionally, the suggested color schemes have been evaluated considering different output devices like CRT screens, TFT displays or LCD pro-jectors. However, all the scales aim on categorized (i.e. gmented) maps; special scales for continuous q
uantitative data are not provided. Such scales can be found in PRAVDAColor [Ber95] developed by Bergman et al.. The authors describe a rule-bad mechanism for supporting urs in choosing appro-priate color scales. In order to decide what scale fits best, data type (ratio or interval data), spatial fre-quency (low or high), and reprentation task (iso-morphic reprentation, gmentation, highlighting) are taken into account. Figure 2 shows how color
scales for different visualization tasks may look like.
Figure 2 Color scales for isomorphic reprenta-tion (left), gmentation (middle), and highlight-ing (right). Though the works highly support developers in designing as well as urs in choosing effective color scales for visualization, most of today’s visualiza-tions barely utilize them. Another aspect currently still underestimated is the dependency on properties of the data (e.g. the distribution of data values). Though a general solution is hard to find – if this is possible at all – an integration of known concepts complementing one another is a vital step to further facilitating the u of colors in data reprentation. A modern visualization system, therefore, should provide all: potentially effective color scales, meth-ods for choosing and adapting color scales according to data characteristics, and intuitive interactive tools to enable urs to adapt color scales according to their needs.
3.ENHANCED COLOR CODING
In this ction we will review factors influencing color scaling. Bad on this, we introduce our ap-proach for enhancing effectiveness of color-bad visualizations.
3.1.Factors influencing color scaling Creating effective color scales for data visualization is not a trivial task. There exist no general guidelines or methods for choosing color scales automatically. This is due to the complexity of factors influencing the decision for a concrete color scale.
We identified 3 main categories of factors:
•Data properties,
•Visualization goal, and
•General context.
In the following, we will focus on the categories in more detail. Regarding data properties we subcatego-rize: scaling of the value range and statistical charac-teristics.
The scaling of the value range is considered for each variable within a data t. Data variables can be of nominal, ordinal, or quantitative scaling. While for nominal variables no ordering of data values
is given, ordinal variables compri an order of the data values. This has to be considered for the visualiza-tion. For achieving effectiveness, color scales for nominal variables must NOT and color scales for ordinal variables must imply an ordered perception of colors ud. Furthermore, quantitative variables allow for distances between data values. Therefore, perceptual distances within a color scale must reflect distances in the data. Moreover, a color scale should reveal whether a data variable contains a special zero value ctioning the value range (i.e. ratio data).
By considering statistical properties of a variable (e.g. element count, minimum, maximum, average, mean, quartiles, etc.) the effectiveness of color scales can be improved. This can be realized mainly by adapting the control points or the mapping function rather than by choosing certain colors.
The goal a ur wants to achieve when using visuali-zation has much influence on the choice of color scale. Bergman et al. [Ber95] differentiate between isomorphic reprentations (ur eks an exact im-age of the data), reprentations of gmentation (ur intends to detect gments within the data), and reprentations for highlighting (ur is interested in particular values). Besides the goals a variety of further tasks are possible (e.g. comparison, detection of correlation or clusters, etc.). Our interest espe-cially regards comparison tasks. Since comparison is one of the main tasks in data exploration, we do not limit our considerations on intercomparison within a single visual repre
ntation but detail also on com-parison of different reprentations of varying por-tions of the data (e.g. different time steps or different regions).
The third category of influencing factors is related to general context. Regarding this we identified the following aspects: perception of color, colorblind-ness, output device, and ur preferences. Regarding color perception the capabilities of the human eye can be considered. Though this issue is hard to grasp due to the complexity of the human visual system, some perception aspects can yet be considered for designing color scales. So it is possi-ble to take visual resolution into account. Visual resolution regards to what degree small and differ-ently colored spatial structures can be visually distin-guished. Furthermore, paying attention to adaptation mechanisms of the eye (e.g. a negative afterimage of what has been previously en shortly remains on the retina), to the relation of color and size (i.e. differ-ently sized objects of same color are perceived as differently colored), as well as to time dependent color perception (i.e. perception changes over time and under different environmental lightings) could enhance color scales; though the aspects are hard to integrate into a real system. However, colorblind-ness and the addresd output device can be taken into account easily by providing parameters urs can adjust. Even if urs do not know whether they are colorblind or not, simple tests could reveal this (cp. [Mey88]).
Table 1: Factors influencing color scaling.
By considering urs preferences (e.g. favorite color) visual analysis can be enhanced in general. When visualization is ud to communicate facts found in the data among urs of different cultural back-grounds, the relevance of ur preferences is even incread. To be more concrete, in each culture dif-ferent colors may be associated to different things.
Mandatory factors Optional factors
Scaling of the value
range
Colorblindness
Statistical
characteristics
Output device Visualization goal Cultural environment
Visual resolution
Adaptation
mechanisms
Relation of color and
size
Time dependent
color perception
Interaction of
different colors
The color red, for instance, is associated in Germany with danger, in Egypt with death, in India with life, and in China with happiness. This example under-lines the difficulty of solving the general color cod-ing problem.
In order to asss all the mentioned factors we distin-guish mandatory and optional factors (cp. Table 1). This distinction is bad on the relation between ef-fort for integrating a factor into a real system and resulting benefits for the effectiveness of the color scales.
3.2.Automatic color coding
We have developed our approach (cp. [Rut03] for detailed description) bad on previous work by Brewer [Bre94, Bre99] and work by Bergman, Rogowitz, and Treinish [Ber95, Rog96, Rog98]. Namely, we u a collection of color scales sug-gested in the publication. It is important to mention that all of the color scales are perception-bad and are, therefore, potentially effective for visualization tasks. Furthermore, we follow [Ber95] in using their rule-bad approach. Depending on scaling of the value range (nominal, ordinal, quantitative, or ratio data) and the ur’s visualization goal (isomorphic, gmentation, or highlighting) the most suitable color scale is chon. We extend the approach of Bergman et al. [Ber95] by:
•Extracting statistical metadata from the data t, •Adapting the chon color scale according to the metadata, and
•Creating an expressive legend for the chon color scale.
Extracting metadata Statistical characteristics can be easily extracted from a data t. We u average, median, mode, minimum, maximum, skewness, and quartiles as metadata for the automatic adaptation of the mapping function for a chon color scale. While all of the statistics can be calculated for quantita-tive data, for ordinal data media, mode, and quartiles can be determined only. In ca of nominal data mode is the only characteristic being considered. Adapting the color scale Bad on the metadata we allow for the following automatic adaptations of a chon color scale:
•Expansion of the mapped value range, •Adjustment of control points, and
•Alteration of the mapping functions.
Value range expansion is ud to create an adequate value range for the color scale mapping. Addition-ally, the lower and upper bounds of the value range are intended to be intuitively comprehensible. When considering dynamic data ts lower and upper bounds of the same variable might change. Espe-cially in this context, value range expansion allows for coherent visualization. For realizing value range expansion the lower and upper bounds are calculated according to the minimum and maximum values of a variable. An example (cp. Figure 3) for this is the expansion of a variables’ range from 225 to 1778 to a
range of values reaching from 0 to 2000.
Figure 3 Range expansion is ud to increa the comprehensibility of color scales.
The adjustment of control points is mainly ud for improving color scales for ratio data as well as for
国信手机
gmentation and highlighting scales. Color scales for highlighting tasks are adjusted by tting a spe-cial control point (i.e. a control point denoting the value to highlight) according to average, median, or mode of a variable. For ratio data it is also possible to consider a “real” zero for adjustment of control points. Color scales for gmentation could be ad-justed bad on quartiles. With respect to the number of gments to be differentiated, quartiles are calcu-lated, containing the same number of values each. The control points of the color scale are then posi-tioned according to the quartiles. This results in a color scale that supports the ur in the detection of similar regions within the data. Figure 4 depicts how a standard gmentation color scale (for 4 gments) can be adapted according to quartiles.
An alteration of the mapping function might become necessary for data with certain value distributions. Usually, the mapping function performs a linear in-terpolation between the colors associated to the con-trol points. Since the colors are perception-bad, this is effective and the structure of the data is accu-rately reprented. However, linear interpolation leads to problems if the values of a variable are not uniformly distributed (e.g. if outliers are prent). In this ca a wide range of the color scale reprents a small number of values and the majority of values has to cope with only a narrow range on the color scale. Similar problems are known from the field of computer vi
sion. There histogram equalization is ud to handle unfavorable color distribution in im-ages. We deal with this problem by means of nonlin-ear mapping functions. In order to decide how to adjust the mapping function, we take skewness of the
value distribution into account. Depending on whether a variable has positive or negative skewness, we apply an exponential or logarithmic mapping function. By doing so, we “stretch” the range of col-ors ud for the majority of data values. The visuali-zation is improved in a way that differences between values can be more easily detected for values from the range of high value density (cp. Section 3.3 for examples). It is very important that the color legend clearly reveals the color scale as exponential or loga-
rithmic. Otherwi, misinterpretations are inevitable.
Figure 4 Adaptation of a gmentation color scale according to quartiles. Note that on the adapted scale (right) the majority of values (50% of the value range) are exactly encoded by the two green
gments.
Figure 5 A color legend enhanced with a Box-Whisker plot. Creating a color legend In order to achieve an easy comprehensibility of the data and the ud color-coded visualization, a color legend has to be pro-vided. The color legend should show all ud colors and an additional scale, which allows an association of characteristic data values to a color. By using the method of value range expansion, we ensure that the scales of our color legends reprent characteristic data values. Moreover, we provide a Box-Whisker plot attached to the color legend (inspired by [And01]). Such a color legend is prented in Figure 5. By doing so, urs get better insight to the distri-bution of data values.
3.3. Considering comparison tasks什么影响了我作文
Comparison is an esntial task when visualization is ud for data exploration/analysis. In literature,
only few publications explicitly focus on this esntial task. Therefore, we will describe comparison in more detail.
比价格Comparison is ud to asss characteristics of the data. Before comparing an aspect of the data has to be chon regarding which comparison is performed. A requirement for comparison of objects is the equal-ity of the objects according to a common basis (e.g. objects from the same domain). If this requirement is satisfied objects are comparable. Then an object of reference has to be chon. Other objects can then be compared with respect to the reference object. This means that the process of comparison consists of four steps:
1. Choo an aspect for comparison
2. Check for comparability
3. Choo a reference object
4. Check for equality or differences.
Comparison facilitates 3 basic tasks regarding the asssment of data characteristics. Depending on the scaling of the value range the following basic tasks can be performed. For nominal variables only
equal-ity or inequality can be checked (a b =;a b ≠), regarding ordinal variables the ordering of objects can be determined (a b <;a b >), and for quantita-tive data the detection of an amount of difference is possible (a b =+∆).
The duration of comparability is another aspect that has to be taken into account. Regarding this aspect we distinguish: • Intercomparison in one single visualization, • Comparison in one single visualization ssion, • Comparison among multiple visualizations s-sions, and
•
Long term comparisons.
The difficulty of supporting comparison tasks by appropriate color coding increas with the duration of comparability. If the data is reprented in only one view a single effective color scale has to be cre-ated (cp. Section 3.2). To support comparison in one visualization ssion a single color scale should be ud during the whole ssion. This ensures that vis-ual reprentations created later in a ssion can be compared to views, which have already been ana-lyzed. Human color memory is notoriously poor, which makes comparisons across different visual
reprentations difficult. To alleviate the problem of
Color scale Value scale
Box-Whisker plot女圣职
comparison among multiple visualization ssions and long term comparison, it is necessary to save color scales ud in one ssion and import them into another ssion. Moreover, a special “color memory” could be ud. This means that the visualization sys-tem remembers which color scales have been ud to visualize which variables. This memory can then be ud to ensure that the same variables are encoded using the same color scales (i.e. ensure visual conti-nuity) and to avoid using the same color scale for different variables (i.e. avoid visual misinterpreta-tions). By doing so, a binding between color scale and encoded variable is established in the urs’ mind. Since this procedure is limited to a rather small number of variables (6-8), a “color memory” should be created for each data t. By using the concept of “color memory” we support comparison tasks among different visualization ssions and long term visu-alization tasks are facilitated.
For visualization of spatial-temporal data on maps the following specific comparison tasks can be re-fined:
1.Comparison of different regions of a map.
2.Comparison of different time steps of a data t.
菜饭的做法3.Comparison of different variables of a data t. Intercomparison of regions of the map is supported by adjusting the mapping functions for variables with positive or negative skewness. This can be en in Figure 6. When using a linear mapping, the regions of the island Rügen em to have equal values. How-ever, when using exponential mapping urs can clearly detect that differences exist. Note that the attached Box-Whisker plots give an idea of the dis-tribution of the data values.
For comparison of different variables or time steps multiple view techniques are suitable, where each of the views shows a map color coding one time step or one variable. However, if more than one view is ud, the construction of a color scale is more diffi-cult. This is due to the requirement for effectiveness of the color scale not only for a single but for all views.
When comparing different time steps two opposing goals exists. On the one hand, a global color scale can ensure comparability among the views, but re-gions within a single view might become hard to distinguish. On the other hand, local color scales can encode each view effectively, but comparison of time steps is hardly possible. In order to alleviate this problem, we collect metadata for the entirety of data reprented in the views to be compared and lect the most suitable color scale. Furthermore, value range expansion is applied to the data range common to all views. Bad on this, we t a global color scale for all views. Additionally, each view is equipped with its own local
color legend including a Box-Whisker plot reprenting the statistical charac-teristics of the respective view.
为什么尿酸高
In order to support comparison of different variables, the described “color memory” can be ud. More-over, if the variables to be compared have similar value ranges it is also possible to generate a global color scale and to provide local color legends (analo-gous to time step comparison).
Figure 6 Using an exponential mapping function supports comparison of different regions.
4.COLOR CODING FOR VISUALIZ-
ING HUMAN HEALTH DATA ON
MAPS
The human health data we consider consists of the number of cas for a variety of dias. The data depend on time (i.e. number of cas per week) and space (i.e. number of cas in different regions). The system TeCoMed (Te le Co nsultation for Med ics) [Sch03] has been developed for visualizing such data via the Internet. TeCoMed provides a rich function-ality to lect diagnos, time steps (e.g. day, week, month, quarter and year) and geographical regions interactively.
A variety of concepts for visualizing human health data according to their spatial and temporal depend-encies within different levels of granularity have been realized. Among the visualization facilities, color-coded maps (i.e. choropleth maps) are a key feature of the system. Our approach of choosing ap-propriate color scales automatically has been inte-grated to TeCoMed into order to facilitate the effec-tiveness of color coding. Figure 7 shows the integra-tion of the approach into the architecture of the sys-tem TeCoMed.
The metadata extraction performs a statistical analy-sis of the data to be visualized. Since our human Linear
mapping
Exponential
mapping