A Change of Theme: The Role of Generalization in Thematic Mapping

Cartographic generalization research has focused almost exclusively in recent years on topographic mapping, and has thereby gained an incorrect reputation for having to do only with reference or positional data. The generalization research community needs to broaden its scope to include thematic cartography and geovisualization. Generalization is not new to these areas of cartography, and has in fact always been involved in thematic geographic visualization, despite rarely being acknowledged. We illustrate this involvement with several examples of famous, public-audience thematic maps, noting the generalization procedures involved in drawing each, both across their basemap and thematic layers. We also consider, for each map example we note, which generalization operators were crucial to the formation of the map’s thematic message. The many incremental gains made by the cartographic generalization research community while treating reference data can be brought to bear on thematic cartography in the same way they were used implicitly on the well-known thematic maps we highlight here as examples.


Introduction
People outside the discipline and students early in their studies are often surprised to learn that mapmakers willfully filter and modify the information included on any given map in the interest of clarifying the overall message. Omission of things and places on maps is perhaps one of the first things people become aware of when they learn this, giving them a new appreciation of the saying about how some event caused some place to be "put on the map". They may further learn that roads, coastlines, or borders are actually more complex than the lines drawn for them, and might even be displaced from their actual location to avoid symbol overlaps. These sorts of disparities between reality and maps are often presented in teaching alongside more deliberate or even nefarious techniques for obscuring the truth [1]. However, aside from mistakes or intentional falsehood [2], we know that these disparities with reality are caused by generalization, and that this is entirely necessary for human understanding and the limitations of graphic resolution at map scales, among other reasons. This highlights a truth that seems so obvious to cartographers that we seem to overlook it completely: generalization plays a role in every map.
To illustrate the ubiquity of generalization in thematic cartography, we highlight the operators (i.e., generalization procedures and transformations) involved in compiling and drawing a series of famous, public-audience thematic and navigational maps, using mostly contemporary examples. Specifically, we tag each map for generalization operators used in their creation, using the typology of operators presented by Roth, Brewer, and Stryker [3]. Our tagging is not exhaustive; we are in almost all cases inferring the operators that must have been involved for each map, though indeed other unapparent processes of generalization may have taken place. We use the common conception of a thematic map as composed of a basemap (i.e., contextual geographic information) and one or more overlaid thematic layers. For each map example, we identify which operators were employed in producing either the basemap or thematic layers, and we rank these operators as critical to supporting the map's thematic message, or incidental.
Our aim here is to reintroduce generalization in thematic cartography to scholarly cartographers so that generalization's power in general, and recent advances by the map generalization research community in particular, can be consciously brought to bear on the often very large and varied datasets today's data visualization specialists engage with. Generalization and multiple representations have always been pertinent to thematic mapping, well before our saying so now. Our thesis comes as the map generalization research community is awakening to the importance of the broad world of thematic cartography lying outside of the topographic and reference map realm it has recently been focused on.

Classifying Generalization Techniques
The elemental techniques used in generalization of geospatial data or map symbols for geographic entities are termed operators [4], and numerous typologies of these have been defined throughout recent years. Most of these have emphasized geometry: different manipulations and adjustments of spatial data resulting in shapes or groups of pixels that lose undesired or illegible detail, or are arranged together on the map plane such that their essential, if not planimetrically true positional relationships remain illustrated. We use a relatively unique typology here, namely that of Roth, Brewer, and Stryker [3] because, in addition to providing clear definitions of each operator named, this typology approaches the suite of generalization techniques more holistically in that it includes thematic filtering, changes in graphic variables, and variations in textual elements, as well as geometric changes, as parts of the generalization process, all being constrained by map resolution and legibility [5]. While this typology is less exhaustive in defining geometric operators than others (e.g., [6][7][8][9][10][11]), it nonetheless recognizes that generalization and multiple representations are achieved not only by geometric manipulation, but also in close relation to the themes being mapped and the graphic resolution at which the map is being presented. Furthermore, it reflects how generalization and symbol and style design are intertwined cartographic processes. We reproduce the Roth, Brewer, and Stryker coding scheme in Table 1; for each example map that follows, we note which of the operators in the scheme are used to create that map across both thematic and basemap layers, by presenting this in a tabular "schedule" of operators. We identify operators in each schedule as either critical or incidental to the map's thematic message. Looking at the old map of the railways, it occurred to me that it might be possible to tidy it up by straightening the lines, experimenting with diagonals and evening out the distances between stations. This map has been emulated so frequently that most of the metro or subway rail systems of the world, as well as many overland systems, now use a similar schematized geometry with consistent rail line junction angles (typically multiples of 45 • ), rather than represent the planimetrically-accurate rail system in question, even when that rail system is relatively simple. It even inspired a specific field of automated map design, namely map schematization [13], and cartographers have developed automated methods to produce layouts like the Tube Map [14,15]. While the popularity of this map and its style likely has much to do with aesthetics, it also illustrates the power of elucidation that generalization furnishes when done successfully. Transit maps such as Beck's retain topological relations between stations and lines, which is all a typical subway rider really needs to know when routing themselves from one station to another [16]. Lines that connect the two end points, the points where those lines intersect and allow transfers between trains, and the waypoint stations along the route are more easily read from a map such as this because they mostly lie along easy-to-follow straight lines, rather than the often meandering real routes train tracks often take. Furthermore, since actual speeds over distances or directions moved in are not within the rider's control, providing these in the form of a planimetric map does not serve the rider in this use case, and qualifies as information that is best reduced or removed (i.e., generalized) in order to lighten overall cognitive load and thereby improve usability. In addition to obvious geometric differences between this map and the rail system it represents, content has been added as a basemap to contextualize the network (i.e., the Thames River), rail lines and station types have been reduced to consistent, typified symbols (e.g., junction stations are all diamond-shaped), and labels have been carefully placed to clarify stations along the heavily-simplified lines. Figure 1. H. C. Beck's London subway "Tube" map design, as originally published in 1933. This design differed significantly from previous designs in that it introduced a schematized (i.e., heavily generalized) geometry, rather than present the Tube rail lines with planimetric accuracy. Elimination (C−) plays a strong role in this map: much that Beck could have included, especially in the basemap about the London context, has been removed in favor of keeping the focus exclusively on the transit routes. Reclassification (Cc) and typification (Sf), also reflected in color-coded labels (La), are used to identify station types (i.e., interchange and not) and the lines to which stations belong; the reader quickly understands that switching from a tick-marked station named in green text to a tick-marked station in black text will involve passing through a diamond-shaped station that the green and black lines share. Most relevant to this map's notoriety, however, are the generalization operators that modify the rail lines' shapes to give rise to its schematized [13] character: displacement (Gd), simplification (Gs), and smoothing (Go). These operators are what take the planimetrically-correct rail routes and represent them in heavily abstracted, angular form, for the sake of yielding easily-read straight and sometimes parallel lines and orthogonal junctures. The Thames River in the basemap gets the same treatment so as to be useful as context for the rail lines.

Minard's Map of Napoleon's March
One of the most celebrated maps among information visualization enthusiasts is Charles Minard's 1869 map of Napoleon's 1812 march on Moscow (see Figure 2 and Table 3). A great strength of this map is in its parsimony: it illustrates a series of events over time and space with very little extraneous information, any of which is clearly removed to the background using strongly contrasting beige and black ink for the thematic information presented in the foreground. Symbols are kept to an extreme minimum in this map: contextualizing basemap elements consist of only a few selected and simplified rivers and labeled waypoints. The resulting clarity of the graphics presented, as well as the relatively simple east-then-west movement of the marching army, allow Minard to illustrate time and travel immediately and unambiguously. The greatly simplified trajectory makes canvas space available for wide variation in line thickness, being the graphic variable used to illustrate the main theme of dwindling numbers of soldiers. Heavy generalization in this map provides major support for its rhetorical power. This map has inspired research into automated flow map creation, a challenge which invariably has to use generalization to unclutter masses of trajectories [17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], and this map in particular has inspired in-depth analysis on visualization of time and movement [32].

Critical Incidental
Thematic C− Gm Gd Gs Sc Sz Similarly to the Tube map, this map's minimalism-extreme generalization-brings its thematic message to the fore. The relatively bare canvas allows Minard to focus reader attention clearly on the story of army numbers: we see easily how the size of the army began to dwindle almost immediately after it left its initial position, losing numbers to desertion even before arriving at the front. Much detail is eliminated (C−), especially on the nearly non-existent basemap, where only a few notes about river crossings and way-point towns are added (C+), though these are usually done exclusively with text (L+). In addition, like the Tube map, the trajectory of the army's movement is heavily geometrically simplified (Gs) and displaced (Gd) onto long, straight lines, which help to make very clear Minard's use of changing line width as a graphic variable for army size. In addition, geometrically important is the use of merging (Gm) to illustrate subsets of the army leaving or rejoining the main group. Especially critical to the thematic message of this map are adjustments in color (Sc), used to denote the army before and after giving up their advance, and size (Sz), dramatically illustrating losses in an easily-perceived manner.

Harrison's the World Divided
Richard Edes Harrison's maps for Fortune Magazine during in the mid-20th Century were famous for depicting topographic, demographic, political, and economic data around the world using unusual perspectives and map projections. One such map is The World Divided (see Figure 3, [33], and Table 4), set in a polar azimuthal equidistant projection, explained to the non-specialist audience as "the world centrifuged". Drawn for a US audience in 1941, during the period when the US had not yet entered WWII, the map had a strongly rhetorical and persuasive aim: to illustrate the geopolitical position of the USA and its allies, distributed around the world, as imperiled by any territorial advances by Axis powers, themselves distributed around the world (Harrison made a similar point a year earlier in [34]). The projection choice is a transformation that helps illustrate Harrison's point, but generalization in this map is another such transformation: this map succinctly tells a persuasive story because it generalizes away any particular details about the precise nature of relationships with the United States into a simple, ordinal classification of relative alliance. Simple connecting lines indicate nations with which the USA had relations.  The basemap here plays an important role in providing context for the thematic choropleth and connection layers, but its generalization, especially by eliminating (C−), merging (Gm), and simplifying (Gs) islands and landmasses, is incidental, being due to scale and resolution rather than to the definition of Harrison's message. For thematic information, Harrison's use of reclassification (Cc) is particularly important, since an important part of the map's message is to suggest that various territories are held by governments who are more or less allied to The United States in an ordinal ranking. Merging (Gm) and size adjustment (Sz) are present along the lines drawn to illustrate relationships to allied territories; splits and merges serve to group local territories (e.g., present-day Indonesia, then under Dutch colonial rule) along major arterial branches leading to relevant ports in the USA (e.g., Pacific routes meeting at San Francisco), while sea-route line thickness varies according to relative importance. Basemap labels (L+) are particularly important because the map projection presents a world layout unfamiliar to many non-geographer readers.

Geography Used by Gapminder
The Gapminder Foundation [35] has in recent years attracted attention for visually compelling illustrations of world health and economic data, partly due to engaging, Internet published talks given by their co-founder, the late Hans Rosling. Particularly famous have been animated versions of Gapminder's graduated-symbol scatterplots summarizing international levels of life expectancy and income (see Figure 4 and Table 5), where the video steps through time, beginning in the 19th Century and ending with today. The series of "bubble" graphs paints an optimistic general picture, where the trend across all nations is toward longer lives and higher incomes. Differences between countries are highlighted by partitioning them, both into clusters on the scatterplot plane and into continent-wide regions. The graphs are available online, where they are presented in a straightforward geovisualization interface allowing for querying and selection, as well as playback of time-series plots.  The Gapminder site is an outreach effort, and is designed for a non-specialist, public audience. In order to simplify their data presentation, Gapminder compiles its country-level data throughout the 19th, 20th, and 21st Centuries according to today's national borders. In their own words, "this is absolutely wrong, but it's necessary to make the animations easier to understand. If our animated bubble charts had displayed all the divisions of territories by splitting bubbles as they animated, the general movement would be much harder to follow, and we would have to manage a much more complex database" [36].
Because Gapminder's geography is more of a concept than a particular map, it has no basemap. Critical to the national analysis units are reclassification (Cc) into continental units, as well as aggregations (Gg) of statistics from subnational units. Collapse (Gc) and merging (Gm) of areas as they change national identities through historical border changes, as explained in the quote above, are also important for maintaining consistent national units through historical time; had Gapminder not done this, modifiable areal unit problem (MAUP) effects would likely be present in the time-series data and subsequent analyses.

Interactive, Multi-Scale "Slippy" Maps
Online maps offering interactive zooming and panning, or "slippy maps" as they are called in the industry, are almost certainly the most popular maps in the world today. A few commercial companies have the most notoriety for these, namely Google (e.g., Figure 5 and Table 6), Bing (Microsoft), Esri, Mapbox, and MapQuest, but popular free and open-source platforms or libraries abound as well, such as OpenStreetMap, Leaflet, and OpenLayers. These are often used both as navigational aides and basemaps for thematic data (see Figure 6). Because these maps allow the user to zoom (i.e., to interactively change map scale), generalization is critical to them, even if only for maintaining legibility. The generalization procedures undertaken by commercial providers are frequently not disclosed, but a few organizations do in fact communicate generalization functionality in their platforms. Mapbox [37], for instance, offers several explicit generalization procedures as options available to developers using their vector mapping platforms, including capabilities to filter features, simplify geometries using the Ramer-Douglas-Peucker algorithm [38,39], or modifying feature attribute values, all dependent on zoom level and consequent screen and map resolution [40]. Even if generalization is not always properly performed, it remains a way to improve these maps [41]. In particular, consistency between thematic data and the basemaps used to contextualize those data is crucial [42]; for example, the roads selected as the shortest routes between Paris and Amsterdam in Figure 5 must be retained and not eliminated in the basemap.
Slippy maps are themselves frequently used as basemaps, providing context for navigational or thematic overlays. Because these basemaps often cover very diverse parts of the Earth, many different operators might be observed across this wide category. Elimination (C−, L−) of map features as one zooms out is probably the single most important operator used in slippy basemaps, accounting for most of their generalization. Attendant to it is the addition (C+) of surrounding detail (e.g., land cover tints added around roads and points of interest) as features toggle on and off based on scale. When thematic information is overlaid on a slippy map, ordering features (Co) is often critical to ensure overlapping symbols do not obscure each other (e.g., a map pin over a route line). Simplification (Gs) is often critical to the legibility of complex shapes, both in the thematic and basemap layers.
The interactivity of these maps makes symbol-based generalization pertinent to their thematic layers. Enhancement (Se) is often seen, for example in drawing intersections between cased road symbols such that their casings don't cross the junction. Adjustments in color (Sc) are particularly common, with changes in hue often used to indicate feature selections made interactively; sometimes the same is indicated by changing shape (Ss), for example by going from a dot to a pin shape upon user selection. Label additions (L+) often attend such interactive selections.

Stamen Watercolor Map
A particular "slippy" interactive map, the Stamen Watercolor Map (available at http://maps.sta men.com/watercolor) is especially known for its artistic style (see Figure 7 and Table 7). Stamen is a design studio focusing on data visualisation; here, they use computer graphics techniques to mimic watercolor painting, characterized by translucent pigments that irregularly build up on paper. This results in fuzzy borders and margins. As a consequence, the watercolor-styled map symbols need more map surface area than more traditional vector symbols, and information density is greatly decreased in comparison to other, more typical topographic maps at the same scale. Only hydrography, roads, and forests are kept in the Stamen Watercolor Map, with roads having gone through a very selective retention process. The map has no point symbols, which would be difficult to render with a watercolor style. The fuzzy edges of colored areas naturally result in smoothed lines.

Critical Incidental
Thematic not applicable

Basemap C+ C− Gm Go Se Gc Gx Gs Sc Ss Sz St
The watercolor map has cartographic merit [43] in that within some level of accuracy, it is expressive, beautiful, and original; the smooth lines of the watercolor strokes mean that accuracy and precision are lesser and much more inconsistent than in typical maps, but there is undeniable originality and, to many readers, a great deal of beauty to the design. While presenting real-world roads, coastlines, and land cover in an image meant to be emotionally affective, the style is expressive. Part of the appeal of the map may be that it evokes hand-drawn art, having a "sketchy" or human feeling rather than a precise, machine-driven character [44]. Artistic styles such as this one enhance the emotional appeal of a map, which often increases its impact, and finally its use [45]. Cartographers have developed propositions to help map makers reproduce antique cartographic styles [46], or painting styles such as Pop Art [45,46]. Similar to the Watercolor Map, these styles have their own generalization requirements, often calling for even heavier generalization than classical map styles typically do.
The Watercolor map is made available by Stamen through an online API for use as a basemap; while it has its own aesthetic appeal, it typically is not used for its own thematic communication.
As with the basemaps seen in the slippy map example, addition of surrounding symbols (C+), this time with color fills to indicate land cover types, plays an important role in these basemaps, while elimination (C−) is probably the single most important operator used, here seen in a very judicious selection of roads and other linear features. Merging (Gm) and smoothing (Go) are particularly important in the watercoloring emulation here; separate but clustered instances of things like islands are naturally merged into larger masses by the blot-like technique, and perimeters and line strokes are inherently smoothed with the blob-like shapes of the watercolor "brushstrokes". Enhancement (Se), "graphic embellishments around or within a feature to maintain or emphasize feature relationships", [3] are used throughout, frequently by allowing important features such as selected roads to be symbolized with large brushstrokes. Unlike most basemaps, the Watercolor map has no labels.

Beccario's Earth
As with "slippy" maps, virtual globes need to generalize for viewing resolution on demand, since they offer dynamic zooming and panning and, frequently, changes in perspective or viewing angle. Likely the most famous example of these is Google Earth [47]. Another particularly compelling virtual globe is Cameron Beccario's earth (see Figure 8, [48], and Table 8), which continuously gathers oceanographic and atmospheric data to render scientific visualizations of these using colors, textures, and animated vector strokes. It allows users to select between views of various environmental parameters, as well as map projections beyond an orthographic one for a globe. Panning in each map projection continuously alters that projection's aspect (i.e., its central meridian, areas shown and occluded, and its areas of relative distortion). In any projection and at any zoom level, symbols for atmospheric and oceanographic phenomena are recalculated and rendered according to the user's screen resolution, and thus generalization happens on-the-fly.
This map makes very minimal use of a basemap: there are landmasses and little else, with small islands having been eliminated (C−), and resolution-based incidental simplification (Gs) of coastlines (clearly visible while zooming on the map). Most of the generalization present happens with the many small, animated strokes used to represent the moving thematic oceanic or atmospheric fluids, and with colors used to classify measures these fluids. Aggregations (Gg) of strokes, when they flow into each other in an area too small to show individual strokes, play a crucial role in keeping the moving patterns legible and uncluttered. The strokes form areal patterns that are frequently adjusted (Sp) to maintain relative densities across the map surface. Reclassifications (Cc) and adjustments to color (Sc) are important to illustrate relative magnitudes of themes such as wind speed. Table 8. Schedule of operators for Beccario's earth.

Critical Incidental
Thematic Cc Gg Sc Sp Basemap C− Cc Gs Go La

Hennig's UK Election Cartograms
Cartograms are maps where areas are distorted to convey some variable. The technique is usually used for thematic cartography [49]. A striking example of the visual power of cartograms is the 2019 British elections map by Benjamin Hennig (see Figure 9, [50], and Table 9). The non-distorted map suggests an overwhelming win for the Conservative Party (in blue), but a very different story is illustrated when the geographical areas are distorted such that Parlimentary constituencies have equivalent plotted area: in this view, the Conservative electoral win is seen as more moderate in proportion. In these maps, we consider the borders of the British Isles as the basemap, with the Parlimentary constituencies mapped inside these borders as the thematic information. Generalization, by means of content and geometric transformations, plays a clear role in the clarity of this cartogram. An important goal of most map generalization is to preserve the geography (i.e., shape and form), despite geometric transformations. In this case, the stacked hexagons still recognizably represent the United Kingdom because shapes and topology have been largely preserved. The essence of this map is the preservation of the shape of the British Isles (the basemap) while transforming the thematic information for clarity; both are critical, as shown by the cartogram on the right where geography preservation is much less convincing, making this map less readable than the one with hexagons. This use case is interesting because it illustrates the differences between a map without generalization (left), with good generalization decisions (middle), and with worse generalization decisions (right, without preservation of recognizable shapes). Cartograms represent a unique map transformation. By their nature, they exemplify exaggeration (Gx) and adjustments of size (Sz) to make their thematic points, across both basemaps and thematic layers. The adjustment of shape (Ss) follows as an incidental consequence of the resizing. Hennig's two cartograms presented in this example also rely heavily on reclassification (Cc) of electoral constituencies across the UK political parties, aggregating (Gg) votes to that geographical unit. In addition, important to the center map among the three presented is the adjustment of pattern (Sp) into a lattice of hexagons in such a way that it still resembles the un-distorted United Kingdom and preserves some of the topological connections and relative proximity relationships of the electoral units. The lattice in the middle map has a kind of clarity that neither the planimetrically-accurate nor the continuous cartogram have, in that it represents the many electoral constituencies in the UK-being discrete and generally uniform things in the political world, but irregular things in real space-as uniform, patterned, individual hexagons that can be easily seen one-by-one or in collection. Cartograms in general, and their automated generation or generalization specifically, are areas wherein further research would be welcome.

Counting Operators and Going beyond Operators
Our map examples vary widely-most are particular maps, while "slippy" maps are a whole category-and have been chosen from subjective, personal experience. Statistical frequency analysis on our operator counts would be contrived because our sample is not random and we cannot be certain that it is a representative sample of the very diverse world of thematic cartography. We have, however, tried to observe different rates of operator use across basemap and thematic layers in each of our maps, and have ranked instances of operator use in terms of whether it was critical to the thematic message, or simply incidental. Our counts are summarized in Table 10, collecting the generalization "schedules" that we have identified throughout the preceding discussion as involved in making each map.
While almost all operators in the typology are observed at least a few times across our sampled maps, we note that content and geometry operators are most frequently and consistently used in ways critical to the map message. Reclassification (Cc), aggregation (Gg), merging (Gm) and simplification (Gs) are observed to be most frequent among our map examples. Elimination (C−) is also frequently observed, though mainly on basemaps. This makes sense, since reducing features on basemaps to a relative minimum, while still keeping the basemap informative enough to provide context for the thematic layer, helps to accentuate the generally more content-rich thematic layers. Reclassification is probably frequent because many thematic maps have ordinal or numerical information to illustrate, while aggregation, merging and simplification are geometric operators often driven by reductions in map scale. Generalizations of symbolization are applied less consistently across our examples, but it is interesting that they are observed more frequently in the interactive setting of slippy maps. These operators would appear to lend themselves more readily to driving visual feedback for user interaction, such as when map features may change colour or shape to indicate a selection made by the user. There appears to be no real pattern in the instances of labeling generalizations across our maps, except that label additions (L+) in slippy maps frequently attend the symbolization generalizations seen there.
Generalization researchers have developed a wealth of theory that readily applies, beyond the reference cartography on which it was mostly developed, to thematic and navigational mapping. Work done on conceptualizing the transformative process of generalization, such as controlling it by a system of constraints [52][53][54][55], or using machine learning and agent-based modeling to drive it [56,57], are easily ported to automated thematic mapping. Measurements of the content of a map and how this can relate to making multiple representations [58][59][60][61] also apply just as easily to thematic maps. Evaluation techniques developed to consider generalization procedures [62][63][64][65][66] likewise apply, since 'what is a good generalized map?' easily extends to 'what is a good map?' For instance, the UK election cartograms show two different thematic maps of the election, and one is clearly a 'good map' with respect to legibility and clarity of message, while the other is not. The clutter and the poor shape preservation of the second cartogram could easily be measured using the evaluation techniques developed for map generalization. These and other gains made by the cartographic generalization research community while treating reference data can be brought to bear on issues in thematic cartography, including the visualization and sense-making of massive, heterogeneous datasets.

Conclusions
We have tried to illustrate an obvious but frequently overlooked point: that generalization is ubiquitous and critical in all cartography, and by corollary that it is an important aspect of the highly popular thematic mapping currently capturing public and otherwise non-cartographer attention. The map generalization and multiple representation research community will be missing a vital opportunity if it remains focused on reference mapping to the near exclusion of thematic visualization, as it has in recent years. These examples show that map generalization in thematic maps is not limited to the basemap, but also critically applies to the thematic information displayed on the map. Thus, this is not a simple change of use case that is required (i.e., generalizing the basemaps of thematic pieces), but a more general orientation of automated map generalization principles to the way thematic information is displayed in thematic maps. The field has much to offer in terms of theory and practice, and itself stands to benefit from applying and testing its achievements on a wealth of new, thematic data streams and types.
Author Contributions: All aspects of the work were shared by all authors. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.