Exploring Life in Concentration Camps through a Visual Analysis of Prisoners’ Diaries

: Diaries are private documentations of people’s lives. They contain descriptions of events, thoughts, fears, and desires. While diaries are usually kept in private, published ones, such as the diary of Anne Frank, show that they bear the potential to give personal insight into events and into the emotional impact on their authors. We present a visualization tool that provides insight into the Bergen-Belsen memorial’s diary corpus, which consists of dozens of diaries written by concentration camp prisoners. We designed a calendar view that documents when authors wrote about concentration camp life. Different modes support quantitative and sentiment analyses, and we provide a solution for historians to create thematic concepts that can be used for searching and ﬁltering for speciﬁc diary entries. The usage scenarios illustrate the importance of the tool for researchers and memorial visitors as well as for commemorating the Holocaust.


Introduction
The Bergen-Belsen concentration camp was initially built for Jewish hostages. Later, it was also used to imprison people from other camps who were considered unfit for work, as well as prisoners who were to be sent to other concentration camps in northern Germany for forced labor. During its existence (1943)(1944)(1945), a total of around 120,000 people were imprisoned on site. At least 52,000 of them died, mainly of starvation or disease. In Bergen-Belsen, the prisoners of the so-called exchange camp were destined to be exchanged for Germans from abroad or important goods for the war industry.
Therefore, and to disguise the general purpose of the concentration camps, these prisoners were allowed to keep some of their luggage, and living conditions in the exchange camp were somewhat better. Still, taking notes or even writing diary entries in a concentration camp was an act of resistance, as it was a way for prisoners to cope with camp life and to provide evidence of their treatment. In general, this was only possible under certain circumstances.
Usually, the authors feared discovery and punishment by the SS and therefore kept their records secret. Thus, we do not know how many diaries were actually written in the Bergen-Belsen concentration camp. As of today, about 40 documents-classified as diariesare known to the Bergen-Belsen Memorial. In contrast to this, one of the large issues of historic reappraisal of the concentration camp is the lack of (official) documentation. Shortly before the liberation of the camp by the British Army, the SS burned all official documents of the camp, but because of the secretly written and well-hidden nature of the diaries, the SS could not destroy these texts, making them essential in documenting, e.g., the names, structure, conditions, and developments of the camp.
Contrary to their importance, educators and researchers do not regularly and frequently draw on these sources, because the texts are difficult to access and use. They are

Related Work
We present a tool that visualizes historical diary data, which places the paper in a visual text analysis context [5,6]. More specifically with regards to the data resources, we discuss related works in the category of historical visualizations. We show examples of visualizations of war victims and various projects that have applied visual analysis tools to diary data. On the technical visual side, we include visualization systems in the form of timelines and calendar views, as we also use these forms of representations to provide a quick overview of the data.

Timeline Visualizations
Changes concerning time are often visualized with timeline visualizations [7]. This type of visualization allows for engaging story-telling [8]. Examples of timeline visualizations include Joseph Priestley's Chart of Biography, which shows the lifetimes of 2000 famous people in mathematics, physics, religion, and more between 1200 BC and 1800 AD [9] or its more recent implementation as an interactive visualization [10].
Some timeline visualizations use a space-filling algorithm to place the lines compactly, which is useful for quantitative analysis of events, for example. Others, such as the Chart of Biography, use stacked timelines that divide the dataset into different classes. This allows for easier tracking of trends and tendencies within a category. While the visualization mentioned above categorizes by occupation, we used a simple categorization to track a single author through our time period.
Other systems that show categorized changes over time focus on, e.g., the popularity of people mentioned in news articles [11] or changes in the value of stocks [12,13]. Today, many variants of timelines exist, such as the ThemeRiver [14,15]. A detailed analysis and overview of possible timeline visualizations can be found in Brehmer et al.'s survey [16].

Calendar Visualizations
A comprehensive overview of different calendar visualizations was published by Hartl in the form of a diploma thesis in 2008 [17]. In the report, Hartl discusses the advantages of digital calendars over analog calendars, tasks, such as analysis and organization of data in 2D and 3D, and the development of calendar-based tools over the years. A great advantage of digital calendars is the possibility to include visualizations, e.g., to obtain a better overview of appointments.
Many software tools offer calendar-like views, but we focus on those that provide visual analysis capabilities to their users. Related to our work are calendar views that make use of a circular layout to map and highlight recurring patterns across multiple seasons of temporal data [18,19].
An early prototype of our work employed a similar principle but was later discarded primarily because of two drawbacks in our particular case. First, this layout places a strong focus on recurring data, making it difficult to provide insight into non-periodic patterns. Second, traditional calendar views are more intuitive because most people are used to them and their layout, which reduces the inhibition to engage with the tool.
More traditional calendar visualizations use coloring to visualize univariate data. These works showed examples of how such a design, paired with a graph, can provide insights into patterns and trends at the temporal levels of years, months, and days [20,21]. Other works merged the calendar view and graph into a single window to provide insights into daily-level data in addition to organizing events [22].
While the graph and calendar share the time component, no further connection between them is mandatory here. Still, it shows a way to link different visualizations in a single, easy-to-understand view. A last example of the use of calendar views can be found in the industrial domain. Here, calendar visualizations are used to show production patterns and provide insight into data anomalies, showing dates where automated processes may have caused problems and need to be re-evaluated or the products reviewed in detail [23].

Visualization of World War II Victims
A popular approach to visualizing World War II victims are map-based visualizations. They help the user to understand the state of the political world during the time period depicted and make it easier to grasp the distance between locations. This is used by several map visualizations that show deportation routes. Examples are provided by the World Holocaust Remembrance Center Yad Vashem under the heading "Transports to Extinction. Holocaust (Shoah) Deportation Database". They provide an interactive, linked view of the timeline and map showing the sources and destinations of deportations for each year between 1939 and 1945 [24].
Other works add graph-based route visualizations adding a temporal order to each intermediate step. On a small scale, the University of Osnabrück visualized a single individual, such as Hermann Helfgott (Zvi Asaria). They included a story told through annotated photographs and text showing the route consisting of a lifetime of escape, imprisonment, and deportation [25]. A small subset, in this case of prisoners from the Auschwitz concentration camp, was visualized by Boern [26].
A larger scope was taken by Martin Gilbert's famous work of the so-called "Atlas of the Holocaust", which offers insights into a more global structure of deportations [27]. In addition to depicting deportation routes, other works focused on the concentration camps themselves. A visualization by Christoph Rass tailored for historians attempted to provide new insights into existing data through visualizations. At least for German historians, this had a powerful impact and initiated a shift in how visualizations are being used to open up new research questions about existing data. Examples of his visualization work also use a map that is animated over time.
The data is based on the Totenbuch (death book to document which prisoner died and when) from the "Mittelbau Dora" concentration camp, where prisoners were abused for forced labor for the German war industry. For each entry in the death book, a single glyph is located in that person's birthplace, representing the passing of a prisoner. Thus, the visualization provides information about the relationship between a prisoner's origin and the time of his or her murder and reveals noteworthy patterns [28].
On the international level, research was conducted by historian Tim Cole in collaboration with Stanford University. Cole showed visualizations of the Budapest ghetto and maps showing the temporal evolution of the camp's foundation and expansion. According to his publications, the visualizations with the title "The Evolution of the SS Concentration Camp System, 1933System, -1945 Such emerging questions demonstrate the expansion of traditional research through digital approaches and visualizations. The challenges of using historical data, such as incomplete data and temporal uncertainties, have been discussed as well [29]. Another visualization under Stanford University's banner offers insights into the "Arrests of Italian Jews, 1943-1945", using a map, time slider, and scatterplot [30]. Finally, a 2021 publication by Ehmel et al. provides a visualization of violence against Jews in Germany between 1930 and 1939, also using a combination of map and glyphs to represent different types of violence against individuals, businesses, and institutions [31].

Visualization of Diaries
Diaries, as a form of self-documented text, can provide a glimpse into the daily life at the time that they were written. In addition to facts and events, they also contain the emotions and thoughts of the author. Visualization of diary data has proven useful for a variety of analyses. Examples include cultural research on specific time periods. For this purpose, several papers by Toledo et al. allowed the visual analysis of Japanese diaries from the Heian period (12th century). Through these visualization tools, important access and work could be done on this period of Japanese history. In their interactive web tool, they provide stacked-graph visualizations for inspecting time series [32].
Furthermore, a more structural pattern analysis can be performed, as seen by Vrotsou et al. They presented a tool for discovering unexpected and interesting patterns for social science diary data. Here, visualization is coupled with an automatic pattern mining algorithm for automatic pattern extraction [33]. Other diaries help provide insights into travel data (spatial, temporal, and attributional correlations) [34], time use [35], and what and when contact occurred between people [36].

Dataset and Preprocessing
As increasing digitization projects are successfully completed, the amount of available digital data increases. Digitization itself can be very beneficial and important. Examples include preserving knowledge from decaying sources, such as paper or other organic materials, and being able to work with this data safely without risking stress or even to damage sensitive objects. While the mere availability of such data is important, the focus of research has shifted from the provision to the analysis and accessibility of digitally available resources.
In this move, the Digital Humanities emerged as a handshake between the humanities, educational professions, and computer scientists. This interdisciplinary collaboration enables pipelines from digitizing resources to making them accessible and explorable to large audiences. Following this, we formed a core for our project consisting of a historian and two computer scientists. The historian works at the Bergen-Belsen Memorial, located at the historic site of the Bergen-Belsen concentration camp during World War II, and is also responsible for the public relations work of the memorial.
The two computer scientists work at the universities of Southern Denmark and Leipzig and both specialize in visual (text) analysis. The project arose from the need for a comparative diary tool suitable for visual data analysis tasks. This is motivated by the fact that increasing time has passed since the events of World War II, and thus more generations are no longer familiar with World War II and the concentration camps in particular. Therefore, it is crucial to show the system of persecution and dehumanization as endured by the victims and their struggle to survive.
For this purpose, we offer several ways of viewing the textual data of diaries describing life in the concentration camp. In addition to the traditional close reading of the digitized texts via the web browser, we enable distant reading approaches. These help to obtain an overview of when and how regularly entries were written in the diaries, which terms and topics were relevant in which time periods, how the total amount of texts written by the authors about their time in the camps developed, the moods of the descriptions (which days and events were described with positive or negative words), and finally to enable a comparison between the authors.
While the Bergen-Belsen Memorial has a large amount of textual resources, only about 40 documents are classified as diaries. In some cases, this classification is difficult. Some texts resemble a diary-a text divided into separate parts, each describing events of a certain date-but were written at a later date. This makes a fundamental difference in the perspective offered by those sources. Texts written after a person's liberation are skewed by the knowledge that he or she survived the ordeal and may be influenced by the knowledge he or she gathered after liberation.
The diaries themselves show the exact information that was available to the prisoners at the time of writing and their struggle to deal with the emotional and physical stress caused by not knowing when the liberation would take place or if they would survive at all. Some texts remain untranslated, which makes it more difficult to check and categorize them as diaries for this project. Of the 40 texts categorized as diaries, the authors have fairly inhomogeneous biographical data.
For example, the ages of the authors, who are mostly male prisoners (73%), range from 11 years (Jovan Rajs-not included) to around 60 years (Szidonia Devecseri-not included), and the languages reflect the nationalities of the prisoners (14 Hungarian, nine Dutch, eight German, five Polish, three Hebrew, two French, and one Czech). At the moment of writing, German translations were available for 12 of these texts. In addition, the diaries were written by people in different parts of the camp. Most (38) known diaries were written by prisoners from the exchange camp, where conditions made taking notes easier (prisoners could keep part of their luggage and belongings), although it was still illegal to write texts.
Today, some of these diaries (13) have been distributed through different publishers, either as digital or printed versions. Furthermore, 29 diaries are available at the memorial in their entirety, the rest only exist as excerpts. From this corpus of 40 diaries, we decided to focus on the texts that we can safely classify as diaries and that are also available as German texts (in the original or through available translations), resulting in 12 diaries. Among all of these chosen sources, the first date described is 20 October 1943, and the last is 30 July 1945, 2 months after the end of World War II in Europe.
Despite some entries being written after the liberation of the concentration camp and thus not being within the expected range of data, these entries are strongly influenced by the time spent in the concentration camp and help to capture the significance of the liberation and the days that followed. In these cases, we decided to include the later texts. In terms of content, the available diaries come from people with different backgrounds and views. These include a staunch Communist from Yugoslavia [37], a Polish agent for Jewish organizations [38], a family focus from the point of view of parents [39,40], spouses [39,41,42], and children [41,43], as well as descriptions of people in the fields of law [44], psychology [45], and medicine [42].

Preprocessing
Each diary was offered as a digital copy without a common standard. Some were offered as PDFs in camera-ready versions, others were freshly scanned from the original documents for this project and did not even have OCR extracted data. Others were offered in various .doc formats. The first step of preprocessing (as seen in the left part of the process illustration in Figure 1), was a simple scan and OCR process to obtain text data from each document.
Next, we deleted all added footnotes from the published versions but kept inline additions (as some of the authors coded certain names and terms out of fear in the texts) that were needed to understand the sentences ("Manual Correction"). In the next step, we created one file per author with lists of the texts. Each resulting entry begins with the specific date, followed by the text (which allows for easier access to metadata later on). Since some diaries already start their texts with the date, but not all, we decided to put the date at the beginning in a consistent format to allow for automatic processing of the texts.
The original date was left untouched as text. This leads to example lines, such as this: "16.08.1944: BB. 16.8.1944. ...Mein Inneres ist wie erstarrt, und ich fühle, wie..." After formatting each text into a consistent structure, we used TextBlobDE's sentence algorithm for sentiment analysis to create three sentiment collections for each author and each day: a value for the entire text on the date, a list of sentiment values for each sentence on each day, and finally for each word on each day ("Sentiment Analysis"). These steps were performed once and saved to disk. In real-time, each text was then tokenized for the tag clouds, and stop words were filtered ("Metadata Addition").

Visualizations
To develop a useful tool that leverages both distant reading and the close reading of texts, we followed the mantra of Shneiderman [46]. In a DH context, this leads to a distant reading approach as an overview and close reading as a detailed view of the data on demand. To this end, the initial state of the tool gives a view of the data with a calendar-like visualization as well as a timeline.

CalendarView
The first thing the user encounters on the page is the calendar (see Figure 2), which shows the three years 1943, 1944, and 1945. For each year, each month is represented as tiles of days. Each day is, in turn, tiled with the number of authors who wrote diary entries on that day. This tiled and colored calendar gives a rough overview of the distribution of dates, showing which days have diary entries and, to a lesser extent, which author wrote when. As a side note, important dates can be marked in this view. In the figure, a red border can be seen around 15 April, which is the day of liberation for the Bergen-Belsen concentration camp, and 8 May, which is the official date of the end of World War II in Europe.
The calendar is complemented by the timeline, which provides further access to the distribution of dates over time. This helps to better explore patterns in the writing activity of authors. A linearly distributed and slim-shaped form shows when an entry was written. Each author is positioned on a separate line, which helps reduce the clutter and visual overload encountered in the calendar. This additional view is especially helpful for less global questions, such as analyzing the writing activities of a single author. At the bottom of the page, a legend shows the color mapping between diaries/authors and color in all visualizations. The size in the legend indicates the number of days described by the authors. They are also sorted in ascending order by their first date of writing.

Tag Cloud
While the visualizations mentioned above provide an insight into the purely temporal metadata of the diaries, a distant reading view of the texts themselves can be acquired by toggling the tag clouds. These are available for the month (see Figure 3) and year (see Figure 4) levels by clicking on either the free space or by pressing the two buttons below the year label. These tag clouds use all tags from the underlying data. After removing stop words, the frequency of tf-idf is used to map a tag to its size. Thus, a tag is large if its frequency in that time window is higher than in the others. Using a normal frequency would result in tags, such as "work", "warehouse", or "barrack" being dominant in almost all tag clouds.
While this would depict the true word count, it does not show useful information due to its consistency. An alternative would be to tag such terms as stop words. However, we still believe that such terms are of high value, and the use of a tf-idf frequency mapped to size allows these terms to still be displayed if they are particularly important for the time period as opposed to all the others. Thus, although all prisoners were forced to work in the camp, the term can also be seen in times when work was particularly often written about.   Other interesting tags found by this frequency were "Waggon" and "Luggage", indicating months when deportations took place. Furthermore, clicking on a tag leads to a close reading of the diaries as contexts in which these terms were used. This allows for further investigation of ambiguous or unknown terms, as well as a bar chart showing the use of the term in question over time, which is not visible in the tag clouds due to the tf-idf frequency. Finally, a more general inspection of tag occurrence can be done by hovering over a tag, which will cause the same tag to be highlighted in other tag clouds, as seen in Figure 5. Hovering over a tag in one cloud highlights all other occurrences in the other clouds. In this example, the term "Luftalarm" German for "air raid warning" was found in three different months of 1944.

Close Reading Text
The system provides further access to the raw text material. While the tag clouds provide insight into common terms, a click on the days in the CalendarView provides access to the digital version of the authors' diaries. In this view, all authors of the current selection are listed with their texts or a short note if no text was written on that day. For navigation, long texts can be scrolled down and at the top of the view, two buttons allow navigation to the previous or next day if one wants to continue reading or compare the text with the next or the previous one.
The displayed text is digitized but left in its original state as much as possible. As a result, abbreviations in the text, spelling and grammatical errors, and incomplete sentences remain in the texts. In the case of published diaries, annotated information from the editor is included, but footnotes were removed. This helps with texts that are particularly difficult to read due to the use of unfamiliar terms or abbreviations (such as the use of only one letter or nicknames used by prisoners as a safety measure in case the hidden text is found and read by guards).

Filters
A basic feature for working with such a text corpus is a means of filtering. The filter is paired with the legend. Clicking on the name or shape of an author in the legend selects or deselects it. Immediately after the click, the visualizations are rendered with the modified text corpus.

Concepts
In addition to a filter functionality to select the authors of the corpus, a content-based filter was considered to help access concept-specific information on camp life. Therefore, we introduced a "concept" filter to help define topics of interest. We provide a separate tool for creating such concepts (ConceptTool- Figures 6-8). Here, the user can enter a name of the concept (e.g., "weather") and a word to start the concept creation with (e.g., "rain"-German "Regen").
The user is then presented with a tag cloud that displays all co-occurrences (filtered by stop word) in all texts in the corpus. Frequency is mapped to size and the top N (where N is user-defined and defaults to ten) tags are highlighted in white. The range of co-occurrences can also be manipulated with a slider (the default is three). Next, the tag cloud is used to add relevant other words to the concept.
In the first step of the weather concept creation, starting with "rain", we can find "wind", "cold" and "weather" as high-frequency terms and additionally "sunny" as a low-frequency term. Clicking on these terms adds them to the concept tokens above. After selecting all relevant terms from the tags, the user can use the current concept to recalculate the tag cloud. Instead of using the starting word and all co-occurrences of it, all co-occurrences of all words in the concept tokens are now being used, resulting in a better-populated tag cloud and allowing access to new candidates. In addition to this iterative concept creation, the user can also freely type in words to add to the concept, which is helpful when clear ideas of the concepts under investigation already exist.
After the concept creation, the concept can be exported to the diary visualization tool, which results in filtering all texts to use only those that have at least one token of the concept per tag in the text (an example is shown in Figure 9). Thus, we can focus at the texts that deal with the description of the weather, which is closely related to the roll call where the prisoners had to wait and endure the weather conditions for hours.

Sentiments
From the memorial's perspective, shedding light on the sentiments in the written text is another helpful and new task. Showing the sentiments of the texts can be an effective way for a user to grasp the meaning of living in a concentration camp. While daily life in the camps was characterized by strict rules, obedience, and violence, prisoners sometimes used more emotional language in their diaries to channel perceptions and reactions, which would have been dangerous to express openly. Strongly emotionally colored descriptions can help counterbalance the destructive and threatening situation in the camp by highlighting positive aspects and experiences, however minute.
Discussing and reflecting on such aspects helps to understand that the linguistic definition of an emotional state is usually learned long before the time of imprisonment. Diary authors use language and its tonality that is familiar to them; it is the specific context that frames the different values of words that often cannot be gleaned from emotion. For now, we offer two modes of visual analysis of sentiment; The first colors the daily tiles by the mean of the sentiment of all the author's texts, resulting in a more neutral coloring (yellow). The few positive (green) and negative (red) tiles show days with a significant dominance of one feeling among all authors and may be of interest to analyze and link to events causing this dominance. The second mode is colored according to the sum of the calculated sentiments (in the range −1 as negative to +1 as positive). Thus, more neutral sentiments are weighted less, and even if only one author used strong negative sentiments, it is displayed.
This results in a much more mixed color scheme. When the user has come across an interesting daily tile, clicking on it leads to the view for close reading. Instead of the simple listing of all texts as before, the texts are now rearranged by sentence, with each sentence on its own line. We now also see three more sentiment calculations (besides the already seen coloring of the daily tile depending on the mode). First, the sentiment calculation of the text for each author of that particular date (background of the heading); second, the sentiment of each sentence (background of the sentence); and finally at the word level for each word (background of the word).
For the word level, white coloring means that there is no sentiment available (since that word is not included in a sentiment list). It is important to note that, while word-level coloring is easy to track in the view and is highly meaningful, it lacks the contextual information that is included in the other calculations. In particular, the use of "very" and "not" results in altered sentiments. Thus, a sentence may have negative sentiment, while the word level provides only positive sentiments preceded by "not". Finally, a mouse hovering over a sentiment displays a small tooltip indicating the calculated sentiment in a range from −1 to +1 to better compare results with a similar color.

Text Volume
The final feature for visual analysis is a heatmap that is integrated into the Calen-derView and can be selected at the top of the tool. The heatmap uses a sequential color map from white (no content) to red (lots of content) to show the amount of text written daily. Two modes use this approach to visualize text volume. The first mode colors the relative frequency of the number of diary entries for each day. This corresponds to the number of authors describing the day and also to the number of tiles in the default "colorful" mode. Therefore, days with high writing frequency are easy to find.
The second mode colors according to the amount of written text (number of words). Therefore, in dark red, we can see such days that have a high number of words compared to the other days. This is not the same as the number of authors writing (mode 1). This can be seen at, e.g., 17 June 1944, which is the day with the most amount of text. Then, only one author wrote more than 8700 words (by hand and without normal writing instruments), which corresponds to the amount of text in Sections 1-6.2 of this paper (hence, far more text than has been read in this paper, right now).
On that day, Jenö Weiczner took up writing about his journey and that of his family and thus began to describe the events since March of that year. Like many other authors, he wrote in pencil and sometimes in ink. Weiczner had a few small notebooks in his luggage, in which he then wrote his diary entries, filling a total of seven notebooks.

Use Cases
While much effort has gone into adapting the tool to the needs of historians and mapping their traditional workflows with digital processing, the value of the tool is not meant to be only on a theoretical level. To this end, we conducted several sessions in which we observed user behavior and how they used the tool for their own research interests. We describe the users' tasks based on Munzner's task classification [47].

Lookup
The interface first supports looking up diary entries. This is also perhaps the task most related to traditional humanities practice. The historian among us wanted to obtain more information about the event of the departure of a group of people from the "Hungary Camp" (camp with mostly Hungarian prisoners). Accessing the data needed was very easy, as both the location in the visualization and the goal of the search were known (class of lookup tasks). The historian knew that the group left on August 18 and clicked on the corresponding day tile in the calendar view.
The close-up view shows diary entries by Gitler and Koretz describing the number of prisoners, the changes that accompanied the departure, and more. Thus, checking the date of events in the camp in the diaries provides an opportunity to see what an individual prisoner or group of people knew about things that were going on in their own immediate environment or in different parts of the camp. In this example, both authors were imprisoned in the neighboring subcamps on either side of the Hungarians' camp.
Moreover, starting from a particular keyword relevant to the event under review, one can follow the entries of the following (and perhaps preceding) days to learn more about that particular aspect. While this trivial use case hardly motivates complex visualizationthe required text could also be found in a trivial Excel spreadsheet-this search and find functionality is important as a basis for the more complex use cases. Moreover, this use case can also be extended to perform deeper analysis, such as examining the sentiments associated with the departure of the arrival of the new prisoners.

Locate
A similar task is the retrieval of specific diary entries. Here, an event of the establishment of the so-called "Frauenlager" (Women's camp) is to be traced. Our historian remembered that it was founded "somewhere in the second half of August 1944" and consisted of tents erected by work-groups. Thus, we obtained three pieces of information: an uncertain date (1); related terms (women, camp, and tents/camp) (2); and built by prisoners from other parts of the camp (3). To find the desired information, the first step was to access the Concept Tool (see Section 4.2.2).
With this, a simple concept-consisting of women and camp-was created. For simplicity, we ignored the terms tents and tent camps because we were curious if the concept was strong enough without them (which turned out to be true). Therefore, we filtered the day-level data to only those that contained these terms. By using (2) and (3), at least one of the authors of the diaries might have been part of the assembly group for the tent camp, or at least knew about it and described it in the texts. This step led to a reduction of the texts and entries in the CalendarView.
As a result (seen in Figure 10), the visualizations for 2, 4, 8, and 10 August showed blank days without any reference to the concept. In contrast, we got access to a small number of texts for the other days that likely addressed the topic being searched. With this reduction and the uncertain date (1), we were able to make a guess and clicked on 16 August, as the subcamp should have opened in the second half of the month.
For this day, the author Koretz writes of an additional transport of women arriving. Thus, on this date, the Women's camp was already built and inhabited ("additional"). A jump back to 11 August (the day after the first day without naming the terms) again showed a text by Koretz about how the preparations for the Women's camp changed the daily tasks. At this point, we learned that the construction of the Women's camp had already begun in early August.
Another look at the filtered texts for the dates before that led us to successfully pinpoint the beginning of the Women's camp: On 6 August, Koretz describes the beginning of the construction of a new camp with tents. The arrival of the prisoners could be determined through texts by the author Gitler, who describes the first arrivals on 15 August. Since there is no official documentation of the Bergen-Belsen concentration camp (as all available documents were burned by the SS prior to the camp liberation), such approaches to the temporal location of events are a common practice for historians.
Whereas, traditionally, the researcher would have had multiple sources available on his desk and would have had to painstakingly search for the days of August 1944 in each of them (and then also read through and look up all the diaries describing the events), with the help of the tool, we reduced this task to reading only the sources relevant to this topic (only two authors using the terms in this time period) through the concept tool and also helping to provide quick and easy access to texts of specific dates.
Having found the necessary information so surprisingly easy and quickly, our historian wanted to continue and began to narrow down the end of the Women's camp. She knew that it was later destroyed due to heavy storms, which led to the integration of the prisoners into the more durable barracks. Therefore, we added the terms "tents" and "storm" to the already constructed concept of the Women's camp.
Again, she had an approximate date in mind and focused on November 1944. Figure 11 shows the resulting written texts for November 1944. Since we found most of the month described as stormy, an iteration over the first few days of the month resulted in a false positive on 1 November (while Weiczner describes bad weather conditions, it is a description of ongoing conditions over the previous two weeks, not the current weather).
Over the next few days, we were able to follow the description that the weather was becoming more severe (Koretz on the 6th), e.g., as indicated by "terrible storm", "tent camp collapsed and flew in all directions" and "pouring rain". Finally, Zielenziger describes the last moments of the Tent camp and the conditions in which the prisoners had to live with "Words cannot describe the chaos in the cold rain."

Present
In a conversation with an employee of the memorial for public relations, we learned that the Bergen-Belsen Memorial regularly publishes excerpts from the diaries on social media (https://twitter.com/belsenmemorial (accessed on 8 December 2021)). These are usually published under the label "on this day" (#otd) and contain text written on the current date (day and month) in 1943, 1944, and 1945. Due to the time-consuming nature of this task, it was done once a month by two volunteers (1).
They went through the corpus of diaries looking for the time period needed (2), wrote down the texts and presented them to the public relations team (3), who then held an editorial meeting with quality control (4) of the typed texts, a discussion on which texts were useful for presentation (5), and finally edited the texts to comply with social media rules and guidelines.
Although not planned (and not even noticed by us for a long time), the tool has supported and improved this workflow in the Public Relations team at the memorial. Now, they use the calendar view to obtain a quick overview of days when criteria, such as the amount of text or authors, seem promising (2). Sometimes they also use the concept tool to see if the current month's texts contain terms that also appear in today's news (political events, weather, ...). This helps greatly to increase the relevance of the posted texts to the audience (5) and to reduce the workload for the Public Relations team (4). Moreover, the time needed to create such posts can be greatly reduced. For example, this can be done quickly, conveniently, and directly by a member of the Public Relations team, reducing the time and number of people needed for this task (1).
Furthermore, the digital availability and duplication of text allows for a reduction in potential typographical errors (3) and (4). It has been reported to us that this reduced effort has greatly increased the number of days that such a post is now posted online. Whereas previously, the entire process took place once a month and required several people and hours of work, it can now be done with a few clicks and in a few minutes. Figures 12 and 13 show two recent examples of such posts-in this case, two tweets.

Discussion
Since this is a project with close collaboration between computer scientists and historians, the design and development of the tool was heavily influenced by both. Since the tool is used by historians, as well as by casual users (e.g., visitors to the memorial), we will discuss the benefits, advantages, and problems from these perspectives.

Usage
The project began with a computer scientist asking historians about their data and whether they would be interested in visualizing it. At this point, it was unclear to the historian how the visualization might work, as she had less experience in this area. She was curious about how visualization and visual analysis could offer new insights (1) into the data she already knew and was working with. Hence, there was no particular expectation to meet, but interest was piqued. After the initial conceptual meetings, the main task for the tool was defined as "exploration" (2).
Through an iterative design process, accompanied by weekly virtual meetings and evaluations, the need for a more global system was identified early on, leading to the present state of the tool, which allows for a wide range of tasks (3). Later, it was found that the tool also helped clarify aspects that were known but not at a conscious level (4).
This was also motivated by the awareness of having developed a familiarity with the texts that made it more difficult to approach the data from new angles through traditional close reading. The development of new research questions through newfound access to the data sparked curiosity. The ConceptTool proved to be a complementary tool, characterized by defining different themes and manipulating the sources to gain insight into these themes and if and when they were significant to the authors (5).

Usability
For the previously defined uses of the tool (1)-(5) (see Section 6.1), the main focus was on (1) and (3). Offering an exploration tool is of high value for both casual users, such as visitors to memorial sites, and experts, such as researchers. Presenting the data through visualizations provides a new perspective on the diaries. Through traditional workflowssuch as reading a single source from front to back in close reading, and hence, following the descriptions of a single person-a reader can identify with the author/protagonist. Therefore, some sort of bias or distortion in the analysis is not uncommon. The top-down approach with the visual approach at a distant reading level and the further approach of close reading of texts through interaction allows for a quick comparison of texts by different authors for a given period of time. While this reduces bias, it also removes contextual information. A text may refer to earlier events or entries and may be understood differently without having read them. We consider both approaches important and decided to include the ability to access only a single author's text as well as simply navigate to the previous or next day so as not to deny this approach.
Thus, for usage (1), we decided to enrich the traditional approaches with new ones. This is especially appreciated by collaborating historians, as they are still free to decide which approaches they choose (and, in the case of the more traditional approach, have easier access to it, as they do not need to access the analog texts). In terms of the supported tasks, we chose to provide a wide range of options and interactions to accommodate both the traditional and digital approaches. While we did not include a specific use case for exploration, this type of interaction is used as a starting point for the tool.
Getting users interested and motivated to explore the tool and the underlying data is done by giving them an overview, regardless of their expertise. After finding an aspect that interests them, they can follow a process motivated by Shneiderman's mantra [46] that uses interactions to break down the data to a subset to begin specific tasks. In Section 5, we included three use cases for the Search, Lookup, and Present tasks that show examples of scenarios in which the tool is used (2) and (3).
We know from historians that they have built up a great deal of knowledge and expertise through their years of work in the field, and this tool can help them to categorize and link knowledge. These include many logical conclusions that are perhaps too obvious to be the focus in a research context but which may be of great value to visitors to the memorial, for example. Above all, they help to understand and contextualize the information contained in the diaries.
During the development process, we encountered such a logical linkage through the ConceptTool, for example. A dominance was found in the co-occurrences between weather terms and roll-call terms. Although it is not surprising that forced long standing and waiting in bad weather is particularly stressful, it is perhaps too obvious to consider. The combination of prior knowledge (most diaries talk about weather during roll call) and a very plausible statement (weather is especially important when forced to wait outside for several hours) leads to a quick realization of the important relationship between weather and roll calls (4).
For a visitor attempting to better understand the situation of prisoners, it can be of great value to follow the descriptions of the weather during these roll calls. This functionality is provided by either defining a weather-and-roll-call concept oneself or selecting this concept from a list of predefined concepts by the memorial staff and loading it into the CalendarView (5).

CalendarView
At the time of writing, the tool is not widely used or publicly available at Bergen-Belsen Memorial. This tool is known to a number of memorial staff and some external partners; however, ongoing development discourages integration into most workflows. Nonetheless, our principal historian on the project is already making use of the tool. We described the previous and the extended workflow of accessing desired diary entries in the use case section (see Section 5.3).
Briefly, she told us that the tool helps to obtain quick access to all diary entries up to a certain date without having to review and open all diaries individually (either as analog manuscripts scattered on the desk or through multiple PDFs scattered on a digital desktop). This is used at the Bergen-Belsen Memorial to feed a Twitter account several times a month with quotes from these diaries. She also praised the exploration part of the tool several times and helped in designing it for on-site use by visitors.
For the historians' research, we were told that they found some surprising aspects. With the inclusion of sentiment analysis, they had access to the sentiments of the text for the first time. Before we gave access to the sentiment functionality, we gathered thoughts and expectations about what the sentiments in the texts would look like with the historians: "Under the impression of three freshly read diaries from the Stern camp, I would guess that the perceptions of the prisoners change from slightly negative to rather negative, but rather 'gently' and not very steeply. I suspect this because most prisoners had previous prison experiences and associated a certain expectation with the Bergen-Belsen camp, which also had to do with 'exchange'. However, as the overall situation deteriorates over time, I believe at the same time that negative moods accumulate over time because the nervous strain on the prisoners increases significantly. According to my current knowledge of the prisoners' reports, the liberation, which is perceived as so striking from today's perspective, is not described as strongly and unambiguously positive as one would expect in retrospect, because it still takes place in parallel with many negative experiences in people's situation. Therefore, I do not expect a strong or only a very moderate increase in positive sentiments for this period." While this explanation sounded very logical to the inquiring computer scientists, the actual visualization of the moods shown (see Figure 14) shows a surprisingly positive (green) to neutral (yellow) representation in the diaries. This led to further (albeit rather brief) investigation. It is striking that the texts show a higher degree of mood variation than expected. The expected smooth change between moods is hardly to be found. Instead, the visualization shows a rather chaotic character with positive texts close to negative texts.
As later discussed in the limitations (Section 6.3), the sentiment analysis in this project requires further refinement and was more or less included as a prototype. Nevertheless, the current state of the tool has not only shown future potential but also interesting approaches and analysis possibilities available right now.

Stopword List
We also used a German stopword list to filter words for the sentiment list as well as the tag clouds. Similar to the sentiment list, we also encountered topic-specific problems. While some words from the stopword list are missing, resulting in less useful entries in the tool, we also found some words that are normally considered stopwords but are important for our cases. Although we talked about these problems several times, none of us felt confident enough to completely revise these predefined lists. Therefore, a future challenge would be to involve a linguist (preferably someone with experience in (collaboration with) historical studies) and work on a more appropriate list.

ConceptTool
As it stands, the ConceptTool provides a quick and easy interface to generate and export terms that characterize a topic or concept. Motivated by historians, it mainly includes features needed by them, while excluding possible aspects that could be useful for users unfamiliar with the texts and authors. Thus, the impact of the ConceptTool is the lowest of the entire tool, and as far as we know, it is not used by users outside of guided sessions.

Visualizations
The linchpin of the project is the different approaches to the data through visualizations. Therefore, it is important to look at the value of each visualization through the eyes of historians. The following are statements we learned through oral evaluation and conversations with historians, focused on the visualizations.
Calendar. "The Calendar was very familiar. The structure and system of the visualization was intuitive and clear. I can easily follow the visualization. On the other hand, I have a more difficult access to the data at close range. The degree of abstraction is high. This gives easy access to the overview, while access to the data itself is more difficult." Timeline. When asked about her first impression of the timeline, the historian responded with "delight". As with the calendar, the structure of the visualization-including the temporal component-was familiar. The data access itself, on the other hand, was closer to the text, but still provided new perspectives.
Tag Cloud. Lastly, we solicited feedback on the tag cloud. At first, the reaction was indifferent: "This looks like a normal tag cloud." After a closer look at the tags, it seemed to spark curiosity. While the size of the labels-corresponding to the tf-idf frequency-was rather uninteresting, interest was shown in the many words representing the time period. When asked if the size of the tags should be disregarded, the response was that while it is uninteresting for exploring the cloud, it helps reduce the clutter of the visualization, so we kept it anyway.

Limitations
Although the tool has been used successfully by historians (see Section 5), we are aware of a number of limitations.
Design. During our collaboration with historians, we opted for a visualization approach designed to be focused on distant reading. While this allowed for a wide range of new research possibilities, in some cases, it complicates the traditional close reading approach to which historians are accustomed. This increases the reluctance of these historians to use the tool.
Data. Although we offer a generic import of sources, there are some limitations to the input. First of all, we decided to limit the number of authors/sources to be included. This is mainly motivated by the use of colors to represent authors. We offer a set of eleven colors provided by colorbrewer.org. A larger number of colors would run the risk of users confusing the colors and biasing the analysis (without explicit interaction with the visualizations).
We also limited the number of years to three (1943)(1944)(1945). World War II did start earlier, but we developed the system using the sources of the Bergen-Belsen concentration camp, which existed during these three years. A final data limitation applies to the languages of the input. While most parts of the system are language-independent, the sentiment analysis uses pre-defined lists of word-sentiment pairs. Sources in other languages (even mixed) may be loaded, but the results of sentiment analysis and even the tag clouds would be (near) useless. Sentiment Analysis. In addition to the potential problems of loading mixed or non-German texts, the value of the sentiment analysis is limited. We used a merge of the German SentiWS (SentimentWortschatz) [48] and TextBlobDE [49]. While they went to great lengths to create sophisticated sentiment lists for German, we found that the vocabulary of the diaries and the dire situation of the prisoners would require a more refined and tailored sentiment analysis not offered by these frameworks. In our analysis and evaluation, we encountered three types of problems that limit the use of the sentiment analysis: missing sentiments (white) (1), wrong sentiment type (2), and wrong value (3).
(1) For all diaries, almost 39,000 words are missing from the sentiment list (stop words excluded). Although this is a very high number, most of them are not relevant because they are without sentiment. Further, some more of them are either not relevant in a general context (but very relevant to our setting) or are somehow missing, although we would have expected them in at least one of the lists. Examples of words with missing sentiments are barbed wire, suicidal thoughts, and crematorium. In addition, some words are limited to this period but are also very important, such as air raid alarm, SS, and exchange.
(2) In special cases, a word may also have a meaning that is plausible in everyday life, but not for a concentration camp. An example of this is Versteckt (hidden), which is rated as −0.7 out of a maximum of −1, i.e., negative, while in the diaries it is used as a positive word.
(3) Similarly, we sometimes encountered values that did not correctly reflect the effect, such as Merciless, Severe, and Broken, of which the first two also have a value of −0.7 (as with "hidden") and the last one has a value of −0.79, which is also too low for the meaning these words have in a concentration camp setting.
While aware of these limitations, none of our collaborating persons was experienced enough in linguistic approaches to offer a more specialized implementation.
ConceptTool. The ConceptTool was not included in the first iterations of our tool. It was only added later at the explicit request of our historian. Therefore, it only contains information that is useful and necessary for the historian. This limits its use for casual users. They would need additional contextual information to use the ConceptTool effectively. This is also reflected by the fact that we only saw concepts that were created in closely guided sessions.
Visualizations. The calendar view was the element most discussed in our evaluation sessions. While we are pleased with the usefulness of the current version, we are aware of its limitations. First of all, the tiling of the days can lead to a distorted perception of the distribution. In the current version, the size of each day tile depends on the number of other authors who wrote on that particular day. This leads to a high visual representation of an author when he is the only one who wrote on that day.
The arrangement of days in seven columns, months in six columns, and finally years in three rows leads to an uneven distribution of space between adjacent dates. Although there is only one day between 31 December 1944 and 1 January 1945, they are positioned on opposite sides of the screen, resulting in a greater focus on logical elements, such as weeks, months, and years, which are not decisive for most approaches.
As for the tag cloud, the year window can hold a high number of tags, but the month clouds are rather limited by the space available. Using a full-screen window with a resolution of 1920 × 1080 pixels and no browser zoom, the clouds can-depending on the character length of the tags-hold between 40 and 60 tags each. Nevertheless, interesting and important tags could be found even in the less-frequent parts of the tag lists. A deeper analysis and evaluation regarding its usefulness should be done in the future.

Future Work
With the current state, we offer a prototype version of the diary visualization tool. We are still motivated to expand the project. This includes adding more diaries, which is mainly handled by the Bergen-Belsen Memorial and their translation and digitization work, which could add up to 30 more diaries. In addition, we can reach out to other memorials that also have a corpus of diaries available, such as Camp Westerbork and even test the tool with (diary) data not linked to concentration camps.
On the more technical side, we are aware of the work needed for sentiment and language processing in general (see Section 6.3). In addition to improving the features already included, more focused work on linguistic approaches could also lead to high-value access through, e.g., NER, and help to provide new insights and support for working with the texts.

Conclusions
With the passage of time and the accompanying disappearance of survivors who had been imprisoned in concentration camps, who play a major role in sharing their memories with younger generations, it becomes increasingly important to develop new solutions for the transmission of this knowledge. Our project targets this direction and uses digital technology to make memories inherent in the diaries written by prisoners of the Bergen-Belsen concentration camp persistently accessible and explorable.
Our design generates a new perspective on concentration camp life by arranging diary entries in a calendar view, and the multifaceted framework allows for investigation of individual as well as shared prisoner experiences. In addition, we analyze sentiments and offer the ability to search for thematic concepts in the corpus, which generates user-driven, non-linear entry points to the materials.
We followed a participatory design process [50] to ensure the development of a valuable visualization for the targeted audience of historians who can use the tool for educational purposes in various settings, such as social media activities and guided groups memorial visits. Our solution on visualizing a diary corpus is the first step towards virtualized access to concentration camp heritage, and in the future this can be expanded to related time-based resources, including letter exchanges or death registers.
Author Contributions: Conceptualization, S.J., S.B. and R.K.; programming: R.K.; Writing: R.K.; Review and Editing: S.J. and S.B. All authors have read and agreed to the published version of the manuscript.