An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise

: Collaborative ﬁltering based recommender systems have proven to be extremely successful in settings where user preference data on items is abundant. However, collaborative ﬁltering algorithms are hindered by their weakness against the item cold-start problem and general lack of interpretability. Ontology-based recommender systems exploit hierarchical organizations of users and items to enhance browsing, recommendation, and proﬁle construction. While ontology-based approaches address the shortcomings of their collaborative ﬁltering counterparts, ontological organizations of items can be difﬁcult to obtain for items that mostly belong to the same category (e.g., television series episodes). In this paper, we present an ontology-based recommender system that integrates the knowledge represented in a large ontology of literary themes to produce ﬁction content recommendations. The main novelty of this work is an ontology-based method for computing similarities between items and its integration with the classical Item-KNN (K-nearest neighbors) algorithm. As a study case, we evaluated the proposed method against other approaches by performing the classical rating prediction task on a collection of Star Trek television series episodes in an item cold-start scenario. This transverse evaluation provides insights into the utility of different information resources and methods for the initial stages of recommender system development. We found our proposed method to be a convenient alternative to collaborative ﬁltering approaches for collections of mostly similar items, particularly when other content-based approaches are not applicable or otherwise unavailable. Aside from the new methods, this paper contributes a testbed for future research and an online framework to collaboratively extend the ontology of literary themes to cover other narrative content.


Introduction
A literary theme, or theme for short, is broadly defined as a topic that is featured in a story.Stock literary themes are typically communicable in the form of a single word or short phrase, as is exemplified by such themes as "betrayal", "obsession", and "the quest for immortality".A typical story will enjoy multiple themes.In the present work, we distinguish between central themes (i.e., themes found to recur throughout a major part of a story or are otherwise important to its conclusion) and peripheral themes (i.e., briefly featured themes that are not part of the main story narrative).This structure is shared not only by pieces of fiction but by other types of documents such as news or content from social networks where the themes provide elements to analyze the narrative structure of a story that encompasses several documents.To sum up: the themes of a story furnish the partaker thereof with a thumbnail sketch of what it is really all about.
Stories from diverse genres employing altogether different settings, character types, and styles are nonetheless liable to be bound by commonly shared themes.Take, for example, the archetypal "hero's journey" theme as advanced by Joseph Campbell in his comparative mythological classic The Hero with a Thousand Faces [1].The narrative of a hero who sets out on a journey, faces down a seemingly insurmountable challenge, and returns home with a valuable lesson to share with their community has been a staple of epic storytelling dating back to ancient times.Modern day television series are no exception.Consider the sci-fi series Star Trek: Voyager, which was produced by Rick Berman for seven seasons from 1995 to 2001 [2].It chronicles the adventures of Captain Janeway and her crew aboard the starship USS Voyager, as they journey home to Earth after having become stranded on the far side of the Milky Way galaxy.Running the gauntlet past one hostile alien force after another, their return journey serves as a living testament to the need for cooperation and tolerance in times of hardship.Compare this with the Ancient Mesopotamian epic poem The Epic of Gilgamesh, which was probably compiled from older sources by Sîn-lēqi-unninni in eleven tablets sometime between 1650 and 1150 BC [3].In the epic, Gilgamesh, king of Uruk, is wracked with death anxiety in the wake of the sudden passing of his companion Enkidu, and resolves to embark on a journey to the land of Dilmun in search of the secret to everlasting life.Gilgamesh goes through hell and high water to reach Dilmun, but life everlasting he does not find.He returns home instead having ascertained that immortality is better left to the gods, and that mortal men would do well to embrace an "eat, drink, and be merry for tomorrow we die" philosophy of life.Thus, we find the "hero's journey" theme links two stories that are otherwise unrelated by genre (sci-fi vs. mythology), character type (heroine vs. hero), setting (outer space vs. Ancient Mesopotamia), and style (television series vs. epic poem).
In the present work, we propose a story recommender system catered to those story enthusiasts who prize thematic content above all else.The system is so constructed as to return a list of stories, sorted according to thematic similarity with respect to a user-selected story.Two main challenges must be overcome in the making of a working system of this kind.First, required is a protocol for theming stories in such a manner that they can be meaningfully compared in terms of their shared themes.To this end, we have launched the Theme Ontology (beta version) online community platform [4].The website features a controlled vocabulary of defined themes, hierarchically arranged into a draft ontology.Community members are encouraged to tag whatever stories (e.g., short stories, novels, films, TV shows, etc.) they please with themes drawn from the ontology, and adorn the ontology with newly coined themes as necessary.It is our aim to build up a large and freely available themed story collection over time through this story theming system.Second, a method of determining pairwise thematic similarity between stories must be specified.To this end, we evaluated the performance of three contrasting approaches to the determination of story thematic similarity: 1) a class of natural language processing baseline methods that assign story similarity without regard to human curated themes; 2) a class of methods that apply similarity functions, such as the cosine and related indices [5,6], to sets of themes that represent stories; and 3) a class of soft cardinality methods that extend (2) by adapting traditional similarity functions to exploit information about ontological proximity between themes in the determination of pairwise story similarity [7,8].The traditional "crisp" cardinality-based similarity functions (e.g., Jaccard, Dice, and Cosine indices) preclude the taking into account of similarities between the elements being compared [9].In addition to the choice of similarity function, the recommender system implements a handful of other knowledge-based filtering options, including theme level (i.e., include central and/or peripheral themes), minimum theme overlap (i.e., filter out those stories falling short of having a certain number of themes in common with a user-selected story), background storyset (i.e., the list of stories to compare to a user-selected story), and a blacklist of themes to be excluded from consideration.
As a proof of concept, we demonstrate our proposed system's viability through the recommendation of Star Trek television series episodes.In the study, we evaluate the performance of different similarity functions on a number of manually curated benchmark similar episode lists.The main take away is that the proposed theme-based representation approaches performed significantly better than a baseline constructed using the transcripts of the episodes and the common practices of the information retrieval areas for the comparison of texts.In addition, we found that the methods based on exploiting the ontology hierarchy did not significantly outperform the sets of themes based methods on average.However, we did find that the ontology based methods outperformed their sets of themes based counterparts at recovering stories related by high-level themes (i.e., themes of a very general nature occupying high levels of abstraction in the ontology).The recommender system is implemented in the R package stoRy version 0.1.2[10].A related R Shiny web application is available for download at the Theme Ontology GitHub repository at https://github.com/theme-ontology/shiny-apps.
The rest of the paper is organized as follows: Section 2 covers some general background material along with the methodology underlying the proposed recommender system.In Subsection 2.1 we provide an overview of the Star Trek television and film franchise.In Subsection 2.2 we acquaint the reader with our draft theme ontology.It is a controlled vocabulary consisting of over 2000 defined themes, hierarchically arranged into the following four domains: the human condition, society, the pursuit of knowledge, and alternate reality.In Subsection 2.3 we describe a related toy dataset consisting of 452 manually themed Star Trek episodes.In Subsection 2.4 we describe the inner workings of the recommender system.In particular, Subsubsections 2.4.1 and 2.4.2 concern the technical details of sets of themes and ontology based methods, respectively.Section 3 deals with the application of the system to the recommendation of Star Trek television series episodes.In Subsection 3.1 we walk the reader through an example usage of our R Shiny web application.In Subsection 3.2, we introduce the collection of manually curated benchmark storysets used to validate our methodology.Subsection 3.3 describes the methodology we use to evaluate our various proposed similarity functions.Subsections 3.4 concerns the technical details of the baseline methods used in our comparisons.In Subsection 3.5, we present the results of the proof of concept study.Our primary conclusion is that the theme-based methods significantly outperform the natural language processed baseline methods.Section 4 concludes the paper with a serious discussion on the steps involved in scaling the recommender system in order that it might appeal to a broad audience.

Materials and Methods
In this section, we provide an introduction to the Star Trek television and film franchise, describe an ontology of themes and related toy dataset of themed Star Trek television series episodes, and spell out the details of how our story recommender system works.The theme ontology and themed Star Trek episode dataset have been made accessible in a structured manner through the R package stoRy (version 0.1.2)[10] and the Theme Ontology website [4].

The Star Trek Television and Film Franchise
Star Trek has influenced American popular culture for more than 50 years [11], and remains a favorite among sci-fi enthusiasts the world over [12].The Star Trek sci-fi media franchise canon comprises seven television series and thirteen feature films [2]. Figure 1 shows an overview.The first series, which is known as Star Trek: The Original Series (or simply TOS ), aired from 1966 to 1969 over three seasons and 80 episodes.Set in the mid 23rd century, it depicts the adventures of Captain James T. Kirk and his crew aboard the starship Enterprise on a mission to explore the galaxy.The Enterprise crew resumed their mission in cartoon form in Star Trek: The Animated Series (TAS ), which ran from 1973 to 1974 in two seasons consisting of 22 episodes.Six feature films following the TOS/TAS cast on subsequent adventures were released in the years from 1979 to 1991.From TOS was spawned the spin-off television series Star Trek: The Next Generation (TNG) which ran from 1987 to 1994 in seven seasons consisting of 178 episodes.It is set a century or so after Captain Kirk's original mission.In the series, a fresh cast of characters is led by Captain Jean-Luc Picard on a similar mission of galactic exploration aboard a newfangled starship Enterprise.There are four associated feature films.Three subsequent television series have been produced: Voyager series (a.k.a., Voyager ), set in the generation after TNG, is particularly noteworthy for being the first series in the franchise to employ a female starship captain (i.e., Captain Kathryn Janeway).A total of 172 episodes were aired over seven seasons.Three reboot films based on TOS have also been released to date, and a seventh television series, Star Trek: Discovery (2017 -), is presently in its second season of being aired.

A Theme Ontology
A theme ontology is a controlled vocabulary of defined terms representing literary themes in fiction.In this subsection, we describe a draft theme ontology that comprises 2129 unique themes, hierarchically arranged into the following four domains: The Feature Films The Feature Films The Human Condition: Themes pertaining to "characteristics, key events, and situations which compose the essentials of human existence, such as birth, growth, emotionality, aspiration, conflict, and mortality"1 .
Society: Themes pertaining to a "community of people living in a particular country or region and having shared customs, laws, and organizations" 2 .
The Pursuit of Knowledge: Themes pertaining to "facts, information, and skills acquired through experience or education; the theoretical or practical understanding of a subject" 3 .
Alternate Reality: Themes related to subject matter falling outside of reality as it is presently understood.These are classical science fiction and fantasy themes 4 .
Figure 2 depicts a bird's eye view of the ontology.The abstract theme "literary thematic entity" is taken as root theme.Each domain is structured as a tree descended from the root with "the human condition", "society", "the pursuit of knowledge", and "alternate reality" serving as the top themes of their respective domains.Each child theme is made to bare a subtype relationship with its parent.One theme is said to be a child of another if the former theme is directly connected to the latter when proceeding away from the root.A parent theme is just the reverse notion of a child theme.One theme is an ancestor of another when reachable by repeated proceeding from child to parent.Likewise, a descendant is the reverse notion of an ancestor theme.A leaf theme is such that it has no children.The height of a tree is the number of parent/child relations in the longest path from root to leaf.A theme is said to be observed with respect to a story if it is featured in the story in such a manner that none of its descendant themes in the ontology hierarchy are also featured.An ancestor of an observed theme is said to be latent with respect to a story.The tree in Figure 2 depicts the ontology to a height of three levels, although they actually branch out a number levels further still, as summarized along with other information in Table 1.In designing the ontology, we strove to make sibling themes mutually exclusive, but not necessarily jointly exhaustive.All non-root themes are accompanied with definitions and references, when possible.We appealed to the principle of falsifiability in definition writing.That is to say, a good theme will be such that it is possible to appeal to the definition to show it is not featured in a story.Take "to tell the truth vs. offering a comforting lie" as an example, which is defined as "A character must choose between telling a comforting white lie on one hand, and being honest on the other.".We contend that definition writing of this sort helps to bring the conversation of whether a theme is featured in a given story into the realm of rational argumentation.The theme ontology data has been made accessible in a structured manner through the R package stoRy [10].This paper uses version 0.1.2 of the ontology, which can be accessed through the like versioned stoRy package 0.1.2.Functions for exploring the ontology are described in the package reference manual.For example, the command theme$print() prints summary information for the theme object theme, and the function print tree takes a theme object as input and prints the corresponding theme together with its descendants in tree format to the console.We encourage non-R-users to explore the current version of the ontology on the Theme Ontology website5 .Previous versions of the ontology are available there for download.

A Themed Star Trek Episode Dataset
We manually tagged a total of 452 Star Trek television series episodes with themes drawn from the theme ontology described above.This covers all TOS, TAS, TNG, and Voyager television series episodes.Table 2 shows a basic statistical summary of the dataset.Note that a theme is made to bear either a central or peripheral relationship to any particular episode it tags.A look at an example episode will help to illustrate the system of theming we employ.Table 3 catalogs the themes for the Voyager episode False Profits (1996).In this episode, the USS Voyager crew discover a planet on which two Ferengi, named Arridor and Kol, have duped the comparatively primitive inhabitants thereof into thinking them holy prophets.The story begins with Commander Chakotay and Lieutenant Tom Paris beaming down to the Takarian homeworld to investigate signs of "matter replicator" usage among the local inhabitants.This is considered odd, because the Takarians otherwise manifest only a Bronze Age level of technology.Chakotay and Paris soon uncover how the Ferengi had traveled through a "wormhole", crash-landed on the planet, and, in a naked display of "science as magic to the primitive", convinced the Takarians that they had come in "fulfillment of prophesy" through the performance of "matter replicator" powered conjuring tricks.The intervening years see the Ferengi use "religion as a control mechanism" to shape the Takarian economy to suit their own self-interest.Arridor and Kol now wallow in the muck of "avarice" as a result of their "fraud".Back aboard the ship, Captain Janeway confers with her senior staff about "the ethics of interfering in less advanced societies", before venturing to determine a proper course of action.Janeway decides that this appalling "exploitation of sentient beings" must be brought to an end with minimal interruption to the internal development of Takarian civilization.Because forcibly removing Arridor and Kol could undermine Takarian religion, she reasons that the pair must be made to leave the planet of their own accord.Morale officer Neelix, disguised as a representative of the Ferengi head of state, beams down to the planet in an effort to trick Arridor and Kol into returning to their homeworld.But the Ferengi, driven by an insatiable "lust for gold", refuse to leave without putting up a fight.The situation quickly spirals out of control when the Takarians opt to burn Arridor, Kol, and Neelix at the stake.This, according to their "primitive point of view", would deliver the holy prophets back to the heavens from whence they came.Arridor and Kol resort to blatant "casuistry in interpretation of scripture" in a last ditched attempt to save their skins, but to no avail.Then, the condemned men are beamed up to Voyager just as the smoke begins to overwhelm, as the Takarian onlookers watch in amazement at the return of their holy prophets to the stars.We recorded themes for each of the 452 episodes of TOS, TAS, TNG, and Voyager in a similar manner.The basic process we used in assigning themes is summed up as follows.We individually tagged episodes with themes before comparing notes with a view toward building a consensus set of themes for each episode.We aimed to abide in the principle of low-hanging fruit in the compilation of consensus themes.In the present context, this means we aimed to capture the more striking topics featured in each episode with appropriate themes.Another guiding principle is the minimization of false positives (i.e. the tagging of episodes with themes that are not featured) at the expense of tolerating false negatives (i.e.neglecting to tag episodes with themes that they feature).This strategy amounts to erring on the side of caution.We acknowledge that this process does not preclude the tagging of stories with themes that are idiosyncratic and unique to our point of view.But let us return to this subject in the discussion section.

A Story Recommender System for Theme Lovers
In this subsection, we delve into the technical details of our proposed story recommender system.It is a simple knowledge-based filtering recommender system [13], insofar as themes are used to describe stories and users are able to set certain filtering options to suit their personal thematic preferences.In essence, the user selects a story of interest, and is then given the choice of altering the following knowledge-based filtering options from their default settings: Similarity function: The function used to score the similarity between pairs of stories.The choices are cosine similarity (Eq.1), cosine similarity with idf weighting (Eq.2), and the soft cardinality index (Eq.5).Cosine similarity is used by default.
Theme level: The level of theme to include in the scoring of story similarity.The choices are central and/or peripheral.For example, the user may elect to obtain recommendations on the basis of shared central themes alone.The default is to base story recommendations on both central and peripheral themes.
Minimum thematic overlap: The minimum number of themes that a story must share in common with the user-selected story to be included among the recommendations.The value is set to 1 by default.
Background storyset: A user-supplied background list of stories to be evaluated for similarity with the story selected by the user.
Theme blacklist: A user-supplied list of themes to be excluded from the analysis.
The remainder of this section is focused on defining the aforementioned trio of similarity functions, and explaining the details of how we integrate them into our recommender system framework.

Methods Based on Sets of Themes
The dataset presented in Section 2.3 allows us to represent each one of the 452 stories as the set of the themes tagged for each story.In formal terms, consider a story, S, that is labeled by a unique identifier, s, called a story ID.A storyset is defined as a set of story IDs.We use the symbol S to denote a storyset consisting of n unique story IDs.Thus, a pair of stories Q and R from some storyset, S, represented as their corresponding set of tagged themes Q and R can be compared using the cosine coefficient [5]: The cardinality function of a set X, denoted by |X|, means the number of elements in that set.We will refer to this method in our experiments as COS.We also tested the Cosine-idf [6] function which we refer to as COS IDF, and a variant (COS IDF'), which consists of raising the IDF values to a power.The term W S (t) = log(1 + n/F (t, S)) is the idf weighting term for a theme t from theme set S. In the inverse document frequency factor, 1 + n/F (t, S), the function F (t, S) = s∈S |t| gives the total number of times the theme t occurs in the theme sets associated with storyset S. The goal is to assign a pairwise similarity score, 0 ≤ Sim(Q, R) ≤ 1, between a query story of interest, Q, with story ID q / ∈ S and theme set Q, and each reference story, R, with story ID r ∈ S and theme set R. The difference in performance between these methods and the baseline methods reflects the gain of using the proposed set of themes and its manually constructed mapping to the stories, in comparison to the use of a low-informative resource as it is the transcripts.

Methods Exploiting the Ontology Hierarchy
As was shown in Subsection 2.2, the proposed theme ontology arranges the set of themes in a hierarchicalsemantical organization.Such an arrangement allows for a richer representation of the set of themes that can also enrich the representation of the stories.To that end, the theme hierarchy can be used to build similarity functions for comparing pairs of themes.In turn, such theme-similarity functions can be used to leverage the information coded in the hierarchy to build story-similarity functions.In so doing, we follow the exposition of [9].The soft cardinality x of an element, x, from set X is defined by the formula where 0 ≤ S(x, y) ≤ 1 is a similarity function, and p > 0 a softness-control parameter.Note that x attains a maximum value of | x | when S(x, y) = 1, if x = y, and 0 otherwise.In other words, the soft cardinality of x ∈ X reduces to the standard notion of element cardinality when x is wholly similar to itself (i.e., S(x, x) = 1) and utterly dissimilar to all other elements y ∈ X (i.e., S(x, y) = 0).The softness-control parameter p is discussed at length in [9].Suffice it to say here that maximal "softness" is obtained in the limit as p approaches 0 + (i.e., S(x, y) p = 1 for all y ∈ X), maximal "crispness" is obtained in the limit as p approaches ∞ (i.e., S(x, y) p = 1, if x = y, and 0 otherwise), while setting p = 1 leaves the values of S(x, y) unmolested.This brings us to the soft cardinality X of set X, which is defined as the sum of the soft cardinalities of its elements: Several remarks are in order.First, soft cardinality generalizes the set theoretic notion of cardinality defined above to non-whole number values by exploiting pairwise similarities between elements in the calculation of element cardinality.Element cardinality, | • |, by contrast, is confined to the whole numbers.The intuition underlying soft cardinality is this: if elements x and y in a set X are similar in some respect, then they should contribute less to the "size" of X than their respective cardinalities, | x | and | y |.Second, soft cardinality has been employed with some success in the field of natural language processing [14,15,16].
The final similarity function we employ is the cosine index integrated with soft cardinality: We hereafter refer to this similarity function as the soft cosine index.In our scenario, Q and R are stories represented, as before, as sets of themes, and the soft cardinality function X is implemented by a themesimilarity function S(•, •), Eq. (3) and Eq. ( 4).In that way, Eq. ( 5) preserves the same conceptual structure of the cardinality-based coefficients in Eq. ( 1), but is boosted by an ancillary theme-similarity function.By using the ontology hierarchy to build the theme-similarity function, the resulting story-similarity function of Eq. ( 5) exploits that hierarchy.
The following paragraphs elaborate on the construction of different options for the function S(•, •).
A semantic similarity function is a function that computes the pairwise similarities between hierarchically arranged terms [17].Let t and u be themes in the ontology.Most of these functions are built from the following primitives: d: The maximum number of steps from any theme to the root-theme (d = 8 in our ontology).m: The maximum number of steps between any pair of themes (m = 13 in our ontology).

depth(t):
The minimum number of steps from theme t to the root-theme.

path(t, s):
The minimum number of steps from theme t to theme s.

LCS(t, s):
The least common subsumer of t and s (i.e.their deepest common ancestor).

IC(t, s):
The information content of theme t as defined by [18].IC is a measure of the informativeness of a theme obtained from statistics gathered from a corpus.In our scenario, the corpus is the collection of episodes where each one is represented as a set of themes from the ontology.
The theme-similarity functions used in our experiments are: The functions S path1 and S path2 are simple conversions of the number of steps between themes t and s to a similarity score in the unit interval.The procedure to obtain a story-similarity function from S path1 or S path2 is to use them as a replacement of S(•, •) in Eq. ( 3).Next, the soft cardinality function for a story Q is obtained using Eq. ( 4).Finally, using such a soft cardinality function in Eq. ( 5) produces the final storysimilarity function.This procedure provides a similarity function that exploits both the representation of the stories as sets of themes and the hierarchical arrangement of themes in the ontology.In our experiments, the resulting functions obtained from S path1 and S path2 are referred as PATH1 and PATH2, respectively.
S lch corresponds to the measure defined in [19] and the resulting story-similarity function is referred as LCH.Analogously, S wup is the measure from [20], which produces WUP; S res is the measure from [21], which produces RES; S lin is the measure from [22], which produces LIN; and finally, S jcn is the measure from [23], which produce JCH.The measures LCH, RES and JCN produced similarity scores that can be larger than 1.
To keep these measures in the unit interval, the constants c i scale the scores with the values c 1 = 3, c 1 = 10, and c 1 = 2. Notice that measures based on the IC() primitive exploits both the hierarchy and the corpus of episodes annotated with themes.

Results
In this section, we evaluate the performance of the proposed recommender system in a proof of concept study using a number of curated benchmark storysets.In particular, this study aims to answer the following questions: 1. Which story-similarity function offers the best performance according to the proposed benchmarks? 2. Which resources among the story transcripts, the themes associated with the stories, and the ontology hierarchy, should be used to build such a similarity function?3. How much does each of those resources contribute to the performance of the proposed story-similarity functions?

A Star Trek: Voyager Episode Case Study
The present case study serves as lead-in to a more comprehensive proof of concept study.In particular, we show how our R Shiny web application can be employed to recommend Star Trek television series episodes that are similar to the Voyager episode False Profits (1996; story ID=voy3x05 ).It will be recalled that a synopsis of the False Profits episode is provided in Subsection 2.3, and an inventory of the themes featured therein is found in Table 3.
Figure 3 shows a screenshot of the R Shiny web application in action.Viewing the said screenshot, it is easy to imagine a hypothetical user, who, having selected False Profits from a dropdown menu of episodes, peruses the returned table of recommended Star Trek episodes with marked delight.Our hypothetical user has elected to use the cosine index in an effort to find similar stories on the basis of shared central themes.The file StarTrek.smt,which contains Star Trek episode story IDs, they have uploaded as background storyset so as to restrict the recommendations to the 452 Star Trek episodes considered in this work.Note that the StarTrek.smtfile is included among the supplementary information items.Now on to the recommendations.The most similar episode to False Profits is revealed to be Devil's Due (1991, story ID=tng4x13 ).The TNG classic shares six central themes with its Voyager counterpart in all.In the story, a woman claiming to be the devil of Ventaxian mythology returns to enslave the people of Ventax II in accordance with an ancient contract.However, Captain Picard is convinced she is an opportunistic charlatan.If our user can somehow restrain themself from watching Devil's Due straightaway, they will no doubt be pleased to notice a combination of "avarice/the lust for gold" and "the ethics of interfering in less advanced societies" featured in the three subsequent recommendations.It is interesting to note that, unlike with Devil's Due, these three episodes do not touch on religion.Each of the remaining top ten recommended episodes is related to False Profits by themes from exactly one of the domains the human condition (e.g., "avarice" and "the lust for gold"), society (e.g., "the ethics of interfering in less advanced societies"), and the pursuit of knowledge (e.g., "religion as a control mechanism" and "the fulfillment of prophesy").Thus, we see how our user, furnished with these recommendations, is well launched on the selection of their next episodes to watch.

Benchmark Storyset Collection
We curated a total of 28 benchmark storysets.The collection is made available in smt format via the benchmark storysets.smtfile in the stoRy package.Moreover, the spreadsheet file benchmark storysets.xlsx in supplementary notes contains a host of benchmark storyset summary details, including definitions, summary statistics, and detailed justifications for episode inclusion.
Table 4 shows an overview of the collection.Each row in the table summarizes a benchmark storyset according to name, number of episodes (i.e., n), most commonly observed theme (i.e., top observed theme), proportion of episodes in the benchmark storyset featuring the said theme (i.e., P BS ), and the proportion of episodes in the background storyset consisting of all 452 Star Trek episodes featuring the said theme (i.e., P Bg ).In the "quest for immortality" benchmark storyset, for example, P BS = 1.000 means the top observed theme "the quest for immortality" is featured in all n = 8 episodes.Meanwhile, the value P Bg = 0.018 Table 4: Benchmark storyset collection at a glance.Recorded for each benchmark storyset is the number of episodes contained therein (i.e.n), the most commonly observed theme found to occur in the episodes (i.e.top observed theme), the proportion of episodes featuring the topmost observed theme (i.e.P BS ), and the proportion of background storyset episodes featuring the topmost observed theme (i.e.P Bg ).The background storyset is taken to consist of all 452 Star Trek episodes described in the present work.Definitions for the benchmark storysets are found in the supplementary materials.

Benchmark storyset
n indicates that the same theme is featured in 1.8% of the background episodes (i.e., 8 of 452 episodes).In other words, an episode is included in the "quest for immortality" benchmark storyset if, and only if, its namesake theme "the quest for immortality" is observed in the episode.However, not all benchmark storysets are defined in this simple manner.Consider the "recreation gone wrong" benchmark storyset, defined as "A list of episodes where a character enjoying a recreational diversion winds up in trouble.",which was proposed by Memory Alpha website user LauraCC (http://memory-alpha.wikia.com/wiki/User:LauraCC).The term "recreation gone wrong" is not at present a theme in the theme ontology.Instead, we find the top observed theme in episodes satisfying the definition is "virtual reality room" with P BS = 0.818.This makes sense because Star Trek is well-known its usage of the holodeck virtual reality environment.As a final example, take the "existential risk" benchmark storyset.Curiously, the topmost observed theme is in this case is "annoyance", which is featured in 31% of the episodes.This observation is explained by the fact the namesake theme "existential risk" is a high level theme in the ontology.Indeed, it is often, if not always the case, that the theme "existential risk" turns out to be latent with respect to the stories in which it in featured.Instead, some descendant theme of "existential risk" is observed with respect to a given story.As a result, the theme "annoyance" ends up as the top observed theme for no obvious reason.

Evaluation Method
Taking the benchmark storyset collection as our standard of truth, we aim to assess the quality of a particular story-similarity function sim(Q, R), which provides scores in the unit interval reflecting the degree of similarity of pairs of stories returning 1 for identical stories, close to 1 for similar ones and close to 0 for dissimilar ones.For that purpose, each story Q in benchmark storyset S of cardinality n, Q is compared against all the 452 − 1 stories in S (the set of all background stories save for Q) using sim(Q, R), where Q ∈ S and R ∈ S. The result is a list of stories ordered descendently by similarity, where the first n − 1 positions should be taken by the stories in S. In practice, sim() fails to do this and other stories different to those in S can appear in the leading positions in the list.If only a few of those stories (not in S) appear in the leading positions, then sim() is considered of high quality, but if there are many of them, then it is considered a low-quality similarity function.To assess quantitatively such degree of quality, we used two performance measures, namely: precision at 10 (P@10) and Mean Average Precision (MAP).In the field of information retrieval, P@10 and MAP are the most common evaluation measures for the ad hoc task, i.e., retrieving items from a collection relevant to a query.P@10 measures the ratio of stories in S in the ten leading positions in the list ranked by sim().Thus, the P@10 measure for a benchmark storyset S is the average of the ratios over the n lists corresponding to each story in S.This measure is convenient for evaluating scoring functions in scenarios where users obtain a short list of results to a query.In our recommender system scenario, we deem it a most suitable performance measure on the ground that users have limited time to peruse long lists of episodes.
The average precision (AP) measure can be defined using the concept of P@m, i.e., the precision at the m-th position in a list of stories ordered descendently by similarity against a query story Q. AP is the average P@m variying m in the positions obtained by the n − 1 stories in the benchmark storyset S. Finally, MAP is the average of APs over all the stories in a storyset.Unlike P@10, the MAP measure provides an evaluation of the entire ranking of stories ordered by similarity (not only the top ten positions).

Baseline Methods
A simple story-similarity function can be build using the episode transcripts 6 .Such a function provides a baseline to compare the performance of other functions based on the proposed set of themes and ontology.For that, we followed common practices from the field of information retrieval for document comparison [24].First, the transcripts were preprocessed using the following steps: 1) the sequence of characters was converted to a sequence of tokens using the tokenizer from the NLTK7 ; 2) words were selected from the tokens by removing punctuation marks, non-alphabetic tokens, tokens longer than 20 characters, tokens shorter than three characters, and tokens occurring only once in the transcript collection; 3) words in the list of English stop-words of the NLTK8 were removed; 4) the remaining words were reduced to their stems using the Porter's algorithm [25].
The first baseline function uses the well-known vector space model proposed by Salton [6], which assigns tf-idf weights to the words (stems) of each document (story).The representation for each document is a vector indexed by the words in the collection vocabulary and their entries are the tf-idf weight.Finally, the similarity score is obtained by comparing pairs with the vectorized version of the cosine coefficient.We refer to that function as VSM in our experiments.
Given that our episode collection is relatively small (452 episodes) in comparison with the size of the resulting vocabulary in the transcript collection (14,262 stems), we proposed a second baseline aimed to  reduce the dimension of the vectors to be compared.For that, the common practice is to use latent semantic indexing (LSI), which reduces the dimensionality of the vectors to a fixed size, k, using singular value decomposition [26].In our experiments we used values of k in {10, 20, 40, 80, 160, 320}, so, the names of the functions are LSI-10, LSI-20, and so on.
We provide a Python script with the implementation of these baselines using the GenSim [27] library along with the data from the transcripts9 .

Study Results
Figure 4 shows the results in three groups.On the left in light colored bars, the baselines (see Section 3.4); in the center, the functions based on themes (see Section 2.4.1); and to the right in dark bars, functions based on themes but that also make use of the ontology hierarchy (see Section 2.4.2).The values that illustrate the bars correspond to the average obtained by the MAP and P@10 measurements in the 28 benchmarks storysets.The results of COS IDF' correspond to the optimal value of the exponent used in IDF that was 0.3.Similarly, for each measure in the right group, we varied the softness parameter in Eq. ( 3) ranging p = 1, 2, . . ., 10.The figure shows only the results for the optimal value of p for each measure.
These results clearly show that the use of the episode representation based on the themes in the episodes (center and right groups) far surpasses the representation of the baselines (left group), which use the episode transcripts.The difference in performance between the center group and the one on the right is much less marked giving a slight advantage to the methods that exploit the hierarchy ontology.The only function that performed poorly is PATH2, which obtained a unexpected value of 0.242 in the MAP measure lower than 0.263 obtained by COS.In general, the performance of all measures are somehow uniform with a small superiority of the measures that use IC in their formulation (i.e., RES, LIN, and JCN).We tested the differences between COS and all the functions of the right group using the Wilcoxon signed rank test.The only significant improvements were obtained by LIN for the MAP measure (p-value=0.034) and JCN for the P@10 measure (p-value=0.008).
Although the methods exploiting the ontology hierarchy did not outperform the sets of themes methods on average, Table 5 hints at some interesting differences.Indeed, the P@10 scores, when averaged over all benchmark storysets, reveal that the COS (average P@10 = 0.299), COS IDF' (average P@10 = 0.293) and soft cosine (average P@10 = 0.306) indices end up in a virtual tie when it comes to best overall performance.
Table 5: Selected results for the recommender system with COS IDF' exponent 0.3 and softness-parameter p =???.P@10 performance measure outcomes are reported for each benchmark storyset.The benchmark storysets are sorted according to P Bg in increasing order.P Bg , P BS , and n are defined in the main text.The bottom row gives the P@10 performance measure outcomes averaged over all benchmark storysets.Bold is used for the top outcome in each table row.But the average P@10 values do not show the whole picture.Indeed, on closer examination, each similarity function seems to excel at recommending episodes from a specific type of benchmark storyset.The COS IDF' index tends to come out on top when a rare theme (i.e., P Bg ≤ 0.025) underlies the episodes of a benchmark storyset.For example, However, COS generally outperforms its idf cousin when a common theme (i.e., P Bg > 0.025) underlies the episodes of a benchmark storyset.Meanwhile, the soft cosine index performs best at recommending episodes from those storysets defined by high-level, latent themes in the ontology, such as the "existential risk", "diplomacy", "family affairs", and "tough decision" storysets.In the future, a more comprehensive examination of these behaviors could prove to be an interesting exercise.

Conclusion
The overarching aim of this paper has been to propose a system of story recommendation that uses commonly featured themes as a basis for calculating pairwise story similarity.In a proof of concept study, we demonstrated the utility of this approach through the successful recommendation of Star Trek television franchise episodes.In particular, we showed that the representation of stories as sets of themes is useful insofar as the resulting recommender system significantly outperforms conventional natural language based methods.However, the utility of the ontology hierarchy is unclear for story recommendation, but we contend that it is at minimum useful for the purposes of navigation and annotation.The recommender system is implemented in the R package stoRy version 0.1.2,and a related R Shiny web application has been made available for download at https://github.com/theme-ontology/shiny-apps.
On the identification of story themes, something must be said.Theme identification is admittedly subjective, but this is not to say the endeavor is altogether arbitrary.Oxford Dictionaries defines a theme as "An idea that recurs in or pervades a work of art or literature."[28].This definition proves adequate for most literary critical purposes but it does not help us identify themes.Two people may understand the same story differently and therefore designate it different themes.The potential for a system of this kind to degenerate into a wasteland of subjectivity depends on the extent to which themes can be made precise.This may appear difficult at a glance because it is often the subtle and contentious themes that are most discussed.A great many themes in a great many stories are, however, comparatively obvious and it is not vain to hope for near universal agreement about whether they apply.At the Theme Ontology community platform, we are presently drafting a policies and guidelines document that will emphasize the need for clarity in and verifiability of theme definitions.The theme "the desire for vengeance", which is defined as "A character seeks vengeance over a perceived injury or wrong", constitutes a model definition.Growing pains are inevitable.But by concentrating on cataloging precisely defined and verifiable themes, we hope to ensure that Theme Ontology becomes a useful digital humanities resource in the future.
Finally, the importance of laying out a plan for how to scale the recommender system so as to broaden its appeal to a general audience cannot be understated.We describe a two-pronged strategy, involving a fusion of manual and automated story theme curation.On the manual side, we aim to grow the Theme Ontology online community platform by following the example of Wikipedia and related non-profit enterprises.We anticipate the development to proceed slowly at first, but with much hard work and dedication, the Theme Ontology will mature into a flourishing online community, dedicated to story theme curation.In the short term, we intend to build up the Theme Ontology library by using topic modeling [29,30] to automatically gather themes for large numbers of stories in the public domain.In the fullness of time, Theme Ontology community members will be able to subject the quick-and-dirty computationally gathered themes to human curation.By following this general approach, we hope to scale the recommender system to operate on a database consisting of stories numbering in the thousands to millions.
To conclude, the results of our proof of concept study allow us to say that the representation achieved for the episodes is adequate and that it can be exploited with good performance using methods as simple as the cosine coefficient.Clearly, theme-based representation is appropriate for constructing convenient similarity functions and has the potential to be applied to other tasks such as the detection of storylines.This approach could be matched with the proximity among the characters to identify a story that is spread in different documents [31].Finally, we want to highlight that the whole ontology of themes, the Star Trek episodes annotated with their themes, the episode scripts, and the benchmark storysets, constitute (to our knowledge) the largest and most complete free access resource for the investigation of multidisciplinary aspects of narrative analysis.Likewise, the Star Trek episodes annotated with their themes and the episode scripts can be considered as a corpus according to Finlayson criteria to be used by computational, narrative and cognitive methods [32].

14 14 Figure 1 :
Figure 1: The Star Trek television series and film franchise overview.

Figure 2 :
Figure 2: Theme ontology hierarchies shown to three levels of depth.

Figure 3 :
Figure 3: Story recommender R Shiny web application screenshot.The table lists the top ten most similar episodes to the Voyager episode False Profits.Pairwise episode similarity is determined by applying the cosine index to central themes.StarTrek.smt is a background storyset file containing story IDs for all 452 themed Star Trek television series episodes considered in this work.

Figure 4 :
Figure 4: Proof of concept study results.The methods based on sets of themes (center) and the methods based on exploiting the ontology hierarchy (right) outperform the baseline methods (left) as judged according to the MAP (blue) and P@10 (red) performance evaluation measures.Numbers above bars in the center group are exponents of idf scores.Number above bars in the right group are values of the p exponent.

Table 1 :
Theme ontology summary statistics.

Table 2 :
Star Trek television franchise episode theme summary statistics by series.

Table 3 :
Inventory of the Star Trek: Voyager episode False Profits (1996) themes.