Extracting Semantic Relationships in Greek Literary Texts

: In the era of Big Data, the digitization of texts and the advancements in Artiﬁcial Intelligence (AI) and Natural Language Processing (NLP) are enabling the automatic analysis of literary works, allowing us to delve into the structure of artifacts and to compare, explore, manage and preserve the richness of our written heritage. This paper proposes a deep-learning-based approach to discovering semantic relationships in literary texts (19th century Greek Literature) facilitating the analysis, organization and management of collections through the automation of metadata extraction. Moreover, we provide a new annotated dataset used to train our model. Our proposed model, REDSandT_Lit, recognizes six distinct relationships, extracting the richest set of relations up to now from literary texts. It efﬁciently captures the semantic characteristics of the investigating time-period by ﬁnetuning the state-of-the-art transformer-based Language Model (LM) for Modern Greek in our corpora. Extensive experiments and comparisons with existing models on our dataset reveal that REDSandT_Lit has superior performance (90% accuracy), manages to capture infrequent relations (100%F in long-tail relations) and can also correct mislabelled sentences. Our results suggest that our approach efﬁciently handles the peculiarities of literary texts, and it is a promising tool for managing and preserving cultural information in various settings.


Introduction
An important part of humanity's cultural heritage resides in its literature [1], a rich body of interconnected works revealing the history and workings of human civilization across the eras. Major novelists have produced their works by engaging with the spirit of their time [2] and capturing the essence of society, human thought and accomplishment.
Cultural Heritage (CH) in its entirety constitutes a "cultural capital" for contemporary societies because it contributes to the constant valorization of cultures and identities. Moreover, it is also an important tool for the transmission of expertise, skills and knowledge across generations and is closely related to the promotion of cultural diversity, creativity and innovation [3]. For this reason, proper management of the development potential of CH requires a sustainability-oriented approach, i.e., one that ensures both the preservation of the heritage from loss and its connection to the present and the future. Proper management of literary cultural heritage, therefore, requires extensive digitization of collections and procedures that allow for the automatic extraction of semantic information and metadata to ensure the organization of past collections and their linkage with present and future documents.
Until recently, engaging with a large body of literature and discovering insights and links between storytellers and cultures was a painstaking process which relied mainly on close reading [4]. Nowadays, however, the large-scale digitization of texts as well as developments in Artificial Intelligence (AI) and Natural Language Processing (NLP) are making it possible to explore the richness of our written heritage with methods that were not possible before at an unprecedented scale, while facilitating the management and preservation of texts [5].
GPT [19] under a DS setting. Extensive experimentation and comparison of our model to several existing models for RE reveals REDSandT_Lit's superiority. Our model captures with great precision (75-100% P) all relations, including the infrequent ones that other models failed to capture. Moreover, we will observe that fine-tuning a transformer-based model under a DS setting and incorporating entity-type side information highly boosts RE performance, especially for the relations in the long-tail of the distribution. Finally, REDSandT_Lit manages to find additional relations that were missed during annotation.
Our proposed model is the first to extract semantic relationships from 19th century Greek literary texts, and the first, to our knowledge, to extract relationships between entities other than person and place; thus, we provide a broader and more diverse set of semantic information on literary texts. More precisely, we expand the boundaries of current research from narration understanding to extended metadata extraction. Even though online repositories provide several metadata that accompany digitized books to facilitate search and indexing, digitized literary texts contain rich semantic and cultural information that often goes unused. The six relationships identified by our model can further boost the books' metadata, preserve more information and facilitate search and comparisons. Moreover, having access to a broader set of relations can boost downstream tasks, such as recommending similar books based on hidden relations. Finally, distant reading [4] goes one step further with readers and storytellers in terms of understanding the story set more quickly and easily.
The remainder of this paper is organized as follows: Section 2 contains a brief literature review, Section 3 discusses our dataset and proposed methodology. Sections 4 and 5 contain our results and discussion, respectively.

Related Work
Our work is related to distantly supervised relation extraction, information extraction from literary texts and metadata enhancement.

Distantly-Supervised Relation Extraction
Distant supervision [20,21] plays a key role in RE meeting its need for a plethora of training data in a simple and cost-effective manner. Mintz et al. [6] were the first to propose DS to automatically construct corpora for RE, assuming that all sentences that include an entity pair that has a relation in a KB express the same relation. Of course, this assumption is very loose and is accompanied by noisy labels. Multi-instance learning methods were proposed to alleviate the problem by performing relationship classification at the bag level, where a bag contains instances that mention the same entity pair [7,8].
With the training framework being typically the aforementioned, research focused on features and models that better suppress noise. Until the advent of neural networks (NNs), researchers used simple models heavily relying on handcrafted features (part-of-speech tags, named entity tags, morphological features, etc.) [7,8]. Later on the focus turned to model architecture. Initially, a method based on a convolutional neural network (CNN) was proposed by [22] to automatically capture the semantics of sentences, while piecewise-CNN (PCNN) [23] became the common architecture for embedding sentences and handling DS noise [24][25][26][27][28][29]. Moreover, Graph-CNNs (GCNN) proved an effective method for encoding syntactic information from text [30].
The development of pre-trained language models (LMs) that rely on transformer architecture [31] and enable to transfer common knowledge in downstream tasks has been shown to capture semantic and syntactic features better [32]. In particular, it has been shown that pre-trained LMs significantly improve the performance in text classification tasks, prevent overfitting and increasing sample efficiency [33]. Moreover, methods in [34,35] that fine-tune the pre-trained LM models, as also observed in [18,19] who extended GPT [32] and BERT [17] models, respectively, to the DS setting by incorporating a multi-instance training mechanism, show that pre-trained LMs provide a stronger signal for DS than specific linguistic and side-information features [30].

Information Extraction from Literary Texts
While relation extraction has been extensively studied in news and biomedical corpora, extracting semantic relationships from literary texts is a much less studied area. Existing research attempts to understand narration mostly from the viewpoint of character relationships but not to augment existing KBs or enhance a story's metadata in an online repository. An explanation based on [10] is the difficulty in automatically determining meaningful interpretations (i.e., predefined relations) and the lack of semantically annotated corpora. Therefore, most research is focused on extracting a limited set of relationships among characters, such as "interaction" [10][11][12], "mention" [10] and "family" [13].
The key challenges in extracting relations from literary texts are listed out in [36], an excellent survey on extracting relations among fictional characters. The authors point out that there can be significant stylistic differences among authors and grammar misformats in books of different periods, while the closed-word fashion of fiction where plot involves recurring entities entails coreference resolution issues. This work aims at capturing relations not only between people or places but also between organizations, dates and work of art titles.

Metadata Enhancement
It was only two decades ago when book information was only available by accessing libraries. On the other hand, nowadays we suffer from information overload, with libraries now including their own databases to facilitate search [37].
With increasing digital content being added to the enormous collection of libraries, archives, etc., providing machine-readable structured information to facilitate information integration and presentation [38] is becoming increasingly important and challenging. Moreover, research has shown that providing metadata in fiction books highly affects the selection of a fiction book and their perception on the story [39,40]. For that reason, we believe that enhancing the metadata of literary texts is crucial.

Materials and Methods
As discussed in the Introduction, extracting cultural information from literary texts demands either a plethora of annotations or robust augmentation techniques that can capture a representative sample of annotations and boost machine learning techniques. Meanwhile, automatically augmented datasets are always accompanied by noise, while creative writing's characteristics set an extra challenge.
In this section, we present a new dataset for Greek literary fiction from the 19th century. The dataset was created by aligning entity pair-relation triplets to a representative sample of Greek 19th century books. Even though we efficiently manage to augment the training samples, these inevitably suffer from noise and include imbalanced labels. Moreover, the special nature of the 19th century Greek language sets an extra challenge. We present our model as follows: a distantly supervised transformer-based RE method based on [18] that has proven to efficiently suppress noise from DS using multi-instance learning and exploiting a pre-trained transformer-based LM. Our model proposes a simpler configuration for representing the embedding of the final sentence, which manages to capture a larger number of relations by using information about the entity types and the Greek BERT's [16] pre-trained model.

Benchmark Dataset
Preserving semantic information from cultural artifacts requires either extensive annotation that is rarely available or automatically augmented datasets to sufficiently capture context. In the case of literary texts, no dataset exists to train our models. Taking into account that the greatest part of digitized Modern Greek literature refers to the 19th century, we construct our dataset by aligning relation-triples from [41] to twenty-six (26) literary Greek books of the 19th century (see Table A1). Namely, we use the provided relation triplets (i.e., head-tail-relationship triplets) as an external knowledge base (KB) to automatically extract sentences that include the entity pairs, assuming that these sentences also express the same relationship (distant supervision).
The dataset's six specific relations and their statistics can be found in Table 1. Train, validation and test datasets follow a 80%-10%-10% split. We assume that a relationship can occur within a period of three consequent sentences and only between two named entities. Sentences that include at least two named entities of different types but do not constitute a valid entity pair are annotated with a "NoRel" relation. These can either reflect sentences with no actual underlying relation or sentences for which the annotation is missed. The dataset also includes the named entity types of the sentence's entity pair. The following five entity types are utilized: person (PER), place (GPE), organization (ORG), date (DATE) and book title (TITLE). We made this dataset publicly available (Data available at: https://github.com/intelligence-csd-auth-gr/extracting-semantic-relationships-fromgreek-literary-texts (accessed on 3 August 2021)) to encourage further research on 19th century Greek literary fiction. The challenges of this dataset are threefold. At first, similar to all datasets created via distant supervision, ours also suffers from noisy labels (false positives) and is imbalanced, including relations with a varying number of samples. Secondly, the dataset includes misspellings stemming from the books' digitization through OCR systems. Lastly, the documents use a conservative form of the modern Greek language, katharevousa, which was used between the late 18th century and 1976. Katharevousa, which covers a significant part of modern Greek literature, is more complex than modern Greek, including additional cases, compound words and other grammatical features that set an extra challenge for the algorithm.

The Proposed Model Architecture
In this section, we present our approach towards extracting semantic relationships from literary texts. We highlight that the specific challenges that we have to address are as follows: DS noise, imbalanced relations, character misspellings due to OCR, Katharevousa form of Greek language and creative writing peculiarities. Inspired by [18,19] who showed that DS and pre-trained models can suppress noise and capture a wider set of relations, we propose an approach that efficiently handles the aforementioned challenges by using multi-instance learning, exploiting a pre-trained transformer-based language model and incorporating entity type side-information.
In particular, given a bag of sentences {s 1 , s 2 , . . . , s n } that concern a specific entity pair, our model generates a probability distribution on the set of possible relations. The model utilizes the GREEK-BERT pre-trained LM [16] to capture the semantic and syntactic features of sentences by transferring pre-trained common-sense knowledge. In order to capture the specific patterns of our corpus, we fine-tuned the model using multi-instance learning; namely, we trained our model to extract the entity pairs' underlying relation given their associated sentences.
During fine-tuning, we employ a structured, RE-specific input representation to minimize architectural changes to the model [42]. Each sentence is transformed to a structured format, including a compressed form of the sentence along with the entity pair and their entity types. We transform the input into a sub-word level distributed representation using byte-pair encoding (BPE) and positional embeddings from GREEK-BERT fine-tuned on our corpus. Lastly, we concatenate the head and tail entities' types embeddings, as shaped from BERT's last layer, to form the final sentence representation that we used to classify the bag's relation.
The proposed model can be summarized in three components: the sentence encoder, the bag encoder and model training. Components are described in the following sections with the overall architecture shown in Figures 1 and 2.  Transformer architecture (left) and training framework (right). We used BERT transformer architecture and precisely the bert-base-greek-uncased-v1 GREEK-BERT LM. Sentence representation s i is formed as shown in Figure 1. Reprinted with permission from [18]. Copyright 2021 Copyright Despina Christou.

Sentence Encoder
Our model encodes sentences into a distributed representation by concatenating the head (h) and tail (t) entity type embeddings. The overall sentence encoding is depicted in Figure 1, while the following sections examine in brief the parts of the sentence encoder in a bottom-up manner.
In order to capture the relation hidden between an entity pair and its surrounding context, RE requires structured input. To this end, we encoded sentences as a sequence of tokens. At the very bottom of Figure 1 is this representation, which starts with the head entity type and token(s) followed by the delimiter (H-SEP), continues with the tail entity type and token(s) followed by the delimiter [T-SEP] and ends with the token sequence of a compressed form of the sentence. The whole input starts and ends with the special delimiters [CLS] and [SEP], respectively, which are typically used in transformer models. In BERT, for example, [CLS] acts as a pooling token representing the whole sequence for downstream tasks, such as RE. We do not follow that convention. Furthermore, tokens refer to the sub-word tokens of each word, where each word is also lower-cased and normalized in terms of accents and other diacritics; for example the word "Αρσάκειο" (Arsakeio) is split into the "αρ" ("ar"), "##σα" ("##sa") and "##κειο" ("##keio") sub-word tokens.

Input Representation
As discussed in Section 3.1, samples including a relation can include up to three sentences; thus, samples generally referenced as sentences within the document can entail information which is not directly related to the underlying relation. Moreover, creative writing's focus on narration results in long secondary sentences that further disrupt the content linking the two entities. In order to focus on the important to the relation tokens, we adopt two distinct compression techniques, namely the following: • trim_text_1: Given a sentence, it preserves the text starting from the three preceding words of the head entity to the three following words of the tail entity; • trim_text_2: Given a sentence, it preserves only the surrounding text of the head and tail entities, with surrounding text referring to the three preceding and following words of each entity.
Our selection is based on the fact that context closer to the entities holds the most important relational information. We experimented with two compressed versions of the text, one that keeps all text between the two entities (trim_text_1) and one that keeps only the very close context (trim_text_2) assuming that the in-between text, if long enough, typically constitutes a secondary sentence, irrelevant to the underlying relation. Our assumption is reassured in our experiments (see Sections 4 and 5).
After suppressing the sentences to a more compact form, we also incorporate the head and tail entities text and types in the beginning of the structured input to bias LM focusing on the important for the entity pair features. Extensive experimentation reveals that the extracted entity type embeddings hold the most significance information for extracting the underlying relation within two entities. Entity types are considered known and are also provided in the dataset.

Input Embeddings
Input embeddings to GREEK-BERT are presented as h 0 in Figure 1. Each token's embedding results from summing the positional and byte pair embeddings for each token in the structured input.
Position embedding is an essential part of BERT's attention mechanism, while bytepair embedding is an efficient method for encoding sub-words to account for vocabulary variability and possible new words in inference.
To make use of sub-word information, the input is tokenized using byte-pair encoding (BPE). We use the tokenizer of the pre-trained model (35,000 BPEs) to which we added seven task-specific tokens (e.g., [H-SEP], [T-SEP] and five entity type tokens). We forced the model not to decompose the added tokens into sub-words because of their special meaning in the input representation.

Sentence Representation
Input sequence is transformed into feature vectors (h L ) using GREEK-BERT's pretrained language model fine-tuned in our task. Each sub-word token feature vector (h L i . . . D t ) is the result of BERT's attention mechanism over all tokens. Intuitively, we do understand that feature vectors of specific tokens are more informative and contribute more in identifying the underlying relationship.
To the extent that each relation constrains the type of the entities involved and vice versa [30,43], we represent each sentence by concatenating the head and tail entities' type embeddings: where s i ∈ d h * 2 . While it is typical to encode sentences using the vector of the [CLS] token in h L [11], our experiments show that representing a sentence as a function of the examining entity pair types reduces noise, improves precision and helps in capturing the infrequent relations.
Several other representation techniques were tested; i.e., we tested the method of also concatenating the [CLS] vector to embed the overall sentence's information and also using the sentence representation from [18], including relation embeddings and further attention mechanisms, with the presented method to outperform. Our intuition is that the LM was not able to efficiently capture patterns in Katharevousa since manual observation revealed most words to have split in many sub-words. This occurs because Katharevousa differs to Modern Greek, while some words/characters were also misspelled in the OCR process.

Bag Encoder
Bag encoding, i.e., aggregation of sentence representations in a bag, comes to reduce noise generated by the erroneously annotated relations accompanying DS.
Assuming that not all sentences equally contribute to bag's representation, we use selective attention [24] to highlight the sentences that better express the underlying relation.
As observed in the above equation, selective attention represents each bag as a weighted sum over its individual sentences. Attention α i is calculated by comparing each sentence representation against a learned representation r: At last, the bag representation B is fed to a softmax classifier in order to obtain the probability distribution over the relations: where W r is the relation weight matrix, and b r ∈ d r is the bias vector.

Training
Our model utilizes a transformer model, precisely GREEK-BERT, which fine-tunes on our specific setup to capture the semantic features of relational sentences. Below, we briefly present the overall process.

Pre-training
For our experiments, we use the pre-trained bert-base-greek-uncased-v1 language model [16], which consists of 12 layers, 12 attention heads and 110M parameters where each layer is a bidirectional Transformer encoder [31]. The model is trained on uncased Modern Greek texts of Wikipedia, European Parliament Proceedings Parallel Corpus (Europarl) and OSCAR (clean part of Common Crawl) with a total of 3.04B tokens. GREEK-BERT is pre-trained using two unsupervised tasks, masked LM and next sentence prediction, with masked LM being its core novelty as it allows the previously impossible bidirectional training.

Fine-tuning
We initialize our model' s weights with the pre-trained GREEK-BERT model and fine-tune only the last four layers under the multi-instance learning setting presented in Figure 2, using the specific input shown in Figure 1. After experimentation, only the last four layers are fine-tuned.
During fine-tuning, we optimize the following objective: where for all entity pair bags |B| in the dataset, we want to maximize the probability of correctly predicting the bag's relation (l i ) given its sentences' representation and parameters (θ).

Hyper-Parameter Settings
In our experiments we utilize bert-base-greek-uncased-v1 model with hidden layer dimension D h = 768, while we fine-tune the model with max_seq_length D t = 128. We use the Adam optimization scheme [44] with β 1 = 0.9, β 2 = 0.999 and a cosine learning rate decay schedule with warm-up over 0.1% of training updates. We also minimize loss using the cross entropy criterion.
Regarding dataset-specific REDSandT_Lit model's hyper-parameters, we automatically tune them on the validation set based on F1-score. Table 2 shows the applied search space and selected values for the dataset-specific hyper-parameters. Experiments are conducted in Python 3.6, on a PC with 32.00 GB RAM, Intel i7-7800X CPU@ 3.5 GHz and NVIDIA's GeForce GTX 1080 with 8 GB. Fine-tuning takes about 5 min for the three epochs. The implementation of our method is based on the following code: https://github.com/DespinaChristou/REDSandT (accessed on 18 May 2021).

Baseline Models
In order to show the proposed method's effectiveness, we compare against three strong baselines in our dataset. More precisely, we compare REDSandT_Lit to the standard featurebased [45] and NN-based [46] approaches used in the literature while also comparing to the Greek version of BERT [16]. All models were tested on both sentence compression formats presented in Section 3.2.1 and are indicated with respective (1, 2) superscripts. For the Bi-LSTM approach we also experimented with both full-word and BPE tokenization indicated with ( ) and ( ) superscripts, respectively.

Feature-based Methods
• SV M 1 : A Support Vector Machine classifier. Sentences are encoded using the firstpresented compression format.
• SV M 2 : A Support Vector Machine classifier. Sentences are encoded using the secondpresented compression format.

Transformer-based Methods
• GREEK-BERT: BERT (bert-base-uncased) fine-tuned on modern Greek corpora. We fine-tune this to our specific dataset and task. • REDSandT 2 : The default REDSandT approach for distantly supervised RE. We use GREEK-BERT as base, and we fine-tune the model on our corpus and specific task. Sentences are encoded using the second-presented compression format. • REDSandT_Lit 1 : The proposed variant of REDSandT fine-tuned on our corpora and specific task. Sentences are encoded using the first-presented compression format. • REDSandT_Lit 2 : The proposed variant of REDSandT fine-tuned on our corpora and specific task. Sentences are encoded using the second-presented compression format.

Evaluation Criteria
In order to evaluate our model against baselines, we report accuracy macro-P, R, F and weighted-P, R, F for all models. For a more in-depth analysis of models' performance in each relation, we report Precision, Recall and F1-score metrics for all models and relations. Moreover, we conduct Friedman's statistical significance test to compare all presented models on our dataset, following [47,48].

Results
In this section, we present the results of our model against the predefined baselines both overall and for each relation, separately. Table 3 compares our model to the baseline models mentioned above. We observed the following: (1) both REDSandT_Lit 1 and REDSandT_Lit 2 are better overall in terms of precision, recall and F1-score, followed by SV M 2 and BiLSTM 2, * ; (2) preserving the surrounding context of entity pairs (trim_text_2) almost always results in better results; and (3) using full-word tokenization in Bi-LSTM models shows a tremendous performance improvement over using BPE tokenization. Focusing on the REDSandT_Lit models, a detailed investigation of their performance on each separate relation showed that the high accuracy achieved by REDSandT_Lit 1 was mainly due to that model being highly accurate in identifying "NoRel" relations. This explains the differences in macro vs. weighted metrics of REDSandT_Lit 1 .

Overall Models Evaluation
Moreover, when it comes to training times, the SVM models are clearly the winner with training times less than a sec, with the rest models deviating from 4 min (BERT-based trained in GPU) to 20 min (BiLSTM trained in CPU). Moreover, it is worth mentioning that the extra complexity added by bag training induces only 10 s additional training time in REDSandT_Lit compared to the training time of the simple Bert models. Table 3. Baselines Comparison. We report the overall accuracy (ACC), precision (P), recall (R) and F1-score (F1) at the Test set. For P, R and F1 we present both macro-version and weighted-version of the metrics. In order to validate the contribution of all presented models, we compare (i) all examined models and (ii) the best performed ones by using the Friedman's statistical test. As observed in Table 4, the p-value of both compared model sets is less than 0.05 (actually close to zero); thus, we have sufficient evidence to conclude that using different models results in statistical differences in the predicted relations and that our outcomes are statistical significant.

Models Evaluation on Each Relation
Tables 5-7 compare our models to the above-mentioned baselines across all relations, reporting precision, recall and F1-score, respectively. Overall, we observed following: (1) the REDSandT_Lit models exhibit strong performance across all relations, while REDSandT_Lit 2 best captures relations in the long-tail; (2) SV M 1 , SV M 2 and BERT 2 are generally consistent but all Bi-LSTM models exhibit significant performance variabilities; and (3) SV M models perform well regardless of chosen sentence compression.    Figure 3 presents the confusion matrices for REDSandT_Lit 2 and SV M 2 models. Even though the SVM model seems to slightly over-perform the REDSandT_Lit approach, the confusion matrices show that this superiority comes from the "NoRel" relation. Excluding the "NoRel" relation, REDSandT_Lit 2 model performs much better across all relations including those in the long tail. As previously discussed, "NoRel" relation can include sentences which do not contain a relation or were not annotated. For this reason, we further analyze the performance in this class below.

Effectiveness on Mislabelled Sentences
Sentences marked with "NoRel" relation correspond to sentences that include at least two recognized entities but where not annotated with a relation. This can correspond either in no underlying relation within sentence or a missed annotation. In order to examine this case, we further investigate the performance of the best performing models on the "NoRel" relation. Our goal is to reveal the model that can capture missed annotations and propose it as an efficient model that can correct mislabels and augment samples which in our case and industry-wise is of high importance. Table 8 compares the best two models on predicting mislabelled samples. We observe that REDSandT_Lit 2 is superior to SV M 2 in this task and precisely in identifying "artAuthor" relations within sentences that were not annotated.

Conclusions and Future Work
We proposed a novel distantly supervised transformer-based relation extraction model, REDSandT_Lit, that can automate metadata extraction from literary texts, thus helping sustaining important cultural insights that otherwise could be lost in unindexed raw texts. Precisely, our model efficiently captures semantic relationships from Greek literary texts of the 19th century. We constructed the first dataset for this language and period, including 3649 samples annotated through distant supervision with six semantic relationships. The dataset is in the Katharevousa variant of Greek, in which a great part of Modern Greek literature is written. In order to capture the semantic and syntactic characteristics of the language, we exploited GREEK-BERT, a pre-trained language model on modern Greek, which we fine-tuned on our specific task and language. To handle the problem of noisy instances, as well as the long sentences that are typical in literary writing, we guided REDSandT_Lit to focus solely on a compressed form of the sentence and the entity types of the entity pair. Extensive experiments and comparisons with existing models on our dataset revealed that REDSandT_Lit has superior performance, manages to capture infrequent relations and can correct mislabelled sentences.
Extensions of this work could focus on augmenting our dataset to facilitate direct BERT pre-training on the Katharevousa form of the Greek language. Even though we achieve high accuracy with pre-trained models in Modern Greek and finetuned on the Katharevousa variant, this inconsistency suggests that augmenting the studied data and providing a model specific to these data can further improve results. Moreover, we would like to further investigate the effect of additional side-information such as POS info and entities description, while also an end-to-end model that is not based on pre-recognized entities and extracts both entities and relations in one pass. At last, although there is extensive research on ancient Greek philosophy, literature and culture, as well as research in modern Greek Natural Language Processing (NLP) tools, the very important (from a cultural, literary and linguistic point of view) Katharevousa form of the Greek language has not been studied in terms of automatic NLP tools. Thus, creating automated tools specific to this form is a step towards revealing important cultural insights for the early years of the modern Greek state.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: