Paragraph Boundary Recognition in Novels for Story Understanding

: The understanding of narrative stories by computer is an important task for their automatic generation. To date, high-performance neural-network technologies such as BERT have been applied to tasks such as the Story Cloze Test and Story Completion. In this study, we focus on the text segmentation of novels into paragraphs, which is an important writing technique for readers to deepen their understanding of the texts. This type of segmentation, which we call “paragraph boundary recognition”, can be considered to be a binary classiﬁcation problem in terms of the presence or absence of a boundary, such as a paragraph between target sentences. However, in this case, the data imbalance becomes a bottleneck because the number of paragraphs is generally smaller than the number of sentences. To deal with this problem, we introduced several cost-sensitive loss functions, namely. focal loss, dice loss, and anchor loss, which were robust for imbalanced classiﬁcation in BERT. In addition, introducing the threshold-moving technique into the model was effective in estimating paragraph boundaries. As a result of the experiment on three newly created datasets, BERT with dice loss and threshold moving obtained a higher F 1 than the original BERT had using cross-entropy loss as its loss function (76% to 80%, 50% to 54%, 59% to 63%).


Introduction
With the improvement in natural-language processing, commercial expectations for the automatic generation of scenarios in novels, movies, games, etc. have been increasing. For example, in Japan, there is a literary award that explicitly allows for works written by artificial intelligence (https://hoshiaward.nikkei.co.jp/, accessed on 18 June 2021). Studies on story generation have been proposed since ancient times [1][2][3]; recently, various approaches based on neural networks were proposed [4][5][6][7][8]. However, it is very difficult to generate high-quality narrative sentences that satisfy readers. Therefore, it is important for the computer to understand the special sentence structure of the novel as a step-by-step study with the ultimate goal of the automatic generation of the novel by the computer. A computer understanding a story such as a novel is generally a task of modeling the consistency of sentences. Thus far, research has been conducted on the understanding and generation of narrative sentences from various perspectives such as the Story Cloze Test (SCT) [9,10] and Story Completion (SC) [11], ordering sentences into a story [12][13][14].
In relation to tasks aimed at story understanding by a computer as described above, we focused on segmenting text from novels into paragraphs, which is an important technique used in sentence writing in novels. It is also an operation that divides a document into paragraphs to improve readability. Seki et al. [15] discussed the importance of paragraphs in texts. In their study, the effect of a paragraph display as a document layout on the reader's understanding of content was investigated. Specifically, they conducted an experiment in which a collaborator read correctly paragraphed sentences, intentionally incorrectly paragraphed sentences, and unparagraphedsentences, and then compared their comprehension of the contents. They concluded that a proper paragraph setting was important for facilitating the reader's understanding of the document.
The features of paragraphs are slightly different between descriptive or logical sentences, such as academic treatises, and literary sentences, such as in novels and essays. This is essential because logical texts focus on accurately and concisely disclosing information to the reader, whereas literary texts focus on emotionally impressing on the reader [16]. In the former, one paragraph focuses solely on one topic, centered on a core sentence, which is called a topic sentence. It is necessary for the content and assertions in each sentence composing a paragraph to be consistent, and studies numerically defined the consistency of paragraphs for scientific and technological sentences [17]. By contrast, the latter is based on the transition of topics and scenes over time, and generally does not apply a topic sentence [16]. Therefore, there are no common rules that the writer should uniformly follow in the text segmentation of a novel, and such segmentation is an extremely difficult operation that requires a high level of skill. Although the demand for systems that assist in text segmentation of novels into paragraphs is high, few studies have been conducted in this area owing to the difficulty.
Placing new paragraphs in the appropriate positions on the basis of the transitions of scenes and topics helps readers to fully understand the story. Therefore, the position where a new paragraph starts contains important sensory information for people who write or read novels. On the basis of this assumption, in this study, as a stepwise approach with the automatic creation of a novel as the ultimate objective, we estimate paragraph boundaries from the perspective of a computer's understanding of the story in the novel. We regarded this task as a binary classification problem regarding whether target sentences belong to the same paragraph, which we call "paragraph-boundary recognition". However, because the number of paragraphs is small relative to the number of sentences, it is necessary to consider the imbalance in the number of the data. Therefore, we used focal loss [18], dice loss [19], and anchor loss [20], which were confirmed to be robust to imbalanced classification, as a loss function of BERT, which is highly accurate in various naturallanguage processing (NLP) tasks. Through experiments, we confirmed that the novel could be divided into appropriate paragraphs by using our method. This result suggests that it is possible to divide narrative sentences generated by humans or computers into appropriate paragraphs and improve the readability of the texts. Our contributions are summarized as follows: • We regarded the text segmentation of novels into paragraphs as imbalanced classification regarding the presence or absence of a paragraph boundary between two consecutive sentences, and applied BERT, which introduced multiple cost-sensitive loss functions. • We confirmed that the accuracy of paragraph-boundary recognition by our approach can be improved by applying threshold moving, which is one of the methods for dealing with imbalanced classification. • Our experiment using multiple author-specific datasets, newly created for this study, showed that the proposed method could recognize paragraph boundaries with higher accuracy than that of a conventional text-segmentation method.

Related Works
The tasks in this study are strongly related to text segmentation and studies on classification problems for imbalanced data. We describe the background of such research in this section.

Story Understanding
Thus far, research has been conducted on the understanding and generation of narrative sentences from various perspectives. For example, the Story Cloze Test (SCT) [9,10] selects the optimal ending following the input sentences, and evaluates the performance of the model on the basis of the correct answer rate. Various studies included a classifier using recurrent neural networks (RNN) and a model based on the transfer learning of bidirectional encoder representations from transformers (BERT) [21][22][23][24][25][26].
Guan et al. [11] proposed Story Completion (SC), which is an extension of SCT, and is the task of generating a sentence that is missing in a given sentence. In addition, Gupta et al. [27] focused on the fact that the ending of a story is not uniquely determined and that multiple endings are possible, and proposed a method for generating various story endings. Mori et al. [28] added missing-position prediction, which is an operation used to estimate a missing sentence, to a conventional SC where the position of the complementary sentence is specified, assuming an actual writing-support scene.
Another task for understanding stories is ordering sentences of narrative stories. Sentence ordering [29] is the task of rearranging sentences to maximize the evaluation value for the consistency of a document. A method for obtaining context information at the paragraph level using the Pointer Network [30] based on an RNN was also proposed [12,31]. Wang et al. [13], and Oh et al. [32] proposed models using the attention mechanism, and Cui et al. [14] proposed a model that utilizes the dependency between sentences acquired by BERT.

Text Segmentation
The operation of dividing a sentence into semantic groups based on topics is generally known as text segmentation [33][34][35]. This is an important task applied to various aspects in the field of NLP, such as sentence summarization and question answering, from the viewpoint of understanding the meaning of natural language by a computer. Text-segmentation methods proposed thus far are roughly divided into unsupervised and supervised algorithms.
TextTiling [33] is a type of unsupervised text segmentation that utilizes the fact that specific words frequently appear in the same segment, and calculates the similarity of each segment from such vectors. Glavaš et al. [36] proposed an unsupervised algorithm for constructing semantic-relevance graphs of sentences using the word-embedding expression and a measure of semantic relevance of short sentences. The nodes in this graph represent sentences, and the edges between the two sentences indicate that the sentences are semantically similar. Segmentation is then determined by finding the maximal cliques of adjacent sentences and heuristically completing the segmentation.
In addition, there is a model that uses long short-term memory (LSTM), which is a type of RNN, as a method of supervised learning [37]. Such a model can efficiently model the input sequence by controlling the flow of information over time. Badjatiya et al. [38] proposed an attention-based convolutional neural network bidirectional LSTM model that introduced the attention mechanism and learned the relative importance of each sentence in the text to achieve segmentation. Glavaš et al. [39] proposed a multitask learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones.
In our study, we treat text segmentation as an imbalanced classification that should consider the imbalance in the number of segments and the number of sentences, as opposed to the above-mentioned studies.

Imbalanced Classification
There are two main approaches to the classification problem, in which the number of data in each class is imbalanced: a resampling method and cost-sensitive learning.
Resampling is a method of generating balanced distribution by making changes to imbalanced data. Imbalance in the number of data is eliminated by undersampling the majority class [40,41] or oversampling the minority class. The simplest oversampling method is to randomly duplicate a minority-class instance, but can cause overfitting owing to the redundant distribution of data. To solve this problem, Chawla et al. [42] proposed SMOTE, which is a basic approach using data synthesis. SMOTE randomly selects seed samples to balance the dataset, and applies linear interpolation between the seed sample and one of its neighbors to synthesize a new sample.
By contrast, cost-sensitive learning is a method for improving the classifier itself through learning, which applies a loss function with different weights to each data sample, instead of changing the distribution of training data. This method is often associated with research dealing with object-detection problems, particularly in the field of image processing. This is because the background occupies most of the image in an objectdetection problem, and it is necessary to eliminate the imbalance of the label to identify a specific object in the minority class. Thus far, studies segmented medical images using a loss function based on the Dice coefficient [43,44], and robust losses for imbalanced data, such as focal loss [18], dice loss [19], and anchor loss [20] were proposed. We applied BERT, adopting focal loss and dice loss as the loss functions, to the text segmentation of novels into paragraphs, and demonstrated the effectiveness of the approach [45][46][47]. Furthermore, Li et al. [48] also applied BERT, introducing dice loss to tasks such as part-of-speech tagging and named-entity recognition as a classification problem of imbalanced classification in the field of NLP, and clarified its effectiveness.
Threshold moving is an alternative technique that can deal with class imbalance. As the main difference between resampling and threshold-based methods, the former relies on data preprocessing before the learning phase, whereas the latter relies on manipulating the model output. This technique is utilized using some popular learning methods, including ensemble learning [49][50][51][52]. In this study, we apply BERT, which introduces a cost-sensitive loss function, to the paragraph-boundary recognition of the novel, and adjust the decision threshold as a hyperparameter to improve estimation accuracy.

Technical Background
In this section, we describe BERT in detail, which is the basis of our paragraphboundary recognition model, and the loss functions used in the classification.

BERT
BERT [22] is a general-purpose language model based on a multiple bidirectional transformer [53] that outputs a distributed representation of an input sequence and words included in the sequence. In this work, we used BERT BASE (L = 12, H = 768, A = 12, Total Parameters = 110M), where L, H, and A are the number of transformer blocks, the hidden size, and the number of self-attention heads, respectively. BERT improves the performance of a language model by pretraining a large-scale corpus. For prelearning, masked word prediction was applied to predict the original word of a sentence, in which a portion of the input sentence had been replaced with a token [MASK], and the next sentence prediction was used to correctly identify the continuity of the two sentences as the input.
BERT represents a single sentence or a pair of sentences (for example, pair < question, answer >) as a sequence of tokens according to the following features: BERT uses WordPiece embeddings [54]. To apply BERT in classification tasks, such as polarity determination and document classification, the vector output for token [CLS] added to the head of the input sentence was input into the classifier. In particular, when inputting two sentences into the model, token [SEP] is inserted between the two sentences, which are combined and treated as a single sequence. Embedding is added to every token indicating whether it belongs to the first or the second sentence. For a given token, its input representation is constructed by summing the corresponding token, position, and segment embeddings. In BERT, after converting a sentence or sentence pair into a distributed representation, it is used as an input to solve applied tasks such as classification and regression using a multilayer perceptron. At this time, fine tuning using a pretrained model can be applied to the tasks to be solved.

Loss Functions
We detail some of the loss functions adopted to address the data imbalance, which is the bottleneck of the tasks on which this study focuses. In the following, the shown loss function assumes a binary-classification problem for convenience. Let X denote a set of training instances, and each instance x i ∈ X be associated with a golden binary label y i = [y i0 , y i1 ] denotes the ground-truth class to which x i belongs, and p i = [p i0 , p i1 ] denote the predicted probabilities of the two classes, respectively, where y i0 , y i1 ∈ {0, 1}, p i0 , p i1 ∈ [0, 1] and p i1 + p i0 = 1.

Cross Entropy Loss
The original BERT uses cross-entropy (CE) loss to solve classification problems, which is given as follows: In general, for classification problems that target imbalanced data, weights α ∈ [0, 1] are introduced into the CE loss to adjust the balance, as shown in Equation (2), and the importance is considered on the basis of the size of each class. In many cases, the reciprocal of the number of data included in each class is adopted as a practical value of α.
Madabushi et al. [55] showed the effectiveness of changing the loss function in a fully connected layer, which is the final layer of BERT, into weighted CE loss for the classification problem of imbalanced data in identifying propaganda.

Focal Loss
Focal loss (FL) is a loss function proposed by Lin et al. [18] that dynamically scales CE loss. The above-mentioned weighted CE loss makes it possible to consider importance on the basis of the size of each class; however, it cannot distinguish the difficulty of identification for each class. However, FL introduces a modulation factor that attenuates the contribution of errors from easily identifiable examples and prevents overwhelming loss functions. This allows for the model to effectively focus on examples that are difficult to identify. Specifically, term (1 − p t ) γ containing γ ≥ 0 is introduced into CE loss, which can be tuned as shown in Equation (3).
When γ = 0, the FL is equivalent to the CE loss.

Dice Loss
One of the indicators used to evaluate the classification model of imbalanced data is F1. Dice coefficient (Sørensen-Dice coefficient: DSC) is the F1-oriented statistical index. Although the DSC is generally an index for measuring similarity between two sets, it may also be used in the segmentation of affected images in the medical field in connection with the imbalanced classification problem [56]. Li et al. [48] showed the relationship between the Dice coefficient and F1 as follows: First, given two sets A and B, the DSC is given as follows: In this study, set A is the set of samples determined to be positive by the model, and B is the set of ground-truth samples. Here, using the true-positive (TP), false-positive (FP), and false-negative (FN) rates, the relationship between the Dice coefficient and F1 is expressed as follows: On the basis of the above definition, the value of the DSC for each sample x i is given as follows: where provides numerical stability to prevent division by zero. Milletari et al. [19] proposed an objective function that squares each term of the denominator in the Dice coefficient, and defined the dice loss as a loss function to maximize it as follows: 3.

Anchor Loss
Motivated by focal loss, anchor loss (AL) [20] is a loss function that dynamically scales CE loss on the basis of the difficulty of predicting the sample. Similar to focal loss, AL was proposed for use in an object-detection task where the imbalance between the number of pixels of the background and the target object is a bottleneck. Focal loss addresses the class-imbalance issue by avoiding updating the main gradients for samples that are easy to predict. AL, by contrast, takes advantage of the difference in probabilities for targeted and nontargeted objects, and adjusts the scale of loss for the sample during training. At this time, on the basis of the difficulty of the prediction defined using the reference value, called anchor probability p * , obtained from the network prediction, the loss value is dynamically reweighted. Penalties larger than or equal to CE loss are imposed when the predicted probabilities for nontargets are higher than anchor probabilities. We set the target classprediction score as the anchor probability on the basis of the report by Ryou et al. Anchor loss is given as follows using hyperparameter γ ≥ 0:

Evaluational Experiment
This section details the conducted experiments to confirm the effectiveness of the proposed method.

Paragraph-Boundary Recognition Dataset
A paragraph is a unit of semantically divided sentences, as shown in Figure 1. We set the task called "paragraph-boundary recognition" to identify whether any two consecutive sentences in a novel belonged to the same paragraph, that is, whether there was a paragraph boundary between any two consecutive sentences in a novel (Figure 2).  For the experiments conducted in this study, we created new datasets from the novels in Project Gutenberg (https://www.gutenberg.org/, accessed on 1 April 2021). The data were divided into sentences using the PUNKT tokenizer from NLTK [57]. We defined a set of sentences from an indentation at the beginning of a sentence to line breaks as a paragraph. On the basis of the above definition, a conversational sentence was defined as a single independent paragraph. Generally speaking, a conversational sentence starts with a quotation mark; therefore, it is easier to discriminate on the basis of the surface or symbolic grounds compared to a paragraph among descriptive sentences. Therefore, we did not count conversational sentences as a single paragraph. Table 1 shows examples in the datasets. The input format for the model is based on the BERT prelearning format, that is, [CLS] Sentence1 [SEP] Sentence2 [SEP]. We used samples where Sentence 1 and Sentence 2 were in different paragraphs, that is, samples with paragraph boundaries between two target sentences as positive samples. On the other hand, we used samples where Sentence 1 and Sentence 2 were in the same paragraphs, that is, samples with NO paragraph boundaries between two target sentences as negative samples. We constructed the datasets for our experiment using the following works:  "Whenever you feel like criticizing anyone," he told me, "just remember that all the people in this world haven't had the advantages that you've had." 2 1 "Whenever you feel like criticizing anyone," he told me, "just remember that all the people in this world haven't had the advantages that you've had.
He did n't say any more, but we've always been unusually communicative in a reserved way, and I understood that he meant a great deal more than that.

0
He didn't say any more, but we've always been unusually communicative in a reserved way, and I understood that he meant a great deal more than that.
In consequence, I'm inclined to reserve all judgements, a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores.

0
In consequence, I'm inclined to reserve all judgements, a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores.
The abnormal mind is quick to detect and attach itself to this quality when it appears in a normal person, and so it came about that in college I was unjustly accused of being a politician, because I was privy to the secret griefs of wild, unknown men. Figure 3 shows how to split text for each novel into training, validation, and test data. In this study, 90% of the text from the beginning of each work was used for the training and validation data, and the last 10% of the text was used as the test data. This is based on the assumption that, when actually writing a novel, part of the novel is manually divided into paragraphs, and the rest of the text is automatically divided into paragraphs on the basis of the tendency. Table 2 shows the statistical information on the number of labels for each dataset.

Setup
We set the BERT parameters in the proposed model as follows: a maximal sequence length of 256, training batch size of 32, learning rate of 5 × 10 −6 , and 5 training epochs. We used a standard three-layer perceptron as the final classifier to compare the estimation accuracy of the task owing to the difference in the loss function used for classification. We used a pretrained model (uncased_L-24_H-1024_A-16) publicly available from Google Research (https://github.com/google-research/bert, accessed on 1st October 2020). The compared models in the experiment are as follows: TextTiling [33]: Baseline model. This is one of the first unsupervised algorithms for lineartext segmentation that uses the fact that words tend to be repeated in coherent segments, and measures the similarity between paragraphs by comparing their sparse term vectors.
Koshorek et al. Model [37]: Baseline model. This is a text-segmentation method based on LSTM. The distributed expression for words contained in a sentence obtained by Word2Vec [58] is input using bidirectional LSTM, and the output is used as the distributed expression of the sentence. For word embeddings, we used the Google News word2vec pretrained model (https://code.google.com/archive/p/word2vec/, accessed on 25 November 2020).
BERT + CE, BERT + FL, BERT + DL, BERT + AL: This model adopts cross entropy loss, focal loss, dice loss, and anchor loss as the loss functions of BERT, respectively. The values of the γ hyperparameters of focal loss and anchor loss were determined through a grid search on the verification data.

BERT + CE + TM, BERT + FL + TM, BERT + DL + TM, BERT + AL + TM:
This is a model in which threshold moving is applied to the above-mentioned BERT + CE, BERT + FL, BERT + DL, and BERT + AL. Decision threshold τ was determined through a grid search on the verification data.
We used F1 and P k as evaluation metrics. P k is an evaluation metric for text segmentation proposed by Beeferman et al. [59]. This metric calculates whether two sentences separated by a distance of k belong to the same segment from both the system-output result and the correct-answer data. The unmatched ratio of both is the score of P k ; the smaller the value is, the better the model performance. According to Koshorek et al. [37], k was set to half the average size of the correct segment.

Results and Analysis
We adjusted the γ hyperparameters of the loss function in BERT + FL and BERT + AL, and the decision threshold through the experiment for the validation data. Hyperparameter values shown in Table 3 were set in the models and evaluated on the test data. Table 4 shows the experiment results for each model. The model based on BERT outperformed the estimation accuracy of TextTiling and of the model of Koshorek et al., which is the baseline model. Table 3. Hyperparameters tuned through grid search on validation data. Value of γ is a hyperparameter for focal loss and anchor loss. Value of τ is the decision threshold. Value of τ for models with threshold moving could be tuned between 0 and 0.5 in 0.01 increments. However, values for BERT + FL and BERT + AL were fixed at 0.5 because threshold moving was not introduced in these models.

Fitzgerald
Stevenson  Table 4. Experiment results for each model. The higher F1 is, the better the outcome. By contrast, the lower P k is, the better the results. Bold values represent the best evaluation results for each dataset.

Fitzgerald Stevenson Twain
Model We confirmed that each model using dice loss, focal loss, and anchor loss as the loss functions estimates paragraph boundaries with higher accuracy than that of BERT + CE, which is used in conventional BERT classification tasks. In particular, BERT + DL marked the highest F1 and P k in all datasets. Introducing threshold moving to the models also improved their performance in each dataset. This may have been because the properties of the validation data and test data were similar, and the determination threshold as a hyperparameter adjusted using validation data could identify paragraph boundaries with higher accuracy, even in the test data. These results suggest that the introduction of the cost-sensitive loss function and threshold moving into BERT improves the accuracy of paragraph-boundary estimation as an imbalanced-classification problem. Figure 4 shows the probability of being a positive example (positive probability) of each sample output by BERT + CE and BERT + DL for the part of the Fitzgerald test dataset. Samples that sunk at the bottom of the figure were predicted to be negative samples, while samples located at the top of the figure were predicted to be positive samples. If the positive probability for a sample was higher than threshold τ, then the sample was determined to be a positive sample. Therefore, the positive probability output for the sample on the red vertical line, which is the actual position of the paragraph boundary, should exceed the threshold, whereas that for the sample not on the red vertical line should not. Since the number of samples judged as positive samples by threshold moving increases, the number of samples that are correctly identified as positive samples increases, but the number of actually negative samples that are mistakenly identified as positive samples also increases. However, the negative sample, which was mistakenly judged as a positive sample, accounted for a small proportion of the total negative sample. As a result, the value of each evaluation metric indicating the performance of the model improved.  Table 5 shows examples of the test samples and outputs of BERT + CE and BERT + DL for the samples. Example 1 is a sample in which both BERT + CE and BERT + DL were judged as positive with high probability. In this sample, Sentence1 and Sentence2 belonged to different sections, and it is clear that the scenes and topics were different; thus, they could easily be identified as positive examples. By contrast, Examples 2 and 3 are samples in which BERT + CE and BERT + DL were both presumed to differ from the correct label with high probability. The dataset also included samples that were difficult for humans to discriminate because the sentences that they contained were short, and little information was given. Example 4 is a sample that was correctly identified as a positive sample by introducing threshold moving. On the other hand, Example 5 is an actually negative sample mistakenly identified as a positive sample by BERT + CE + TM and BERT + DL + TM. Example 6 was correctly determined to be a positive sample by BERT + DL + TM, but was determined to be a negative sample by BERT + CE, BERT + CE + TM, and BERT + DL. Table 5. Examples of samples in the Fitzgerald dataset, and positive probability of BERT + CE and BERT + DL for these samples. If the positive probability was higher than the threshold, the sample was recognized as a paragraph boundary.

Sample Information
Positive Probability p i

Discussion and Conclusions
In this study, we proposed a method for paragraph-boundary recognition to divide an existing novel into paragraphs from the viewpoint of story understanding by a computer. We regarded paragraph-boundary recognition as a binary classification problem of whether a paragraph boundary existed between two consecutive targeted sentences. However, in this case, the number of paragraphs was extremely small compared to the number of sentences; thus, the data imbalance became a bottleneck. Therefore, we improved the estimation accuracy of the model by introducing cost-sensitive loss functions, namely, focal loss, dice loss, and anchor loss, which are robust against imbalanced classification, into BERT as a loss function. We experimentally confirmed that our approach showed high estimation accuracy compared to that of the conventional text-segmentation method and the original BERT.
Furthermore, we improved estimation accuracy by introducing threshold moving, which adjusts the threshold value when the model determines the presence or absence of paragraph boundaries. It was also experimentally confirmed that paragraph boundaries can be recognized with higher accuracy by setting the determination threshold value as a hyperparameter to a value smaller than the conventional 0.5 using validation data. Threshold moving is a simple idea, but it is expected to improve the performance of the classifier in cases where it is assumed that the properties of validation data and test data are similar, such as the task dealt with in this work.
From the above results, t our work brings to the community of story understanding a new perspective of solving the operation of dividing the text of a novel into paragraphs as an imbalanced classification problem. Our work is also related to the research of creation support. As related research on creative support, there is a plot-creating support system that considers the reader's preference for the transition of happiness in the story [60], and a system that supports efficient story generation using the similarity between sentences and templates [61]. The results obtained in this work could be applied as a system to support the creation of novels by humans in the technique of paragraph division. Specifically, we envision a system in which a model that learns the paragraph division of existing novels recommends the appropriate paragraph-division position to the writer.
In this study, the effectiveness of the method was confirmed for works written by multiple writers. However, it is necessary to confirm the difference in model performance depending on the characteristics of the work. Therefore, we aim to evaluate the model by cross-validation with each novel used as training or validation data in further work.
Future studies also include the adoption of approaches utilizing the time-series nature of sentences, such as anomaly detection with paragraph breaks as outliers. This method has the advantage of being able to consider not only the information of the preceding and following sentences, but also the information of past sentences.
We evaluated the performance of the paragraph-boundary recognition model by focusing on quantitative indicators. However, in discussing readability as a novel, it is also necessary to qualitatively evaluate the output results. Therefore, in the future, we would like to evaluate the model from both the quantitative evaluation described in this study and qualitative evaluation, such as a questionnaire-based experiment regarding the impression of the collaborator after reading the sentences divided into paragraphs when applying the proposed model.