Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity

Ocal, Ayse

doi:10.3390/systems13110994

Open AccessArticle

Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity

by

Ayse Ocal

Department of Computer Engineering, Yildiz Technical University, İstanbul 34220, Türkiye

Systems 2025, 13(11), 994; https://doi.org/10.3390/systems13110994 (registering DOI)

Submission received: 2 October 2025 / Revised: 1 November 2025 / Accepted: 4 November 2025 / Published: 6 November 2025

Download

Browse Figures

Versions Notes

Abstract

Cinema can be conceptualized as a socio-technical system in which scripts encode intended emotions, production processes transform them into multimodal experiences, and audiences generate emergent responses through reviews and ratings. This study investigates the emotional fidelity between designed affective trajectories in film scripts and perceived emotions expressed in audience reviews. A system-oriented computational framework was developed, integrating large-scale script and review data with transformer-based natural language processing models fine-tuned on the GoEmotions dataset. By applying a unified classification pipeline, we compare emotional distributions across scripts and reviews, analyze temporal and genre-specific patterns, and examine correlations with film success metrics such as profit and ratings. The results reveal both convergence and divergence between scripted intentions and audience responses, with genres functioning as semi-autonomous subsystems and historical trends reflecting context-dependent adaptation. Emotional fidelity—defined as the degree to which intended emotional expressions are preserved, transformed, or inverted in audience interpretation—is introduced as a system-level performance indicator. These findings advance theoretical perspectives on narrative communication as a feedback-driven socio-technical process and demonstrate how emotion mining can function as affective monitoring infrastructure for complex adaptive systems. The study contributes actionable insights for screenwriters, producers, and system designers seeking to enhance affective engagement.

Keywords:

emotion mining; socio-technical systems; cinema; audience reviews; emotional fidelity; BERT; pretrained language models; fine-tuning; natural language processing

1. Introduction

Systems theory emphasizes that social and technical domains are best understood as interconnected elements interacting within dynamic environments [1]. This systems view foregrounds interdependence, adaptation, and feedback as defining characteristics of complex systems. Cinema can be conceptualized as such a socio-technical system [2], where multiple interacting components—scripts, production processes, distribution platforms, and audiences—coalesce to produce emergent cultural outcomes. Scripts encode intended emotional trajectories, production processes transform these scripts into multimodal audiovisual experiences, distribution platforms mediate access, and audiences provide evaluative feedback through reviews, ratings, and word of mouth [3,4,5]. Together, these components form a continuous feedback loop in which narrative design influences reception, while audience responses shape future production decisions, artistic practices, and commercial strategies [6,7,8]. Within this system, emotions function as critical performance signals, determining both artistic impact and market viability.

Emotions constitute the affective architecture of cinematic storytelling, shaping narrative progression, character development, and viewer immersion [8,9,10]. Screenwriters meticulously design scenes to elicit specific emotions such as anticipation, sorrow, joy, or fear at strategic points in the narrative [6]. However, the emotions embedded in scripts do not always mirror the emotions perceived or expressed by the audience, particularly when these responses are articulated through user reviews. From a systems perspective, such emotional divergences represent transformations in affective transmission across system boundaries. They highlight the distinction between “designed inputs” (scripts) and “realized outputs” (audience responses), with implications for cultural resonance, audience engagement, and commercial success.

Although prior research has investigated the structural features of screenplays and extensively analyzed audience reviews for sentiment [11,12,13], few studies have examined the relationship between intended emotional design and experienced emotional response. Rico García et al. [7], for instance, demonstrated this gap in interactive storytelling by showing that content creators often have limited control over the emotions ultimately experienced by the audience. Their system adaptively guided viewers’ emotional states using EEG-based feedback to increase alignment between intended and experienced emotions. However, this work focused on dynamic, real-time emotional adaptation rather than on the emotional structures encoded in static narrative artifacts such as film scripts.

Moreover, existing computational approaches often reduce emotions to broad polarity categories (e.g., positive vs. negative) [4,13,14], overlooking finer-grained affective distinctions such as anger, sadness, joy, and surprise [15]. From a systems-theoretical perspective, this lack of granularity constrains our ability to detect when emotional intentions are faithfully transmitted, distorted, or reinterpreted as scripts move through production and are ultimately received by audiences.

This study addresses these limitations by introducing a transformer-based computational framework [4,16,17,18,19,20,21] for multi-label emotion classification that can be applied symmetrically to scripts and audience reviews. By consolidating both into a unified analytical pipeline, the study enables systematic comparison of intended and perceived emotions across genres, historical periods, and success metrics. Conceptually, this framework operates as a system-level monitoring mechanism, tracing how emotional signals propagate, transform, and diverge as they move through the components of the cinematic ecosystem.

The novelty of this work lies in its dual-layered analysis: first, mapping emotions in scripts as intentional blueprints, and second, examining reviews as retrospective emotional reflections. Building on recent scholarship, we operationalize emotional fidelity as a systems-level construct capturing the degree to which affective design variables (script intentions) correspond to emergent system states (audience responses). This definition extends beyond simple alignment to assess how coherently emotional intentions are preserved or transformed across system components. In this sense, emotional fidelity provides a conceptual bridge between narrative design and systems performance evaluation.

Beyond individual films, we extend the analysis to explore temporal trends and genre-specific affective structures. We investigate whether emotional tones in scripts have shifted historically, for example, toward darker thematic patterns—and whether genre conventions shape patterns of emotional fidelity or divergence. Furthermore, by correlating script-level emotional patterns with performance metrics (profitability and user ratings), we assess how emotional fidelity interacts with system viability.

Taken together, these inquiries lead to the following research questions:

RQ1. How can emotional fidelity be modeled as a systems-level construct in comparing scripted emotional intentions and audience-expressed emotions?
RQ2. How does emotional fidelity vary across genres and historical periods, understood as semi-autonomous subsystems and long-term system adaptation drivers?
RQ3. To what extent does emotional fidelity correlate with system performance metrics such as profitability and audience ratings?

By addressing these questions, this study contributes to the literature both methodologically and theoretically. Methodologically, we demonstrate how transformer-based models can function as analytical instrumentation for tracing affective signal flow in socio-technical systems. Theoretically, we clarify emotional fidelity as a dynamic systems construct rather than a binary match or mismatch. This perspective positions cinema as a complex adaptive system in which emotional meaning emerges through iterative feedback loops between creators and audiences.

2. Background and Related Work

2.1. Emotions in Narrative Media

Emotions are fundamental to narrative communication, shaping how stories are created, perceived, and remembered. In cinema, emotions guide character development, pace narrative arcs, and sustain audience engagement [8]. From a cognitive perspective, scripts serve as blueprints of affect, encoding intended emotional trajectories through dialogue, conflict, and resolution. However, as audience studies have emphasized, viewers interpret narratives in ways that reflect their cultural context, expectations, and personal experiences [22,23]. From a systems perspective, this illustrates that narrative design (input) does not fully determine system outcomes (audience responses). Instead, outputs emerge through complex interactions between design artifacts, cultural environments, and individual psychological states, highlighting the adaptive and emergent nature of socio-technical systems.

2.2. Computational Approaches to Emotion Analysis

Detecting emotions in text is exceedingly challenging, especially if the text has a dialogic or cinematic structure. Traditional emotion analysis methods often used sentiment lexicons (affective word lists) [13,24] or statistical classifiers [25]. With the rise of natural language processing (NLP), emotion detection has become a rapidly developing field. The emergence of Transformer-based models (e.g., BERT) in recent years has significantly improved performance in emotion classification by capturing contextual relationships within text more deeply [9]. Early transformer-based model emotion classification approaches focused on sentiment analysis, classifying text as positive, negative, or neutral [26]. While sentiment analysis has been useful for broad polarity detection, it lacks the granularity required to study complex affective phenomena in narrative media. Recent advances, particularly with transformer-based models such as BERT and RoBERTa, enable fine-grained classification across multiple emotion categories such as anger, fear, and joy [15,27]. In [28], BERT’s ability to capture both context and complex language structure in long sequences is emphasized, thereby showing that BERT is one of the most suitable models for texts requiring deep contextual analysis, such as cinematic scripts.

Imbalanced datasets, particularly those like GoEmotions where certain emotions are generally sparse, pose challenges for comprehensive emotion detection. In [29], the problem of data imbalance was addressed on the EmotionX tasks Friends and EmotionPush datasets using a weighted cross-entropy loss. The weighted loss function assigns more weight to minority classes, thereby improving model performance for more subtle emotional states. Wang et al. [30] fine-tuned BERT on the GoEmotions dataset and employed state-of-the-art data augmentation techniques, such as ProtAugment, to enhance the representation of minority emotional classes for addressing data imbalance issue. Their findings indicate that targeted data augmentation can serve to improve classification accuracy for less frequent emotions, supporting the argument that fine-tuning with augmentation can alleviate data imbalance in emotionally complex datasets.

The co-occurrence of multiple emotions within the same dialogue or sentence is another significant challenge in emotion detection, particularly relevant to multi-label classification tasks [31]. A similar approach was adopted in [32] using the CancerEmo dataset, where domain-specific pre-training of BERT was employed to address overlapping emotions in health-related user comments. The study demonstrated that domain adaptation significantly improved BERT’s sensitivity to nuanced and complex emotional expressions. Consequently, multi-label classification was incorporated into the present research to improve the detection and measurement of emotional layers in each scene, thereby supporting a deeper understanding of the intended emotional atmosphere in cinematic storytelling.

In [33], the use of transformer-based architectures—such as BERT and its derivatives (e.g., RoBERTa, DistilBERT)—was advocated due to their effectiveness in capturing nuanced emotional content. The study compared various transformer models for emotion detection, highlighting RoBERTa’s dynamic masking as a contributor to enhanced performance, while positioning DistilBERT as a lightweight and cost-effective alternative for resource-constrained applications.

Similarly, Tan et al. [34] introduced an ensemble hybrid deep learning model that integrates RoBERTa with sequential architectures such as LSTM, BiLSTM, and GRU. In this framework, RoBERTa encodes the contextual embeddings, while the recurrent layers capture long-range dependencies, and ensemble averaging further improves sentiment classification performance. This hybrid approach achieved accuracies exceeding 90% across multiple benchmark datasets (IMDb, Twitter US Airline, Sentiment140, Fort Worth, TX, USA), demonstrating the advantage of combining transformer and recurrent architectures for nuanced affective analysis.

Building on this foundation, Yang et al. [29] explored the integration of BERT with sequential models such as BiLSTM and GRU to better capture inter-sequence dependencies—particularly relevant for dialogue-heavy content. For example, Chen [35] combined BERT with BiLSTM and demonstrated superior performance in short text classification and emotionally complex content. Similarly, Tembhurne et al. [36] reported that BERT-based hybrid models optimized through Grid Search outperformed other approaches in both accuracy and convergence speed. Their BERT + CNN + Grid Search model achieved an impressive accuracy of 0.9886 on Twitter COVID-19 data, reinforcing the advantage of combining BERT’s contextual strengths with complementary architectures.

Capturing the emotional flow in dialogue is crucial for analyzing multi-character interactions in cinematic scripts. Models incorporating sequence and contextual embeddings, such as the Sequence-based CNN (SCNN) used in [37] for the Friends dataset, utilize attention mechanisms to improve emotion detection in dialogue sequences. This approach allows the model to selectively focus on relevant parts of the dialogue. Yang et al. [29] also emphasized the importance of contextual embeddings for dialogue, proposing a BERT-Max model with dynamic max pooling to better capture the emotional trajectory in a dialogue. Their approach highlights how BERT-based contextual emotion classifiers, when adapted to handle informal dialogues, can enhance detection performance for dialogue-heavy content. Ahanin et al. [31] proposed a hybrid feature extraction model addressing multi-label emotion issues by combining human-engineered features, contextual embeddings, and deep learning models. This approach, which effectively classifies emotional nuances in social media texts, achieved an accuracy of 0.684 on the SemEval-2018 dataset and 0.5345 on the GoEmotions dataset. Hybrid models were observed to overcome the limitations of single-source feature extraction approaches in representing rare emotional classes and enhancing contextual awareness. Furthermore, the study emphasized obtaining balanced datasets through WordNet and BERT-based data augmentation techniques.

BERT-based models fine-tuned for emotion classification have been widely utilized in prior research [16,20,38,39,40,41] due to their ability to capture contextual dependencies in language. However, despite the rapid progress in model architecture and benchmark performance, most existing studies have focused primarily on improving classification accuracy rather than understanding how emotions may function systemically across different communicative layers. In particular, the interaction between designed emotional intent (as embedded in creative artifacts such as film scripts) and emergent audience emotions (as reflected in user-generated reviews) remains largely unexplored. This gap limits the theoretical integration of emotion analysis into broader socio-technical system models that link cultural production and reception.

After testing multiple transformer-based and hybrid architectures, including BERT, DistilBERT, RoBERTa, and their extensions with BiLSTM, CNN, and SCNN layers, this study addresses that gap by employing the BERT + CNN hybrid model not merely for classification accuracy but as a diagnostic instrument within a systems framework, allowing the detection of dominant emotional categories in both scripts and reviews. The model demonstrated the most consistent emotion recognition performance among the tested configurations and was therefore applied to operationalize the proposed systems-level perspective. By integrating local feature detection (via CNN) with contextual embeddings (via BERT), the proposed approach captures emotional micro-patterns that other transformer-only models tend to overlook, thereby contributing a novel methodological lens for system-level emotion analysis in narrative media.

Building on this evidence, the present study employs such models to analyze emotions in both scripted and user-generated texts. Within a systems framework, these computational methods function as monitoring tools that instrument emotional flows across system boundaries, enabling systematic comparison between intended design states and emergent reception states.

2.3. Scripts as Affective Blueprints

Research on screenplays has traditionally emphasized structural components, such as the three-act structure or hero’s journey [42,43]. More recent work has examined the emotional dynamics of narrative structure [44,45,46]; for example, it was demonstrated that novels exhibit recurring “emotional arcs,” which correspond to traditional storytelling forms. However, few studies have extended such approaches to film scripts. While dialogue sentiment analysis has been used in scriptwriting tools [47], systematic large-scale emotion mining of scripts remains limited. For systems scholars, scripts can be understood as formal design specifications—artifacts that encode planned affective functions of the cinematic system. Yet, as in many socio-technical systems, the fidelity between specification and real-world operation remains uncertain and requires empirical investigation.

2.4. Audience Reviews as Affective Reflections

In contrast, audience reviews have been extensively studied as sources of affective expression [4,13,14]. User-generated reviews provide spontaneous, retrospective accounts of how media is experienced [48]. Research has applied sentiment and emotion analysis to movie reviews to predict ratings [11], box office success [49], and cultural resonance [50]. Audience reviews are particularly valuable for their diversity and authenticity, capturing not only evaluative judgments but also detailed affective responses. From a systems-theoretical angle, reviews represent feedback signals that close the loop between system outputs and subsequent design cycles. They are emergent properties of collective system interaction, embodying both individual cognition and cultural interpretation.

2.5. Script–Audience Alignment

The alignment between intended and experienced emotions in film narratives has been conceptually addressed but remains difficult to measure empirically. Narrative transportation theory explains how audiences may become deeply absorbed in a story, experiencing cognitive engagement, emotional immersion, and vivid mental imagery that influence their beliefs and attitudes [51]. However, such immersion does not guarantee that viewers will respond with the emotions encoded in the script. Recent advances in emotion dynamics research show that characters’ emotional arcs often become increasingly negative and discordant until late in the narrative [52]. In systems terms, script–audience alignment can be framed as a verification problem: do realized system states (audience responses) correspond to the system designer’s intended states (script emotions)? Where divergences occur, they can be interpreted as adaptation or emergent reconfiguration, offering opportunities for diagnostic analysis and feedback-driven improvement.

Beyond alignment, recent scholarship has introduced the notion of emotional fidelity to describe the degree to which an expressive artifact or artificial agent remains affectively coherent with its intended or represented emotional state [53,54]. In rhetorical and cultural studies, emotional fidelity denotes the extent to which performances or representations authentically resonate with the lived experiences and affective realities of a community [54]. In computational contexts, it has been redefined as a measurable correspondence between the emotions an intelligent system or role-playing agent is designed to convey and those it actually expresses in interaction [53].

Building on these strands of scholarship, we operationalize emotional fidelity as a systems-level alignment metric: the degree to which affective design variables (script intentions) correspond to emergent system states (audience responses). Applying this concept to cinema allows emotional fidelity to function as a systems-level diagnostic: it captures how emotional intentions encoded in scripts are preserved or distorted as they propagate through production and audience interpretation. Thus, while script–audience alignment asks whether emotional outputs match inputs, emotional fidelity specifies how coherently these emotional trajectories are maintained across system components, linking narrative design with system performance evaluation.

2.6. Genre as Subsystems

Emotional content in cinema is not static but evolves over time and varies by genre. Historical analyses show that the emotional tone of popular culture reflects societal shifts: wartime films emphasized patriotism and sacrifice, while postmodern cinema often embraces ambiguity and skepticism [55,56]. Genres provide specific affective frameworks—comedy may rely on joy and surprise, horror on fear and disgust, and drama on sadness and empathy.

Audience expectations are therefore shaped by genre conventions, which may strengthen or weaken emotional fidelity between scripts and reviews. Viewed through systems theory, genres can be modeled as semi-autonomous subsystems within the larger cinematic ecosystem, each governed by distinct affective rules, feedback dynamics, and cultural functions. Understanding these subsystems allows researchers to trace how systemic differentiation contributes to the diversity and resilience of the overall narrative system.

2.7. Toward an Integrated Systems Framework

Building on the insights from the related work, this study introduces a computational approach that treats scripts and reviews as two interconnected affective items. By applying transformer-based emotion classification to both, we establish a dual-layered perspective: scripts as repositories of intended emotional design, and reviews as reflections of perceived emotions. This integrated framework bridges computational linguistics, film studies, and systems theory by modeling cinema as a complex adaptive system. It demonstrates how system design (scripts), system outputs (audience reviews), and feedback loops (market and cultural reception) can be studied within a unified methodological and theoretical structure.

3. Materials and Methods

This section presents a comprehensive framework of the data sources utilized in this study, the structure and processing methods of this data, and the techniques applied for emotion analysis. From a systems perspective, the methodology operationalizes cinema as a socio-technical system composed of input (scripts), process (production and distribution), and output (audience reviews), with the analytical framework serving as a monitoring mechanism.

The analytical strategy comprised three stages:

Script–Review Alignment: Emotion distributions in scripts and corresponding audience reviews were compared at the film level to identify convergence and divergence patterns.

Temporal and Genre Trends: Emotional signals were aggregated across decades and genres to trace cultural and stylistic dynamics.

Correlation with Success Metrics: Ordinary Least Squares (OLS) regression was applied to examine whether specific script-level emotions predict box office profitability or higher audience ratings.

From a systems perspective, this methodological design functions as a monitoring mechanism for the cinematic ecosystem. Scripts represent the input subsystem encoding narrative intent, reviews constitute the feedback subsystem reflecting audience experience, and the analytical framework operates as a diagnostic layer tracing the fidelity of affective transmission across boundaries. This framing enables not only empirical comparison but also conceptual integration into socio-technical systems theory, highlighting cinema as a complex adaptive system shaped by iterative feedback loops between creators and audiences.

Figure 1 shows the general block diagram of the proposed methodology.

The pipeline begins with a Data Collection Module that gathers film scripts from IMSDB and audience comments from TMDB and Rotten Tomatoes. These unstructured texts are stored in CSV/RDBMS format and converted into structured data through the Data Preprocessing Module, which handles tokenization, cleaning, and alignment. The GoEmotions dataset is used for model fine-tuning, where class imbalance is mitigated via a weighted loss function before training. The balanced dataset is then processed by the BERT model, fine-tuned for multi-label emotion detection. Finally, the Data Visualization Module represents the detected emotional distributions across scripts and reviews, enabling comparative and temporal analyses. This modular workflow illustrates the full process—from data acquisition to emotion extraction and visualization—supporting reproducibility and scalability of the approach.

3.1. Datasets

The primary data sources consist of film and TV series scripts, user reviews associated with this content, and pre-labeled datasets used for training and evaluating the emotion analysis model. Additionally, this section covers how the data is stored, processed to be suitable for the model, and how necessary operations such as subsampling, filtering, or data balancing are applied. In system terms, these datasets represent structured signals circulating within different subsystems (design artifacts vs. feedback artifacts) of the cinematic ecosystem.

3.1.1. Script Data: IMSDb Source

The source of film and TV series scripts used in the study is the IMSDb (Internet Movie Script Database) website. IMSDb hosts full or partial script texts for many films. The structure of these scripts generally conforms to industry standards, including scene headings, character names, action descriptions, and dialogues.

A Python-based web scraping tool (python 3.11.10) developed for the study was used to automatically collect script titles, writer information, genres, and URLs for the full script text from IMSDb. Metadata such as title, writers, genres, and script_url were stored in a structured CSV file. Subsequently, raw script texts were retrieved and stored as “.txt” files.

Full scripts consist of hundreds of sentences. To obtain meaningful units for emotion analysis, scripts were segmented into scene-based segments. Standard scene headings (e.g., INT, EXT) were captured, and each scene was labeled as a separate unit. Scripts with improper formatting or insufficient content were excluded. This ensured that only appropriately formatted scripts with sufficient dialogue and meaningful content were included in the analysis.

Within a systems framework, scripts function as design specifications of the cinematic system—encoding intended affective dynamics. Segmentation into scenes allows us to treat each unit as a modular subsystem, improving resolution in monitoring emotional signals.

3.1.2. User Review Data

User reviews for the same films, together with budget, revenue, user ratings, and text-based reviews, were obtained from TMDB and Rotten Tomatoes. This alignment ensured that both the script data from IMSDb and the audience response data referred to the same set of films, allowing for direct comparative analysis.

For TMDB, each film title was submitted to the/search/movie endpoint to obtain a unique tmdb_id. Reviews were collected via paginated API requests, yielding 3101 reviews across 1210 films. Rotten Tomatoes data were scraped from public review pages, normalized by slug identifiers, and parsed into structured JSON, yielding 21,328 reviews from 1223 films.

In total, the dataset comprises 24,429 reviews from 1225 unique films.

3.2. Emotion Classification Framework

3.2.1. Remapping of Emotion Labels

BERT based models have demonstrated superior performance in natural language understanding tasks by evaluating each part of the text in conjunction with other parts, thanks to its bidirectional context modeling [16]. For emotion detection, transformer-based models were fine-tuned on the GoEmotions dataset [40].

The GoEmotions dataset originally offers 27 distinct emotion labels and a neutral category. However, for the purposes of this study, it was necessary to convert this broad spectrum of emotions into a more manageable subset. Specifically, to identify clear emotional axes that can be tracked in film scripts and user reviews, similar or overlapping emotions were combined, thus defining nine main emotion categories:

new_emotions = {joy, sadness, anger, fear, surprise, disgust, love, curiosity, neutral}

For example, positively charged labels such as admiration, amusement, approval, excitement, gratitude, optimism, pride, and relief were grouped under joy. Similarly, annoyance and anger were combined under anger, and caring and desire under love. Additionally, confusion and curiosity were grouped under the curiosity label to consolidate states of wonder/perplexity. Likewise, the sadness category includes emotions like disappointment, embarrassment, grief, and remorse. Disapproval and disgust were converted into the disgust category, consolidating more negative, repulsive emotions under a single label. Realization was placed under surprise, which best reflects an unexpected insight or understanding. Finally, the neutral category maintains emotional neutrality. This emotion mapping process facilitates the model’s ability to cope with complex and diverse emotions while bringing the analysis to a more understandable and manageable dimension. The mapping process was automated with functions utilized in this study; the original labels field of the GoEmotions dataset was adapted to the new_emotions set via the map_labels function. Thus, each text sample is represented by a multi-label vector consisting of the newly defined 9 basic emotions. This approach, while reducing the complex emotional texture to a simpler representation, contributes to mitigating data imbalance issues and enabling more stable learning during the model’s training process.

The GoEmotions dataset is loaded via the Hugging Face datasets library. Training, validation, and test splits are managed with the DatasetDict structure. Tokenized data is fed to the model as input_ids, attention_mask, and labels.

3.2.2. Multi-Label Classification Approach

Scripts often contain multiple emotions within the same scene or line. Therefore, multi-label classification techniques are used instead of traditional single-label approaches. Multi-label emotion detection is based on the assumption that multiple emotion labels can coexist in the same text segment [32]. This allows for a more accurate representation of the complex emotional layers in film scripts. Figure 2 shows a matrix indicating the co-occurrence frequency of emotions in the GoEmotions dataset.

3.2.3. Strategies for Handling Data Imbalance

In GoEmotions and similar emotion-labeled datasets, some emotion categories are in the minority, as seen in Figure 3. This may make it difficult for the model to learn rare emotions.

To address the problem of data imbalance in the literature, data augmentation [30], weighted loss functions [29], and special sampling strategies for minority classes are common. For this study, a standard Binary Cross-Entropy with Logits loss (BCEWithLogitsLoss) was utilized as the primary loss function. While this approach does not inherently apply differential weighting to minority classes, its effectiveness was evaluated within the described model architecture.

3.2.4. Scene and Line-Based Analysis

While texts are generally analyzed at the paragraph or sentence level in the literature, a scene-based approach in cinematic content has shown to more accurately represent emotional fluctuations [36]. Scene-based segmentation facilitates the model’s processing of long texts by dividing them, while a line-based approach offers the possibility of performing character-based emotion analysis.

3.2.5. Integration with Hybrid Architectures and Sequence Models

Although the BERT model alone is a powerful tool for contextual representations, it can be enriched with layers such as GRU, LSTM, or CNN when processing long sequences like dialogues or scripts. Studies such as [35,36] have shown that combining BERT with recurrent or convolutional layers enhances performance. In this study, we evaluated several architectures, including BERT, DistilBERT, RoBERTa, and hybrid extensions (BERT + BiLSTM, BERT + CNN, BERT + SCNN). Among these, the BERT + CNN model was selected, as it provided a strong balance between classification reliability and computational efficiency (described in detail in Section 4.1, Comparative Model Performance). The CNN layer was incorporated to capture local feature patterns and sentence-level emotional cues, which complemented BERT’s contextual embeddings and enhanced the detection of subtle emotional signals.

Model evaluation indicated that frequent categories such as joy, sadness, and anger were classified with relatively high reliability, while minority categories such as disgust and surprise yielded lower accuracy due to data sparsity and the inherent ambiguity of emotional expression in user-generated texts. This distribution of results is typical in multi-label emotion analysis and reflects the natural imbalance of affective states in real-world systems. Importantly, the overall performance was sufficiently stable to support large-scale comparative analysis, enabling the detection of systemic patterns of alignment and divergence across scripts and audience reviews.

In systems terms, the classifier served as a monitoring instrument capable of capturing both dominant affective signals and weaker, low-frequency emotional states that nonetheless play a role in the adaptive dynamics of the cinematic ecosystem.

3.2.6. Model Training, Hyperparameter Settings, and Performance Evaluation

For emotion detection, we fine-tuned BERT-based models using the GoEmotions dataset [40]. The performance of these models was evaluated through standard classification metrics, including accuracy, precision, recall, and F1-score. In multi-label emotion classification tasks, micro- and macro-averaged F1-scores are particularly informative, as they provide a balanced view of model performance across emotions with varying frequencies [40]. Thus, these evaluation metrics are widely adopted in studies utilizing the GoEmotions dataset, enabling consistent benchmarking and meaningful comparison of results across different model architectures [40]. The detailed performance results for our models are presented in the following sections.

BERT-BASED Model

This section examines the performance values obtained during the training process of our BERT-based emotion analysis model. First, the change in loss values obtained in the training and validation sets per epoch is analyzed. Then, class-based metrics (precision, recall, and F1-score) and confusion matrices are evaluated. Finally, the overall success level of the model is discussed, and the findings are deliberated.

The model training utilized a Bert-base-uncased based network architecture, fine-tuned with the GoEmotions dataset for up to five epochs. As shown in Figure 4, the training loss rapidly decreases from the first epoch and continues to decrease more slowly thereafter. The validation loss, initially higher than the training loss, shows fluctuations after the second epoch, but the model appears to progress towards the lowest validation loss around the 3rd to 3.5th epoch. The trend of the learning curves indicates that the model is still learning, but an overfitting tendency begins after the 2nd epoch. The early stopping mechanism intervened at 3.5th, terminating the training at the epoch with the best validation loss and F1-score.

The class-based precision, recall, and F1-scores obtained by the model are graphically summarized in Figure 5. Some of the numerical values are presented in the classification report in Table 1. The class-based results are as follows:

joy: One of the emotions with the highest F1-score (79%), with high precision (83%) and recall (76%) values. This can be attributed to the fact that samples labeled joy are more abundant and consistently distributed in the training set compared to other emotions.

sadness: Precision (65%) and recall (59%) are moderate. Data imbalance and multi-label structures in some samples (such as sadness with neutral or sadness with anger) are thought to affect these results.

anger: The model achieved 58% precision and 51% recall for anger. The confusion of anger samples with similar negative emotions (e.g., disgust, fear) relatively reduced success in this class.

fear and surprise: In these classes with relatively few samples, the F1-scores for fear (68% F1) and surprise (47% F1) were more limited. The scarcity of data makes it difficult for the model to detect these emotions.

disgust: This class stands out with the lowest F1-score (37%). The disgust label frequently confused with anger or fear classes can also be observed from the confusion matrices in Figure 6. Although the matrix does not directly show which classes it is confused with, the high number of false negatives (273) for the disgust class suggests that this label is frequently confused with other classes (presumably anger and fear).

love, curiosity, neutral: These three classes have higher complexity as they span both positive and neutral emotions. love exhibits a fairly stable performance with a 68% F1-score, while curiosity (58% F1) and neutral (62% F1) classes are also at reasonable levels.

The binary confusion matrices created for each emotion class show the model’s performance over two classes: the current emotion (1) and all other emotions (0). Each matrix focuses on a single emotion, where the label “1” denotes the presence of the target emotion (such as joy), and “0” aggregates all other emotions into a single opposing class. In this setup, four key outcomes define the model’s success and error rates. A True Positive (TP) occurs when the model correctly identifies an instance of the target emotion as “1.” A False Negative (FN) happens when the model incorrectly predicts “0” for an instance that actually belongs to the target emotion. A True Negative (TN) is recorded when the model correctly classifies instances of other emotions as “0.” Finally, a False Positive (FP) arises when the model incorrectly assigns a “1” label to an instance that does not belong to the target emotion.

Experimental results show that the model is quite successful in majority emotions (e.g., joy, neutral, love). Data imbalance, especially in classes like disgust, surprise, and fear, has led to lower success rates. Looking at the obtained macro F1-score (59%), it can be said that there is a significant performance gap between different classes. The micro F1-score (65%), however, shows that the model exhibits a more balanced overall performance, as the micro average highlights the successes of classes with a high number of samples. This is related to the unequal class distribution in the dataset. In conclusion, our BERT-based emotion analysis model performs well, especially on common emotions, but requires further improvement for minority emotions. The multi-label structure is important for reflecting emotional expressions in cinematic scripts; however, this also makes inter-class label transitions and data imbalances inevitable. In subsequent sections, comparisons with different transformer models and the implementation of hybrid architectures were tested as alternative methods.

DistilBERT Model

In this section, emotion analysis was performed on the GoEmotions dataset using the DistilBERT (Distilbert-base-uncased) model, a lighter and faster version of the BERT architecture. Training parameters were set as follows:

Number of epochs: 5;

Batch size: 16;

Learning rate:

(3 \times 10^{- 5})

;

Weight decay: 0.01;

Early Stopping: early_stopping_patience = 8.

Evaluation was performed at specific steps (eval_steps) within each epoch, and the best model (best_model) was saved based on the eval_f1 metric.

The change in training and validation losses per epoch after DistilBERT training is shown in Figure 7. As can be seen from the figure, the training loss dropped rapidly from the initial epochs, while the validation loss followed a not-so-similar trend, but fluctuating at moderate levels. The model reached its lowest validation loss and highest F1-score around the 3rd–4th epoch.

Figure 8 presents the precision, recall, and F1 scores of the DistilBERT model on a per-class basis.

The confusion matrices for each emotion class of the model are given in Figure 9. While acceptable accuracy is achieved in majority classes like joy and neutral, an increased error rate is observed for emotions with fewer samples or high confusability, such as disgust, surprise, and fear.

Table 2 summarizes the class-based precision, recall, and F1-scores obtained by the model.

The DistilBERT model, thanks to its smaller parameter size, offers speed advantages in training and inference processes. However, it may experience some performance loss compared to BERT, especially with imbalanced data or samples where various emotions coexist. While acceptable results are obtained for common labels like Joy, Neutral, and Love, the model appears indecisive for labels like Surprise and Disgust, and the Macro F1 average (58%) lags. Conversely, the micro average value of 65% indicates that the model has considerable performance in majority classes.

RoBERTa Model

In this section, a Roberta-based model was applied to the GoEmotions dataset. Training parameters were set as follows:

Number of epochs: 5;

Batch size: 16;

Learning rate:

(3 \times 10^{- 5})

;

Weight decay: 0.01;

Early stopping: early_stopping_patience = 8.

Figure 10 shows the change in loss values obtained on the training and validation sets per epoch. After a significant drop in loss from the first epoch, the model’s loss relatively decreased and stabilized in subsequent epochs.

Precision, recall, and F1-scores obtained on the validation dataset are shown in Figure 11. This graph indicates that the RoBERTa model, similar to other models, exhibits higher success on common labels like joy, but performance is limited for labels like fear, disgust, and surprise.

As shown in Table 3, the macro F1 value is around 50%. Considering the micro and weighted averages, it is observed that the model experiences a performance loss due to low recall, especially in minority classes.

The macro F1 value (50%) and accuracy (50%) obtained by the RoBERTa model present a picture open to improvement for minority emotions. These results show that the model performs better on common classes like joy and neutral but experiences performance loss due to low recall for minority or complex emotions.

BERT + BiLSTM

In this experiment, a hybrid approach was implemented by combining the bert-base-uncased language model with a BiLSTM layer (CustomHybridModel). The following parameters were used for model training:

Base Model: bert-base-uncased;

lstm_hidden_dim: 256;

Number of epochs: 5;

Batch size: 16;

Learning rate:

(3 \times 10^{- 5})

;

Weight decay: 0.01;

Early stopping: early_stopping_patience = 8.

The model training took longer (~9 h) than other structures. This is due to the additional parameters of the hybrid architecture. Given the scale and time constraints of this study, long training times presented a disadvantage for practical application.

Figure 12 shows the loss values of the hybrid model on the training and validation sets per epoch.

The class-based precision, recall, and F1-scores obtained by the hybrid model on the validation set are summarized in Figure 13.

The classification report data are summarized in Table 4. The macro-averaged F1-score is at 59%, while the weighted average is around 65%.

The hybrid model achieved a reasonable macro F1 (59%) and weighted F1 (65%) score compared to other models. However, the training process was longer (~9 h) compared to similar experiments due to the increased number of parameters with the additional BiLSTM layer. From an application perspective, high time cost is a factor to consider in practical usage scenarios. The results indicate that the hybrid architecture brings some improvement to the BERT-based approach in certain classes but extends the training time.

BERT + CNN

In this experiment, a hybrid emotion analysis architecture was designed by combining a BERT-based embedding layer with a single-layer CNN (Conv1D). The components of the model are as follows:

1. BERT Encoder: bert-base-uncased model; configured in multi_label_classification mode.

2. Conv1D + ReLU: Processes the output from BERT’s last layer (format: batch_size, seq_len, hidden_dim) with a Conv1D layer (kernel_size = 3, out_channels = 256) and ReLU activation. In PyTorch (Pytorch 2.5.1), since the Conv1D input format is

(N, C, L)

, the output from BERT in

(N, L, C)

shape needs to be transposed to

(N, C, L)

.

3. Global Max Pooling: A global max pooling is applied to the tensor obtained after convolution

((N, cnn_out_channels, L))

, reducing it to

((N, cnn_out_channels))

dimension.

4. Classifier (Linear Layer): In the final layer, the obtained vector is converted into classification scores (logits) of num_labels dimension through a linear layer.

5. BCEWithLogitsLoss: BCEWithLogitsLoss is used to support the multi-label structure, independently evaluating the presence/absence of each label.

The following basic parameters were used:

Number of epochs: 5;

Batch size: 16;

Learning rate:

(3 \times 10^{- 5})

;

Weight decay: 0.01;

Early stopping: early_stopping_patience = 8.

Figure 14 shows the change in training and validation losses of the BERT + CNN model per epoch.

The obtained metrics show how the model performed on the GoEmotions validation set.

Eval Loss: 0.196;

Accuracy: 55.8%;

Macro F1: 58.96%;

Precision: 66.27%;

Recall: 53.72%.

Table 5 contains the report summarizing class-based details.

Binary confusion matrices generated for each emotion label are shown in Figure 15.

The results indicate that the BERT + CNN architecture performs relatively well on common emotions like joy, love, and neutral, while for less frequent emotions like disgust and surprise, the F1-score declines due to the model’s low recall. The convolutional layer provided an advantage in capturing n-gram-like approaches within the sequence, achieving a macro F1 of 59% and a weighted F1 of 65%. This is consistent with the results obtained by BERT + CNN hybrids in [36]. Although the training time was slightly longer compared to the base BERT model, the CNN layer was observed to be lighter than recurrent networks like BiLSTM. This BERT + CNN based model offers good performance in multi-label emotion analysis tasks with a macro F1 of 59%.

BERT + SCNN

In this experiment, a hybrid architecture for multi-label emotion classification was designed by combining a BERT-based embedding layer with a Sequence-based CNN (SCNN) approach. The designed model comprised several key components. Initially, a Bert-base-uncased model, configured for multi-label classification, served as the BERT encoder; its output was a tensor of size

(N, L, C)

, representing batch_size, sequence_length, and hidden_dimension, respectively. Following the encoder, multiple Conv1D layers were employed, each utilizing different kernel sizes (e.g., 3, 4, 5).

The input to these convolutional layers was the BERT output, first transposed from a

(N, L, C)

shape to

(N, C, L)

. Each convolution layer used a filter size yielding out_channels = 128 and applied a ReLU activation function. Subsequently, Global Max Pooling was performed over the output of each Conv1D layer. The pooled outputs from each convolution layer were then concatenated horizontally, creating a combined feature vector. This concatenation allowed features from different n-gram windows, captured by the multiple convolution filters, to be represented together.

This combined feature vector, with a dimensionality equal to out_channels multiplied by the number of different kernel sizes

(l e n (k e r n e l_s i z e s))

, was then passed to a fully connected linear layer. The output of this layer produced logits for num_labels = 9 dimensions. Finally, the BCEWithLogitsLoss function was used to support the multi-label structure of the model.

The model includes a Bert-base-uncased based encoder and Conv1D layers. The following basic hyperparameters were used:

Number of epochs: 5;

Train batch size: 16;

Learning rate:

(3 \times 10^{- 5})

;

Weight decay: 0.01;

Early Stopping Callback: early_stopping_patience = 8.

The total training time is similar to other BERT-based approaches, although the additional parameters of the Conv1D layers can partially extend the training. In the current setup, training approached optimal performance around 3–4 epochs.

Figure 16 shows the change in loss values obtained by the model on the training and validation sets per epoch. While the training loss shows a steady decrease from the first epoch, the validation loss tended to rise and fluctuate after the 2nd–3rd epoch, then suddenly decreased to a certain level, indicating ongoing learning.

Figure 17 shows the per-class precision, recall, and F1-scores for the BERT + SCNN model.

Table 6 contains the report summarizing class-based details.

The SCNN architecture aimed to enrich the embedding layer from BERT by focusing on various n-gram-like information through different kernel sizes. However, the results show more pronounced improvement in majority emotions, while progress for minority emotions remains limited due to the current data distribution and model complexity.

4. Results

4.1. Comparative Model Performance

This section collectively discusses the general performance metrics and observations of the models detailed in previous sections, followed by a rationale for the chosen model. Table 7 summarizes the basic metrics (macro F1, weighted F1, and accuracy) for each model.

The BERT + CNN model was chosen among the fine-tuned models for the following primary reasons:

Performance and Trainability Balance: Macro F1 (59%) and weighted F1 (65%) values were comparable to or slightly better than those of BERT-base, DistilBERT, and BERT + BiLSTM. Moreover, training time was shorter than BERT + BiLSTM, offering a more practical configuration for iterative analysis.

Structural Simplicity: The Conv1D layer provides a less complex architecture compared to BiLSTM or SCNN, requiring fewer hyperparameters and facilitating easier tuning and reproducibility.

Consistency Across Emotion Classes: BERT + CNN achieved stable performance in common emotions such as joy, love, and neutral, while maintaining acceptable performance for less frequent emotions such as fear and anger. Although rare emotions (e.g., disgust, surprise) remained challenging across all models, BERT + CNN performed on par with or slightly better than alternatives.

Across all configurations, macro-F1 scores remained within the 0.50–0.60 range, and weighted-F1 scores within 0.60–0.65, reflecting the inherent difficulty of multi-label emotion classification in imbalanced datasets. Common emotions were more reliably captured, while rare emotional expressions posed greater detection challenges. The convolutional layer enhanced the model’s ability to capture localized affective cues at the sentence level, complementing BERT’s contextual embeddings. From a systems perspective, BERT + CNN served as a reliable instrumentation layer, enabling consistent measurement of both dominant and subtle emotional signals. While the overall performance is modest relative to single-label sentiment benchmarks, it is consistent with prior work on GoEmotions, and sufficient to support system-level comparisons between script-encoded and audience-expressed emotions. Therefore, all subsequent analyses were conducted using this configuration.

In addition to accuracy-based metrics, we report training costs on a single NVIDIA RTX 3060 (8 GB) with PyTorch + CUDA. Average fine-tuning times were ≈0.75 h for the BERT-based model, 0.44 h for DistilBERT, 1.11 h for the RoBERTa-based model, 0.89 h for BERT + CNN, 0.87 h for BERT + SCNN, and 9.04 h for BERT + BiLSTM. The BiLSTM hybrid incurs the largest training time due to sequential layers, whereas CNN/SCNN hybrids add only minor overhead to the baseline.

In light of these factors, the BERT + CNN model was selected because it offered an efficient balance of accuracy, stability, and computational feasibility.

4.2. Script–Review Comparisons

The first stage of analysis focused on examining the alignment between scripted emotions and audience responses across representative films from comedy, horror, and drama. Figure 18, Figure 19 and Figure 20 present emotion distribution comparisons for selected case studies.

The analysis of the comedy film Ace Ventura: Pet Detective reveals a notable difference between the emotions present in the script and audience reviews, as shown in Figure 18. While the script predominantly contains neutral emotions, audience reviews highlight a strong presence of joy. This suggests that comedic films may use balanced narrative tones, yet audiences experience and express a higher degree of positive emotions. Interestingly, the script exhibits higher levels of anger and sadness compared to audience feedback, implying that viewers tend to focus more on humor while overlooking negative emotions.

In the horror-thriller Final Destination, fear was the most prevalent emotion in the script. Nevertheless, as depicted in Figure 19, audience reviews demonstrated less intense fear and unexpectedly elevated levels of joy. This suggests that the film’s exaggerated fear sequences may have prompted a release of emotional tension, leading some viewers to experience relief or even amusement after the initial shock. Disgust also appeared more prominently in reviews than in the script, reflecting attention to disturbing or shocking visual scenes.

The drama film The Green Mile demonstrates a significant discrepancy between script emotions and audience responses, as illustrated in Figure 20. While the script contains higher levels of anger, audience reviews emphasize sadness, indicating that viewers connect more emotionally with the film’s dramatic elements. Additionally, the higher frequency of joy in audience feedback suggests that the film’s uplifting moments leave a lasting positive impression, possibly reflecting catharsis, relief, or hope experienced throughout the film’s narrative.

Taken together, these case studies demonstrate that audience responses often diverge from the emotional blueprint of scripts, particularly in genres where multimodal elements (acting, cinematography, and sound design) strongly shape emotional reception.

4.3. Temporal Dynamics of Scripted Emotions

A longitudinal analysis was conducted to evaluate how emotions encoded in film scripts evolved across the period 1920–2020. Figure 21 illustrates temporal dynamics, revealing that scripts from earlier decades, particularly in the mid-20th century, exhibited higher proportions of joy and love. Over time, however, these positive emotions declined, while sadness showed a modest upward trend, particularly in the late 20th century and early 21st century.

These patterns suggest a cultural transition toward more complex and somber emotional narratives in contemporary cinema. The gradual decline of overtly positive emotions (e.g., joy) reflects broader societal transformations, including increased realism, moral ambiguity, and heightened engagement with existential themes. From a systemic perspective, this trajectory indicates that the emotional design of scripts evolves in interaction with cultural environments, highlighting the adaptive nature of narrative effect in response to shifting audience expectations and socio-historical contexts.

4.4. Genre-Based Emotional Structures

To further investigate systematic differences across film categories, a genre-level comparative analysis was performed. Figure 22 and Figure 23 summarize emotional distributions across comedy, drama, and horror.

As shown in Figure 22, comedy scripts exhibited a relatively broad distribution of emotions, with significant joy-related expressions, and audience reviews consistently amplified this joy, underscoring the role of performance and delivery in enhancing comedic impact.

Drama scripts were dominated by sadness, yet audience reviews displayed considerably lower sadness levels, while showing relatively higher levels of joy. This suggests that audiences process dramatic narratives through a broader interpretive lens, emphasizing redemptive or uplifting elements, such as empathy, appreciation of storytelling, or moral reflection, rather than reproducing sadness in their evaluations.

Figure 23 presents a radar chart comparing the emotional profiles of scripts across three major film genres—Comedy, Drama, and Horror. The chart visualizes distinct affective patterns: Comedy is dominated by joy, with minimal presence of negative emotions; Drama is characterized by high levels of sadness and moderate expressions of joy and curiosity; and Horror places strong emphasis on fear and surprise, showing low levels of positive affect. Overall, the figure highlights how each genre maintains a different emotional profile, reflecting its narrative conventions and audience expectations.

These findings reinforce the notion that genres act as affective subsystems, establishing emotional conventions that guide both scriptwriting and audience interpretation. Yet, audience responses frequently reinterpret these conventions, producing emergent affective patterns not fully anticipated by script-level design.

4.5. Relationship Between Script Emotions and Film Success

The final stage of analysis explored potential correlations between script-level emotional content and film-level success metrics, namely financial profit (calculated as revenue minus budget) and audience ratings. Results are presented in Figure 24 and Figure 25. Each dot is a single film in the selected genre. The x-value is the normalized proportion of the chosen script emotion (0–1). The y-value is film profit in the left panel and TMDB rating in the right panel. The solid line is the ordinary least squares (OLS) fitted trend that shows the association between emotion level and the outcome. A positive slope indicates higher profit or rating as the emotion proportion increases.

In adventure films, higher proportions of anger in scripts correlated with lower profits and ratings, suggesting that excessive negativity in narrative tone may hinder both commercial and critical reception. By contrast, sadness in scripts demonstrated a modest positive association with profitability, though no clear effect was observed for ratings. These findings are visualized in Figure 24 (Relationship between anger/sadness in adventure genre scripts and film profit and TMDB rating).

In drama films, the presence of love in scripts showed little relationship with profitability but exhibited a negative correlation with audience ratings. Similarly, higher levels of sadness corresponded to lower ratings. These relationships are illustrated in Figure 25 (Relationship between love/sadness in drama genre scripts and film profit and TMDB rating).

Overall, these results suggest that while script-level emotions may influence reception, they interact with genre conventions, cultural timing, and production factors such as direction, casting, and marketing. From a systems perspective, emotions represent one critical subsystem influencing film outcomes, but their effects are mediated by broader socio-technical components within the cinematic ecosystem.

5. Discussion

This study conceptualized cinema as a socio-technical system in which scripts, production processes, distribution platforms, and audiences interact to generate emergent affective and cultural outcomes [1,2]. Within this system, emotions operate as systemic signals: scripts encode them as design inputs, production and performance transform them into multimodal artifacts, and audiences articulate them as realized outputs through reviews and ratings. Building on this systems framing, we examined how emotional intentions embedded in scripts are preserved, transformed, or reinterpreted as they propagate through the cinematic ecosystem.

Accordingly, our analysis addressed three guiding research questions concerning (i) emotional fidelity between scripted and audience-expressed emotions, (ii) how emotional fidelity varies across genres and historical periods, and (iii) whether emotional fidelity relates to system performance indicators such as profitability and audience ratings. This reformulation treats emotional expression not as a direct transmission of authorial intent but as an emergent property of socio-technical interaction across system components.

5.1. Emotional Fidelity Between Scripts and Audience Responses

The first research question asked how emotional fidelity between scripted emotions (system inputs) and audience responses (system outputs) can be interpreted within a systems framework. The findings revealed that emotional fidelity is not fixed but dynamic and context-dependent, varying across films, genres, and historical periods. For example, Final Destination encoded fear, but audiences often reported enjoyment, and The Green Mile scripted anger yet elicited sadness and joy in reviews.

These divergences echo insights from narrative transportation theory [51], which suggests that immersion fosters engagement but does not guarantee convergence with intended affective trajectories. Comparable discrepancies have been observed in narrative media research: Hipson and Mohammad [52] found that dialogues often become increasingly negative, yet audience interpretations vary; Rico García et al. [7] highlighted similar divergences in interactive storytelling. Our results extend these studies by showing that such divergences are not isolated anomalies but observable patterns across diverse films and time periods.

From a systems perspective, these mismatches may be understood as inefficiencies in affective transmission. At the same time, they serve adaptive functions: by reinterpreting emotions, audiences co-construct meaning and generate cultural relevance. To capture this process, we conceptualize emotional fidelity as the degree to which affective intentions are preserved, transformed, or inverted as they propagate across system components (script → production → distribution → audience). Unlike measures of alignment that treat emotional match or mismatch as binary, emotional fidelity frames emotional transmission as a spectrum shaped by audience interpretation, genre conventions, and situational context.

Our analysis shows that emotional fidelity is a dynamic system-level behavior that shifts as cultural expectations, narrative norms, and audience interpretive practices evolve across genres and historical periods. This means that emotional fidelity may be understood not as a direct reflection of authorial intent, but as an emergent outcome of socio-technical interactions within the cinematic system.

5.2. Temporal and Genre Dynamics

The second research question addressed how systemic affective signals evolve across genres and historical periods. Longitudinal analysis revealed a decline in joy and love and a modest rise in sadness between 1920 and 2020. These shifts resonate with the study of [44], which demonstrated recurring “emotional arcs” in narratives that nonetheless adapt to cultural transformations. They also align with Wuss’s (2009) argument that cinema increasingly gravitates toward darker, psychologically complex themes, reflecting broader socio-cultural anxieties [8,55,56,57,58].

Genre-level findings further showed that subsystems exhibit distinctive affective profiles. Comedy scripts displayed a relatively balanced emotional profile, but audience reviews disproportionately amplified joy, aligning with insights on the psychological role of humor. Drama scripts were saturated with sadness, yet audiences highlighted joy, suggesting catharsis, relief, and moral reflection [55]. Horror relied on fear, but reviews foregrounded joy, supporting [3,20], who observed unintended affective responses in user-generated content. These patterns indicate that emotional transmission is shaped and transformed by genre-specific interpretive conventions.

From the standpoint of emotional fidelity, genres function as semi-autonomous subsystems with characteristic patterns of affective transformation: comedy tends to amplify positive affect, drama frequently converts sadness into reflective empathy, and horror often inverts fear into pleasurable excitement. Thus, emotional fidelity varies across genres and is not a random deviation but an emergent expression of genre-bound audience meaning-making practices.

Moreover, temporal shifts act as broader cultural drivers that recalibrate emotional fidelity; as societal norms evolve, the degree to which scripted emotional intentions align with or diverge from audience interpretations also changes. This finding indicates that emotional fidelity is dynamic and historically contingent..

5.3. Scripted Emotions and System Performance

The third research question examined correlations between script-level emotions and system performance (profit and ratings). Results showed modest and inconsistent associations: anger in adventure films was negatively correlated with profitability and ratings, while sadness showed a modest positive relationship with profitability. In dramas, excessive sadness and love corresponded with lower ratings.

These findings align with prior work indicating that affective content influences outcomes but interacts with multiple mediating factors. Frangidis et al. [11] found that sentiment contributes modestly to rating prediction. Kim et al. [49] demonstrated that text mining explains only part of box office performance, with marketing, casting, and cultural timing also playing critical roles. Our results corroborate these insights: emotions are necessary but not sufficient predictors of performance. From a systems perspective, profitability and ratings are emergent outcomes shaped not only by emotional expression in scripts but also by a set of interacting sub-systems, including production quality, marketing strategies, distribution reach, cast visibility, and cultural timing. In other words, emotional fidelity may support or hinder system performance depending on genre conventions and cultural expectations, but it does not solely determine commercial or critical outcomes.

5.4. Broader Implications and Contributions

Taken together, the results demonstrate that cinema operates as a dynamic feedback system where emotional fidelity between scripted intentions and audience responses is actively shaped through interpretation rather than automatically mirrored. This study contributes to both systems research and media studies with implications that extend beyond cinema into the study of socio-technical systems more broadly.

Conceptual Contribution: This study advances the construct of emotional fidelity as a systems-level alignment metric, bridging narrative design and audience reception. For example, when a horror script encodes fear but audiences report joy, this indicates a context-dependent transformation in affective transmission. Conversely, when audience reviews echo the intended suspense or catharsis, it reflects temporary convergence between intended and experienced affect. Thus, emotional fidelity can be understood as a context-dependent and evolving system property, reflecting how the audience reshapes the affective intentions encoded in narratives.

Methodological Contribution: This study demonstrates how transformer-based models can serve as affective instrumentation, enabling fine-grained monitoring of input–output affective flows across domains. In practice, the BERT + CNN classifier functions like a sensor that records emotional signals at both the design stage (scripts) and the reception stage (reviews), making divergences visible to researchers, filmmakers, and system designers. The model achieved sufficient performance (macro F1 = 0.59, weighted F1 = 0.65) across multiple architectures.

Model Limitations and Interpretive Boundaries: While transformer-based classifiers provide a valuable analytical instrument, they are sensitive to class imbalance, contextual nuance, and domain variation in emotional expression. Automated classification captures linguistic signals but not multimodal cues. In particular, automated emotion detection inherently focuses on textual and linguistic information and cannot account for multimodal affective cues such as acting, music, or cinematography that strongly influence audience perception in film. Moreover, the model does not identify emotions with perfect accuracy; however, with macro F1 ≈ 0.59 and weighted F1 ≈ 0.65, it provides sufficiently reliable and informative approximations of emotional tendencies at the aggregate level [40]. Therefore, emotional divergences should be interpreted as probabilistic indicators of systemic affective dynamics rather than exact reflections of individual viewer experience.

Additionally, external factors such as marketing, production quality, distribution reach, and star power may also influence audience emotions and ratings beyond the textual content of reviews. Thus, the observed patterns primarily reflect linguistic manifestations of audience emotion rather than the full spectrum of cinematic experience.

In addition, all results and interpretations are based on the specific dataset used in this study, which comprises 24,429 audience reviews from 1225 unique films. While the corpus is large and diverse, it may not fully represent all cinematic genres, cultures, or languages. Accordingly, the findings could be viewed as indicative of general tendencies.

Empirical Contribution: This study provides large-scale fine-grained emotions, century-spanning evidence of temporal shifts, genre-specific profiles, and correlations with success metrics. Longitudinal analysis also indicates that some decades have seen a rise in darker tonalities, suggesting systemic adaptation to cultural anxieties and social change.

Practical Contribution: This study offers actionable insights for filmmakers, producers, and marketers. Narratives over-reliant on negative tonalities (e.g., anger) may reduce commercial viability, as seen in box office underperformance of films whose scripts emphasized anger. By contrast, films that balance affective trajectories—combining tension with moments of sadness, excitement or catharsis—tend to achieve higher ratings and profitability. For instance, audience reviews of successful adventure films often emphasize relief and wonder alongside sadness, suggesting that layered emotional design sustains engagement more effectively. These findings may guide adaptive storytelling strategies for screenwriters, script revisions, and marketing campaigns that resonate with audience expectations.

Furthermore, emotional fidelity can also be translated into practical tools and workflows across creative and technological domains. For instance, filmmakers may use emotional alignment dashboards to compare intended and perceived emotions during production, while system designers can implement emotion-aware recommendation frameworks that enhance audience engagement. For scholars, emotional fidelity serves as an analytical framework for studying affective coherence across production, distribution, and reception stages.

More broadly, these applications illustrate how cinema operates as a complex adaptive system in which continuous feedback between creators and audiences drives emotional and cultural evolution. Within this system, emotional fidelity functions not as a measure of correctness but as a feedback mechanism: when audience emotions diverge from scripted intentions, these deviations can signal reinterpretation, relief, or cathartic release rather than system failure. Such divergences may reveal how audiences actively co-construct meaning and emotional resonance. Conversely, when convergence occurs—when viewers experience the intended suspense, joy, or empathy—it may also reinforce the effectiveness of the narrative design.

In this way, emotional fidelity serves as a dynamic calibration tool, enabling the cinematic ecosystem to learn and adapt iteratively to evolving audience expectations and cultural contexts.

5.5. Future Research Directions

Future work can extend this framework in several directions. First, multimodal integration could capture affective transmission beyond text by incorporating visual and auditory features [6]. Second, audience segmentation could analyze demographic and cultural variation [48,58] in emotional fidelity. Third, dynamic modeling approaches, such as system dynamics or agent-based simulations, could formalize feedback loops and model long-term adaptation. Finally, comparative studies across media domains—such as television, gaming, or immersive VR [7]—could test the generalizability of emotional fidelity as a construct for analyzing socio-technical systems.

In addition, while the present analysis inferred that genre-specific divergences—such as audiences expressing joy in response to fear or sadness—may stem from mechanisms like relief or catharsis, the underlying psychological and cultural causes remain to be empirically verified. Future research could address this interpretive gap through other approaches such as questionnaires [59], interviews, or focus groups [60] to better understand how viewers cognitively and emotionally reinterpret cinematic emotions.

Additionally, future work could translate emotional fidelity into practical tools and workflows for creative industries. For filmmakers, this may include dashboards that visualize emotional alignment between scripts and audience feedback. For system designers, emotional fidelity could guide adaptive content curation or audience recommendation algorithms. For scholars, integrating this metric with multimodal datasets could open new pathways for studying affective coherence in digital storytelling and human–AI interaction. Such applied frameworks would enhance the utility of emotional fidelity beyond theory, making it actionable for design and production.

Furthermore, future work may explore strategies to improve computational efficiency. Optimization for real-time or streaming applications could also enhance scalability. Beyond computational efficiency, subsequent studies may focus on enhancing methodological robustness and model accuracy by expanding and balancing datasets, improving representativeness across genres and time periods, and incorporating richer contextual or multimodal cues. Future research may also establish stronger validation frameworks that connect emotion classification results with behavioral or engagement-based outcomes, thereby strengthening interpretive depth and empirical reliability.

6. Conclusions

This study advanced a systems-theoretical perspective on cinema by treating scripts as design inputs and audience reviews as realized outputs within a socio-technical ecosystem. By introducing emotional fidelity as a construct, we provided a means of quantifying alignment and divergence between intended and perceived emotions.

Empirical findings from a century of films (1920–2020) revealed three core insights. First, emotions encoded in scripts frequently diverged from those expressed in reviews, underscoring that affective transmission across system boundaries is dynamic. Second, emotional trajectories shifted historically and varied by genre, confirming that both temporal change and subsystem conventions shape affective outcomes. Third, while script-based emotions showed some association with profitability and ratings, these effects were modest, highlighting that system performance emerges from the interaction of multiple subsystems rather than from script design alone.

The contributions of this work are conceptual, methodological, empirical, and practical. Conceptually, it operationalizes emotions as systemic performance signals. Methodologically, it demonstrates how transformer-based models can serve as instrumentation for monitoring cultural systems. Empirically, it provides century-spanning, cross-genre evidence of affective dynamics. Practically, it offers insights for industry practitioners to foster engagement and commercial viability.

More broadly, this research positions cinema as a prototypical socio-technical system in which design and reception may be mediated through feedback loops. The framework introduced here suggests that emotion mining can serve as a monitoring infrastructure not only for film but also for other complex cultural and technological systems, from immersive media to algorithmic content platforms. Future work extending this approach across modalities, demographics, and domains can further advance our understanding of how emotions shape the adaptive behavior of socio-technical systems.

Funding

This research received no external funding.

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Acknowledgments

The author would like to thank Talha Karaca, a Computer Engineering student at Yildiz Technical University, for his valuable technical support in data extraction and data analyses. During the preparation of this manuscript, the author used ChatGPT (OpenAI, 2025 version) to polish the language and refine the clarity of academic expression. The author has reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflicts of interest.

References

Laszlo, A.; Krippner, S. Chapter 3—Systems Theories: Their Origins, Foundations, and development. Adv. Psychol. 1998, 126, 47–74. [Google Scholar] [CrossRef]
Preece, D.; Laurila, J. (Eds.) Technological Change and Organizational Action; Routledge Studies in Technology, Work and Organizations, No. 2; Routledge: London, UK; New York, NY, USA, 2003. [Google Scholar]
Booth, F.; Potts, C.; Bond, R.; Mulvenna, M.D.; Ennis, E.; Mctear, M.F. Review mining to discover user experience issues in mental health and wellbeing chatbots. In Proceedings of the 33rd European Conference on Cognitive Ergonomics, Kaiserslautern, Germany, 4–7 October 2022; ACM: New York, NY, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
Chen, R.; Yan, H.; Liu, X. Negative Emotional Value of Players: Exploring the Time Investment Dimension from Chinese Mobile Game Players’ Reviews. Int. J. Hum.–Comput. Interact. 2024, 41, 4081–4100. [Google Scholar] [CrossRef]
Öcal, A. BERT-Based Sentiment Analysis of Turkish e-Commerce Reviews: Star Ratings Versus Text. Sak. Univ. J. Comput. Inf. Sci. 2025, 8, 677–687. [Google Scholar] [CrossRef]
Dowling, D.; Fearghail, C.O.; Smolic, A.; Knorr, S. Faoladh: A Case Study in Cinematic VR Storytelling and Production. In Interactive Storytelling; Springer: Cham, Switzerland, 2018; pp. 359–362. [Google Scholar] [CrossRef]
Garcia, O.D.R.; Fernandez, J.F.; Saldana, B.R.A.; Witkowski, O. Emotion-Driven Interactive Storytelling: Let Me Tell You How to Feel. In Artificial Intelligence in Music, Sound, Art and Design; Martins, T., Rodríguez-Fernández, N., Rebelo, S.M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2022; Volume 13221, pp. 259–274. [Google Scholar] [CrossRef]
Wuss, P. Cinematic Narration and Its Psychological Impact: Functions of Cognition, Emotion and Play; Cambridge Scholars Publishing: Newcastle Upon Tyne, UK, 2009. [Google Scholar]
Ocal, A. Framing, Emotions, Salience: The Future of AI as Seen by Redditors. Ph.D. Thesis, Syracuse University, Syracuse, NY, USA, 2023. Available online: https://www.proquest.com/docview/2845416849 (accessed on 19 October 2023).
Ocal, A. Moral Decision-making with Artificial Intelligence. In Proceedings of the 2025 IEEE International Symposium on Ethics in Engineering, Science, and Technology (ETHICS), Evanston, IL, USA, 6–8 June 2025; Available online: https://ieeexplore.ieee.org/abstract/document/11098393 (accessed on 22 June 2025).
Frangidis, P.; Georgiou, K.; Papadopoulos, S. Sentiment Analysis on Movie Scripts and Reviews: Utilizing Sentiment Scores in Rating Prediction. In Artificial Intelligence Applications and Innovations; Maglogiannis, I., Iliadis, L., Pimenidis, E., Eds.; IFIP Advances in Information and Communication Technology; Springer International Publishing: Cham, Switzerland, 2020; Volume 583, pp. 430–438. [Google Scholar] [CrossRef]
Ghasiya, P.; Okamura, K. Investigating COVID-19 News Across Four Nations: A Topic Modeling and Sentiment Analysis Approach. IEEE Access 2021, 9, 36645–36656. [Google Scholar] [CrossRef]
Jassim, M.A.; Abd, D.H.; Omri, M.N. A survey of sentiment analysis from film critics based on machine learning, lexicon and hybridization. Neural Comput. Appl. 2023, 35, 9437–9461. [Google Scholar] [CrossRef]
Cherradi, M.; El Haddadi, A. Comparative Analysis of Machine Learning Algorithms for Sentiment Analysis in Film Reviews. Acadlore Trans. AI Mach. Learn. 2024, 3, 137–147. [Google Scholar] [CrossRef]
Ocal, A.; Crowston, K. Framing and feelings on social media: The futures of work and intelligent machines. Inf. Technol. People 2024, 37, 2462–2488. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
Dong, J.; He, F.; Guo, Y.; Zhang, H. A Commodity Review Sentiment Analysis Based on BERT-CNN Model. In Proceedings of the 2020 5th International Conference on Computer and Communication Systems (ICCCS), Shanghai, China, 15–18 May 2020; pp. 143–147. [Google Scholar] [CrossRef]
He, P.; Gao, J.; Chen, W. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv 2023, arXiv:2111.09543. [Google Scholar] [CrossRef]
Liu, Y.; Maier, W.; Minker, W.; Ultes, S. Empathetic Dialogue Generation with Pre-trained RoBERTa-GPT2 and External Knowledge. In Conversational AI for Natural Human-Centric Interaction; Stoyanchev, S., Ultes, S., Li, H., Eds.; Springer Nature: Singapore, 2022; pp. 67–81. [Google Scholar]
Paneru, B.; Thapa, B.; Paneru, B. Sentiment analysis of movie reviews: A flask application using CNN with RoBERTa embeddings. Syst. Soft Comput. 2025, 7, 200192. [Google Scholar] [CrossRef]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2020, arXiv:1910.01108. [Google Scholar] [CrossRef]
Saletta, M.; Kruger, A.; Primoratz, T.; Barnett, A.; van Gelder, T.; Horn, R.E. The role of narrative in collaborative reasoning and intelligence analysis: A case study. PLoS ONE 2020, 15, e0226981. [Google Scholar] [CrossRef]
Love, H.A.; Adamson, G.; James, M.; Lajoie, J.; Mareels, I.; Pearl, Z.; Schiff, D.S.; Schmitt, K.; Arohi, T.; Buchanan, J.; et al. The Future of Work in the Age of Automation: Proceedings of a Workshop on Norbert Wiener’s 21st Century Legacy. IEEE Trans. Technol. Soc. 2024, 1–23. Available online: https://ieeexplore.ieee.org/document/10798994 (accessed on 22 June 2025). [CrossRef]
Mohammad, S.M.; Turney, P.D. Crowdsourcing a Word–Emotion Association Lexicon. Comput. Intell. 2013, 29, 436–465. [Google Scholar] [CrossRef]
van Atteveldt, W.; van der Velden, M.A.C.G.; Boukes, M. The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms. Commun. Methods Meas. 2021, 15, 121–140. [Google Scholar] [CrossRef]
Ocal, A. Perceptions of the Future of Artificial Intelligence on Social Media: A Topic Modeling and Sentiment Analysis Approach. IEEE Access 2024, 12, 182386–182409. [Google Scholar] [CrossRef]
Dalgali, A.; Crowston, K. Sharing Open Deep Learning Models. In Proceedings of the 52nd Hawaii International Conference on System Sciences, Maui, HI, USA, 8–11 January 2019. [Google Scholar] [CrossRef]
Ruan, H. A Comparison of Long Short-Term Memory, Convolutional Neural Network, Transformer, and Mamba Models for Sentiment Analysis. 2024. Available online: https://hdl.handle.net/1828/20610 (accessed on 22 June 2025).
Yang, K.; Lee, D.; Whang, T.; Lee, S.; Lim, H. EmotionX-KU: BERT-Max based Contextual Emotion Classifier. arXiv 2019, arXiv:1906.11565. [Google Scholar] [CrossRef]
Wang, K.; Jing, Z.; Su, Y.; Han, Y. Large Language Models on Fine-grained Emotion Detection Dataset with Data Augmentation and Transfer Learning. arXiv 2024, arXiv:2403.06108. [Google Scholar] [CrossRef]
Ahanin, Z.; Ismail, M.A.; Singh, N.S.S.; AL-Ashmori, A. Hybrid Feature Extraction for Multi-Label Emotion Classification in English Text Messages. Sustainability 2023, 15, 12539. [Google Scholar] [CrossRef]
Sosea, T.; Caragea, C. CancerEmo: A Dataset for Fine-Grained Emotion Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 8892–8904. [Google Scholar] [CrossRef]
Areshey, A.; Mathkour, H. Exploring transformer models for sentiment classification: A comparison of BERT, RoBERTa, ALBERT, DistilBERT, and XLNe. Expert Syst. 2024, 41, e13701. [Google Scholar] [CrossRef]
Tan, K.L.; Lee, C.P.; Lim, K.M.; Anbananthen, K.S.M. Sentiment Analysis with Ensemble Hybrid Deep Learning Model. IEEE Access 2022, 10, 103694–103704. [Google Scholar] [CrossRef]
Chen, M. Emotion Analysis Based on Deep Learning with Application to Research on Development of Western Culture. Front. Psychol. 2022, 13, 911686. [Google Scholar] [CrossRef]
Tembhurne, J.V.; Agrawal, A.; Lakhotia, K. COVID-19 Twitter Sentiment Classification Using Hybrid Deep Learning Model Based on Grid Search Methodology. arXiv 2024, arXiv:2406.10266. [Google Scholar]
Zahiri, S.M.; Choi, J.D. Emotion Detection on TV Show Transcripts with Sequence-Based Convolutional Neural Networks. In Proceedings of the AAAI Workshops, New Orleans, LA, USA, 2–7 February 2018; pp. 44–52. [Google Scholar]
Acheampong, F.A.; Nunoo-Mensah, H.; Chen, W. Transformer models for text-based emotion detection: A review of BERT-based approaches. Artif. Intell. Rev. 2021, 54, 5789–5829. [Google Scholar] [CrossRef]
Belcak, P.; Heinrich, G.; Diao, S.; Fu, Y.; Dong, X.; Muralidharan, S.; Lin, Y.C.; Molchanov, P. Small Language Models are the Future of Agentic AI. arXiv 2025, arXiv:2506.02153. [Google Scholar] [CrossRef]
Demszky, D.; Movshovitz-Attias, D.; Ko, J.; Cowen, A.; Nemade, G.; Ravi, S. GoEmotions: A Dataset of Fine-Grained Emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 4040–4054. [Google Scholar]
Marco, G.; Rello, L.; Gonzalo, J. Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs. arXiv 2025, arXiv:2506.02153. [Google Scholar] [CrossRef]
Bang, J. Script Analysis: Deconstructing Screenplay Fundamentals, 1st ed.; Routledge: London, UK, 2022. [Google Scholar] [CrossRef]
Kobacker, S.R. The Hero’s Journey in Narrative Media: The Female Model. Ph.D. Thesis, Faculty of Arts, Macquarie University, Sydney, Australia, 2019. [Google Scholar]
Reagan, A.J.; Mitchell, L.; Kiley, D.; Danforth, C.M.; Dodds, P.S. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Sci. 2016, 5, 31. [Google Scholar] [CrossRef]
Dalgali, A.; Crowston, K. Factors Influencing Approval of Wikipedia Bots. In Proceedings of the Hawaii International Conference on System Sciences, Maui, HI, USA, 7–10 January 2020; p. 10. [Google Scholar] [CrossRef]
Dalgali, A.; Crowston, K. Algorithmic Journalism and Its Impacts on Work. Comput.+J. Symp. 2020. Available online: https://cj2020.northeastern.edu/research-papers// (accessed on 22 June 2025).
Zheng, H. Artificial intelligence-driven sentiment analysis and optimization of movie scripts. Discov. Artif. Intell. 2025, 5, 114. [Google Scholar] [CrossRef]
Öcal, A.; Xiao, L.; Park, J. Reasoning in social media: Insights from Reddit “Change My View” submissions. Online Inf. Rev. 2021, 45, 1208–1226. [Google Scholar] [CrossRef]
Kim, Y.; Kang, M.; Ryul Jeong, S. Text Mining and Sentiment Analysis for Predicting Box Office Success. KSII Trans. Internet Inf. Syst. 2018, 12, 4090–4102. [Google Scholar] [CrossRef]
Bode, L. Transitional tastes: Teen girls and genre in the critical reception of Twilight. Continuum 2010, 24, 707–719. [Google Scholar] [CrossRef]
Green, M.C.; Fitzgerald, K. Transportation Theory Applied to Health and Risk Messaging. In Oxford Research Encyclopedia of Communication; Oxford University Press: Oxford, UK, 2017. [Google Scholar] [CrossRef]
Hipson, W.E.; Mohammad, S.M. Emotion dynamics in movie dialogues. PLoS ONE 2021, 16, e0256153. [Google Scholar] [CrossRef]
Feng, Q.; Xie, Q.; Wang, X.; Li, Q.; Zhang, Y.; Feng, R.; Zhang, T.; Gao, S. EmoCharacter: Evaluating the Emotional Fidelity of Role-Playing Agents in Dialogues. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, Albuquerque, NM, USA, 29 April–4 May 2025; pp. 6218–6240. [Google Scholar]
Cloud, D.L.; Feyh, K.E. Reason in Revolt: Emotional Fidelity and Working Class Standpoint in the “Internationale”. Rhetor. Soc. Q. 2015, 45, 300–323. [Google Scholar] [CrossRef]
Brown, N. The Feel-Good Film: A Case Study in Contemporary Genre Classification. Q. Rev. Film Video 2015, 32, 269–286. [Google Scholar] [CrossRef]
Grodal, T. How film genres are a product of biology, evolution and culture—An embodied approach. Palgrave Commun. 2017, 3, 17079. [Google Scholar] [CrossRef]
Hanmei, W. Deep Learning-Based Analysis of Emotional and Content Relevance Between Bullet Screens and Subtitles as Movie Narrative Medium. Sage Open 2024, 14, 21582440241280840. [Google Scholar] [CrossRef]
Ocal, A. Perceptions of AI Ethics on Social Media. In Proceedings of the 2023 IEEE International Symposium on Ethics in Engineering, Science and Technology (ETHICS), West Lafayette, IN, USA, 18–20 May 2023. [Google Scholar] [CrossRef]
Öcal, A.; Turanlı, N. Evaluation of Teaching Practice Performance of Students of Secondary Mathematics Education by Fuzzy Logic. In Chaos, Complexity and Leadership 2017; Erçetin, Ş.Ş., Potas, N., Eds.; Springer Proceedings in Complexity; Springer International Publishing: Cham, Switzerland, 2019; pp. 153–162. [Google Scholar]
Öcal, A. Belirtisiz Mantıktan Yararlanılarak Ortaöğretim Matematik Öğretmenliği Öğrencilerinin Öğretmenlik Uygulaması Başarılarının Değerlendirilmesi. Master’s Thesis, Hacettepe University, Ankara, Turkey, 2015. [Google Scholar]

Figure 1. General block diagram of methodology.

Figure 2. Normalized emotion co-occurrence matrix.

Figure 3. Data imbalance in the GoEmotions dataset.

Figure 4. Change in training and validation losses per epoch for BERT-based model.

Figure 5. Per-class precision, recall, and F1-scores for BERT-based model.

Figure 6. Confusion matrices for each emotion class for BERT-based model. Note. The color intensity reflects the number of samples in each cell; darker blue shades indicate higher counts, whereas lighter shades indicate lower counts.

Figure 7. Change in training and validation losses for DistilBERT model.

Figure 8. DistilBERT per-class precision, recall, and F1-scores.

Figure 9. DistilBERT confusion matrices for each emotion class. Note. The color intensity reflects the number of samples in each cell; darker blue shades indicate higher counts, whereas lighter shades indicate lower counts.

Figure 10. Change in training and validation losses for RoBERTa model.

Figure 11. RoBERTa per-class precision, recall, and F1-scores.

Figure 12. BERT + BiLSTM change in training and validation losses per epoch.

Figure 13. BERT + BiLSTM per-class precision, recall, and F1-scores.

Figure 14. BERT + CNN change in training and validation losses per epoch.

Figure 15. BERT + CNN binary confusion matrices. Note. The color intensity reflects the number of samples in each cell; darker blue shades indicate higher counts, whereas lighter shades indicate lower counts.

Figure 16. BERT + SCNN change in training and validation losses per epoch.

Figure 17. BERT + SCNN per-class precision, recall, and F1-scores.

Figure 18. Emotion distribution comparison for Ace Ventura: Pet Detective.

Figure 19. Emotion distribution comparison for Final Destination.

Figure 20. Emotion distribution comparison for The Green Mile.

Figure 21. Change in emotions in scripts over the years.

Figure 22. Emotion profiles in different film genres (Comedy, Drama).

Figure 23. Radar chart of emotion profiles in different film genres (Comedy, Drama, Horror).

Figure 24. Relationship between anger/sadness in adventure genre scripts and film profit and TMDB rating.

Figure 25. Relationship between love/sadness in drama genre scripts and film profit and TMDB rating.

Table 1. Classification report for BERT-based model.

Class	Precision	Recall	F1-Score	Support
joy	0.83	0.76	0.79	1836
sadness	0.65	0.59	0.62	390
anger	0.58	0.51	0.54	449
fear	0.70	0.66	0.68	105
surprise	0.51	0.43	0.47	254
disgust	0.51	0.29	0.37	384
love	0.67	0.68	0.68	473
curiosity	0.53	0.63	0.58	381
neutral	0.67	0.57	0.62	1766
micro avg	0.69	0.62	0.65	6038
macro avg	0.63	0.57	0.59	6038
weighted avg	0.69	0.62	0.65	6038

Table 2. DistilBERT classification report.

Class	Precision	Recall	F1-Score	Support
joy	0.83	0.75	0.79	1836
sadness	0.70	0.57	0.63	390
anger	0.61	0.46	0.53	449
fear	0.73	0.51	0.60	105
surprise	0.55	0.35	0.43	254
disgust	0.54	0.32	0.40	384
love	0.71	0.66	0.69	473
curiosity	0.57	0.57	0.57	381
neutral	0.71	0.54	0.62	1766
micro avg	0.72	0.59	0.65	6038
macro avg	0.66	0.53	0.58	6038
weighted avg	0.71	0.59	0.64	6038

Table 3. RoBERTa classification report.

Class	Precision	Recall	F1-Score	Support
joy	0.77	0.72	0.75	1836
sadness	0.70	0.39	0.50	390
anger	0.57	0.37	0.45	449
fear	0.78	0.30	0.43	105
surprise	0.58	0.25	0.35	254
disgust	0.53	0.19	0.28	384
love	0.74	0.55	0.63	473
curiosity	0.55	0.48	0.51	381
neutral	0.62	0.56	0.59	1766
micro avg	0.68	0.54	0.60	6038
macro avg	0.65	0.42	0.50	6038
weighted avg	0.67	0.54	0.59	6038

Table 4. BERT + BiLSTM classification report.

Class	Precision	Recall	F1-Score	Support
joy	0.83	0.76	0.79	1836
sadness	0.69	0.57	0.62	390
anger	0.62	0.47	0.53	449
fear	0.73	0.54	0.62	105
surprise	0.60	0.31	0.41	254
disgust	0.55	0.33	0.42	384
love	0.71	0.67	0.69	473
curiosity	0.59	0.56	0.57	381
neutral	0.69	0.58	0.63	1766
micro avg	0.72	0.61	0.66	6038
macro avg	0.67	0.53	0.59	6038
weighted avg	0.71	0.61	0.65	6038

Table 5. BERT + CNN Classification Report.

Class	Precision	Recall	F1-Score	Support
love	0.72	0.68	0.70	473
curiosity	0.60	0.50	0.54	381
neutral	0.68	0.57	0.62	1766
joy	0.82	0.77	0.79	1836
sadness	0.68	0.59	0.63	390
anger	0.60	0.50	0.54	449
fear	0.76	0.55	0.64	105
surprise	0.60	0.31	0.41	254
disgust	0.51	0.36	0.43	384
Micro avg	0.71	0.61	0.65	6038
Macro avg	0.66	0.54	0.59	6038
Weighted avg	0.70	0.61	0.65	6038

Table 6. BERT + SCNN classification report.

Class	Precision	Recall	F1-Score	Support
joy	0.81	0.79	0.80	1836
sadness	0.66	0.63	0.64	390
anger	0.63	0.43	0.51	449
fear	0.75	0.56	0.64	105
surprise	0.61	0.31	0.41	254
disgust	0.48	0.40	0.44	384
love	0.68	0.67	0.68	473
curiosity	0.59	0.55	0.57	381
neutral	0.71	0.52	0.60	1766
Micro avg	0.71	0.60	0.65	6038
Macro avg	0.66	0.54	0.59	6038
Weighted avg	0.70	0.60	0.64	6038

Table 7. Comparative results of different models.

Model	Macro F1 (%)	Weighted F1 (%)	Accuracy (%)
BERT-base	59.0	65.0	55.7
DistilBERT	58.2	64.0	55.2
RoBERTa	49.8	59.0	50.1
BERT + BiLSTM	59.0	65.0	57.0
BERT + CNN	59.0	65.0	55.8
BERT + SCNN	58.7	64.0	55.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ocal, A. Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity. Systems 2025, 13, 994. https://doi.org/10.3390/systems13110994

AMA Style

Ocal A. Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity. Systems. 2025; 13(11):994. https://doi.org/10.3390/systems13110994

Chicago/Turabian Style

Ocal, Ayse. 2025. "Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity" Systems 13, no. 11: 994. https://doi.org/10.3390/systems13110994

APA Style

Ocal, A. (2025). Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity. Systems, 13(11), 994. https://doi.org/10.3390/systems13110994

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cinematic Narratives as Socio-Technical Systems: Emotion Mining and Script–Audience Emotional Fidelity

Abstract

1. Introduction

2. Background and Related Work

2.1. Emotions in Narrative Media

2.2. Computational Approaches to Emotion Analysis

2.3. Scripts as Affective Blueprints

2.4. Audience Reviews as Affective Reflections

2.5. Script–Audience Alignment

2.6. Genre as Subsystems

2.7. Toward an Integrated Systems Framework

3. Materials and Methods

3.1. Datasets

3.1.1. Script Data: IMSDb Source

3.1.2. User Review Data

3.2. Emotion Classification Framework

3.2.1. Remapping of Emotion Labels

3.2.2. Multi-Label Classification Approach

3.2.3. Strategies for Handling Data Imbalance

3.2.4. Scene and Line-Based Analysis

3.2.5. Integration with Hybrid Architectures and Sequence Models

3.2.6. Model Training, Hyperparameter Settings, and Performance Evaluation

BERT-BASED Model

DistilBERT Model

RoBERTa Model

BERT + BiLSTM

BERT + CNN

BERT + SCNN

4. Results

4.1. Comparative Model Performance

4.2. Script–Review Comparisons

4.3. Temporal Dynamics of Scripted Emotions

4.4. Genre-Based Emotional Structures

4.5. Relationship Between Script Emotions and Film Success

5. Discussion

5.1. Emotional Fidelity Between Scripts and Audience Responses

5.2. Temporal and Genre Dynamics

5.3. Scripted Emotions and System Performance

5.4. Broader Implications and Contributions

5.5. Future Research Directions

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI