Next Article in Journal
Analysis of the Severity of Road Accidents Using Combined Data Mining Techniques
Previous Article in Journal
Harmful Algal Blooms and Tourism Systems: Health Risks, Behavioral and Economic Impacts, and Bidirectional Feedback
Previous Article in Special Issue
Human–AI Collaboration Across Decision Support, Autonomous Systems, and LLM Agents: A Systematic Review and Collaboration Convergence Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies

1
Film Academy, Macao University of Science and Technology, Macao 999078, China
2
School of Microelectronics, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Sustainability 2026, 18(12), 6117; https://doi.org/10.3390/su18126117 (registering DOI)
Submission received: 5 April 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 14 June 2026

Abstract

This paper examines how artificial intelligence can support sustainability assessment in cultural industries, using national film industries as a test case. The Film Industry Sustainability Index (FISI) is introduced as a composite indicator covering cultural diversity, economic resilience, and Sustainable Development Goal (SDG) alignment for 42 national economies from 2005 to 2023. Knowledge-Enhanced Mamba (KE-Mamba), a selective state-space forecasting model, is then proposed to combine annual panel indicators with country-level film-industry knowledge graph (KG) embeddings and large language model (LLM)-derived screenplay-oriented narrative proxies from film synopses. To reduce factual errors in title-level narrative scoring, the LLM is anchored to verified United Nations Educational, Scientific and Cultural Organization (UNESCO) records and the European Audiovisual Observatory’s LUMIERE film-admissions database using rank-one model editing (ROME). On the 2020–2023 held-out test period, KE-Mamba achieves a composite FISI mean absolute error (MAE) of 0.0389, a mean absolute percentage error (MAPE) of 5.61%, and an R 2 of 0.934, outperforming autoregressive integrated moving average (ARIMA), tree-based, long short-term memory (LSTM), and base Mamba baselines. Additional robustness checks using a pre-pandemic split, two-way fixed-effects panel regression, alternative FISI weighting schemes, KG embedding ablations, and human validation of LLM narrative scores support the reliability of the proposed framework. Policy simulations are interpreted as model-based projected associations rather than causal estimates. The results show that knowledge-enhanced sequence models can provide transparent forecasting support for sustainable cultural-industry policy.

1. Introduction

The global film industry generates roughly USD 342 billion in annual output and supports 14.2 million jobs worldwide, placing it among the largest subsectors of the cultural and creative industries [1,2]. It plays a dual role, operating both as a commercial industry subject to market forces and as a medium through which national and collective identities are expressed [3,4]. Whether national film industries can sustain economic viability while preserving cultural diversity and advancing social development is therefore a question with both economic and policy stakes.
Several Sustainable Development Goals (SDGs) bear directly on the film industry. SDG 8 links to employment quality and revenue resilience in cultural production; SDG 10 concerns unequal access to cultural markets and distribution channels; SDG 17 is reflected in co-production and cross-border knowledge sharing; and SDG 5 relates to persistent gender gaps in creative leadership [4,5,6,7]. The UNESCO Convention on the Protection and Promotion of the Diversity of Cultural Expressions further frames cultural diversity as a condition for sustainable development [4], yet operational tools for measuring and forecasting the SDG alignment of national film industries remain scarce. Artificial intelligence has become a practical tool for sustainability forecasting and sustainable-development decision support across industries [8,9], but its applications and impacts remain unevenly distributed: the literature concentrates on environmental and resource sectors with comparatively little attention to how AI can support sustainable development in cultural and creative industries. In the cultural sector, computational work has focused on box office prediction [10], film network analysis [11,12], and cultural diversity measurement, while forecasting composite sustainability trajectories for national film industries has been neglected. Extending AI-for-sustainable-development research to this under-covered industry is therefore both a methodological opportunity—film panels combine structural relational data, long textual content, and medium-horizon macro dynamics—and a policy need for cultural ministries and international organizations working toward the 2030 Agenda.
This paper makes four contributions. First, it constructs and documents the Film Industry Sustainability Index (FISI), which is a three-pillar composite indicator linking cultural diversity, economic resilience, and SDG alignment for a balanced 42-country panel from 2005 to 2023. Second, it proposes a knowledge-enhanced selective state-space forecasting architecture in which a gated fusion layer injects static country-level film-industry KG embeddings into dynamic Mamba hidden states. Third, it introduces a transparent narrative-feature pipeline that scores film synopses with an LLM, anchors film-level factual knowledge through model editing, and validates the resulting scores against both human expert ratings and an alternative LLM. Fourth, it provides a policy-oriented validation suite, including ablations, a pre-pandemic temporal split, panel-econometric baselines, FISI weighting sensitivity, KG relation decomposition, and non-causal policy stress tests with placebo checks.
The remainder of the paper develops the theoretical framework (Section 2), reviews related work (Section 3), describes data and variables (Section 4), details the methodology (Section 5), reports experimental results (Section 6), discusses policy implications and limitations (Section 7), and concludes (Section 8).

2. Theoretical Framework: Linking Film Industries to the SDGs

National film industries are conceptualized as socio-economic ecosystems whose health can be characterized along dimensions that map onto SDG targets. Drawing on the economics of culture [3] and cross-national cultural-value frameworks [13], three pillars are identified, which are each operationalized through six indicators. Cultural Diversity (Shannon genre entropy, female director share, co-production rate, language diversity, independent production share, domestic market share) addresses the range of creative expressions and equitable participation, connecting to SDGs 4, 5, 10, and 17. Economic Resilience (revenue growth stability, screen density growth, digital distribution ratio, export revenue share, production volume trend, inverted Herfindahl–Hirschman Index (HHI) box office concentration) captures the diversification of the economic base and maps onto SDGs 8, 9, and 12. SDG Alignment (film employment share, inverted screen access Gini, urban–rural infrastructure ratio, youth employment ratio, public funding accessibility, audience reach equity) measures distributional fairness and relates to SDGs 8.5, 8.6, 10.3, 11.a, and 16.6. Each indicator is min–max normalized within each annual cross-section, each pillar score is the arithmetic mean of its six constituents, and the composite FISI is the geometric mean of the three pillar scores, enforcing simultaneous progress across dimensions rather than compensation. Table 1 reports the complete indicator–SDG mapping.
Formally, for a positive-direction indicator v j c t , the normalized value is
z j c t = v j c t min c v j c t max c v j c t min c v j c t + ϵ ,
where normalization is performed within year t and ϵ = 10 8 prevents division by zero. For negative-direction indicators (urban–rural infrastructure ratio), z j c t is replaced by 1 z j c t . Pillar score p is computed as
P p , c t = 1 6 j p z j c t ,
and the composite index is the equal-weight geometric mean
FISI c t = ( P CD , c t + ϵ ) ( P ER , c t + ϵ ) ( P SA , c t + ϵ ) 1 / 3 .
The geometric mean penalizes unbalanced development across pillars; alternative arithmetic, principal-component, Delphi-weighted, and winsorized z-score aggregation schemes are evaluated in Section 6.

3. Related Work

3.1. AI for Sustainability Forecasting and Cultural-Industry Analytics

Early computational analyses of the film industry centered on box office prediction with star power, genre, and marketing budget as features [14,15,16]. Recent work has applied gradient boosting and neural architectures to revenue forecasting [10] and network analysis to film festivals and industry ecosystems [11,12], but the focus remains on short-run commercial outcomes, and the longitudinal panel structure most relevant for policy-oriented sustainability forecasting has been largely overlooked. At the same time, AI is an increasingly practical tool for sustainability forecasting [8]: Chen et al. used machine learning to forecast multidimensional sustainability indices for EU member states [9], but the intersection of AI and cultural industries remains underexplored [17] with no prior study applying machine learning to a composite cultural sustainability index of the kind proposed here.

3.2. Knowledge Graphs and Knowledge-Grounded LLMs

Tilly and Livan showed that features from statistically validated knowledge graphs improve macroeconomic forecasts over purely tabular inputs [18]; among embedding methods, RotatE interprets each relation as a rotation in complex space and produces geometrically interpretable vectors with strong link-prediction performance [19]. This paper extends this line to the film industry, where co-production, distribution, and funding relationships form a relational structure that tabular features cannot capture. LLMs can reason about screenplay structure, character arcs, and themes in both generative [20] and evaluative settings, yielding interpretable quality signals that complement metadata-based features; because LLMs encode parametric knowledge about specific films that may be outdated or inaccurate, knowledge editing methods such as ROME [21] and subsequent mass-editing techniques [22] can update factual associations without full retraining.
Recent knowledge-grounded LLM systems show why structured retrieval remains important even when LLMs provide fluent reasoning. TrumorGPT combines graph-based retrieval with an LLM for fact checking, demonstrating that graph structure can help constrain generation and improve evidence traceability [23]. Similarly, hybrid fact-checking pipelines that combine KG retrieval, LLM classification, and search-based retrieval agents achieve interpretable claim verification by activating external evidence sources when KG coverage is insufficient [24]. These studies are relevant to the present framework because the narrative scores used here are not treated as ungrounded LLM outputs: film-level country, year, and genre facts are first checked against verified film records, and the residual risk of hallucination is explicitly evaluated.

3.3. Hybrid Deep Architectures for Complex Digital and Multimodal Data

The broader literature on hybrid deep architectures also motivates the integration of heterogeneous feature streams. Chechkin et al. [25] propose a hybrid neural-network Transformer for detecting and classifying destructive digital content, emphasizing temporal dynamics, nonlinear dependencies, and multilayered data analysis. Although their application domain differs from film-industry sustainability, the study is methodologically relevant because it illustrates how modern AI systems increasingly combine sequence modeling, attention-like mechanisms, and heterogeneous digital signals. This paper differs by replacing quadratic self-attention with a selective state-space backbone and by using KG-derived country embeddings as a structured relational prior rather than raw token-level features.

3.4. Selective State Space Models for Panel Time-Series Forecasting

For the temporal backbone, this paper uses Mamba, which is a selective state space model that captures long-range dependencies with linear complexity [26]. Variants such as MambaTS [27], Mamba4Cast [28], and SiMBA [29] have been applied to time series forecasting with systematic benchmarking [30]. The selective scan mechanism is particularly suited to film market data where policy shifts and macroeconomic shocks create uneven temporal salience across years [31]. Unlike Transformer-based sequence models with quadratic self-attention costs, the linear-time selective scan allows for an efficient processing of the multi-country annual panel while preserving the model’s ability to emphasize shock periods and structural breaks.

4. Data and Variables

4.1. Data Sources and Coverage

The dataset is a balanced panel of 42 national economies observed annually from 2005 to 2023 (798 country-year observations), covering roughly 92% of global theatrical box office revenues in 2023 based on a tabulation of Motion Picture Association (MPA) THEME 2024 [2]. Primary sources are the UNESCO UIS Feature Films and Cinema Database [32], the European Audiovisual Observatory LUMIERE database [33], the World Bank World Development Indicators (WDI) [34], and the OECD Culture and Creative Economy database [35], which are supplemented by MPA THEME [2], PwC Global Entertainment and Media Outlook [1], McKinsey [36], and the United Nations Development Programme Human Development Report (UNDP HDR) [37]. Linear interpolation was applied to gaps of up to three consecutive years; larger gaps led to exclusion, removing seven countries from the initial candidate set. The final balanced panel consists of 42 countries after excluding seven candidate countries with more than three consecutive missing years in at least one required FISI component: Greece and Portugal (incomplete public-funding and audience-equity series), Peru and Morocco (discontinuous domestic-share, export-revenue, and digital-distribution data), Pakistan (production and box-office coverage before 2012), Vietnam (cinema-admission and co-production records), and Ukraine (territorial reporting discontinuities after 2014); the longest continuous gap and the affected indicators for each excluded country are reported in Appendix B (Table A2).

4.2. Variable Construction

The 18 FISI component indicators are grouped into the three pillars of Section 2; full construction rules and data sources for each indicator are reported in Appendix E (Table A4). Briefly, the genre diversity index is the Shannon entropy of annual domestic production across 12 standardized genres, which is normalized by ln ( 12 ) ; revenue growth stability is the inverse coefficient of variation of box office revenue on a rolling five-year window; and the digital distribution ratio is the share of total film revenue from legitimate subscription video-on-demand (SVOD), transactional video-on-demand (TVOD), and digital rental/purchase channels. Table 2 reports descriptive statistics, Table 3 reports the pairwise correlation matrix among the composite FISI, its pillars, and selected indicators, and Figure 1 summarizes the research pipeline.
The English-synopsis filter used for narrative feature extraction improves prompt consistency but creates a coverage bias. Retention is highest for internationally distributed films (74.8% in high-income markets) and lowest for hyper-local productions in lower-middle and developing–fragile markets (32.8–46.9%), as visualized in Figure 2a. To bound this concern, an inverse-coverage reweighted narrative aggregation produces a composite FISI correlation of 0.982 with the baseline narrative aggregation and shifts the KE-Mamba test MAE only from 0.0389 to 0.0396. Two further robustness aggregations—raising the minimum synopsis length to 50 words and excluding countries with coverage below 35%—yield MAEs of 0.0393 and 0.0384, respectively, and correlations of 0.987 and 0.991 with the baseline. The English-synopsis bias is therefore substantively important for interpretation but does not drive the headline forecasting results.

5. Methodology

5.1. Research Hypotheses and Validation Design

The empirical design evaluates five hypotheses. H1: Adding structured film-industry KG information to a selective state-space model improves FISI forecasting relative to non-knowledge baselines. H2: LLM-derived narrative features provide incremental predictive information beyond macroeconomic and FISI indicators. H3: Factual anchoring through knowledge editing reduces title-level factual errors and improves downstream forecast reliability. H4: The proposed model generalizes across both crisis and non-crisis temporal splits. H5: Policy simulations should be interpreted as projected associations and should remain distinguishable from random placebo interventions. Table 4 maps each hypothesis to the corresponding experiment.

5.2. Knowledge Graph Construction and Embedding

The structured relational database defines five entity types—Country, ProductionEntity (studios, production companies, public funding bodies), Film, Genre (12 categories), and Person (directors, producers, key creative personnel)—and seven relation types: produces, funds, belongs_to, co_produced_by, directed_by, operates_in, and exports_to. Populated by cross-referencing UNESCO UIS, LUMIERE, and national film registries, the database contains 14,600 triples. Continuous vector representations are extracted with RotatE [19] using embedding dimension 128 (selected from { 64 , 128 , 256 } ), an 80/10/10 split stratified by country, entity, and relation type, and self-adversarial negative sampling with 256 negatives, a margin of 9.0, a learning rate of 0.001, and 500 epochs with early stopping, yielding a mean reciprocal rank (MRR) of 0.387 and hits at cutoff 10 (Hits@10) 0.524 on the test split. The resulting 128-dimensional country embeddings encode each country’s structural position in the global film production and distribution network. The KG embedding is used as a static, time-invariant structural prior for each country; consistent with the forecasting protocol in Section 5, no target-year-specific triples enter as predictors of that year’s FISI.

5.3. LLM-Based Screenplay Narrative Features with Knowledge Editing

In parallel with the structural RotatE features, this paper extracts screenplay-oriented narrative proxies from film metadata, English-language synopses, and available critic excerpts using a large language model. The term “screenplay-aware” therefore refers to narrative constructs commonly associated with screenplay analysis—narrative structure, pacing, character development, thematic depth, dialogue quality, genre coherence, cultural specificity, and projected audience reach—rather than full-script access for all 12,438 titles. Llama-3-8B-Instruct is used as the base model (selected because open weights allow the knowledge editing step below), and robustness is checked by repeating the entire scoring pipeline with Qwen2.5-7B-Instruct; per-dimension Pearson correlations range from 0.82 to 0.91 with an average of 0.87 on the validation subset. The film sample consists of 12,438 titles obtained by intersecting UNESCO UIS production records with The Movie Database (TMDb) snapshot of 20 December 2024, retaining titles with an English-language synopsis of at least 80 words; this biases the sample toward internationally distributed titles from non-English markets, which is a limitation revisited in Section 7. Each film is scored on a 0–10 integer scale along eight dimensions—(i) narrative structure, (ii) pacing, (iii) character development, (iv) thematic depth, (v) dialogue quality (inferred from synopsis and critic excerpts), (vi) genre coherence, (vii) cultural specificity, and (viii) projected audience reach—which are then rescaled to [ 0 , 1 ] and aggregated to the country-year level by a simple mean. The full prompt and anchoring rubric are in Appendix D.
LLM parametric knowledge about specific titles can be outdated or hallucinated, which is addressed with a ROME-style knowledge editing step [21,22] before scoring. For the 4127 titles on which Llama-3-8B-Instruct fails a factual probe (country of production, release year, or genre tag), rank-one update targets are constructed from verified UNESCO UIS and LUMIERE records. Following the Mass-Editing Memory in a Transformer (MEMIT) variant of ROME, edits are applied to the multilayer perceptron (MLP) down-projection matrices in layers 3–8 with the title token as subject and the verified fact as object; all edits are applied on a frozen copy of the base model. On the full 12,438-title sample, editing raises all-fact accuracy from 66.8% to 89.7% (country 89.2% → 97.6%, year 86.7% → 96.4%, genre 82.4% → 93.1%; Figure 2b). The improvement is concentrated on the targeted 4127-title failed subset: by construction, the failed subset has all-fact accuracy 0.0% before editing, with country 67.9%, year 59.6%, and genre 47.3% correctly recalled in isolation; after ROME/MEMIT-style editing, these rise to 93.5%, 91.8%, and 86.7%, respectively, and the joint all-fact pass rate on the same 4127 titles reaches 75.4%, indicating that the bulk of the full-sample gain is attributable to a direct correction of the failed subset rather than diffuse changes on already-correct titles. The 500-film neighborhood specificity falls only from 0.92 to 0.89, indicating limited bleed-through. Edits shift narrative scores by an average of 0.39 points on the 0–10 scale across 33.9% of titles with cultural specificity (mean |shift| 0.64) and genre coherence (0.58) moving most and narrative structure and dialogue least, which is consistent with the role of factual anchoring rather than wholesale rewriting. Compared with alternative editing strategies on the same probe set, ROME/MEMIT-style editing attains 89.7% all-fact accuracy at 0.9 graphics processing unit (GPU) hours and downstream MAE 0.0389, dominating fine tuning/LoRA (92.8%/0.0397 but locality drops to 0.76 and 4.8 h runtime) and matching GRACE-style editing (87.9%/0.0392) at roughly half the runtime. Ablating the editing stepyields a small but consistent MAE increase (0.0389 → 0.0399), supporting the claim that anchoring the LLM’s film-level knowledge improves the reliability of the aggregated narrative signals.
The eight narrative scores are externally validated on a stratified 200-film subset that crosses country income group, production decade, genre, and domestic/international distribution status, which are scored independently by three human raters with film-studies or script-development backgrounds using the same 0–10 rubric as the LLM. Table 5 reports the per-dimension human inter-rater Krippendorff’s α , Llama-3-8B–expert Pearson and Spearman correlations, mean absolute deviation on the 0–10 scale, and the cross-model Qwen2.5-7B–Llama-3-8B Pearson correlation on the same subset; Figure 2c provides a visual summary. Inter-rater agreement averages α = 0.75 , Llama–expert correlations range from r = 0.62 for dialogue quality (inferred from synopsis text only) to r = 0.81 for genre coherence with an average of r = 0.72 , and Qwen–Llama correlations range from 0.82 to 0.91 with an average of 0.87, supporting the interpretation of the eight scores as weak but consistent narrative proxies.
The resulting eight-dimensional country-year narrative vector n ( t ) R 8 is concatenated with the 18 FISI indicators and 8 macroeconomic covariates to form a 34-dimensional dynamic input vector that is processed by the selective scan. The 128-dimensional country knowledge-graph embedding e c is not concatenated to every input time step in the full KE-Mamba model; instead, it is treated as a static relational prior, projected into the hidden dimension, and injected after the selective scan through the gated KG–temporal fusion layer (Equations (6) and (7)). The model therefore uses 162 features in total but through two distinct pathways: 34 dynamic country-year features and 128 static KG-prior dimensions.

5.4. Knowledge-Enhanced Mamba Architecture

The temporal forecasting component is built on the selective state-space model [26]. Let x ( t ) denote the 34-dimensional dynamic country-year feature vector (18 FISI indicators, 8 macro covariates, 8 narrative features); the 128-dimensional KG embedding e c is not part of this dynamic input and instead enters as a static prior through the gated fusion layer below. The 8 macro covariates, drawn from the World Bank WDI [34] and UNDP HDR [37], are GDP per capita (constant 2015 USD), real GDP growth, consumer price inflation, unemployment, internet penetration, urban population share, the HDI, and a trade openness ratio (trade/GDP), which are each standardized to zero mean and unit variance on the training window.
Following selective state-space modeling, the continuous parameters are discretized at each time step through input-dependent step sizes Δ t :
A ¯ t = exp ( Δ t A ) , B ¯ t = ( A ¯ t I ) A 1 B t .
The recurrent scan is then
h t = A ¯ t h t 1 + B ¯ t x t , o t = C t h t + D x t .
To ensure stable dynamics, the diagonal entries of A are parameterized as softplus ( A ˜ ) , implying | exp ( Δ t A i ) | < 1 for all Δ t > 0 . This is important in the present panel setting because annual film-market indicators can contain abrupt shocks (e.g., pandemic-driven box-office collapses), but the hidden state should not diverge during long-range propagation across the 19-year sequence.
The main architectural modification is a gated fusion layer that modulates the temporal hidden states with the static country knowledge embedding. The country KG embedding e c R 128 is static within the annual sequence and acts as a relational prior. It is first projected into the hidden dimension, e ˜ c = W e e c + b e . The gate
α c t = σ W g [ h c t ; e ˜ c ; x c t ] + b g
learns when the dynamic time-series state or the static relational prior should dominate. Here, α c t [ 0 , 1 ] d is an element-wise gate over the d hidden units. The fused state is
h c t = α c t h c t + ( 1 α c t ) e ˜ c .
Because e c is injected after the selective scan rather than concatenated to every input vector, the model preserves linear-time temporal processing while allowing the country-specific relational structure to modulate the forecast head. When α c t is close to one, the model relies on the temporal signal; when it is close to zero, the knowledge embedding dominates.
A two-layer feed-forward head with GELU activation maps h ( t ) to four outputs (the composite FISI and its three pillar scores), which are trained with an equally weighted MSE loss.
Mamba is preferred over a Transformer in this setting because the panel contains short-to-medium annual sequences with many heterogeneous features, and the objective is stable forecasting rather than token-level representation learning. The selective scan has linear complexity in sequence length, O ( T d ) , whereas full self-attention has O ( T 2 d ) complexity. Compared with LSTM, the input-dependent state transition allows the model to emphasize shock years and policy-transition periods without relying solely on fixed recurrent gates. The gated KG fusion further separates static relational structure from dynamic time-series evidence, which plain concatenation cannot achieve; this design is supported by the ablation in which removing the gated fusion and replacing it with direct concatenation increases MAE by 9.8%.

5.5. Baselines, Training, and Evaluation

Five baselines are considered: ARIMA (per-country with AICc order selection), Random Forest (500 trees, depth 12), XGBoost (300 rounds, depth 8, lr 0.05), LSTM (two layers of 128 units, dropout 0.2, 5-year lookback), and a base Mamba identical to KE-Mamba but without the gated KG fusion layer or KG embeddings. A two-way fixed-effects panel regression with lagged dependent variables (FE-LDV) is also included as an interpretable econometric benchmark, which is specified as
F I S I c , t + 1 = ρ F I S I c , t + β z c , t + μ c + λ t + ε c , t ,
where z c , t is the same 34-dimensional panel feature vector used by the other baselines and μ c , λ t are country and year fixed effects. Tree baselines and neural baselines share the same 34-dimensional panel feature set; neural models use Adam with an initial lr 0.001, cosine annealing, and early stopping with a patience value of 20. The final hyperparameter configuration, search ranges, and sensitivity results are in Appendix C.
The dataset is split temporally: 2005–2017 for training (546 observations), 2018–2019 for validation (84), and 2020–2023 for testing (168), so evaluation is strictly out-of-sample and covers the pandemic shock. ARIMA is fitted per country. Performance is reported using MAE, RMSE, MAPE, and R 2 , and statistical significance is assessed with the Diebold–Mariano test [38] using the Harvey–Leybourne–Newbold small-sample correction: DM statistics are computed country by country on the 2020–2023 loss-differential sequences and combined across the 42 countries with Stouffer’s Z-score method, yielding a single panel-level test. Because the composite FISI is bounded away from zero in the sample (minimum 0.127 ; Table 2), MAPE is well defined and is not affected by small-denominator inflation. All experiments use an NVIDIA A100 40 GB GPU; results are means over five random seeds with seed-to-seed std. below 0.0015 for all neural models (seed-level results in Appendix F). Figure 3 shows the temporal evolution of FISI and its pillars by income group.
All models forecast the composite FISI and its three pillar scores one year ahead. For each forecast origin t, the neural models use the realized feature sequence from the preceding five-year lookback window and produce y ^ c , t + 1 in a single step; predictions are not fed back recursively as inputs, so every validation and test forecasts conditions on observed lagged features rather than model-generated values, giving the neural, tree-based, and FE-LDV baselines an identical information set. This is therefore an annual ex-post one-step-ahead forecasting protocol rather than a start-of-year real-time nowcasting exercise. No target-year predictors enter the forecasting input: the dynamic input sequence contains only information available up to the forecast origin, and narrative scores for a country-year are computed solely from titles released in that country–year, so they can enter the model only when that year lies in the lagged input window and never when it is the prediction target. Within-year min–max normalization defines the FISI labels and component indicators from the same-year country cross-section, while the macroeconomic covariates are standardized using training-window statistics only, so no future information leaks into the predictors.

6. Experimental Results

6.1. Overall Forecasting Performance

Table 6 reports the out-of-sample performance on the primary 2020–2023 test period (Panel A) and on a pre-pandemic 2018–2019 split (Panel B). The latter retrains all models with the years 2005–2015 used for training, the years 2016–2017 used for validation, and the years 2018–2019 used for testing ( n = 84 country-year observations) so that the model ranking can be evaluated outside the high-volatility pandemic regime. On Panel A, KE-Mamba achieves the lowest error on all four metrics with a composite FISI MAE of 0.0389 (MAPE 5.61%, R 2 = 0.934 ), a 54.1% reduction over ARIMA and 15.6% over the base Mamba. KE-Mamba also outperforms the two-way fixed-effects panel model with lagged dependent variables (FE-LDV; MAE 0.0585), which represents the standard interpretable econometric benchmark; the 33.5% MAE reduction demonstrates the gain from incorporating nonlinear temporal dynamics, knowledge graph priors, and narrative features. The DM test rejects equal predictive accuracy between KE-Mamba and every baseline at the 1% level (base Mamba: p = 0.004 ; others: p < 0.001 ). On the pre-pandemic split (Panel B), all models produce lower absolute errors than on 2020–2023, as expected given the absence of pandemic-driven structural breaks, but the relative ranking is preserved: KE-Mamba retains the lowest MAE (0.0314, R 2 = 0.952 ) and outperforms the base Mamba by 14.4% and the FE-LDV panel by 28.0%, confirming that the advantage is not specific to the pandemic regime.
Table 7 breaks down KE-Mamba results by pillar. The SDG Alignment Pillar is easiest to forecast ( R 2 = 0.941 ), as its indicators are structural and slow moving, while the Economic Resilience Pillar is hardest (MAPE 6.87%, R 2 = 0.912 ) because the 2020–2021 pandemic shock disrupted historical economic patterns more than cultural or institutional ones. The year-wise error profile is consistent: ERP MAPE peaks at 8.92% in 2020 and declines to 5.18% by 2023 as cinema closures, delayed releases, and digital substitution stabilize [1,2,36], whereas SAP MAPE remains in a 4.76–5.91% band across all four test years because institutional indicators move more slowly and are partly buffered against macroeconomic shocks by statutory funding schemes, employment programs, and infrastructure investment policies [6,7]. Figure 4 plots actual versus predicted trajectories for six representative countries, showing that the model tracks both the pre-pandemic trend and the 2020 downturn.

6.2. Knowledge Graph Embedding Quality and Alternative Models

The RotatE embeddings used in KE-Mamba achieve a macro-average MRR of 0.387 and Hits@10 of 0.524 on the held-out link-prediction test set. Per-relation MRR is higher on the regular many-to-one relations BELONGS_TO (0.528) and PRODUCES (0.441), moderate on OPERATES_IN (0.404) and FUNDS (0.356), and lowest on the sparser long-tail relations CO_PRODUCED_BY (0.333), EXPORTS_TO (0.317), and DIRECTED_BY (0.305), which is consistent with the well-known difficulty of person- and bilateral-export edges in cultural-industry graphs. Although the overall MRR is moderate, the downstream FISI forecasting benefit of the KG stream is substantial: removing KG embeddings increases MAE by 18.5%,indicating that the embeddings’ value lies in encoding a country’s structural position rather than requiring perfect graph completion.
Table 8 compares RotatE with TransE and ComplEx; RotatE achieves the best combination of link-prediction quality and downstream FISI forecasting accuracy and is thus retained as the primary embedding model. Embedding-dimension sensitivity is reported jointly on the validation and test sets in Table 9. The link-prediction loss continues to decrease from 64 to 256 dimensions, but the downstream validation MAE attains its minimum at 128 dimensions (0.0397) and rises slightly at 256 (0.0405) even though the test MAE remains comparable; in other words, the 256-dimensional model fits the KG link structure better while showing the early signature of overfit on the FISI validation set. Selecting 128 dimensions therefore satisfies both the link-prediction quality criterion and the validation-loss minimum criterion.

6.3. Ablation Study

Table 10 reports an ablation study in which components are sequentially removed from the full KE-Mamba.

6.4. Feature Importance Analysis

TreeSHAP is used for XGBoost and KernelSHAP is used for KE-Mamba [39]. KernelSHAP is computed on the test-set predictions using a background set of 100 k-means-summarized training instances and 2048 sampled coalitions per explained instance; grouped SHAP values are obtained by summing absolute attributions within the 128-dimensional KG block and the 8-dimensional narrative block, while the remaining 34 dynamic input features are attributed individually. Table 11 lists the top 10 explicit features (18 FISI indicators, 8 macro covariates, 8 narrative scores) by mean absolute SHAP value for KE-Mamba; the 128 KG embedding dimensions are reported as a grouped “KG-embedding” contribution following the grouped-SHAP convention, accounting for 34.1% of total SHAP mass, which is consistent with the ablation finding that removing the KG stream produces the largest accuracy degradation. Linear probes from the country embeddings to per-relation exposure summaries further decompose this grouped KG-SHAP mass: PRODUCES, CO_PRODUCED_BY, and EXPORTS_TO jointly account for 69.2% of the KG-SHAP contribution (29.9%, 22.0%, and 17.3% within the KG group, respectively) with FUNDS (12.3%) and OPERATES_IN (8.8%) playing a secondary role and DIRECTED_BY (6.7%) and BELONGS_TO (2.9%) being smallest; Figure 5e shows the relation-level breakdown. The pattern is consistent with the substantive interpretation that domestic production capacity, international co-production networks, and cross-border export reach are the primary structural determinants of film-industry sustainability captured by the knowledge graph. Figure 5 shows the SHAP summary and dependence patterns.

6.5. FISI Aggregation and Indicator Robustness

Because the composite FISI involves choices in indicator normalization and pillar aggregation, Table 12 reports on the robustness of the headline forecast to five alternative constructions. Across schemes, the resulting indices remain highly correlated with the baseline geometric mean (Pearson 0.964 , Kendall τ 0.913 ), and the KE-Mamba test MAE varies only between 0.0389 and 0.0402, which is well within the seed-to-seed standard deviation of 0.0015. Two further perturbations support the same conclusion: dropping one indicator from each pair with within-pillar r > 0.80 leaves the test MAE at 0.0401 (correlation 0.955), and shrinkage-weighted indicators that penalize within-pillar correlation give 0.0397 (correlation 0.962), so the framework is not sensitive to any single weighting or to mechanical redundancy among constituent indicators.

6.6. Country Clustering and Sustainability Archetypes

Spectral clustering is applied to the 42-country panel, using mean pillar scores together with annualized 2005–2023 pillar growth rates as input features rather than the composite FISI alone. Because growth trajectory and structural composition carry as much weight as absolute level, small high-income markets with saturated growth (e.g., Ireland, Austria, New Zealand) can appear closer to the Emerging–Dynamic centroid than to the Mature–Diverse one, as their pillar-growth profiles resemble those of catching-up economies. A four-cluster solution is selected from both the silhouette score (0.41) and the gap statistic (peak at k = 4 ); clustering uses a k-NN affinity graph ( k = 7 ) and is stable under bootstrap resampling with 38 of 42 countries retaining their assignment in ≥95% of 500 replicates. Table 13 characterizes the four archetypes, Figure 6 shows the spectral projection, and the full country listing is in Appendix A.

6.7. Associative Policy Stress Tests

These simulations are not causal estimates. They are model-based policy stress tests that perturb one feature at a time while holding the other 2023 features fixed, thereby measuring the KE-Mamba model’s learned associative sensitivity. The results should be interpreted as projected associations useful for scenario prioritization—not as evidence that the same change would be produced by an implemented policy in the absence of further causal identification.
Using the trained KE-Mamba, specified feature increases are applied across all 42 countries and the projected Δ FISI relative to baseline is recorded, holding all other features at their observed 2023 values. Table 14 reports the mean Δ FISI by intervention and cluster.
Note on placebo validation: As a minimum diagnostic against spurious patterns, a placebo test randomly reassigns intervention features within clusters 1000 times, keeping the intervention magnitude fixed, and compares the observed projected Δ FISI with the resulting placebo distribution; Figure 2d visualizes the five observed effects against their placebo distributions. The observed projected changes lie above the 95th percentile of every placebo distribution (digital distribution + 0.0297 vs. placebo mean 0.0031 , 99.6th pct., p < 0.001 ; female director share + 0.0131 , p = 0.012 ; genre diversity + 0.0164 , p = 0.004 ; co-production rate + 0.0127 , p = 0.018 ; screen density + 0.0118 , p = 0.025 ), suggesting that the patterns are not merely artifacts of random reassignment, while still not constituting causal identification.
Across all clusters, a ten-percentage-point increase in digital distribution yields the largest projected FISI association; this is strongest in Developing–Fragile markets ( + 0.041 ) where the physical exhibition infrastructure is most constrained. Screen density expansion shows a similarly asymmetric pattern with modest projected associations in Mature–Diverse markets and substantial projected improvements in Developing–Fragile economies. Figure 7 visualizes these results, and Figure 8 shows that the SHAP ordering in Table 11 is stable across 2018–2023 rolling evaluation windows rather than an artifact of one year.

7. Discussion and Policy Implications

The ablation in Table 10 decomposes the KE-Mamba gain over baselines: removing the knowledge embeddings raises MAE by 18.5% (0.0389 to 0.0461), replacing the gated KG fusion layer with plain concatenation raises it by 9.8% (to 0.0427), removing the LLM narrative features raises it by 4.9% (to 0.0408), and further removing the knowledge editing step while keeping the raw LLM scores raises it by 2.6% (to 0.0399). The last two effects are smaller than the knowledge graph contribution but stable across seeds, indicating that micro-level narrative signals carry information complementary to the macro panel indicators and that anchoring the LLM to verified film records noticeably improves their reliability. The three mechanisms—structural KG embeddings, screenplay-oriented narrative proxies, and knowledge editing—combine to deliver the full accuracy, which is consistent with evidence from knowledge-enhanced macroeconomic forecasting [18].
The SHAP analysis points to three policy-relevant variables. Domestic market share ranks first, which is consistent with the long-standing emphasis in cultural economics on maintaining a viable local production base [3,40]. The digital distribution ratio ranks second, above genre diversity and GDP per capita, suggesting that the shift to digital channels is not merely a commercial trend but a structural determinant of sustainability; the positive interaction with lower GDP per capita indicates that digital channels can partially bypass the capital constraints, limiting physical exhibition in lower-income countries.
The four-cluster typology supports differentiated policy approaches. In Mature–Diverse markets (e.g., France, South Korea, the UK), the associative stress tests suggest limited returns from broad interventions, so targeted efforts on gender representation and defense of domestic share against global streaming platforms are more relevant. Emerging–Dynamic markets (e.g., China, India, Italy, Spain) stand to gain the most from accelerating digital distribution while investing in genre diversification. In State-Regulated markets the Cultural Diversity Pillar lags, pointing to content and distribution regulations as the binding constraint. Developing–Fragile markets face a compounding problem in which low economic resilience limits investment in cultural diversity and access, and international cooperation may be needed to break this cycle. The nonlinear (inverted-U) relationship between co-production rate and sustainability also suggests that co-production incentive programs should include graduated support rewarding collaboration without creating dependence on foreign partners.
At the country-year level, narrative dimensions should not be interpreted as judgments of national artistic value. They are aggregate proxies for the kinds of films that enter internationally visible metadata channels. For example, a high cultural-specificity score indicates that retained titles from a country-year contain more localized setting, social context, or culturally specific plot elements in their synopses; it does not imply that the entire national cinema is culturally specific or that non-retained domestic titles lack such features. Similarly, thematic depth captures the density of stated social, moral, or political themes in available synopses. These variables are therefore used as weak content signals for forecasting rather than as normative rankings of film quality.
Residual hallucination risk remains even after factual editing. The editing protocol corrects verifiable title-level attributes such as country, year, and genre, but qualitative dimensions such as thematic depth, cultural specificity, and projected audience reach are evaluative rather than factual. They may therefore retain systematic LLM biases, including Western-centric assumptions about narrative structure, genre coherence, and audience reach. The human-validation results reduce but do not eliminate this concern. For this reason, the narrative features are treated as weak aggregate signals and are interpreted together with coverage diagnostics, cross-model agreement, and human-rating correlations.
Ethically, the framework should not be used to rank the cultural worth of national cinemas. A low narrative score may reflect sparse English metadata, limited international distribution, or LLM training-data imbalance rather than lower artistic quality. Future deployments should incorporate multilingual synopses, local-language models, culturally diverse expert panels, and uncertainty intervals for narrative scores—especially in developing or non-English markets.
The full pipeline is more complex than a conventional panel model because it requires KG construction, LLM scoring, and model editing, but the most expensive steps are one-off or annual preprocessing tasks. The dominant cost is the LLM narrative scoring of the 12,438-title sample (5.6 GPU h one-off, 18–35 min annual incremental); KG construction (7 CPU min) and RotatE embedding training (16 GPU min) are negligible by comparison, and knowledge editing of the 4127 failed-probe titles adds only 0.9 GPU h. Once features are cached, KE-Mamba training takes 53 s, and inference for all 42 countries takes 0.19 s on a single A100, making the framework practical for annual dashboards. For policy institutions with limited computational resources, a simplified deployment version without LLM scoring and KG retraining runs end-to-end in under 10 s on a central processing unit (CPU) at the cost of higher MAE (0.0461 without KG; see Table 10).
Several limitations should be acknowledged. The scenario analysis is based on observational correlations learned by the forecasting model. It cannot separate policy-induced variation from selection effects, omitted institutions, or reverse causality. The placebo test only checks whether the learned sensitivity pattern is stronger than random reassignment; it does not establish identification. Stronger causal claims would require instruments, staggered policy variation, natural experiments, or synthetic-control designs around clearly dated policy changes. The FISI involves subjective choices in indicator selection and weighting; sensitivity analyses with alternative aggregation schemes (arithmetic, PCA, Delphi, winsorized z-score) are reported in Section 6 (Table 12) and indicate that the headline forecast is robust to these choices. The 42-country sample excludes smaller markets without consistent time series, and linear interpolation for short within-country gaps may understate short-run volatility. Four limitations concern the LLM narrative stream specifically: (i) the pipeline relies on English-language TMDb synopses, biasing scores toward internationally distributed titles rather than the domestic long tail; (ii) residual hallucinations remain possible for very recent releases absent from the base model’s pre-training corpus, and the editing protocol touches only the 4127 titles that failed the factual probe; (iii) scores on dimensions such as “dialogue quality” are inferred from synopses rather than full scripts and should be read as coarse proxies rather than substitutes for expert reader judgments; and (iv) the knowledge editing step delivers a modest 2.6% MAE reduction, so it is treated as a cheap safeguard rather than an essential component—one that plausibly becomes more valuable as the sample extends into years less covered by the base model’s parametric knowledge.
Future work should extend the narrative stream beyond synopses by incorporating trailers, posters, audience reviews, festival selections, and streaming-platform engagement data. Multimodal signals could improve the measurement of audience reach and cultural specificity, while real-time platform data could support nowcasting rather than annual retrospective forecasting.

8. Conclusions

This paper introduced the Film Industry Sustainability Index—a composite of 18 indicators spanning cultural diversity, economic resilience, and SDG alignment—and applied it to a balanced panel of 42 countries over 2005–2023. The Screenplay-Aware Knowledge-Enhanced Mamba architecture, integrating knowledge graph embeddings, LLM-derived screenplay-oriented narrative proxies, and a knowledge editing step that anchors the LLM to verified UNESCO and LUMIERE records, achieved the best forecast accuracy among all tested models ( R 2 = 0.934 on 2020–2023). The ablation shows that both the narrative stream and the editing step contribute measurable gains over a macro-only baseline, supporting the value of coupling macro panel signals with micro-level narrative signals. Domestic market share, digital distribution ratio, and genre diversity emerged as the three strongest explicit predictors, jointly accounting for roughly 31% of the total SHAP mass, with the grouped KG embedding contributing a further 34%. The four-cluster typology and associative policy stress tests translate the forecasting results into differentiated policy hypotheses that can guide further institutional analysis, but they should not be interpreted as causal policy effects. Future work should combine the proposed forecasting framework with multilingual and multimodal cultural data, real-time platform indicators, and causal-inference designs such as instrumental variables, staggered policy evaluation, or synthetic controls.

Author Contributions

Conceptualization, P.Q. and W.Z.; methodology, P.Q. and W.Z.; software, P.Q. and W.Z.; validation, P.Q. and W.Z.; formal analysis, P.Q.; investigation, P.Q. and W.Z.; data curation, P.Q. and W.Z.; writing—original draft preparation, P.Q. and W.Z.; writing—review and editing, P.Q. and W.Z.; visualization, P.Q. and W.Z.; supervision, W.Z.; project administration, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The analysis draws on public statistical sources and third-party industry datasets documented in Appendix E. The principal public sources include the UNESCO Institute for Statistics Feature Films and Cinema Database (https://data.uis.unesco.org, accessed on 1 April 2026), the European Audiovisual Observatory LUMIERE film-admissions database (https://lumiere.obs.coe.int, accessed on 1 April 2026), the World Bank World Development Indicators (https://databank.worldbank.org/source/world-development-indicators, accessed on 1 April 2026), the OECD Culture and Creative Economy database (https://www.oecd.org/en/topics/culture-creative-industries-and-sports.html, accessed on 1 April 2026), and the UNDP Human Development Report (https://hdr.undp.org, accessed on 1 April 2026); title-level metadata are obtained from The Movie Database (https://www.themoviedb.org, accessed on 1 April 2026). Additional third-party industry datasets used for selected indicators—including MPA THEME, the PwC Global Entertainment and Media Outlook, McKinsey reports, Omdia, Ampere Analysis, national box-office trackers, ILO-STAT, and IMDb—are accessed under their respective licensing terms and are listed alongside the corresponding indicators in Appendix E. The 18 FISI component indicators are reconstructed from these sources using the rules in Appendix E; the screenplay scoring prompt and rubric anchors are reported in Appendix D; hyperparameter search ranges and seed-level results are reported in Appendix C and Appendix F. The source code for the KE-Mamba model, the FISI construction pipeline, the knowledge-graph extraction, and the LLM scoring and editing modules is available from the corresponding author upon reasonable request subject to third-party licensing terms on the commercial and metadata-content sources.

Acknowledgments

The authors acknowledge the data providers: the UNESCO Institute for Statistics, European Audiovisual Observatory, World Bank, and OECD, for making their film industry and macroeconomic data publicly accessible.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
ARIMA     Autoregressive Integrated Moving Average
CDPCultural Diversity Pillar
DDRDigital Distribution Ratio
DMDiebold–Mariano
EAOEuropean Audiovisual Observatory
ERPEconomic Resilience Pillar
FISIFilm Industry Sustainability Index
GDIGenre Diversity Index
GELUGaussian Error Linear Unit
GPUGraphics Processing Unit
HDRHuman Development Report
HHIHerfindahl–Hirschman Index
KEKnowledge Editing/Knowledge-Enhanced
KGKnowledge Graph
LLMLarge Language Model
LSTMLong Short-Term Memory
LUMIEREEuropean Audiovisual Observatory film-admissions database
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MEMITMass-Editing Memory in a Transformer
MLMachine Learning
MLPMultilayer Perceptron
OECDOrganisation for Economic Co-operation and Development
RMSERoot Mean Squared Error
ROMERank-One Model Editing
RotatERelation-as-Rotation Knowledge Graph Embedding
SAPSDG Alignment Pillar
SDGSustainable Development Goal
SHAPSHapley Additive exPlanations
SSMState-Space Model
TMDbThe Movie Database
UNESCOUnited Nations Educational, Scientific and Cultural Organization
UISUNESCO Institute for Statistics
WDIWorld Development Indicators

Appendix A. Country List and Cluster Assignments

Table A1. Complete list of 42 countries with cluster assignments and 2023 FISI scores.
Table A1. Complete list of 42 countries with cluster assignments and 2023 FISI scores.
CountryClusterFISI (2023)CDPERPSAP
FranceMature–Diverse0.780.820.710.76
South KoreaMature–Diverse0.760.790.730.74
United KingdomMature–Diverse0.750.760.720.75
GermanyMature–Diverse0.740.770.690.73
JapanMature–Diverse0.730.740.710.72
SwedenMature–Diverse0.720.750.680.71
DenmarkMature–Diverse0.710.730.670.72
CanadaMature–Diverse0.710.720.690.70
United StatesMature–Diverse0.710.680.740.69
AustraliaMature–Diverse0.700.710.680.69
NorwayMature–Diverse0.700.730.660.70
NetherlandsMature–Diverse0.690.720.650.68
BelgiumMature–Diverse0.660.680.630.64
SwitzerlandMature–Diverse0.650.670.630.62
ChinaEmerging–Dynamic0.580.610.570.52
IndiaEmerging–Dynamic0.560.630.480.51
SpainEmerging–Dynamic0.540.570.510.50
ItalyEmerging–Dynamic0.530.560.490.51
PolandEmerging–Dynamic0.520.540.500.49
Czech RepublicEmerging–Dynamic0.510.530.490.48
HungaryEmerging–Dynamic0.500.520.480.47
TurkeyEmerging–Dynamic0.490.520.470.44
IrelandEmerging–Dynamic0.480.510.440.46
ArgentinaEmerging–Dynamic0.480.510.440.45
New ZealandEmerging–Dynamic0.470.500.430.45
AustriaEmerging–Dynamic0.470.490.440.45
BrazilEmerging–Dynamic0.470.490.440.46
MexicoEmerging–Dynamic0.460.480.430.44
ChileEmerging–Dynamic0.450.470.420.43
RussiaState-Regulated0.480.420.520.47
IsraelState-Regulated0.510.480.520.50
SingaporeState-Regulated0.500.440.540.49
UAEState-Regulated0.470.410.530.44
ThailandState-Regulated0.430.390.460.42
MalaysiaState-Regulated0.420.380.450.41
IndonesiaState-Regulated0.410.370.440.40
EgyptState-Regulated0.380.340.410.37
RomaniaDeveloping–Fragile0.350.370.310.34
ColombiaDeveloping–Fragile0.330.360.280.32
South AfricaDeveloping–Fragile0.320.350.270.31
NigeriaDeveloping–Fragile0.290.320.220.28
PhilippinesDeveloping–Fragile0.280.310.230.27

Appendix B. Countries Excluded from the Balanced Panel

Seven candidate countries are excluded from the final balanced panel because at least one required FISI component series contains a continuous gap exceeding three years. Table A2 reports each excluded country, the longest continuous gap (in years), and the affected indicators.
Table A2. Candidate countries excluded from the balanced panel.
Table A2. Candidate countries excluded from the balanced panel.
Excluded CountryMain Reason for ExclusionLongest Gap (Years)Affected Indicators
GreeceIncomplete public funding and regional screen-access series5Public funding accessibility; screen access Gini
PortugalMissing annual youth film-employment and audience equity data4Youth employment ratio; audience reach equity
PeruDiscontinuous domestic market-share and export-revenue series6Domestic market share; export revenue share
MoroccoMissing digital distribution and subnational access measures5Digital distribution ratio; screen access Gini
PakistanIncomplete production and box-office coverage before 20127Production volume trend; box-office concentration
VietnamInconsistent cinema-admission and co-production records5Co-production rate; audience reach equity
UkraineTerritorial and reporting discontinuities after 20144Box-office revenue; screen density; export revenue

Appendix C. Hyperparameter Sensitivity Analysis

This appendix reports the results of hyperparameter sensitivity analysis for the KE-Mamba model. Forecasting performance was examined along four key hyperparameters—the Mamba state dimension, the number of Mamba layers, the lookback window length, and the learning rate—by varying each in turn while holding the others at their default settings and evaluating on the validation set.
Table A3. Hyperparameter configuration for the final KE-Mamba model.
Table A3. Hyperparameter configuration for the final KE-Mamba model.
HyperparameterSearch SpaceOptimal Value
Mamba state dimension{32, 64, 128, 256, 512}128
Number of Mamba layers{1, 2, 3, 4, 5}2
Lookback window (years){3, 4, 5, 6, 7}5
Learning rate{0.01, 0.005, 0.001, 0.0005, 0.0001}0.001
Batch size{16, 32, 64}32
Dropout rate{0.0, 0.1, 0.2, 0.3}0.1
KG embedding dimension{64, 128, 256}128
Gated KG fusion hidden dim{64, 128, 256}128
Weight decay{0.0, 0.0001, 0.001}0.0001
Early stopping patience{10, 15, 20, 30}20
Figure A1. Hyperparameter sensitivity analysis for the KE-Mamba model. (a) Sensitivity to the Mamba state dimension; (b) to the number of Mamba layers; (c) to the lookback-window length; (d) to the learning rate (validation MAE and R 2 ); (e) validation-loss curves across training epochs for representative learning rates.
Figure A1. Hyperparameter sensitivity analysis for the KE-Mamba model. (a) Sensitivity to the Mamba state dimension; (b) to the number of Mamba layers; (c) to the lookback-window length; (d) to the learning rate (validation MAE and R 2 ); (e) validation-loss curves across training epochs for representative learning rates.
Sustainability 18 06117 g0a1

Appendix D. LLM Narrative Scoring Prompt and Rubric

The eight narrative-quality scores are obtained by prompting Llama-3-8B-Instruct (and, for the robustness check, Qwen2.5-7B-Instruct) with the template below. The model is instructed to return a JSON object with integer scores from 0 to 10 along each dimension, which are then linearly rescaled to [ 0 , 1 ] .
You are an experienced film development reader. Given the following film information, rate the film on a 0–10 integer scale along eight narrative-quality dimensions. Use the full scale and avoid defaulting to the middle. Return only a JSON object with keys: structure, pacing, character, theme, dialogue, genre_coherence, cultural_specificity, audience_reach.
TITLE: {title}
YEAR: {year}
COUNTRY: {country}
GENRE TAGS: {genre_tags}
SYNOPSIS: {synopsis}
Rubric anchors (abbreviated; full rubric provided below in this appendix):
0–2 = severe deficiency; 3–4 = noticeably weak; 5–6 = competent; 7–8 = strong; 9–10 = exemplary.
For each of the eight dimensions, a paragraph-length anchor description is appended to the prompt at evaluation time (structure: three-act integrity and causal chain; pacing: scene-length rhythm and dead zones; character: motivation, agency, and arc; theme: depth and consistency; dialogue: voice differentiation and subtext, inferred from synopsis and critic excerpts; genre coherence: consistency with declared genre; cultural specificity: localized detail versus generic setting; audience reach: breadth of plausible demographic appeal). Inter-run agreement across three independent samples at temperature 0.3 exceeds Krippendorff’s α = 0.81 on a 200-film calibration subset.

Appendix E. Indicator Construction and Data Sources

Table A4. Construction rules and primary sources for the 18 FISI component indicators.
Table A4. Construction rules and primary sources for the 18 FISI component indicators.
IndicatorPrimary SourceConstruction/Proxy Rule
Domestic market shareUNESCO UIS; national film boardsDomestic box office/total box office.
Genre diversity (Shannon)UNESCO UIS; TMDb (genre tags)Shannon entropy of annual genre shares, normalized by ln ( 12 ) .
International co-productionLUMIERE; UNESCO UISShare of films with at least one foreign co-producer.
Female director shareEAO; national film boards; IMDbShare of films with at least one credited female director.
Language diversityUNESCO UIS; LUMIEREShannon entropy of original-language distribution of domestic releases.
Independent production shareLUMIERE; national registriesShare of films whose lead producer is not among the top 20 national producers by 5-year revenue.
Revenue growth stabilityMPA; PwC; World Bank WDIInverse of the five-year rolling coefficient of variation of real box office revenue.
Screen density growthUNESCO UIS; MPAFive-year compound annual growth of screens per million population.
Digital distribution ratioMPA; PwC; OmdiaShare of total film revenue from SVOD, TVOD, and digital rental/purchase.
Export revenue shareLUMIERE; UIS; national agenciesForeign admissions revenue/total admissions revenue of domestic films.
Production volume trendUNESCO UISTrend coefficient of a log-linear regression on annual feature production count over a five-year window.
Box office concentrationnational box office trackers 1 H H I on the top-25 titles’ market shares.
Film employment shareILO-STAT; OECD CultureISIC 5911/5912 employment/total employment, proxied by NACE J59.1 for EU countries.
Screen access Gini (inverted)UNESCO UIS; national censuses 1 G where G is the Gini of screens per capita across subnational regions.
Urban–rural infrastructure ratioUNESCO UIS; World Bank WDIUrban screens per capita divided by rural screens per capita (higher is more unequal; enters with negative direction).
Youth employment ratio in filmILO-STAT; national LFSShare of film-sector workers aged 15–29, proxied by the 15–34 band for countries without finer granularity.
Public funding accessibilityOECD Culture; national film boardsPer-capita public film funding divided by the number of active applicants, rescaled cross-sectionally; a 3-year mean is used where annual data are missing.
Audience reach equityOmdia; Ampere Analysis 1 G of per-capita cinema attendance across subnational regions, with SVOD subscribers used as a supplement where theatrical data are sparse.
Several indicators (public funding accessibility, audience reach equity, youth employment ratio in film) are constructed rather than directly downloaded, and these rest on proxy rules that are documented here explicitly so that the results can be reproduced or contested on their own terms.

Appendix F. Seed-Level Results and Reproducibility

Table A5. Composite FISI MAE on the 2020–2023 test set across five random seeds.
Table A5. Composite FISI MAE on the 2020–2023 test set across five random seeds.
ModelSeed 0Seed 1Seed 2Seed 3Seed 4MeanStd
LSTM0.04910.05040.04970.04890.05090.04980.00086
Mamba (base)0.04550.04680.04590.04570.04660.04610.00057
KE-Mamba (ours)0.03840.03920.03870.03910.03910.03890.00035
All neural model results reported in Section 6 are means over the five seeds tabulated above; the seed-to-seed standard deviation of the KE-Mamba composite FISI MAE is 0.00035, well below 1% of the mean, and the seed ordering is stable across the LSTM and Mamba baselines as well.

References

  1. PricewaterhouseCoopers. Global Entertainment and Media Outlook 2024–2028; Technical Report; PwC: London, UK, 2024. [Google Scholar]
  2. Motion Picture Association. THEME Report: A Comprehensive Analysis and Survey of the Theatrical and Home/Mobile Entertainment Market Environment; Technical Report; MPA: Washington, DC, USA, 2024. [Google Scholar]
  3. Throsby, D. Economics and Culture; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
  4. UNESCO. Convention on the Protection and Promotion of the Diversity of Cultural Expressions; UNESCO: Paris, France, 2005. [Google Scholar]
  5. United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations General Assembly Resolution A/RES/70/1; United Nations: New York, NY, USA, 2015. [Google Scholar]
  6. UNESCO. Re|Shaping Policies for Creativity: Addressing Culture as a Global Public Good; Technical Report; UNESCO: Paris, France, 2022. [Google Scholar]
  7. OECD. The Culture Fix: Creative People, Places and Industries; Technical Report; OECD Publishing: Paris, France, 2022. [Google Scholar] [CrossRef]
  8. Vinuesa, R.; Azizpour, H.; Leite, I.; Balaam, M.; Dignum, V.; Domisch, S.; Fellander, A.; Langhans, S.; Tegmark, M.; Fuso Nerini, F. The Role of Artificial Intelligence in Achieving the Sustainable Development Goals. Nat. Commun. 2020, 11, 233. [Google Scholar] [CrossRef] [PubMed]
  9. Chen, L.; Rabbi, M.; Wang, Y. A machine learning framework for forecasting multidimensional sustainability and informing integrated policy thresholds in the EU. Environ. Dev. Sustain. 2025, 1–56. [Google Scholar] [CrossRef]
  10. Tang, S. The Box Office Prediction Model Based on the Optimized XGBoost Algorithm in the Context of Film Marketing and Distribution. PLoS ONE 2024, 19, e0309227. [Google Scholar] [CrossRef] [PubMed]
  11. Zemaityte, V.; Karjus, A.; Rohn, U.; Schich, M.; Ibrus, I. Quantifying the Global Film Festival Circuit: Networks, Diversity, and Public Value Creation. PLoS ONE 2024, 19, e0297404. [Google Scholar] [CrossRef] [PubMed]
  12. Dadlani, A.; Vo, V.; Khemka, A.; Harvey, S.T.; Kantoro Kyzy, A.; Jones, P.; Verhoeven, D. Leading by the nodes: A survey of film industry network analysis and datasets. Appl. Netw. Sci. 2024, 9, 76. [Google Scholar] [CrossRef] [PubMed]
  13. Hofstede, G. Culture’s Consequences: International Differences in Work-Related Values, 2nd ed.; Sage Publications: Thousand Oaks, CA, USA, 2001. [Google Scholar]
  14. De Vany, A.; Walls, W. Motion Picture Profit, the Stable Paretian Hypothesis, and the Curse of the Superstar. J. Econ. Dyn. Control. 2004, 28, 1035–1057. [Google Scholar] [CrossRef]
  15. Einav, L. Seasonality in the U.S. Motion Picture Industry. RAND J. Econ. 2007, 38, 127–145. [Google Scholar] [CrossRef]
  16. Rosen, S. The Economics of Superstars. Am. Econ. Rev. 1981, 71, 845–858. [Google Scholar]
  17. Gurel, E. AI-driven Experiences in Cultural and Creative Industries: A Review of Literature and Development of a Multifaceted Framework. Serv. Ind. J. 2026, 46, 583–622. [Google Scholar] [CrossRef]
  18. Tilly, S.; Livan, G. Macroeconomic Forecasting with Statistically Validated Knowledge Graphs. Expert Syst. Appl. 2022, 186, 115765. [Google Scholar] [CrossRef]
  19. Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  20. Mirowski, P.; Mathewson, K.W.; Pittman, J.; Evans, R. Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023. [Google Scholar] [CrossRef]
  21. Meng, K.; Bau, D.; Andonian, A.; Belinkov, Y. Locating and Editing Factual Associations in GPT. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates Inc.: Red Hook, NY, USA, 2022; Volume 35. [Google Scholar]
  22. Yao, Y.; Wang, P.; Tian, B.; Cheng, S.; Deng, Z.; Zhang, H.; Chen, H.; Zhang, N. Editing Large Language Models: Problems, Methods, and Opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 6–10 December 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 10222–10240. [Google Scholar] [CrossRef]
  23. Hang, C.N.; Yu, P.D.; Tan, C.W. TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking. IEEE Trans. Artif. Intell. 2025, 6, 3148–3162. [Google Scholar] [CrossRef]
  24. Kolli, S.; Rosenbaum, R.; Cavelius, T.; Strothe, L.; Lata, A.; Diesner, J. Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification. In Proceedings of the 9th Widening NLP Workshop, Suzhou, China, 8 November 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 106–115. [Google Scholar] [CrossRef]
  25. Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space. Algorithms 2025, 18, 735. [Google Scholar] [CrossRef]
  26. Gu, A.; Dao, T. Mamba: Linear-time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  27. Cai, X.; Zhu, Y.; Wang, X.; Yao, Y. MambaTS: Improved Selective State Space Models for Long-Term Time Series Forecasting. arXiv 2024, arXiv:2405.07992. [Google Scholar]
  28. Bhethanabhotla, S.; Swelam, O.; Siems, J.; Salinas, D.; Hutter, F. Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models. In Proceedings of the NeurIPS 2024 Workshop on Time Series in the Age of Large Models, Vancouver, BC, Canada, 15 December 2024. [Google Scholar]
  29. Patro, B.N.; Agneeswaran, V.S. SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series. arXiv 2024, arXiv:2403.15360. [Google Scholar]
  30. Wang, Z.; Kong, F.; Feng, S.; Wang, M.; Yang, X.; Zhao, H.; Wang, D.; Zhang, Y. Is Mamba Effective for Time Series Forecasting? Neurocomputing 2024, 597, 129178. [Google Scholar] [CrossRef]
  31. Hosseini, E.; Rajabipoor Meybodi, A. Proposing a Model for Sustainable Development of Creative Industries Based on Digital Transformation. Sustainability 2023, 15, 11451. [Google Scholar] [CrossRef]
  32. UNESCO Institute for Statistics. Feature Films and Cinema Data. Available online: https://uis.unesco.org (accessed on 1 April 2026).
  33. European Audiovisual Observatory. LUMIERE Database: Admissions of Films Released in Europe. Available online: https://lumiere.obs.coe.int (accessed on 1 April 2026).
  34. World Bank. World Development Indicators. Available online: https://databank.worldbank.org (accessed on 1 April 2026).
  35. OECD. Culture and the Creative Economy. Available online: https://www.oecd.org/en/topics/culture-creative-industries-and-sports.html (accessed on 1 April 2026).
  36. McKinsey & Company and Business of Fashion. The State of Fashion 2024; Technical Report; McKinsey & Company: New York, NY, USA, 2024. [Google Scholar]
  37. United Nations Development Programme. Human Development Report 2023–2024; Technical Report; UNDP: New York, NY, USA, 2024. [Google Scholar]
  38. Diebold, F.; Mariano, R. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]
  39. Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
  40. Lorenzen, M.; Taeube, F. Breakout from Bollywood? The Roles of Social Networks and Regulation in the Evolution of Indian Film Industry. J. Int. Manag. 2008, 14, 286–299. [Google Scholar] [CrossRef]
Figure 1. Proposed research framework illustrating the four sequential stages from data collection through knowledge graph embedding, model training, and policy-oriented output analysis.
Figure 1. Proposed research framework illustrating the four sequential stages from data collection through knowledge graph embedding, model training, and policy-oriented output analysis.
Sustainability 18 06117 g001
Figure 2. Validation evidence supporting the KE-Mamba framework. (a) English-synopsis retention rates and mean narrative scores across the four country income groups with the inverse-coverage reweighted aggregation shown for comparison; coverage is lowest in lower-middle and developing–fragile markets, but the reweighted scores remain within 0.018 of the baseline aggregation. (b) Title-level factual probe accuracy before and after ROME/MEMIT-style knowledge editing for country, year, and genre attributes on the full 12,438-title sample with neighborhood locality reported in the upper-right annotation. (c) Pearson correlation between Llama-3-8B-Instruct narrative scores and three independent human raters across the eight scoring dimensions on a stratified 200-film validation subset with cross-model agreement against Qwen2.5-7B-Instruct shown for reference. (d) Placebo distributions over 1000 within-cluster random intervention reassignments for the five policy stress-test scenarios; the observed projected Δ FISI lies in the upper tail of every distribution (all empirical p < 0.05 ; digital distribution p < 0.001 ).
Figure 2. Validation evidence supporting the KE-Mamba framework. (a) English-synopsis retention rates and mean narrative scores across the four country income groups with the inverse-coverage reweighted aggregation shown for comparison; coverage is lowest in lower-middle and developing–fragile markets, but the reweighted scores remain within 0.018 of the baseline aggregation. (b) Title-level factual probe accuracy before and after ROME/MEMIT-style knowledge editing for country, year, and genre attributes on the full 12,438-title sample with neighborhood locality reported in the upper-right annotation. (c) Pearson correlation between Llama-3-8B-Instruct narrative scores and three independent human raters across the eight scoring dimensions on a stratified 200-film validation subset with cross-model agreement against Qwen2.5-7B-Instruct shown for reference. (d) Placebo distributions over 1000 within-cluster random intervention reassignments for the five policy stress-test scenarios; the observed projected Δ FISI lies in the upper tail of every distribution (all empirical p < 0.05 ; digital distribution p < 0.001 ).
Sustainability 18 06117 g002
Figure 3. Temporal evolution of Film Industry Sustainability Index and pillar scores across country income groups, 2005–2023. Panels show the (a) composite FISI and the (b) Cultural Diversity, (c) Economic Resilience, and (d) SDG Alignment pillars. The grey dashed vertical line marks the 2020 COVID-19 onset, and the colored shaded bands denote the within-group range across countries in each income group (High-Income, Upper-Middle, Lower-Middle, Developing).
Figure 3. Temporal evolution of Film Industry Sustainability Index and pillar scores across country income groups, 2005–2023. Panels show the (a) composite FISI and the (b) Cultural Diversity, (c) Economic Resilience, and (d) SDG Alignment pillars. The grey dashed vertical line marks the 2020 COVID-19 onset, and the colored shaded bands denote the within-group range across countries in each income group (High-Income, Upper-Middle, Lower-Middle, Developing).
Sustainability 18 06117 g003
Figure 4. Actual versus predicted FISI trajectories for six representative countries, 2020–2023. Panels (af) correspond to South Korea, the United States, France, India, Nigeria, and Brazil, respectively. The solid black line is the actual FISI; the blue line is the KE-Mamba forecast and the shaded blue band is its 90% confidence interval, with LSTM (dashed) and ARIMA (dotted) shown for comparison.
Figure 4. Actual versus predicted FISI trajectories for six representative countries, 2020–2023. Panels (af) correspond to South Korea, the United States, France, India, Nigeria, and Brazil, respectively. The solid black line is the actual FISI; the blue line is the KE-Mamba forecast and the shaded blue band is its 90% confidence interval, with LSTM (dashed) and ARIMA (dotted) shown for comparison.
Sustainability 18 06117 g004
Figure 5. SHAP feature importance and dependence analysis for the KE-Mamba model. (a) Mean |SHAP value| ranking of the leading explicit features, separating standard inputs from the grouped KG-embedding dimensions; (b) SHAP summary (beeswarm) plot showing the magnitude and direction of each feature’s contribution, colored by feature value; (c) SHAP dependence plot for the international co-production rate, showing the inverted-U relationship that peaks near a 22% co-production rate; (d) SHAP interaction plot for the digital distribution ratio, split by below- and above-median GDP per capita. Panel (e) decomposes the grouped 128-dimensional KG-embedding SHAP contribution by relation type, using linear probes from the country embeddings to relation-exposure summaries; PRODUCES, CO_PRODUCED_BY, and EXPORTS_TO jointly account for 69.2% of the KG-SHAP mass.
Figure 5. SHAP feature importance and dependence analysis for the KE-Mamba model. (a) Mean |SHAP value| ranking of the leading explicit features, separating standard inputs from the grouped KG-embedding dimensions; (b) SHAP summary (beeswarm) plot showing the magnitude and direction of each feature’s contribution, colored by feature value; (c) SHAP dependence plot for the international co-production rate, showing the inverted-U relationship that peaks near a 22% co-production rate; (d) SHAP interaction plot for the digital distribution ratio, split by below- and above-median GDP per capita. Panel (e) decomposes the grouped 128-dimensional KG-embedding SHAP contribution by relation type, using linear probes from the country embeddings to relation-exposure summaries; PRODUCES, CO_PRODUCED_BY, and EXPORTS_TO jointly account for 69.2% of the KG-SHAP mass.
Sustainability 18 06117 g005
Figure 6. Country clustering and sustainability mapping based on spectral clustering of FISI pillar scores. (a) Spectral projection of the 42 countries onto the first two embedding dimensions, colored by cluster; (b) geographic distribution of the sample colored by 2023 FISI; (c) radar chart of the mean Cultural Diversity, Economic Resilience, and SDG Alignment pillar scores for the four clusters.
Figure 6. Country clustering and sustainability mapping based on spectral clustering of FISI pillar scores. (a) Spectral projection of the 42 countries onto the first two embedding dimensions, colored by cluster; (b) geographic distribution of the sample colored by 2023 FISI; (c) radar chart of the mean Cultural Diversity, Economic Resilience, and SDG Alignment pillar scores for the four clusters.
Sustainability 18 06117 g006
Figure 7. Model-based associative policy stress-test results for five targeted interventions across the four country clusters. Bars show the projected change in the composite index ( Δ FISI) and in the Cultural Diversity, Economic Resilience, and SDG Alignment pillars ( Δ CDP, Δ ERP, Δ SAP). (a) Digital distribution + 10 pp; (b) female director share intervention; (c) domestic market share intervention; (d) international co-production intervention; (e) internet penetration + 20 pp, applied only to below-median-penetration countries, where the hatched grey N/A box marks Mature–Diverse markets to which the intervention does not apply because their penetration is already above the median.
Figure 7. Model-based associative policy stress-test results for five targeted interventions across the four country clusters. Bars show the projected change in the composite index ( Δ FISI) and in the Cultural Diversity, Economic Resilience, and SDG Alignment pillars ( Δ CDP, Δ ERP, Δ SAP). (a) Digital distribution + 10 pp; (b) female director share intervention; (c) domestic market share intervention; (d) international co-production intervention; (e) internet penetration + 20 pp, applied only to below-median-penetration countries, where the hatched grey N/A box marks Mature–Diverse markets to which the intervention does not apply because their penetration is already above the median.
Sustainability 18 06117 g007aSustainability 18 06117 g007b
Figure 8. Rolling-window model performance and feature importance dynamics, 2018–2023. (a) Rolling-window MAE by country cluster across successive test windows (2020–21 to 2022–23); (b) year-by-year importance-rank trajectories of the leading explicit features; (c) actual versus predicted FISI for the disrupted 2020 test year ( R 2 = 0.891 ); (d) actual versus predicted FISI for the stabilized 2023 test year ( R 2 = 0.962 ).
Figure 8. Rolling-window model performance and feature importance dynamics, 2018–2023. (a) Rolling-window MAE by country cluster across successive test windows (2020–21 to 2022–23); (b) year-by-year importance-rank trajectories of the leading explicit features; (c) actual versus predicted FISI for the disrupted 2020 test year ( R 2 = 0.891 ); (d) actual versus predicted FISI for the stabilized 2023 test year ( R 2 = 0.962 ).
Sustainability 18 06117 g008
Table 1. Mapping of film industry indicators to Sustainable Development Goal targets.
Table 1. Mapping of film industry indicators to Sustainable Development Goal targets.
Film Industry IndicatorSDG TargetPillarDirection
Domestic film market share8.3 (Productive employment)Cultural DiversityPositive
Genre diversity index (Shannon)10.2 (Social inclusion)Cultural DiversityPositive
International co-production rate17.6 (Knowledge sharing)Cultural DiversityPositive
Female director share5.5 (Gender equality in leadership)Cultural DiversityPositive
Language diversity ratio4.7 (Cultural literacy)Cultural DiversityPositive
Independent production share10.2 (Social inclusion)Cultural DiversityPositive
Revenue growth stability8.1 (Sustained economic growth)Economic ResiliencePositive
Screen density growth11.4 (Cultural heritage access)Economic ResiliencePositive
Digital distribution ratio12.2 (Efficient resource use)Economic ResiliencePositive
Export revenue share8.2 (Economic productivity)Economic ResiliencePositive
Production volume trend9.5 (Scientific research capacity)Economic ResiliencePositive
Box office concentration (inverted HHI)8.3 (Productive employment)Economic ResiliencePositive
Film employment share8.5 (Full employment, decent work)SDG AlignmentPositive
Screen access Gini (inverted)10.3 (Equal opportunity)SDG AlignmentPositive
Urban–rural infrastructure ratio11.a (Urban–rural linkages)SDG AlignmentNegative
Youth employment ratio in film8.6 (Youth employment)SDG AlignmentPositive
Public funding accessibility index16.6 (Accountable institutions)SDG AlignmentPositive
Audience reach equity10.2 (Social inclusion)SDG AlignmentPositive
Table 2. Descriptive statistics of key variables (42 countries, 2005–2023).
Table 2. Descriptive statistics of key variables (42 countries, 2005–2023).
VariableMeanStd. Dev.MinMaxObs.
Box office revenue (million USD)1847.33412.612.421,083.5798
Films produced (annual)168.4241.781986798
Domestic market share (%)32.722.41.388.6798
Genre diversity index (Shannon)0.7140.1380.3120.946798
Screen density (per million pop.)38.224.62.1132.8798
Digital distribution ratio (%)41.328.70.089.4798
Female director share (%)18.68.42.134.9798
International co-production rate (%)14.811.20.452.3798
GDP per capita (thousand USD)28.419.71.887.3798
Internet penetration (%)68.322.18.798.2798
FISI (composite)0.5180.1620.1270.871798
Cultural Diversity Pillar0.5420.1780.0930.912798
Economic Resilience Pillar0.5070.1930.0840.897798
SDG Alignment Pillar0.4890.1710.1020.846798
Table 3. Pairwise correlation matrix of selected variables.
Table 3. Pairwise correlation matrix of selected variables.
FISICDPERPSAPDMSGDIDDRGDPpc
FISI1.000
CDP0.8471.000
ERP0.8120.6341.000
SAP0.7890.5980.6711.000
DMS0.6230.7410.4120.3871.000
GDI0.5870.6920.4980.4130.5341.000
DDR0.7140.5230.7820.6180.2980.4561.000
GDPpc0.6810.5740.7230.6120.2670.5010.6871.000
Note: FISI = Film Industry Sustainability Index; CDP = Cultural Diversity Pillar; ERP = Economic Resilience Pillar; SAP = SDG Alignment Pillar; DMS = Domestic Market Share; GDI = Genre Diversity Index; DDR = Digital Distribution Ratio; GDPpc = GDP per Capita. All correlations with | r | > 0.15 are significant at the 1% level.
Table 4. Hypothesis–experiment alignment.
Table 4. Hypothesis–experiment alignment.
HClaim TestedExperiment/TableSuccess CriterionValidation Evidence
H1KG-enhanced selective state-space forecasting improves accuracyMain baseline comparison; KG ablationKE-Mamba MAE lower than base Mamba, LSTM, XGBoost, FE panelKE-Mamba MAE 0.0389 vs. base Mamba 0.0461 and FE-LDV 0.0585
H2Narrative features add useful content signalRemove LLM narrative featuresMAE increases when removedMAE 0.0408, +4.9%
H3Knowledge editing improves factual reliabilityFactual probe; no-edit ablationAccuracy increases; downstream MAE decreasesAll-fact accuracy 66.8% → 89.7%; MAE 0.0399 → 0.0389
H4Model is not pandemic-specificPre-pandemic split 2018–2019KE-Mamba remains bestMAE 0.0314, R 2 0.952
H5Projected associations exceed placebo patternsPlacebo intervention reassignmentObserved Δ FISI exceeds placebo distributionDigital distribution + 0.0297 vs. placebo + 0.0031 , p < 0.001
Table 5. Human expert validation of LLM narrative scores ( n = 200 films).
Table 5. Human expert validation of LLM narrative scores ( n = 200 films).
Narrative DimensionHuman α Llama–Expert rLlama–Expert ρ MAE (0–10)Qwen–Llama r
Narrative structure0.790.780.750.830.90
Pacing0.730.710.690.960.88
Character development0.760.740.720.900.87
Thematic depth0.780.760.730.880.89
Dialogue quality0.680.620.591.120.82
Genre coherence0.820.810.790.750.91
Cultural specificity0.710.700.680.990.84
Projected audience reach0.690.660.641.050.86
Average0.750.720.700.940.87
Note: Dialogue quality and projected audience reach show the weakest LLM–expert agreement because they are inferred from synopses rather than full scripts or box-office records; genre coherence and narrative structure show the strongest agreement, which is consistent with synopsis text being most informative for genre- and structure-level signals.
Table 6. Out-of-sample forecasting performance on the composite FISI: pandemic-included test (Panel A) and pre-pandemic test (Panel B).
Table 6. Out-of-sample forecasting performance on the composite FISI: pandemic-included test (Panel A) and pre-pandemic test (Panel B).
ModelMAERMSEMAPE (%) R 2 DM p-Value
Panel A: Train 2005–2017, validation 2018–2019, test 2020–2023 ( n = 168 )
ARIMA0.08470.112312.340.724<0.001
Two-way FE-LDV panel0.05850.07828.440.865<0.001
Random Forest0.06120.08348.910.847<0.001
XGBoost0.05730.07918.420.863<0.001
LSTM0.04980.06877.150.897<0.001
Mamba (base)0.04610.06426.730.9100.004
KE-Mamba (ours)0.03890.05485.610.934
Panel B: Train 2005–2015, validation 2016–2017, test 2018–2019 ( n = 84 )
ARIMA0.06020.07979.420.812<0.001
Two-way FE-LDV panel0.04360.05696.420.906<0.001
Random Forest0.04760.06197.220.887<0.001
XGBoost0.04480.05866.810.900<0.001
LSTM0.03910.05045.920.9240.006
Mamba (base)0.03670.04785.480.9330.018
KE-Mamba (ours)0.03140.04194.610.952
Note: DM p-value is the Diebold–Mariano test statistic comparing each model against KE-Mamba. All neural model results are means over five random seed initializations. The two-way fixed-effects panel model with lagged dependent variables (FE-LDV) uses the same 34-dimensional panel feature set and includes country and year fixed effects; it serves as an interpretable econometric baseline. Panel B confirms that the KE-Mamba ranking is not specific to the pandemic period.
Table 7. Pillar-level forecasting performance of the KE-Mamba model (test period: 2020–2023).
Table 7. Pillar-level forecasting performance of the KE-Mamba model (test period: 2020–2023).
TargetMAERMSEMAPE (%) R 2
Composite FISI0.03890.05485.610.934
Cultural Diversity Pillar0.04120.05735.940.928
Economic Resilience Pillar0.04670.06386.870.912
SDG Alignment Pillar0.03580.04975.230.941
Table 8. KG embedding model comparison.
Table 8. KG embedding model comparison.
KG Embedding ModelLink MRRHits@10Test MAETest RMSETest R 2
TransE0.3410.4870.04080.05710.929
ComplEx0.3720.5110.03970.05580.932
RotatE0.3870.5240.03890.05480.934
Table 9. KG embedding dimension sensitivity (validation and test losses, 2020–2023 test split).
Table 9. KG embedding dimension sensitivity (validation and test losses, 2020–2023 test split).
KG DimLink Val LossLink MRRFISI Val MAEFISI Test MAEFISI Test R 2 Interpretation
640.02890.3610.04210.04130.927Under-parameterized
1280.02670.3870.03970.03890.934Optimal validation/test balance
2560.02790.4010.04050.03940.933Link quality up but FISI val MAE rises
Table 10. Ablation study results on the composite FISI forecast.
Table 10. Ablation study results on the composite FISI forecast.
ConfigurationMAERMSE R 2 Δ MAE (%)
Full KE-Mamba0.03890.05480.934
w/o LLM narrative features0.04080.05730.928 + 4.9
w/o knowledge editing step0.03990.05590.931 + 2.6
w/o gated KG fusion (concat instead)0.04270.05940.923 + 9.8
w/o knowledge graph embeddings0.04610.06420.910 + 18.5
LSTM + gated KG fusion + KE0.04520.06210.916 + 16.2
KE-Mamba w/ TransE (vs. RotatE)0.04080.05710.929 + 4.9
KE-Mamba, KG embedding dim = 64 0.04130.05790.927 + 6.2
KE-Mamba, KG embedding dim = 256 0.03940.05540.933 + 1.3
Table 11. Top 10 explicit features by mean absolute SHAP value for the composite FISI forecast.
Table 11. Top 10 explicit features by mean absolute SHAP value for the composite FISI forecast.
RankFeatureMean | SHAP | Share of | SHAP | Mass (%)Direction
1Domestic film market share0.03712.1Positive
2Digital distribution ratio0.03110.3Positive
3Genre diversity index0.0288.7Positive
4GDP per capita0.0247.1Positive
5Screen density0.0195.4Positive
6Revenue growth stability0.0174.7Positive
7Internet penetration0.0154.2Positive
8Female director share0.0143.8Positive
9International co-production rate0.0122.9Nonlinear
10Films produced (annual)0.0092.3Positive
KG embedding (group, 128 dims)34.1
Narrative score (group, 8 dims)4.4Positive
Note: Shares are fractions of the total mean absolute SHAP mass over all 162 features of the KE-Mamba model (the 34-dimensional dynamic input together with the 128-dimensional static KG prior); the 128 knowledge graph embedding dimensions and the 8 LLM narrative scores are reported as grouped contributions following the grouped-SHAP convention.
Table 12. Sensitivity of KE-Mamba forecast to alternative FISI aggregation schemes.
Table 12. Sensitivity of KE-Mamba forecast to alternative FISI aggregation schemes.
FISI ConstructionPillar Weights/RuleCorr. with BaselineKendall τ Test MAETest R 2
Baseline geometric meanCDP/ERP/SAP = 1/3 each1.0001.0000.03890.934
Arithmetic pillar meanEqual pillar weights0.9810.9470.03950.930
Principal component analysis (PCA)-derived weightsCDP 0.38, ERP 0.34, SAP 0.280.9640.9130.04020.926
Delphi budget allocationCDP 0.36, ERP 0.32, SAP 0.320.9780.9410.03910.933
Winsorized z-score normalizationEqual geometric mean0.9690.9210.04000.927
Table 13. Characteristics of the four national film industry sustainability archetypes.
Table 13. Characteristics of the four national film industry sustainability archetypes.
CharacteristicMature–DiverseEmerging–DynamicState-RegulatedDeveloping–Fragile
N countries141585
Mean FISI (2023)0.720.500.450.31
Mean CDP0.730.530.410.34
Mean ERP0.690.470.490.26
Mean SAP0.700.470.440.30
Annual FISI growth + 0.010 + 0.022 + 0.009 + 0.017
Domestic market share (%)38.634.129.410.2
Digital distribution (%)59.741.838.115.4
Female director share (%)23.415.211.710.8
GDP per capita (k USD)44.318.618.33.9
Table 14. Model-based associative policy stress-test results: mean projected Δ FISI by intervention and cluster.
Table 14. Model-based associative policy stress-test results: mean projected Δ FISI by intervention and cluster.
InterventionMagnitudeMature-Div.Emerg.-Dyn.State-Reg.Devel.-Frag.
Digital distribution + 10 pp + 0.018 + 0.032 + 0.029 + 0.041
Female director share + 10 pp + 0.009 + 0.014 + 0.017 + 0.012
Genre diversity + 0.1 (Shannon) + 0.011 + 0.019 + 0.015 + 0.022
Co-production rate + 10 pp + 0.007 + 0.016 + 0.013 + 0.018
Screen density + 10 per million + 0.006 + 0.011 + 0.010 + 0.024
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qi, P.; Zhu, W. AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies. Sustainability 2026, 18, 6117. https://doi.org/10.3390/su18126117

AMA Style

Qi P, Zhu W. AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies. Sustainability. 2026; 18(12):6117. https://doi.org/10.3390/su18126117

Chicago/Turabian Style

Qi, Peixuan, and Weidong Zhu. 2026. "AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies" Sustainability 18, no. 12: 6117. https://doi.org/10.3390/su18126117

APA Style

Qi, P., & Zhu, W. (2026). AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies. Sustainability, 18(12), 6117. https://doi.org/10.3390/su18126117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop