AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies

Qi, Peixuan; Zhu, Weidong

doi:10.3390/su18126117

Open AccessArticle

AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies

by

Peixuan Qi

^1,* and

Weidong Zhu

²

¹

Film Academy, Macao University of Science and Technology, Macao 999078, China

²

School of Microelectronics, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(12), 6117; https://doi.org/10.3390/su18126117 (registering DOI)

Submission received: 5 April 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 14 June 2026

(This article belongs to the Special Issue AI for Sustainable Development: Applications and Impacts across Industries)

Download

Browse Figures

Versions Notes

Abstract

This paper examines how artificial intelligence can support sustainability assessment in cultural industries, using national film industries as a test case. The Film Industry Sustainability Index (FISI) is introduced as a composite indicator covering cultural diversity, economic resilience, and Sustainable Development Goal (SDG) alignment for 42 national economies from 2005 to 2023. Knowledge-Enhanced Mamba (KE-Mamba), a selective state-space forecasting model, is then proposed to combine annual panel indicators with country-level film-industry knowledge graph (KG) embeddings and large language model (LLM)-derived screenplay-oriented narrative proxies from film synopses. To reduce factual errors in title-level narrative scoring, the LLM is anchored to verified United Nations Educational, Scientific and Cultural Organization (UNESCO) records and the European Audiovisual Observatory’s LUMIERE film-admissions database using rank-one model editing (ROME). On the 2020–2023 held-out test period, KE-Mamba achieves a composite FISI mean absolute error (MAE) of 0.0389, a mean absolute percentage error (MAPE) of 5.61%, and an

R^{2}

of 0.934, outperforming autoregressive integrated moving average (ARIMA), tree-based, long short-term memory (LSTM), and base Mamba baselines. Additional robustness checks using a pre-pandemic split, two-way fixed-effects panel regression, alternative FISI weighting schemes, KG embedding ablations, and human validation of LLM narrative scores support the reliability of the proposed framework. Policy simulations are interpreted as model-based projected associations rather than causal estimates. The results show that knowledge-enhanced sequence models can provide transparent forecasting support for sustainable cultural-industry policy.

Keywords:

AI for sustainable development; cross-industry applications; large language models; screenplay narrative analysis; knowledge editing; film industry sustainability; cultural and creative industries; Sustainable Development Goals

1. Introduction

The global film industry generates roughly USD 342 billion in annual output and supports 14.2 million jobs worldwide, placing it among the largest subsectors of the cultural and creative industries [1,2]. It plays a dual role, operating both as a commercial industry subject to market forces and as a medium through which national and collective identities are expressed [3,4]. Whether national film industries can sustain economic viability while preserving cultural diversity and advancing social development is therefore a question with both economic and policy stakes.

Several Sustainable Development Goals (SDGs) bear directly on the film industry. SDG 8 links to employment quality and revenue resilience in cultural production; SDG 10 concerns unequal access to cultural markets and distribution channels; SDG 17 is reflected in co-production and cross-border knowledge sharing; and SDG 5 relates to persistent gender gaps in creative leadership [4,5,6,7]. The UNESCO Convention on the Protection and Promotion of the Diversity of Cultural Expressions further frames cultural diversity as a condition for sustainable development [4], yet operational tools for measuring and forecasting the SDG alignment of national film industries remain scarce. Artificial intelligence has become a practical tool for sustainability forecasting and sustainable-development decision support across industries [8,9], but its applications and impacts remain unevenly distributed: the literature concentrates on environmental and resource sectors with comparatively little attention to how AI can support sustainable development in cultural and creative industries. In the cultural sector, computational work has focused on box office prediction [10], film network analysis [11,12], and cultural diversity measurement, while forecasting composite sustainability trajectories for national film industries has been neglected. Extending AI-for-sustainable-development research to this under-covered industry is therefore both a methodological opportunity—film panels combine structural relational data, long textual content, and medium-horizon macro dynamics—and a policy need for cultural ministries and international organizations working toward the 2030 Agenda.

This paper makes four contributions. First, it constructs and documents the Film Industry Sustainability Index (FISI), which is a three-pillar composite indicator linking cultural diversity, economic resilience, and SDG alignment for a balanced 42-country panel from 2005 to 2023. Second, it proposes a knowledge-enhanced selective state-space forecasting architecture in which a gated fusion layer injects static country-level film-industry KG embeddings into dynamic Mamba hidden states. Third, it introduces a transparent narrative-feature pipeline that scores film synopses with an LLM, anchors film-level factual knowledge through model editing, and validates the resulting scores against both human expert ratings and an alternative LLM. Fourth, it provides a policy-oriented validation suite, including ablations, a pre-pandemic temporal split, panel-econometric baselines, FISI weighting sensitivity, KG relation decomposition, and non-causal policy stress tests with placebo checks.

The remainder of the paper develops the theoretical framework (Section 2), reviews related work (Section 3), describes data and variables (Section 4), details the methodology (Section 5), reports experimental results (Section 6), discusses policy implications and limitations (Section 7), and concludes (Section 8).

2. Theoretical Framework: Linking Film Industries to the SDGs

National film industries are conceptualized as socio-economic ecosystems whose health can be characterized along dimensions that map onto SDG targets. Drawing on the economics of culture [3] and cross-national cultural-value frameworks [13], three pillars are identified, which are each operationalized through six indicators. Cultural Diversity (Shannon genre entropy, female director share, co-production rate, language diversity, independent production share, domestic market share) addresses the range of creative expressions and equitable participation, connecting to SDGs 4, 5, 10, and 17. Economic Resilience (revenue growth stability, screen density growth, digital distribution ratio, export revenue share, production volume trend, inverted Herfindahl–Hirschman Index (HHI) box office concentration) captures the diversification of the economic base and maps onto SDGs 8, 9, and 12. SDG Alignment (film employment share, inverted screen access Gini, urban–rural infrastructure ratio, youth employment ratio, public funding accessibility, audience reach equity) measures distributional fairness and relates to SDGs 8.5, 8.6, 10.3, 11.a, and 16.6. Each indicator is min–max normalized within each annual cross-section, each pillar score is the arithmetic mean of its six constituents, and the composite FISI is the geometric mean of the three pillar scores, enforcing simultaneous progress across dimensions rather than compensation. Table 1 reports the complete indicator–SDG mapping.

Formally, for a positive-direction indicator

v_{j c t}

, the normalized value is

z_{j c t} = \frac{v_{j c t} - \min_{c} v_{j c t}}{\max_{c} v_{j c t} - \min_{c} v_{j c t} + ϵ},

(1)

where normalization is performed within year t and

ϵ = 10^{- 8}

prevents division by zero. For negative-direction indicators (urban–rural infrastructure ratio),

z_{j c t}

is replaced by

1 - z_{j c t}

. Pillar score p is computed as

P_{p, c t} = \frac{1}{6} \sum_{j \in p} z_{j c t},

(2)

and the composite index is the equal-weight geometric mean

{FISI}_{c t} = {[(P_{CD, c t} + ϵ) (P_{ER, c t} + ϵ) (P_{SA, c t} + ϵ)]}^{1 / 3} .

(3)

The geometric mean penalizes unbalanced development across pillars; alternative arithmetic, principal-component, Delphi-weighted, and winsorized z-score aggregation schemes are evaluated in Section 6.

3. Related Work

3.1. AI for Sustainability Forecasting and Cultural-Industry Analytics

Early computational analyses of the film industry centered on box office prediction with star power, genre, and marketing budget as features [14,15,16]. Recent work has applied gradient boosting and neural architectures to revenue forecasting [10] and network analysis to film festivals and industry ecosystems [11,12], but the focus remains on short-run commercial outcomes, and the longitudinal panel structure most relevant for policy-oriented sustainability forecasting has been largely overlooked. At the same time, AI is an increasingly practical tool for sustainability forecasting [8]: Chen et al. used machine learning to forecast multidimensional sustainability indices for EU member states [9], but the intersection of AI and cultural industries remains underexplored [17] with no prior study applying machine learning to a composite cultural sustainability index of the kind proposed here.

3.2. Knowledge Graphs and Knowledge-Grounded LLMs

Tilly and Livan showed that features from statistically validated knowledge graphs improve macroeconomic forecasts over purely tabular inputs [18]; among embedding methods, RotatE interprets each relation as a rotation in complex space and produces geometrically interpretable vectors with strong link-prediction performance [19]. This paper extends this line to the film industry, where co-production, distribution, and funding relationships form a relational structure that tabular features cannot capture. LLMs can reason about screenplay structure, character arcs, and themes in both generative [20] and evaluative settings, yielding interpretable quality signals that complement metadata-based features; because LLMs encode parametric knowledge about specific films that may be outdated or inaccurate, knowledge editing methods such as ROME [21] and subsequent mass-editing techniques [22] can update factual associations without full retraining.

Recent knowledge-grounded LLM systems show why structured retrieval remains important even when LLMs provide fluent reasoning. TrumorGPT combines graph-based retrieval with an LLM for fact checking, demonstrating that graph structure can help constrain generation and improve evidence traceability [23]. Similarly, hybrid fact-checking pipelines that combine KG retrieval, LLM classification, and search-based retrieval agents achieve interpretable claim verification by activating external evidence sources when KG coverage is insufficient [24]. These studies are relevant to the present framework because the narrative scores used here are not treated as ungrounded LLM outputs: film-level country, year, and genre facts are first checked against verified film records, and the residual risk of hallucination is explicitly evaluated.

3.3. Hybrid Deep Architectures for Complex Digital and Multimodal Data

The broader literature on hybrid deep architectures also motivates the integration of heterogeneous feature streams. Chechkin et al. [25] propose a hybrid neural-network Transformer for detecting and classifying destructive digital content, emphasizing temporal dynamics, nonlinear dependencies, and multilayered data analysis. Although their application domain differs from film-industry sustainability, the study is methodologically relevant because it illustrates how modern AI systems increasingly combine sequence modeling, attention-like mechanisms, and heterogeneous digital signals. This paper differs by replacing quadratic self-attention with a selective state-space backbone and by using KG-derived country embeddings as a structured relational prior rather than raw token-level features.

3.4. Selective State Space Models for Panel Time-Series Forecasting

For the temporal backbone, this paper uses Mamba, which is a selective state space model that captures long-range dependencies with linear complexity [26]. Variants such as MambaTS [27], Mamba4Cast [28], and SiMBA [29] have been applied to time series forecasting with systematic benchmarking [30]. The selective scan mechanism is particularly suited to film market data where policy shifts and macroeconomic shocks create uneven temporal salience across years [31]. Unlike Transformer-based sequence models with quadratic self-attention costs, the linear-time selective scan allows for an efficient processing of the multi-country annual panel while preserving the model’s ability to emphasize shock periods and structural breaks.

4. Data and Variables

4.1. Data Sources and Coverage

The dataset is a balanced panel of 42 national economies observed annually from 2005 to 2023 (798 country-year observations), covering roughly 92% of global theatrical box office revenues in 2023 based on a tabulation of Motion Picture Association (MPA) THEME 2024 [2]. Primary sources are the UNESCO UIS Feature Films and Cinema Database [32], the European Audiovisual Observatory LUMIERE database [33], the World Bank World Development Indicators (WDI) [34], and the OECD Culture and Creative Economy database [35], which are supplemented by MPA THEME [2], PwC Global Entertainment and Media Outlook [1], McKinsey [36], and the United Nations Development Programme Human Development Report (UNDP HDR) [37]. Linear interpolation was applied to gaps of up to three consecutive years; larger gaps led to exclusion, removing seven countries from the initial candidate set. The final balanced panel consists of 42 countries after excluding seven candidate countries with more than three consecutive missing years in at least one required FISI component: Greece and Portugal (incomplete public-funding and audience-equity series), Peru and Morocco (discontinuous domestic-share, export-revenue, and digital-distribution data), Pakistan (production and box-office coverage before 2012), Vietnam (cinema-admission and co-production records), and Ukraine (territorial reporting discontinuities after 2014); the longest continuous gap and the affected indicators for each excluded country are reported in Appendix B (Table A2).

4.2. Variable Construction

The 18 FISI component indicators are grouped into the three pillars of Section 2; full construction rules and data sources for each indicator are reported in Appendix E (Table A4). Briefly, the genre diversity index is the Shannon entropy of annual domestic production across 12 standardized genres, which is normalized by

\ln (12)

; revenue growth stability is the inverse coefficient of variation of box office revenue on a rolling five-year window; and the digital distribution ratio is the share of total film revenue from legitimate subscription video-on-demand (SVOD), transactional video-on-demand (TVOD), and digital rental/purchase channels. Table 2 reports descriptive statistics, Table 3 reports the pairwise correlation matrix among the composite FISI, its pillars, and selected indicators, and Figure 1 summarizes the research pipeline.

The English-synopsis filter used for narrative feature extraction improves prompt consistency but creates a coverage bias. Retention is highest for internationally distributed films (74.8% in high-income markets) and lowest for hyper-local productions in lower-middle and developing–fragile markets (32.8–46.9%), as visualized in Figure 2a. To bound this concern, an inverse-coverage reweighted narrative aggregation produces a composite FISI correlation of 0.982 with the baseline narrative aggregation and shifts the KE-Mamba test MAE only from 0.0389 to 0.0396. Two further robustness aggregations—raising the minimum synopsis length to 50 words and excluding countries with coverage below 35%—yield MAEs of 0.0393 and 0.0384, respectively, and correlations of 0.987 and 0.991 with the baseline. The English-synopsis bias is therefore substantively important for interpretation but does not drive the headline forecasting results.

5. Methodology

5.1. Research Hypotheses and Validation Design

The empirical design evaluates five hypotheses. H1: Adding structured film-industry KG information to a selective state-space model improves FISI forecasting relative to non-knowledge baselines. H2: LLM-derived narrative features provide incremental predictive information beyond macroeconomic and FISI indicators. H3: Factual anchoring through knowledge editing reduces title-level factual errors and improves downstream forecast reliability. H4: The proposed model generalizes across both crisis and non-crisis temporal splits. H5: Policy simulations should be interpreted as projected associations and should remain distinguishable from random placebo interventions. Table 4 maps each hypothesis to the corresponding experiment.

5.2. Knowledge Graph Construction and Embedding

The structured relational database defines five entity types—Country, ProductionEntity (studios, production companies, public funding bodies), Film, Genre (12 categories), and Person (directors, producers, key creative personnel)—and seven relation types: produces, funds, belongs_to, co_produced_by, directed_by, operates_in, and exports_to. Populated by cross-referencing UNESCO UIS, LUMIERE, and national film registries, the database contains 14,600 triples. Continuous vector representations are extracted with RotatE [19] using embedding dimension 128 (selected from

{64, 128, 256}

), an 80/10/10 split stratified by country, entity, and relation type, and self-adversarial negative sampling with 256 negatives, a margin of 9.0, a learning rate of 0.001, and 500 epochs with early stopping, yielding a mean reciprocal rank (MRR) of 0.387 and hits at cutoff 10 (Hits@10) 0.524 on the test split. The resulting 128-dimensional country embeddings encode each country’s structural position in the global film production and distribution network. The KG embedding is used as a static, time-invariant structural prior for each country; consistent with the forecasting protocol in Section 5, no target-year-specific triples enter as predictors of that year’s FISI.

5.3. LLM-Based Screenplay Narrative Features with Knowledge Editing

In parallel with the structural RotatE features, this paper extracts screenplay-oriented narrative proxies from film metadata, English-language synopses, and available critic excerpts using a large language model. The term “screenplay-aware” therefore refers to narrative constructs commonly associated with screenplay analysis—narrative structure, pacing, character development, thematic depth, dialogue quality, genre coherence, cultural specificity, and projected audience reach—rather than full-script access for all 12,438 titles. Llama-3-8B-Instruct is used as the base model (selected because open weights allow the knowledge editing step below), and robustness is checked by repeating the entire scoring pipeline with Qwen2.5-7B-Instruct; per-dimension Pearson correlations range from 0.82 to 0.91 with an average of 0.87 on the validation subset. The film sample consists of 12,438 titles obtained by intersecting UNESCO UIS production records with The Movie Database (TMDb) snapshot of 20 December 2024, retaining titles with an English-language synopsis of at least 80 words; this biases the sample toward internationally distributed titles from non-English markets, which is a limitation revisited in Section 7. Each film is scored on a 0–10 integer scale along eight dimensions—(i) narrative structure, (ii) pacing, (iii) character development, (iv) thematic depth, (v) dialogue quality (inferred from synopsis and critic excerpts), (vi) genre coherence, (vii) cultural specificity, and (viii) projected audience reach—which are then rescaled to

[0, 1]

and aggregated to the country-year level by a simple mean. The full prompt and anchoring rubric are in Appendix D.

LLM parametric knowledge about specific titles can be outdated or hallucinated, which is addressed with a ROME-style knowledge editing step [21,22] before scoring. For the 4127 titles on which Llama-3-8B-Instruct fails a factual probe (country of production, release year, or genre tag), rank-one update targets are constructed from verified UNESCO UIS and LUMIERE records. Following the Mass-Editing Memory in a Transformer (MEMIT) variant of ROME, edits are applied to the multilayer perceptron (MLP) down-projection matrices in layers 3–8 with the title token as subject and the verified fact as object; all edits are applied on a frozen copy of the base model. On the full 12,438-title sample, editing raises all-fact accuracy from 66.8% to 89.7% (country 89.2% → 97.6%, year 86.7% → 96.4%, genre 82.4% → 93.1%; Figure 2b). The improvement is concentrated on the targeted 4127-title failed subset: by construction, the failed subset has all-fact accuracy 0.0% before editing, with country 67.9%, year 59.6%, and genre 47.3% correctly recalled in isolation; after ROME/MEMIT-style editing, these rise to 93.5%, 91.8%, and 86.7%, respectively, and the joint all-fact pass rate on the same 4127 titles reaches 75.4%, indicating that the bulk of the full-sample gain is attributable to a direct correction of the failed subset rather than diffuse changes on already-correct titles. The 500-film neighborhood specificity falls only from 0.92 to 0.89, indicating limited bleed-through. Edits shift narrative scores by an average of 0.39 points on the 0–10 scale across 33.9% of titles with cultural specificity (mean |shift| 0.64) and genre coherence (0.58) moving most and narrative structure and dialogue least, which is consistent with the role of factual anchoring rather than wholesale rewriting. Compared with alternative editing strategies on the same probe set, ROME/MEMIT-style editing attains 89.7% all-fact accuracy at 0.9 graphics processing unit (GPU) hours and downstream MAE 0.0389, dominating fine tuning/LoRA (92.8%/0.0397 but locality drops to 0.76 and 4.8 h runtime) and matching GRACE-style editing (87.9%/0.0392) at roughly half the runtime. Ablating the editing stepyields a small but consistent MAE increase (0.0389 → 0.0399), supporting the claim that anchoring the LLM’s film-level knowledge improves the reliability of the aggregated narrative signals.

The eight narrative scores are externally validated on a stratified 200-film subset that crosses country income group, production decade, genre, and domestic/international distribution status, which are scored independently by three human raters with film-studies or script-development backgrounds using the same 0–10 rubric as the LLM. Table 5 reports the per-dimension human inter-rater Krippendorff’s

α

, Llama-3-8B–expert Pearson and Spearman correlations, mean absolute deviation on the 0–10 scale, and the cross-model Qwen2.5-7B–Llama-3-8B Pearson correlation on the same subset; Figure 2c provides a visual summary. Inter-rater agreement averages

α = 0.75

, Llama–expert correlations range from

r = 0.62

for dialogue quality (inferred from synopsis text only) to

r = 0.81

for genre coherence with an average of

r = 0.72

, and Qwen–Llama correlations range from 0.82 to 0.91 with an average of 0.87, supporting the interpretation of the eight scores as weak but consistent narrative proxies.

The resulting eight-dimensional country-year narrative vector

n (t) \in R^{8}

is concatenated with the 18 FISI indicators and 8 macroeconomic covariates to form a 34-dimensional dynamic input vector that is processed by the selective scan. The 128-dimensional country knowledge-graph embedding

e_{c}

is not concatenated to every input time step in the full KE-Mamba model; instead, it is treated as a static relational prior, projected into the hidden dimension, and injected after the selective scan through the gated KG–temporal fusion layer (Equations (6) and (7)). The model therefore uses 162 features in total but through two distinct pathways: 34 dynamic country-year features and 128 static KG-prior dimensions.

5.4. Knowledge-Enhanced Mamba Architecture

The temporal forecasting component is built on the selective state-space model [26]. Let

x (t)

denote the 34-dimensional dynamic country-year feature vector (18 FISI indicators, 8 macro covariates, 8 narrative features); the 128-dimensional KG embedding

e_{c}

is not part of this dynamic input and instead enters as a static prior through the gated fusion layer below. The 8 macro covariates, drawn from the World Bank WDI [34] and UNDP HDR [37], are GDP per capita (constant 2015 USD), real GDP growth, consumer price inflation, unemployment, internet penetration, urban population share, the HDI, and a trade openness ratio (trade/GDP), which are each standardized to zero mean and unit variance on the training window.

Following selective state-space modeling, the continuous parameters are discretized at each time step through input-dependent step sizes

Δ_{t}

:

{\bar{A}}_{t} = \exp (Δ_{t} A), {\bar{B}}_{t} = ({\bar{A}}_{t} - I) A^{- 1} B_{t} .

(4)

The recurrent scan is then

h_{t} = {\bar{A}}_{t} h_{t - 1} + {\bar{B}}_{t} x_{t}, o_{t} = C_{t} h_{t} + D x_{t} .

(5)

To ensure stable dynamics, the diagonal entries of

A

are parameterized as

- softplus (\tilde{A})

, implying

| \exp (Δ_{t} A_{i}) | < 1

for all

Δ_{t} > 0

. This is important in the present panel setting because annual film-market indicators can contain abrupt shocks (e.g., pandemic-driven box-office collapses), but the hidden state should not diverge during long-range propagation across the 19-year sequence.

The main architectural modification is a gated fusion layer that modulates the temporal hidden states with the static country knowledge embedding. The country KG embedding

e_{c} \in R^{128}

is static within the annual sequence and acts as a relational prior. It is first projected into the hidden dimension,

{\tilde{e}}_{c} = W_{e} e_{c} + b_{e}

. The gate

α_{c t} = σ (W_{g} [h_{c t}; {\tilde{e}}_{c}; x_{c t}] + b_{g})

(6)

learns when the dynamic time-series state or the static relational prior should dominate. Here,

α_{c t} \in {[0, 1]}^{d}

is an element-wise gate over the d hidden units. The fused state is

h_{c t}^{'} = α_{c t} ⊙ h_{c t} + (1 - α_{c t}) ⊙ {\tilde{e}}_{c} .

(7)

Because

e_{c}

is injected after the selective scan rather than concatenated to every input vector, the model preserves linear-time temporal processing while allowing the country-specific relational structure to modulate the forecast head. When

α_{c t}

is close to one, the model relies on the temporal signal; when it is close to zero, the knowledge embedding dominates.

A two-layer feed-forward head with GELU activation maps

h^{'} (t)

to four outputs (the composite FISI and its three pillar scores), which are trained with an equally weighted MSE loss.

Mamba is preferred over a Transformer in this setting because the panel contains short-to-medium annual sequences with many heterogeneous features, and the objective is stable forecasting rather than token-level representation learning. The selective scan has linear complexity in sequence length,

O (T d)

, whereas full self-attention has

O (T^{2} d)

complexity. Compared with LSTM, the input-dependent state transition allows the model to emphasize shock years and policy-transition periods without relying solely on fixed recurrent gates. The gated KG fusion further separates static relational structure from dynamic time-series evidence, which plain concatenation cannot achieve; this design is supported by the ablation in which removing the gated fusion and replacing it with direct concatenation increases MAE by 9.8%.

5.5. Baselines, Training, and Evaluation

Five baselines are considered: ARIMA (per-country with AICc order selection), Random Forest (500 trees, depth 12), XGBoost (300 rounds, depth 8, lr 0.05), LSTM (two layers of 128 units, dropout 0.2, 5-year lookback), and a base Mamba identical to KE-Mamba but without the gated KG fusion layer or KG embeddings. A two-way fixed-effects panel regression with lagged dependent variables (FE-LDV) is also included as an interpretable econometric benchmark, which is specified as

F I S I_{c, t + 1} = ρ F I S I_{c, t} + β^{⊤} z_{c, t} + μ_{c} + λ_{t} + ε_{c, t},

(8)

where

z_{c, t}

is the same 34-dimensional panel feature vector used by the other baselines and

μ_{c}

,

λ_{t}

are country and year fixed effects. Tree baselines and neural baselines share the same 34-dimensional panel feature set; neural models use Adam with an initial lr 0.001, cosine annealing, and early stopping with a patience value of 20. The final hyperparameter configuration, search ranges, and sensitivity results are in Appendix C.

The dataset is split temporally: 2005–2017 for training (546 observations), 2018–2019 for validation (84), and 2020–2023 for testing (168), so evaluation is strictly out-of-sample and covers the pandemic shock. ARIMA is fitted per country. Performance is reported using MAE, RMSE, MAPE, and

R^{2}

, and statistical significance is assessed with the Diebold–Mariano test [38] using the Harvey–Leybourne–Newbold small-sample correction: DM statistics are computed country by country on the 2020–2023 loss-differential sequences and combined across the 42 countries with Stouffer’s Z-score method, yielding a single panel-level test. Because the composite FISI is bounded away from zero in the sample (minimum

0.127

; Table 2), MAPE is well defined and is not affected by small-denominator inflation. All experiments use an NVIDIA A100 40 GB GPU; results are means over five random seeds with seed-to-seed std. below 0.0015 for all neural models (seed-level results in Appendix F). Figure 3 shows the temporal evolution of FISI and its pillars by income group.

All models forecast the composite FISI and its three pillar scores one year ahead. For each forecast origin t, the neural models use the realized feature sequence from the preceding five-year lookback window and produce

{\hat{y}}_{c, t + 1}

in a single step; predictions are not fed back recursively as inputs, so every validation and test forecasts conditions on observed lagged features rather than model-generated values, giving the neural, tree-based, and FE-LDV baselines an identical information set. This is therefore an annual ex-post one-step-ahead forecasting protocol rather than a start-of-year real-time nowcasting exercise. No target-year predictors enter the forecasting input: the dynamic input sequence contains only information available up to the forecast origin, and narrative scores for a country-year are computed solely from titles released in that country–year, so they can enter the model only when that year lies in the lagged input window and never when it is the prediction target. Within-year min–max normalization defines the FISI labels and component indicators from the same-year country cross-section, while the macroeconomic covariates are standardized using training-window statistics only, so no future information leaks into the predictors.

6. Experimental Results

6.1. Overall Forecasting Performance

Table 6 reports the out-of-sample performance on the primary 2020–2023 test period (Panel A) and on a pre-pandemic 2018–2019 split (Panel B). The latter retrains all models with the years 2005–2015 used for training, the years 2016–2017 used for validation, and the years 2018–2019 used for testing (

n = 84

country-year observations) so that the model ranking can be evaluated outside the high-volatility pandemic regime. On Panel A, KE-Mamba achieves the lowest error on all four metrics with a composite FISI MAE of 0.0389 (MAPE 5.61%,

R^{2} = 0.934

), a 54.1% reduction over ARIMA and 15.6% over the base Mamba. KE-Mamba also outperforms the two-way fixed-effects panel model with lagged dependent variables (FE-LDV; MAE 0.0585), which represents the standard interpretable econometric benchmark; the 33.5% MAE reduction demonstrates the gain from incorporating nonlinear temporal dynamics, knowledge graph priors, and narrative features. The DM test rejects equal predictive accuracy between KE-Mamba and every baseline at the 1% level (base Mamba:

p = 0.004

; others:

p < 0.001

). On the pre-pandemic split (Panel B), all models produce lower absolute errors than on 2020–2023, as expected given the absence of pandemic-driven structural breaks, but the relative ranking is preserved: KE-Mamba retains the lowest MAE (0.0314,

R^{2} = 0.952

) and outperforms the base Mamba by 14.4% and the FE-LDV panel by 28.0%, confirming that the advantage is not specific to the pandemic regime.

Table 7 breaks down KE-Mamba results by pillar. The SDG Alignment Pillar is easiest to forecast (

R^{2} = 0.941

), as its indicators are structural and slow moving, while the Economic Resilience Pillar is hardest (MAPE 6.87%,

R^{2} = 0.912

) because the 2020–2021 pandemic shock disrupted historical economic patterns more than cultural or institutional ones. The year-wise error profile is consistent: ERP MAPE peaks at 8.92% in 2020 and declines to 5.18% by 2023 as cinema closures, delayed releases, and digital substitution stabilize [1,2,36], whereas SAP MAPE remains in a 4.76–5.91% band across all four test years because institutional indicators move more slowly and are partly buffered against macroeconomic shocks by statutory funding schemes, employment programs, and infrastructure investment policies [6,7]. Figure 4 plots actual versus predicted trajectories for six representative countries, showing that the model tracks both the pre-pandemic trend and the 2020 downturn.

6.2. Knowledge Graph Embedding Quality and Alternative Models

The RotatE embeddings used in KE-Mamba achieve a macro-average MRR of 0.387 and Hits@10 of 0.524 on the held-out link-prediction test set. Per-relation MRR is higher on the regular many-to-one relations BELONGS_TO (0.528) and PRODUCES (0.441), moderate on OPERATES_IN (0.404) and FUNDS (0.356), and lowest on the sparser long-tail relations CO_PRODUCED_BY (0.333), EXPORTS_TO (0.317), and DIRECTED_BY (0.305), which is consistent with the well-known difficulty of person- and bilateral-export edges in cultural-industry graphs. Although the overall MRR is moderate, the downstream FISI forecasting benefit of the KG stream is substantial: removing KG embeddings increases MAE by 18.5%,indicating that the embeddings’ value lies in encoding a country’s structural position rather than requiring perfect graph completion.

Table 8 compares RotatE with TransE and ComplEx; RotatE achieves the best combination of link-prediction quality and downstream FISI forecasting accuracy and is thus retained as the primary embedding model. Embedding-dimension sensitivity is reported jointly on the validation and test sets in Table 9. The link-prediction loss continues to decrease from 64 to 256 dimensions, but the downstream validation MAE attains its minimum at 128 dimensions (0.0397) and rises slightly at 256 (0.0405) even though the test MAE remains comparable; in other words, the 256-dimensional model fits the KG link structure better while showing the early signature of overfit on the FISI validation set. Selecting 128 dimensions therefore satisfies both the link-prediction quality criterion and the validation-loss minimum criterion.

6.3. Ablation Study

Table 10 reports an ablation study in which components are sequentially removed from the full KE-Mamba.

6.4. Feature Importance Analysis

TreeSHAP is used for XGBoost and KernelSHAP is used for KE-Mamba [39]. KernelSHAP is computed on the test-set predictions using a background set of 100 k-means-summarized training instances and 2048 sampled coalitions per explained instance; grouped SHAP values are obtained by summing absolute attributions within the 128-dimensional KG block and the 8-dimensional narrative block, while the remaining 34 dynamic input features are attributed individually. Table 11 lists the top 10 explicit features (18 FISI indicators, 8 macro covariates, 8 narrative scores) by mean absolute SHAP value for KE-Mamba; the 128 KG embedding dimensions are reported as a grouped “KG-embedding” contribution following the grouped-SHAP convention, accounting for 34.1% of total SHAP mass, which is consistent with the ablation finding that removing the KG stream produces the largest accuracy degradation. Linear probes from the country embeddings to per-relation exposure summaries further decompose this grouped KG-SHAP mass: PRODUCES, CO_PRODUCED_BY, and EXPORTS_TO jointly account for 69.2% of the KG-SHAP contribution (29.9%, 22.0%, and 17.3% within the KG group, respectively) with FUNDS (12.3%) and OPERATES_IN (8.8%) playing a secondary role and DIRECTED_BY (6.7%) and BELONGS_TO (2.9%) being smallest; Figure 5e shows the relation-level breakdown. The pattern is consistent with the substantive interpretation that domestic production capacity, international co-production networks, and cross-border export reach are the primary structural determinants of film-industry sustainability captured by the knowledge graph. Figure 5 shows the SHAP summary and dependence patterns.

6.5. FISI Aggregation and Indicator Robustness

Because the composite FISI involves choices in indicator normalization and pillar aggregation, Table 12 reports on the robustness of the headline forecast to five alternative constructions. Across schemes, the resulting indices remain highly correlated with the baseline geometric mean (Pearson

\geq 0.964

, Kendall

τ \geq 0.913

), and the KE-Mamba test MAE varies only between 0.0389 and 0.0402, which is well within the seed-to-seed standard deviation of 0.0015. Two further perturbations support the same conclusion: dropping one indicator from each pair with within-pillar

r > 0.80

leaves the test MAE at 0.0401 (correlation 0.955), and shrinkage-weighted indicators that penalize within-pillar correlation give 0.0397 (correlation 0.962), so the framework is not sensitive to any single weighting or to mechanical redundancy among constituent indicators.

6.6. Country Clustering and Sustainability Archetypes

Spectral clustering is applied to the 42-country panel, using mean pillar scores together with annualized 2005–2023 pillar growth rates as input features rather than the composite FISI alone. Because growth trajectory and structural composition carry as much weight as absolute level, small high-income markets with saturated growth (e.g., Ireland, Austria, New Zealand) can appear closer to the Emerging–Dynamic centroid than to the Mature–Diverse one, as their pillar-growth profiles resemble those of catching-up economies. A four-cluster solution is selected from both the silhouette score (0.41) and the gap statistic (peak at

k = 4

); clustering uses a k-NN affinity graph (

k = 7

) and is stable under bootstrap resampling with 38 of 42 countries retaining their assignment in ≥95% of 500 replicates. Table 13 characterizes the four archetypes, Figure 6 shows the spectral projection, and the full country listing is in Appendix A.

6.7. Associative Policy Stress Tests

These simulations are not causal estimates. They are model-based policy stress tests that perturb one feature at a time while holding the other 2023 features fixed, thereby measuring the KE-Mamba model’s learned associative sensitivity. The results should be interpreted as projected associations useful for scenario prioritization—not as evidence that the same change would be produced by an implemented policy in the absence of further causal identification.

Using the trained KE-Mamba, specified feature increases are applied across all 42 countries and the projected

Δ

FISI relative to baseline is recorded, holding all other features at their observed 2023 values. Table 14 reports the mean

Δ

FISI by intervention and cluster.

Note on placebo validation: As a minimum diagnostic against spurious patterns, a placebo test randomly reassigns intervention features within clusters 1000 times, keeping the intervention magnitude fixed, and compares the observed projected

Δ

FISI with the resulting placebo distribution; Figure 2d visualizes the five observed effects against their placebo distributions. The observed projected changes lie above the 95th percentile of every placebo distribution (digital distribution

+ 0.0297

vs. placebo mean

0.0031

, 99.6th pct.,

p < 0.001

; female director share

+ 0.0131

,

p = 0.012

; genre diversity

+ 0.0164

,

p = 0.004

; co-production rate

+ 0.0127

,

p = 0.018

; screen density

+ 0.0118

,

p = 0.025

), suggesting that the patterns are not merely artifacts of random reassignment, while still not constituting causal identification.

Across all clusters, a ten-percentage-point increase in digital distribution yields the largest projected FISI association; this is strongest in Developing–Fragile markets (

+ 0.041

) where the physical exhibition infrastructure is most constrained. Screen density expansion shows a similarly asymmetric pattern with modest projected associations in Mature–Diverse markets and substantial projected improvements in Developing–Fragile economies. Figure 7 visualizes these results, and Figure 8 shows that the SHAP ordering in Table 11 is stable across 2018–2023 rolling evaluation windows rather than an artifact of one year.

7. Discussion and Policy Implications

The ablation in Table 10 decomposes the KE-Mamba gain over baselines: removing the knowledge embeddings raises MAE by 18.5% (0.0389 to 0.0461), replacing the gated KG fusion layer with plain concatenation raises it by 9.8% (to 0.0427), removing the LLM narrative features raises it by 4.9% (to 0.0408), and further removing the knowledge editing step while keeping the raw LLM scores raises it by 2.6% (to 0.0399). The last two effects are smaller than the knowledge graph contribution but stable across seeds, indicating that micro-level narrative signals carry information complementary to the macro panel indicators and that anchoring the LLM to verified film records noticeably improves their reliability. The three mechanisms—structural KG embeddings, screenplay-oriented narrative proxies, and knowledge editing—combine to deliver the full accuracy, which is consistent with evidence from knowledge-enhanced macroeconomic forecasting [18].

The SHAP analysis points to three policy-relevant variables. Domestic market share ranks first, which is consistent with the long-standing emphasis in cultural economics on maintaining a viable local production base [3,40]. The digital distribution ratio ranks second, above genre diversity and GDP per capita, suggesting that the shift to digital channels is not merely a commercial trend but a structural determinant of sustainability; the positive interaction with lower GDP per capita indicates that digital channels can partially bypass the capital constraints, limiting physical exhibition in lower-income countries.

The four-cluster typology supports differentiated policy approaches. In Mature–Diverse markets (e.g., France, South Korea, the UK), the associative stress tests suggest limited returns from broad interventions, so targeted efforts on gender representation and defense of domestic share against global streaming platforms are more relevant. Emerging–Dynamic markets (e.g., China, India, Italy, Spain) stand to gain the most from accelerating digital distribution while investing in genre diversification. In State-Regulated markets the Cultural Diversity Pillar lags, pointing to content and distribution regulations as the binding constraint. Developing–Fragile markets face a compounding problem in which low economic resilience limits investment in cultural diversity and access, and international cooperation may be needed to break this cycle. The nonlinear (inverted-U) relationship between co-production rate and sustainability also suggests that co-production incentive programs should include graduated support rewarding collaboration without creating dependence on foreign partners.

At the country-year level, narrative dimensions should not be interpreted as judgments of national artistic value. They are aggregate proxies for the kinds of films that enter internationally visible metadata channels. For example, a high cultural-specificity score indicates that retained titles from a country-year contain more localized setting, social context, or culturally specific plot elements in their synopses; it does not imply that the entire national cinema is culturally specific or that non-retained domestic titles lack such features. Similarly, thematic depth captures the density of stated social, moral, or political themes in available synopses. These variables are therefore used as weak content signals for forecasting rather than as normative rankings of film quality.

Residual hallucination risk remains even after factual editing. The editing protocol corrects verifiable title-level attributes such as country, year, and genre, but qualitative dimensions such as thematic depth, cultural specificity, and projected audience reach are evaluative rather than factual. They may therefore retain systematic LLM biases, including Western-centric assumptions about narrative structure, genre coherence, and audience reach. The human-validation results reduce but do not eliminate this concern. For this reason, the narrative features are treated as weak aggregate signals and are interpreted together with coverage diagnostics, cross-model agreement, and human-rating correlations.

Ethically, the framework should not be used to rank the cultural worth of national cinemas. A low narrative score may reflect sparse English metadata, limited international distribution, or LLM training-data imbalance rather than lower artistic quality. Future deployments should incorporate multilingual synopses, local-language models, culturally diverse expert panels, and uncertainty intervals for narrative scores—especially in developing or non-English markets.

The full pipeline is more complex than a conventional panel model because it requires KG construction, LLM scoring, and model editing, but the most expensive steps are one-off or annual preprocessing tasks. The dominant cost is the LLM narrative scoring of the 12,438-title sample (5.6 GPU h one-off, 18–35 min annual incremental); KG construction (7 CPU min) and RotatE embedding training (16 GPU min) are negligible by comparison, and knowledge editing of the 4127 failed-probe titles adds only 0.9 GPU h. Once features are cached, KE-Mamba training takes 53 s, and inference for all 42 countries takes 0.19 s on a single A100, making the framework practical for annual dashboards. For policy institutions with limited computational resources, a simplified deployment version without LLM scoring and KG retraining runs end-to-end in under 10 s on a central processing unit (CPU) at the cost of higher MAE (0.0461 without KG; see Table 10).

Several limitations should be acknowledged. The scenario analysis is based on observational correlations learned by the forecasting model. It cannot separate policy-induced variation from selection effects, omitted institutions, or reverse causality. The placebo test only checks whether the learned sensitivity pattern is stronger than random reassignment; it does not establish identification. Stronger causal claims would require instruments, staggered policy variation, natural experiments, or synthetic-control designs around clearly dated policy changes. The FISI involves subjective choices in indicator selection and weighting; sensitivity analyses with alternative aggregation schemes (arithmetic, PCA, Delphi, winsorized z-score) are reported in Section 6 (Table 12) and indicate that the headline forecast is robust to these choices. The 42-country sample excludes smaller markets without consistent time series, and linear interpolation for short within-country gaps may understate short-run volatility. Four limitations concern the LLM narrative stream specifically: (i) the pipeline relies on English-language TMDb synopses, biasing scores toward internationally distributed titles rather than the domestic long tail; (ii) residual hallucinations remain possible for very recent releases absent from the base model’s pre-training corpus, and the editing protocol touches only the 4127 titles that failed the factual probe; (iii) scores on dimensions such as “dialogue quality” are inferred from synopses rather than full scripts and should be read as coarse proxies rather than substitutes for expert reader judgments; and (iv) the knowledge editing step delivers a modest 2.6% MAE reduction, so it is treated as a cheap safeguard rather than an essential component—one that plausibly becomes more valuable as the sample extends into years less covered by the base model’s parametric knowledge.

Future work should extend the narrative stream beyond synopses by incorporating trailers, posters, audience reviews, festival selections, and streaming-platform engagement data. Multimodal signals could improve the measurement of audience reach and cultural specificity, while real-time platform data could support nowcasting rather than annual retrospective forecasting.

8. Conclusions

This paper introduced the Film Industry Sustainability Index—a composite of 18 indicators spanning cultural diversity, economic resilience, and SDG alignment—and applied it to a balanced panel of 42 countries over 2005–2023. The Screenplay-Aware Knowledge-Enhanced Mamba architecture, integrating knowledge graph embeddings, LLM-derived screenplay-oriented narrative proxies, and a knowledge editing step that anchors the LLM to verified UNESCO and LUMIERE records, achieved the best forecast accuracy among all tested models (

R^{2} = 0.934

on 2020–2023). The ablation shows that both the narrative stream and the editing step contribute measurable gains over a macro-only baseline, supporting the value of coupling macro panel signals with micro-level narrative signals. Domestic market share, digital distribution ratio, and genre diversity emerged as the three strongest explicit predictors, jointly accounting for roughly 31% of the total SHAP mass, with the grouped KG embedding contributing a further 34%. The four-cluster typology and associative policy stress tests translate the forecasting results into differentiated policy hypotheses that can guide further institutional analysis, but they should not be interpreted as causal policy effects. Future work should combine the proposed forecasting framework with multilingual and multimodal cultural data, real-time platform indicators, and causal-inference designs such as instrumental variables, staggered policy evaluation, or synthetic controls.

Author Contributions

Conceptualization, P.Q. and W.Z.; methodology, P.Q. and W.Z.; software, P.Q. and W.Z.; validation, P.Q. and W.Z.; formal analysis, P.Q.; investigation, P.Q. and W.Z.; data curation, P.Q. and W.Z.; writing—original draft preparation, P.Q. and W.Z.; writing—review and editing, P.Q. and W.Z.; visualization, P.Q. and W.Z.; supervision, W.Z.; project administration, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The analysis draws on public statistical sources and third-party industry datasets documented in Appendix E. The principal public sources include the UNESCO Institute for Statistics Feature Films and Cinema Database (https://data.uis.unesco.org, accessed on 1 April 2026), the European Audiovisual Observatory LUMIERE film-admissions database (https://lumiere.obs.coe.int, accessed on 1 April 2026), the World Bank World Development Indicators (https://databank.worldbank.org/source/world-development-indicators, accessed on 1 April 2026), the OECD Culture and Creative Economy database (https://www.oecd.org/en/topics/culture-creative-industries-and-sports.html, accessed on 1 April 2026), and the UNDP Human Development Report (https://hdr.undp.org, accessed on 1 April 2026); title-level metadata are obtained from The Movie Database (https://www.themoviedb.org, accessed on 1 April 2026). Additional third-party industry datasets used for selected indicators—including MPA THEME, the PwC Global Entertainment and Media Outlook, McKinsey reports, Omdia, Ampere Analysis, national box-office trackers, ILO-STAT, and IMDb—are accessed under their respective licensing terms and are listed alongside the corresponding indicators in Appendix E. The 18 FISI component indicators are reconstructed from these sources using the rules in Appendix E; the screenplay scoring prompt and rubric anchors are reported in Appendix D; hyperparameter search ranges and seed-level results are reported in Appendix C and Appendix F. The source code for the KE-Mamba model, the FISI construction pipeline, the knowledge-graph extraction, and the LLM scoring and editing modules is available from the corresponding author upon reasonable request subject to third-party licensing terms on the commercial and metadata-content sources.

Acknowledgments

The authors acknowledge the data providers: the UNESCO Institute for Statistics, European Audiovisual Observatory, World Bank, and OECD, for making their film industry and macroeconomic data publicly accessible.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ARIMA	Autoregressive Integrated Moving Average
CDP	Cultural Diversity Pillar
DDR	Digital Distribution Ratio
DM	Diebold–Mariano
EAO	European Audiovisual Observatory
ERP	Economic Resilience Pillar
FISI	Film Industry Sustainability Index
GDI	Genre Diversity Index
GELU	Gaussian Error Linear Unit
GPU	Graphics Processing Unit
HDR	Human Development Report
HHI	Herfindahl–Hirschman Index
KE	Knowledge Editing/Knowledge-Enhanced
KG	Knowledge Graph
LLM	Large Language Model
LSTM	Long Short-Term Memory
LUMIERE	European Audiovisual Observatory film-admissions database
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MEMIT	Mass-Editing Memory in a Transformer
ML	Machine Learning
MLP	Multilayer Perceptron
OECD	Organisation for Economic Co-operation and Development
RMSE	Root Mean Squared Error
ROME	Rank-One Model Editing
RotatE	Relation-as-Rotation Knowledge Graph Embedding
SAP	SDG Alignment Pillar
SDG	Sustainable Development Goal
SHAP	SHapley Additive exPlanations
SSM	State-Space Model
TMDb	The Movie Database
UNESCO	United Nations Educational, Scientific and Cultural Organization
UIS	UNESCO Institute for Statistics
WDI	World Development Indicators

Appendix A. Country List and Cluster Assignments

Table A1. Complete list of 42 countries with cluster assignments and 2023 FISI scores.

Country	Cluster	FISI (2023)	CDP	ERP	SAP
France	Mature–Diverse	0.78	0.82	0.71	0.76
South Korea	Mature–Diverse	0.76	0.79	0.73	0.74
United Kingdom	Mature–Diverse	0.75	0.76	0.72	0.75
Germany	Mature–Diverse	0.74	0.77	0.69	0.73
Japan	Mature–Diverse	0.73	0.74	0.71	0.72
Sweden	Mature–Diverse	0.72	0.75	0.68	0.71
Denmark	Mature–Diverse	0.71	0.73	0.67	0.72
Canada	Mature–Diverse	0.71	0.72	0.69	0.70
United States	Mature–Diverse	0.71	0.68	0.74	0.69
Australia	Mature–Diverse	0.70	0.71	0.68	0.69
Norway	Mature–Diverse	0.70	0.73	0.66	0.70
Netherlands	Mature–Diverse	0.69	0.72	0.65	0.68
Belgium	Mature–Diverse	0.66	0.68	0.63	0.64
Switzerland	Mature–Diverse	0.65	0.67	0.63	0.62
China	Emerging–Dynamic	0.58	0.61	0.57	0.52
India	Emerging–Dynamic	0.56	0.63	0.48	0.51
Spain	Emerging–Dynamic	0.54	0.57	0.51	0.50
Italy	Emerging–Dynamic	0.53	0.56	0.49	0.51
Poland	Emerging–Dynamic	0.52	0.54	0.50	0.49
Czech Republic	Emerging–Dynamic	0.51	0.53	0.49	0.48
Hungary	Emerging–Dynamic	0.50	0.52	0.48	0.47
Turkey	Emerging–Dynamic	0.49	0.52	0.47	0.44
Ireland	Emerging–Dynamic	0.48	0.51	0.44	0.46
Argentina	Emerging–Dynamic	0.48	0.51	0.44	0.45
New Zealand	Emerging–Dynamic	0.47	0.50	0.43	0.45
Austria	Emerging–Dynamic	0.47	0.49	0.44	0.45
Brazil	Emerging–Dynamic	0.47	0.49	0.44	0.46
Mexico	Emerging–Dynamic	0.46	0.48	0.43	0.44
Chile	Emerging–Dynamic	0.45	0.47	0.42	0.43
Russia	State-Regulated	0.48	0.42	0.52	0.47
Israel	State-Regulated	0.51	0.48	0.52	0.50
Singapore	State-Regulated	0.50	0.44	0.54	0.49
UAE	State-Regulated	0.47	0.41	0.53	0.44
Thailand	State-Regulated	0.43	0.39	0.46	0.42
Malaysia	State-Regulated	0.42	0.38	0.45	0.41
Indonesia	State-Regulated	0.41	0.37	0.44	0.40
Egypt	State-Regulated	0.38	0.34	0.41	0.37
Romania	Developing–Fragile	0.35	0.37	0.31	0.34
Colombia	Developing–Fragile	0.33	0.36	0.28	0.32
South Africa	Developing–Fragile	0.32	0.35	0.27	0.31
Nigeria	Developing–Fragile	0.29	0.32	0.22	0.28
Philippines	Developing–Fragile	0.28	0.31	0.23	0.27

Appendix B. Countries Excluded from the Balanced Panel

Seven candidate countries are excluded from the final balanced panel because at least one required FISI component series contains a continuous gap exceeding three years. Table A2 reports each excluded country, the longest continuous gap (in years), and the affected indicators.

Table A2. Candidate countries excluded from the balanced panel.

Excluded Country	Main Reason for Exclusion	Longest Gap (Years)	Affected Indicators
Greece	Incomplete public funding and regional screen-access series	5	Public funding accessibility; screen access Gini
Portugal	Missing annual youth film-employment and audience equity data	4	Youth employment ratio; audience reach equity
Peru	Discontinuous domestic market-share and export-revenue series	6	Domestic market share; export revenue share
Morocco	Missing digital distribution and subnational access measures	5	Digital distribution ratio; screen access Gini
Pakistan	Incomplete production and box-office coverage before 2012	7	Production volume trend; box-office concentration
Vietnam	Inconsistent cinema-admission and co-production records	5	Co-production rate; audience reach equity
Ukraine	Territorial and reporting discontinuities after 2014	4	Box-office revenue; screen density; export revenue

Appendix C. Hyperparameter Sensitivity Analysis

This appendix reports the results of hyperparameter sensitivity analysis for the KE-Mamba model. Forecasting performance was examined along four key hyperparameters—the Mamba state dimension, the number of Mamba layers, the lookback window length, and the learning rate—by varying each in turn while holding the others at their default settings and evaluating on the validation set.

Table A3. Hyperparameter configuration for the final KE-Mamba model.

Hyperparameter	Search Space	Optimal Value
Mamba state dimension	{32, 64, 128, 256, 512}	128
Number of Mamba layers	{1, 2, 3, 4, 5}	2
Lookback window (years)	{3, 4, 5, 6, 7}	5
Learning rate	{0.01, 0.005, 0.001, 0.0005, 0.0001}	0.001
Batch size	{16, 32, 64}	32
Dropout rate	{0.0, 0.1, 0.2, 0.3}	0.1
KG embedding dimension	{64, 128, 256}	128
Gated KG fusion hidden dim	{64, 128, 256}	128
Weight decay	{0.0, 0.0001, 0.001}	0.0001
Early stopping patience	{10, 15, 20, 30}	20

Figure A1. Hyperparameter sensitivity analysis for the KE-Mamba model. (a) Sensitivity to the Mamba state dimension; (b) to the number of Mamba layers; (c) to the lookback-window length; (d) to the learning rate (validation MAE and

R^{2}

); (e) validation-loss curves across training epochs for representative learning rates.

Figure A1. Hyperparameter sensitivity analysis for the KE-Mamba model. (a) Sensitivity to the Mamba state dimension; (b) to the number of Mamba layers; (c) to the lookback-window length; (d) to the learning rate (validation MAE and

R^{2}

); (e) validation-loss curves across training epochs for representative learning rates.

Appendix D. LLM Narrative Scoring Prompt and Rubric

The eight narrative-quality scores are obtained by prompting Llama-3-8B-Instruct (and, for the robustness check, Qwen2.5-7B-Instruct) with the template below. The model is instructed to return a JSON object with integer scores from 0 to 10 along each dimension, which are then linearly rescaled to

[0, 1]

.

You are an experienced film development reader. Given the following film information, rate the film on a 0–10 integer scale along eight narrative-quality dimensions. Use the full scale and avoid defaulting to the middle. Return only a JSON object with keys: structure, pacing, character, theme, dialogue, genre_coherence, cultural_specificity, audience_reach.

TITLE: {title}

YEAR: {year}

COUNTRY: {country}

GENRE TAGS: {genre_tags}

SYNOPSIS: {synopsis}

Rubric anchors (abbreviated; full rubric provided below in this appendix):

0–2 = severe deficiency; 3–4 = noticeably weak; 5–6 = competent; 7–8 = strong; 9–10 = exemplary.

For each of the eight dimensions, a paragraph-length anchor description is appended to the prompt at evaluation time (structure: three-act integrity and causal chain; pacing: scene-length rhythm and dead zones; character: motivation, agency, and arc; theme: depth and consistency; dialogue: voice differentiation and subtext, inferred from synopsis and critic excerpts; genre coherence: consistency with declared genre; cultural specificity: localized detail versus generic setting; audience reach: breadth of plausible demographic appeal). Inter-run agreement across three independent samples at temperature 0.3 exceeds Krippendorff’s

α = 0.81

on a 200-film calibration subset.

Appendix E. Indicator Construction and Data Sources

Table A4. Construction rules and primary sources for the 18 FISI component indicators.

Indicator	Primary Source	Construction/Proxy Rule
Domestic market share	UNESCO UIS; national film boards	Domestic box office/total box office.
Genre diversity (Shannon)	UNESCO UIS; TMDb (genre tags)	Shannon entropy of annual genre shares, normalized by $\ln (12)$ .
International co-production	LUMIERE; UNESCO UIS	Share of films with at least one foreign co-producer.
Female director share	EAO; national film boards; IMDb	Share of films with at least one credited female director.
Language diversity	UNESCO UIS; LUMIERE	Shannon entropy of original-language distribution of domestic releases.
Independent production share	LUMIERE; national registries	Share of films whose lead producer is not among the top 20 national producers by 5-year revenue.
Revenue growth stability	MPA; PwC; World Bank WDI	Inverse of the five-year rolling coefficient of variation of real box office revenue.
Screen density growth	UNESCO UIS; MPA	Five-year compound annual growth of screens per million population.
Digital distribution ratio	MPA; PwC; Omdia	Share of total film revenue from SVOD, TVOD, and digital rental/purchase.
Export revenue share	LUMIERE; UIS; national agencies	Foreign admissions revenue/total admissions revenue of domestic films.
Production volume trend	UNESCO UIS	Trend coefficient of a log-linear regression on annual feature production count over a five-year window.
Box office concentration	national box office trackers	$1 - H H I$ on the top-25 titles’ market shares.
Film employment share	ILO-STAT; OECD Culture	ISIC 5911/5912 employment/total employment, proxied by NACE J59.1 for EU countries.
Screen access Gini (inverted)	UNESCO UIS; national censuses	$1 - G$ where G is the Gini of screens per capita across subnational regions.
Urban–rural infrastructure ratio	UNESCO UIS; World Bank WDI	Urban screens per capita divided by rural screens per capita (higher is more unequal; enters with negative direction).
Youth employment ratio in film	ILO-STAT; national LFS	Share of film-sector workers aged 15–29, proxied by the 15–34 band for countries without finer granularity.
Public funding accessibility	OECD Culture; national film boards	Per-capita public film funding divided by the number of active applicants, rescaled cross-sectionally; a 3-year mean is used where annual data are missing.
Audience reach equity	Omdia; Ampere Analysis	$1 - G$ of per-capita cinema attendance across subnational regions, with SVOD subscribers used as a supplement where theatrical data are sparse.

Several indicators (public funding accessibility, audience reach equity, youth employment ratio in film) are constructed rather than directly downloaded, and these rest on proxy rules that are documented here explicitly so that the results can be reproduced or contested on their own terms.

Appendix F. Seed-Level Results and Reproducibility

Table A5. Composite FISI MAE on the 2020–2023 test set across five random seeds.

Model	Seed 0	Seed 1	Seed 2	Seed 3	Seed 4	Mean	Std
LSTM	0.0491	0.0504	0.0497	0.0489	0.0509	0.0498	0.00086
Mamba (base)	0.0455	0.0468	0.0459	0.0457	0.0466	0.0461	0.00057
KE-Mamba (ours)	0.0384	0.0392	0.0387	0.0391	0.0391	0.0389	0.00035

All neural model results reported in Section 6 are means over the five seeds tabulated above; the seed-to-seed standard deviation of the KE-Mamba composite FISI MAE is 0.00035, well below 1% of the mean, and the seed ordering is stable across the LSTM and Mamba baselines as well.

References

PricewaterhouseCoopers. Global Entertainment and Media Outlook 2024–2028; Technical Report; PwC: London, UK, 2024. [Google Scholar]
Motion Picture Association. THEME Report: A Comprehensive Analysis and Survey of the Theatrical and Home/Mobile Entertainment Market Environment; Technical Report; MPA: Washington, DC, USA, 2024. [Google Scholar]
Throsby, D. Economics and Culture; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
UNESCO. Convention on the Protection and Promotion of the Diversity of Cultural Expressions; UNESCO: Paris, France, 2005. [Google Scholar]
United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations General Assembly Resolution A/RES/70/1; United Nations: New York, NY, USA, 2015. [Google Scholar]
UNESCO. Re|Shaping Policies for Creativity: Addressing Culture as a Global Public Good; Technical Report; UNESCO: Paris, France, 2022. [Google Scholar]
OECD. The Culture Fix: Creative People, Places and Industries; Technical Report; OECD Publishing: Paris, France, 2022. [Google Scholar] [CrossRef]
Vinuesa, R.; Azizpour, H.; Leite, I.; Balaam, M.; Dignum, V.; Domisch, S.; Fellander, A.; Langhans, S.; Tegmark, M.; Fuso Nerini, F. The Role of Artificial Intelligence in Achieving the Sustainable Development Goals. Nat. Commun. 2020, 11, 233. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Rabbi, M.; Wang, Y. A machine learning framework for forecasting multidimensional sustainability and informing integrated policy thresholds in the EU. Environ. Dev. Sustain. 2025, 1–56. [Google Scholar] [CrossRef]
Tang, S. The Box Office Prediction Model Based on the Optimized XGBoost Algorithm in the Context of Film Marketing and Distribution. PLoS ONE 2024, 19, e0309227. [Google Scholar] [CrossRef] [PubMed]
Zemaityte, V.; Karjus, A.; Rohn, U.; Schich, M.; Ibrus, I. Quantifying the Global Film Festival Circuit: Networks, Diversity, and Public Value Creation. PLoS ONE 2024, 19, e0297404. [Google Scholar] [CrossRef] [PubMed]
Dadlani, A.; Vo, V.; Khemka, A.; Harvey, S.T.; Kantoro Kyzy, A.; Jones, P.; Verhoeven, D. Leading by the nodes: A survey of film industry network analysis and datasets. Appl. Netw. Sci. 2024, 9, 76. [Google Scholar] [CrossRef] [PubMed]
Hofstede, G. Culture’s Consequences: International Differences in Work-Related Values, 2nd ed.; Sage Publications: Thousand Oaks, CA, USA, 2001. [Google Scholar]
De Vany, A.; Walls, W. Motion Picture Profit, the Stable Paretian Hypothesis, and the Curse of the Superstar. J. Econ. Dyn. Control. 2004, 28, 1035–1057. [Google Scholar] [CrossRef]
Einav, L. Seasonality in the U.S. Motion Picture Industry. RAND J. Econ. 2007, 38, 127–145. [Google Scholar] [CrossRef]
Rosen, S. The Economics of Superstars. Am. Econ. Rev. 1981, 71, 845–858. [Google Scholar]
Gurel, E. AI-driven Experiences in Cultural and Creative Industries: A Review of Literature and Development of a Multifaceted Framework. Serv. Ind. J. 2026, 46, 583–622. [Google Scholar] [CrossRef]
Tilly, S.; Livan, G. Macroeconomic Forecasting with Statistically Validated Knowledge Graphs. Expert Syst. Appl. 2022, 186, 115765. [Google Scholar] [CrossRef]
Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Mirowski, P.; Mathewson, K.W.; Pittman, J.; Evans, R. Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023. [Google Scholar] [CrossRef]
Meng, K.; Bau, D.; Andonian, A.; Belinkov, Y. Locating and Editing Factual Associations in GPT. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates Inc.: Red Hook, NY, USA, 2022; Volume 35. [Google Scholar]
Yao, Y.; Wang, P.; Tian, B.; Cheng, S.; Deng, Z.; Zhang, H.; Chen, H.; Zhang, N. Editing Large Language Models: Problems, Methods, and Opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 6–10 December 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 10222–10240. [Google Scholar] [CrossRef]
Hang, C.N.; Yu, P.D.; Tan, C.W. TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking. IEEE Trans. Artif. Intell. 2025, 6, 3148–3162. [Google Scholar] [CrossRef]
Kolli, S.; Rosenbaum, R.; Cavelius, T.; Strothe, L.; Lata, A.; Diesner, J. Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification. In Proceedings of the 9th Widening NLP Workshop, Suzhou, China, 8 November 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 106–115. [Google Scholar] [CrossRef]
Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space. Algorithms 2025, 18, 735. [Google Scholar] [CrossRef]
Gu, A.; Dao, T. Mamba: Linear-time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
Cai, X.; Zhu, Y.; Wang, X.; Yao, Y. MambaTS: Improved Selective State Space Models for Long-Term Time Series Forecasting. arXiv 2024, arXiv:2405.07992. [Google Scholar]
Bhethanabhotla, S.; Swelam, O.; Siems, J.; Salinas, D.; Hutter, F. Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models. In Proceedings of the NeurIPS 2024 Workshop on Time Series in the Age of Large Models, Vancouver, BC, Canada, 15 December 2024. [Google Scholar]
Patro, B.N.; Agneeswaran, V.S. SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series. arXiv 2024, arXiv:2403.15360. [Google Scholar]
Wang, Z.; Kong, F.; Feng, S.; Wang, M.; Yang, X.; Zhao, H.; Wang, D.; Zhang, Y. Is Mamba Effective for Time Series Forecasting? Neurocomputing 2024, 597, 129178. [Google Scholar] [CrossRef]
Hosseini, E.; Rajabipoor Meybodi, A. Proposing a Model for Sustainable Development of Creative Industries Based on Digital Transformation. Sustainability 2023, 15, 11451. [Google Scholar] [CrossRef]
UNESCO Institute for Statistics. Feature Films and Cinema Data. Available online: https://uis.unesco.org (accessed on 1 April 2026).
European Audiovisual Observatory. LUMIERE Database: Admissions of Films Released in Europe. Available online: https://lumiere.obs.coe.int (accessed on 1 April 2026).
World Bank. World Development Indicators. Available online: https://databank.worldbank.org (accessed on 1 April 2026).
OECD. Culture and the Creative Economy. Available online: https://www.oecd.org/en/topics/culture-creative-industries-and-sports.html (accessed on 1 April 2026).
McKinsey & Company and Business of Fashion. The State of Fashion 2024; Technical Report; McKinsey & Company: New York, NY, USA, 2024. [Google Scholar]
United Nations Development Programme. Human Development Report 2023–2024; Technical Report; UNDP: New York, NY, USA, 2024. [Google Scholar]
Diebold, F.; Mariano, R. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
Lorenzen, M.; Taeube, F. Breakout from Bollywood? The Roles of Social Networks and Regulation in the Evolution of Indian Film Industry. J. Int. Manag. 2008, 14, 286–299. [Google Scholar] [CrossRef]

Figure 1. Proposed research framework illustrating the four sequential stages from data collection through knowledge graph embedding, model training, and policy-oriented output analysis.

Figure 2. Validation evidence supporting the KE-Mamba framework. (a) English-synopsis retention rates and mean narrative scores across the four country income groups with the inverse-coverage reweighted aggregation shown for comparison; coverage is lowest in lower-middle and developing–fragile markets, but the reweighted scores remain within 0.018 of the baseline aggregation. (b) Title-level factual probe accuracy before and after ROME/MEMIT-style knowledge editing for country, year, and genre attributes on the full 12,438-title sample with neighborhood locality reported in the upper-right annotation. (c) Pearson correlation between Llama-3-8B-Instruct narrative scores and three independent human raters across the eight scoring dimensions on a stratified 200-film validation subset with cross-model agreement against Qwen2.5-7B-Instruct shown for reference. (d) Placebo distributions over 1000 within-cluster random intervention reassignments for the five policy stress-test scenarios; the observed projected

Δ

FISI lies in the upper tail of every distribution (all empirical

p < 0.05

; digital distribution

p < 0.001

).

Figure 2. Validation evidence supporting the KE-Mamba framework. (a) English-synopsis retention rates and mean narrative scores across the four country income groups with the inverse-coverage reweighted aggregation shown for comparison; coverage is lowest in lower-middle and developing–fragile markets, but the reweighted scores remain within 0.018 of the baseline aggregation. (b) Title-level factual probe accuracy before and after ROME/MEMIT-style knowledge editing for country, year, and genre attributes on the full 12,438-title sample with neighborhood locality reported in the upper-right annotation. (c) Pearson correlation between Llama-3-8B-Instruct narrative scores and three independent human raters across the eight scoring dimensions on a stratified 200-film validation subset with cross-model agreement against Qwen2.5-7B-Instruct shown for reference. (d) Placebo distributions over 1000 within-cluster random intervention reassignments for the five policy stress-test scenarios; the observed projected

Δ

FISI lies in the upper tail of every distribution (all empirical

p < 0.05

; digital distribution

p < 0.001

).

Figure 3. Temporal evolution of Film Industry Sustainability Index and pillar scores across country income groups, 2005–2023. Panels show the (a) composite FISI and the (b) Cultural Diversity, (c) Economic Resilience, and (d) SDG Alignment pillars. The grey dashed vertical line marks the 2020 COVID-19 onset, and the colored shaded bands denote the within-group range across countries in each income group (High-Income, Upper-Middle, Lower-Middle, Developing).

Figure 4. Actual versus predicted FISI trajectories for six representative countries, 2020–2023. Panels (a–f) correspond to South Korea, the United States, France, India, Nigeria, and Brazil, respectively. The solid black line is the actual FISI; the blue line is the KE-Mamba forecast and the shaded blue band is its 90% confidence interval, with LSTM (dashed) and ARIMA (dotted) shown for comparison.

Figure 5. SHAP feature importance and dependence analysis for the KE-Mamba model. (a) Mean |SHAP value| ranking of the leading explicit features, separating standard inputs from the grouped KG-embedding dimensions; (b) SHAP summary (beeswarm) plot showing the magnitude and direction of each feature’s contribution, colored by feature value; (c) SHAP dependence plot for the international co-production rate, showing the inverted-U relationship that peaks near a 22% co-production rate; (d) SHAP interaction plot for the digital distribution ratio, split by below- and above-median GDP per capita. Panel (e) decomposes the grouped 128-dimensional KG-embedding SHAP contribution by relation type, using linear probes from the country embeddings to relation-exposure summaries; PRODUCES, CO_PRODUCED_BY, and EXPORTS_TO jointly account for 69.2% of the KG-SHAP mass.

Figure 6. Country clustering and sustainability mapping based on spectral clustering of FISI pillar scores. (a) Spectral projection of the 42 countries onto the first two embedding dimensions, colored by cluster; (b) geographic distribution of the sample colored by 2023 FISI; (c) radar chart of the mean Cultural Diversity, Economic Resilience, and SDG Alignment pillar scores for the four clusters.

Figure 7. Model-based associative policy stress-test results for five targeted interventions across the four country clusters. Bars show the projected change in the composite index (

Δ

FISI) and in the Cultural Diversity, Economic Resilience, and SDG Alignment pillars (

Δ

CDP,

Δ

ERP,

Δ

SAP). (a) Digital distribution

+ 10

pp; (b) female director share intervention; (c) domestic market share intervention; (d) international co-production intervention; (e) internet penetration

+ 20

pp, applied only to below-median-penetration countries, where the hatched grey N/A box marks Mature–Diverse markets to which the intervention does not apply because their penetration is already above the median.

Figure 7. Model-based associative policy stress-test results for five targeted interventions across the four country clusters. Bars show the projected change in the composite index (

Δ

FISI) and in the Cultural Diversity, Economic Resilience, and SDG Alignment pillars (

Δ

CDP,

Δ

ERP,

Δ

SAP). (a) Digital distribution

+ 10

pp; (b) female director share intervention; (c) domestic market share intervention; (d) international co-production intervention; (e) internet penetration

+ 20

pp, applied only to below-median-penetration countries, where the hatched grey N/A box marks Mature–Diverse markets to which the intervention does not apply because their penetration is already above the median.

Figure 8. Rolling-window model performance and feature importance dynamics, 2018–2023. (a) Rolling-window MAE by country cluster across successive test windows (2020–21 to 2022–23); (b) year-by-year importance-rank trajectories of the leading explicit features; (c) actual versus predicted FISI for the disrupted 2020 test year (

R^{2} = 0.891

); (d) actual versus predicted FISI for the stabilized 2023 test year (

R^{2} = 0.962

).

Figure 8. Rolling-window model performance and feature importance dynamics, 2018–2023. (a) Rolling-window MAE by country cluster across successive test windows (2020–21 to 2022–23); (b) year-by-year importance-rank trajectories of the leading explicit features; (c) actual versus predicted FISI for the disrupted 2020 test year (

R^{2} = 0.891

); (d) actual versus predicted FISI for the stabilized 2023 test year (

R^{2} = 0.962

).

Table 1. Mapping of film industry indicators to Sustainable Development Goal targets.

Film Industry Indicator	SDG Target	Pillar	Direction
Domestic film market share	8.3 (Productive employment)	Cultural Diversity	Positive
Genre diversity index (Shannon)	10.2 (Social inclusion)	Cultural Diversity	Positive
International co-production rate	17.6 (Knowledge sharing)	Cultural Diversity	Positive
Female director share	5.5 (Gender equality in leadership)	Cultural Diversity	Positive
Language diversity ratio	4.7 (Cultural literacy)	Cultural Diversity	Positive
Independent production share	10.2 (Social inclusion)	Cultural Diversity	Positive
Revenue growth stability	8.1 (Sustained economic growth)	Economic Resilience	Positive
Screen density growth	11.4 (Cultural heritage access)	Economic Resilience	Positive
Digital distribution ratio	12.2 (Efficient resource use)	Economic Resilience	Positive
Export revenue share	8.2 (Economic productivity)	Economic Resilience	Positive
Production volume trend	9.5 (Scientific research capacity)	Economic Resilience	Positive
Box office concentration (inverted HHI)	8.3 (Productive employment)	Economic Resilience	Positive
Film employment share	8.5 (Full employment, decent work)	SDG Alignment	Positive
Screen access Gini (inverted)	10.3 (Equal opportunity)	SDG Alignment	Positive
Urban–rural infrastructure ratio	11.a (Urban–rural linkages)	SDG Alignment	Negative
Youth employment ratio in film	8.6 (Youth employment)	SDG Alignment	Positive
Public funding accessibility index	16.6 (Accountable institutions)	SDG Alignment	Positive
Audience reach equity	10.2 (Social inclusion)	SDG Alignment	Positive

Table 2. Descriptive statistics of key variables (42 countries, 2005–2023).

Variable	Mean	Std. Dev.	Min	Max	Obs.
Box office revenue (million USD)	1847.3	3412.6	12.4	21,083.5	798
Films produced (annual)	168.4	241.7	8	1986	798
Domestic market share (%)	32.7	22.4	1.3	88.6	798
Genre diversity index (Shannon)	0.714	0.138	0.312	0.946	798
Screen density (per million pop.)	38.2	24.6	2.1	132.8	798
Digital distribution ratio (%)	41.3	28.7	0.0	89.4	798
Female director share (%)	18.6	8.4	2.1	34.9	798
International co-production rate (%)	14.8	11.2	0.4	52.3	798
GDP per capita (thousand USD)	28.4	19.7	1.8	87.3	798
Internet penetration (%)	68.3	22.1	8.7	98.2	798
FISI (composite)	0.518	0.162	0.127	0.871	798
Cultural Diversity Pillar	0.542	0.178	0.093	0.912	798
Economic Resilience Pillar	0.507	0.193	0.084	0.897	798
SDG Alignment Pillar	0.489	0.171	0.102	0.846	798

Table 3. Pairwise correlation matrix of selected variables.

	FISI	CDP	ERP	SAP	DMS	GDI	DDR	GDPpc
FISI	1.000
CDP	0.847	1.000
ERP	0.812	0.634	1.000
SAP	0.789	0.598	0.671	1.000
DMS	0.623	0.741	0.412	0.387	1.000
GDI	0.587	0.692	0.498	0.413	0.534	1.000
DDR	0.714	0.523	0.782	0.618	0.298	0.456	1.000
GDPpc	0.681	0.574	0.723	0.612	0.267	0.501	0.687	1.000

Note: FISI = Film Industry Sustainability Index; CDP = Cultural Diversity Pillar; ERP = Economic Resilience Pillar; SAP = SDG Alignment Pillar; DMS = Domestic Market Share; GDI = Genre Diversity Index; DDR = Digital Distribution Ratio; GDPpc = GDP per Capita. All correlations with

| r | > 0.15

are significant at the 1% level.

Table 4. Hypothesis–experiment alignment.

H	Claim Tested	Experiment/Table	Success Criterion	Validation Evidence
H1	KG-enhanced selective state-space forecasting improves accuracy	Main baseline comparison; KG ablation	KE-Mamba MAE lower than base Mamba, LSTM, XGBoost, FE panel	KE-Mamba MAE 0.0389 vs. base Mamba 0.0461 and FE-LDV 0.0585
H2	Narrative features add useful content signal	Remove LLM narrative features	MAE increases when removed	MAE 0.0408, +4.9%
H3	Knowledge editing improves factual reliability	Factual probe; no-edit ablation	Accuracy increases; downstream MAE decreases	All-fact accuracy 66.8% → 89.7%; MAE 0.0399 → 0.0389
H4	Model is not pandemic-specific	Pre-pandemic split 2018–2019	KE-Mamba remains best	MAE 0.0314, $R^{2}$ 0.952
H5	Projected associations exceed placebo patterns	Placebo intervention reassignment	Observed $Δ$ FISI exceeds placebo distribution	Digital distribution $+ 0.0297$ vs. placebo $+ 0.0031$ , $p < 0.001$

Table 5. Human expert validation of LLM narrative scores (

n = 200

films).

Table 5. Human expert validation of LLM narrative scores (

n = 200

films).

Narrative Dimension	Human $α$	Llama–Expert r	Llama–Expert $ρ$	MAE (0–10)	Qwen–Llama r
Narrative structure	0.79	0.78	0.75	0.83	0.90
Pacing	0.73	0.71	0.69	0.96	0.88
Character development	0.76	0.74	0.72	0.90	0.87
Thematic depth	0.78	0.76	0.73	0.88	0.89
Dialogue quality	0.68	0.62	0.59	1.12	0.82
Genre coherence	0.82	0.81	0.79	0.75	0.91
Cultural specificity	0.71	0.70	0.68	0.99	0.84
Projected audience reach	0.69	0.66	0.64	1.05	0.86
Average	0.75	0.72	0.70	0.94	0.87

Note: Dialogue quality and projected audience reach show the weakest LLM–expert agreement because they are inferred from synopses rather than full scripts or box-office records; genre coherence and narrative structure show the strongest agreement, which is consistent with synopsis text being most informative for genre- and structure-level signals.

Table 6. Out-of-sample forecasting performance on the composite FISI: pandemic-included test (Panel A) and pre-pandemic test (Panel B).

Model	MAE	RMSE	MAPE (%)	$R^{2}$	DM p-Value
Panel A: Train 2005–2017, validation 2018–2019, test 2020–2023 ( $n = 168$ )
ARIMA	0.0847	0.1123	12.34	0.724	<0.001
Two-way FE-LDV panel	0.0585	0.0782	8.44	0.865	<0.001
Random Forest	0.0612	0.0834	8.91	0.847	<0.001
XGBoost	0.0573	0.0791	8.42	0.863	<0.001
LSTM	0.0498	0.0687	7.15	0.897	<0.001
Mamba (base)	0.0461	0.0642	6.73	0.910	0.004
KE-Mamba (ours)	0.0389	0.0548	5.61	0.934	—
Panel B: Train 2005–2015, validation 2016–2017, test 2018–2019 ( $n = 84$ )
ARIMA	0.0602	0.0797	9.42	0.812	<0.001
Two-way FE-LDV panel	0.0436	0.0569	6.42	0.906	<0.001
Random Forest	0.0476	0.0619	7.22	0.887	<0.001
XGBoost	0.0448	0.0586	6.81	0.900	<0.001
LSTM	0.0391	0.0504	5.92	0.924	0.006
Mamba (base)	0.0367	0.0478	5.48	0.933	0.018
KE-Mamba (ours)	0.0314	0.0419	4.61	0.952	—

Note: DM p-value is the Diebold–Mariano test statistic comparing each model against KE-Mamba. All neural model results are means over five random seed initializations. The two-way fixed-effects panel model with lagged dependent variables (FE-LDV) uses the same 34-dimensional panel feature set and includes country and year fixed effects; it serves as an interpretable econometric baseline. Panel B confirms that the KE-Mamba ranking is not specific to the pandemic period.

Table 7. Pillar-level forecasting performance of the KE-Mamba model (test period: 2020–2023).

Target	MAE	RMSE	MAPE (%)	$R^{2}$
Composite FISI	0.0389	0.0548	5.61	0.934
Cultural Diversity Pillar	0.0412	0.0573	5.94	0.928
Economic Resilience Pillar	0.0467	0.0638	6.87	0.912
SDG Alignment Pillar	0.0358	0.0497	5.23	0.941

Table 8. KG embedding model comparison.

KG Embedding Model	Link MRR	Hits@10	Test MAE	Test RMSE	Test $R^{2}$
TransE	0.341	0.487	0.0408	0.0571	0.929
ComplEx	0.372	0.511	0.0397	0.0558	0.932
RotatE	0.387	0.524	0.0389	0.0548	0.934

Table 9. KG embedding dimension sensitivity (validation and test losses, 2020–2023 test split).

KG Dim	Link Val Loss	Link MRR	FISI Val MAE	FISI Test MAE	FISI Test $R^{2}$	Interpretation
64	0.0289	0.361	0.0421	0.0413	0.927	Under-parameterized
128	0.0267	0.387	0.0397	0.0389	0.934	Optimal validation/test balance
256	0.0279	0.401	0.0405	0.0394	0.933	Link quality up but FISI val MAE rises

Table 10. Ablation study results on the composite FISI forecast.

Configuration	MAE	RMSE	$R^{2}$	$Δ$ MAE (%)
Full KE-Mamba	0.0389	0.0548	0.934	—
w/o LLM narrative features	0.0408	0.0573	0.928	$+ 4.9$
w/o knowledge editing step	0.0399	0.0559	0.931	$+ 2.6$
w/o gated KG fusion (concat instead)	0.0427	0.0594	0.923	$+ 9.8$
w/o knowledge graph embeddings	0.0461	0.0642	0.910	$+ 18.5$
LSTM + gated KG fusion + KE	0.0452	0.0621	0.916	$+ 16.2$
KE-Mamba w/ TransE (vs. RotatE)	0.0408	0.0571	0.929	$+ 4.9$
KE-Mamba, KG embedding dim $= 64$	0.0413	0.0579	0.927	$+ 6.2$
KE-Mamba, KG embedding dim $= 256$	0.0394	0.0554	0.933	$+ 1.3$

Table 11. Top 10 explicit features by mean absolute SHAP value for the composite FISI forecast.

Rank	Feature	Mean $\| SHAP \|$	Share of $\| SHAP \|$ Mass (%)	Direction
1	Domestic film market share	0.037	12.1	Positive
2	Digital distribution ratio	0.031	10.3	Positive
3	Genre diversity index	0.028	8.7	Positive
4	GDP per capita	0.024	7.1	Positive
5	Screen density	0.019	5.4	Positive
6	Revenue growth stability	0.017	4.7	Positive
7	Internet penetration	0.015	4.2	Positive
8	Female director share	0.014	3.8	Positive
9	International co-production rate	0.012	2.9	Nonlinear
10	Films produced (annual)	0.009	2.3	Positive
–	KG embedding (group, 128 dims)	—	34.1	—
–	Narrative score (group, 8 dims)	—	4.4	Positive

Note: Shares are fractions of the total mean absolute SHAP mass over all 162 features of the KE-Mamba model (the 34-dimensional dynamic input together with the 128-dimensional static KG prior); the 128 knowledge graph embedding dimensions and the 8 LLM narrative scores are reported as grouped contributions following the grouped-SHAP convention.

Table 12. Sensitivity of KE-Mamba forecast to alternative FISI aggregation schemes.

FISI Construction	Pillar Weights/Rule	Corr. with Baseline	Kendall $τ$	Test MAE	Test $R^{2}$
Baseline geometric mean	CDP/ERP/SAP = 1/3 each	1.000	1.000	0.0389	0.934
Arithmetic pillar mean	Equal pillar weights	0.981	0.947	0.0395	0.930
Principal component analysis (PCA)-derived weights	CDP 0.38, ERP 0.34, SAP 0.28	0.964	0.913	0.0402	0.926
Delphi budget allocation	CDP 0.36, ERP 0.32, SAP 0.32	0.978	0.941	0.0391	0.933
Winsorized z-score normalization	Equal geometric mean	0.969	0.921	0.0400	0.927

Table 13. Characteristics of the four national film industry sustainability archetypes.

Characteristic	Mature–Diverse	Emerging–Dynamic	State-Regulated	Developing–Fragile
N countries	14	15	8	5
Mean FISI (2023)	0.72	0.50	0.45	0.31
Mean CDP	0.73	0.53	0.41	0.34
Mean ERP	0.69	0.47	0.49	0.26
Mean SAP	0.70	0.47	0.44	0.30
Annual FISI growth	$+ 0.010$	$+ 0.022$	$+ 0.009$	$+ 0.017$
Domestic market share (%)	38.6	34.1	29.4	10.2
Digital distribution (%)	59.7	41.8	38.1	15.4
Female director share (%)	23.4	15.2	11.7	10.8
GDP per capita (k USD)	44.3	18.6	18.3	3.9

Table 14. Model-based associative policy stress-test results: mean projected

Δ

FISI by intervention and cluster.

Table 14. Model-based associative policy stress-test results: mean projected

Δ

FISI by intervention and cluster.

Intervention	Magnitude	Mature-Div.	Emerg.-Dyn.	State-Reg.	Devel.-Frag.
Digital distribution	$+ 10$ pp	$+ 0.018$	$+ 0.032$	$+ 0.029$	$+ 0.041$
Female director share	$+ 10$ pp	$+ 0.009$	$+ 0.014$	$+ 0.017$	$+ 0.012$
Genre diversity	$+ 0.1$ (Shannon)	$+ 0.011$	$+ 0.019$	$+ 0.015$	$+ 0.022$
Co-production rate	$+ 10$ pp	$+ 0.007$	$+ 0.016$	$+ 0.013$	$+ 0.018$
Screen density	$+ 10$ per million	$+ 0.006$	$+ 0.011$	$+ 0.010$	$+ 0.024$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qi, P.; Zhu, W. AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies. Sustainability 2026, 18, 6117. https://doi.org/10.3390/su18126117

AMA Style

Qi P, Zhu W. AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies. Sustainability. 2026; 18(12):6117. https://doi.org/10.3390/su18126117

Chicago/Turabian Style

Qi, Peixuan, and Weidong Zhu. 2026. "AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies" Sustainability 18, no. 12: 6117. https://doi.org/10.3390/su18126117

APA Style

Qi, P., & Zhu, W. (2026). AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies. Sustainability, 18(12), 6117. https://doi.org/10.3390/su18126117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

AI for Sustainable Cultural Industries: A Screenplay-Aware Knowledge-Enhanced State Space Model with LLM-Derived Narrative Features for Forecasting Film Industry Sustainability Across National Economies

Abstract

1. Introduction

2. Theoretical Framework: Linking Film Industries to the SDGs

3. Related Work

3.1. AI for Sustainability Forecasting and Cultural-Industry Analytics

3.2. Knowledge Graphs and Knowledge-Grounded LLMs

3.3. Hybrid Deep Architectures for Complex Digital and Multimodal Data

3.4. Selective State Space Models for Panel Time-Series Forecasting

4. Data and Variables

4.1. Data Sources and Coverage

4.2. Variable Construction

5. Methodology

5.1. Research Hypotheses and Validation Design

5.2. Knowledge Graph Construction and Embedding

5.3. LLM-Based Screenplay Narrative Features with Knowledge Editing

5.4. Knowledge-Enhanced Mamba Architecture

5.5. Baselines, Training, and Evaluation

6. Experimental Results

6.1. Overall Forecasting Performance

6.2. Knowledge Graph Embedding Quality and Alternative Models

6.3. Ablation Study

6.4. Feature Importance Analysis

6.5. FISI Aggregation and Indicator Robustness

6.6. Country Clustering and Sustainability Archetypes

6.7. Associative Policy Stress Tests

7. Discussion and Policy Implications

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Country List and Cluster Assignments

Appendix B. Countries Excluded from the Balanced Panel

Appendix C. Hyperparameter Sensitivity Analysis

Appendix D. LLM Narrative Scoring Prompt and Rubric

Appendix E. Indicator Construction and Data Sources

Appendix F. Seed-Level Results and Reproducibility

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI