1. Introduction
The global film industry generates roughly USD 342 billion in annual output and supports 14.2 million jobs worldwide, placing it among the largest subsectors of the cultural and creative industries [
1,
2]. It plays a dual role, operating both as a commercial industry subject to market forces and as a medium through which national and collective identities are expressed [
3,
4]. Whether national film industries can sustain economic viability while preserving cultural diversity and advancing social development is therefore a question with both economic and policy stakes.
Several Sustainable Development Goals (SDGs) bear directly on the film industry. SDG 8 links to employment quality and revenue resilience in cultural production; SDG 10 concerns unequal access to cultural markets and distribution channels; SDG 17 is reflected in co-production and cross-border knowledge sharing; and SDG 5 relates to persistent gender gaps in creative leadership [
4,
5,
6,
7]. The UNESCO Convention on the Protection and Promotion of the Diversity of Cultural Expressions further frames cultural diversity as a condition for sustainable development [
4], yet operational tools for measuring and forecasting the SDG alignment of national film industries remain scarce. Artificial intelligence has become a practical tool for sustainability forecasting and sustainable-development decision support across industries [
8,
9], but its applications and impacts remain unevenly distributed: the literature concentrates on environmental and resource sectors with comparatively little attention to how AI can support sustainable development in cultural and creative industries. In the cultural sector, computational work has focused on box office prediction [
10], film network analysis [
11,
12], and cultural diversity measurement, while forecasting composite sustainability trajectories for national film industries has been neglected. Extending AI-for-sustainable-development research to this under-covered industry is therefore both a methodological opportunity—film panels combine structural relational data, long textual content, and medium-horizon macro dynamics—and a policy need for cultural ministries and international organizations working toward the 2030 Agenda.
This paper makes four contributions. First, it constructs and documents the Film Industry Sustainability Index (FISI), which is a three-pillar composite indicator linking cultural diversity, economic resilience, and SDG alignment for a balanced 42-country panel from 2005 to 2023. Second, it proposes a knowledge-enhanced selective state-space forecasting architecture in which a gated fusion layer injects static country-level film-industry KG embeddings into dynamic Mamba hidden states. Third, it introduces a transparent narrative-feature pipeline that scores film synopses with an LLM, anchors film-level factual knowledge through model editing, and validates the resulting scores against both human expert ratings and an alternative LLM. Fourth, it provides a policy-oriented validation suite, including ablations, a pre-pandemic temporal split, panel-econometric baselines, FISI weighting sensitivity, KG relation decomposition, and non-causal policy stress tests with placebo checks.
The remainder of the paper develops the theoretical framework (
Section 2), reviews related work (
Section 3), describes data and variables (
Section 4), details the methodology (
Section 5), reports experimental results (
Section 6), discusses policy implications and limitations (
Section 7), and concludes (
Section 8).
2. Theoretical Framework: Linking Film Industries to the SDGs
National film industries are conceptualized as socio-economic ecosystems whose health can be characterized along dimensions that map onto SDG targets. Drawing on the economics of culture [
3] and cross-national cultural-value frameworks [
13], three pillars are identified, which are each operationalized through six indicators.
Cultural Diversity (Shannon genre entropy, female director share, co-production rate, language diversity, independent production share, domestic market share) addresses the range of creative expressions and equitable participation, connecting to SDGs 4, 5, 10, and 17.
Economic Resilience (revenue growth stability, screen density growth, digital distribution ratio, export revenue share, production volume trend, inverted Herfindahl–Hirschman Index (HHI) box office concentration) captures the diversification of the economic base and maps onto SDGs 8, 9, and 12.
SDG Alignment (film employment share, inverted screen access Gini, urban–rural infrastructure ratio, youth employment ratio, public funding accessibility, audience reach equity) measures distributional fairness and relates to SDGs 8.5, 8.6, 10.3, 11.a, and 16.6. Each indicator is min–max normalized within each annual cross-section, each pillar score is the arithmetic mean of its six constituents, and the composite FISI is the geometric mean of the three pillar scores, enforcing simultaneous progress across dimensions rather than compensation.
Table 1 reports the complete indicator–SDG mapping.
Formally, for a positive-direction indicator
, the normalized value is
where normalization is performed within year
t and
prevents division by zero. For negative-direction indicators (urban–rural infrastructure ratio),
is replaced by
. Pillar score
p is computed as
and the composite index is the equal-weight geometric mean
The geometric mean penalizes unbalanced development across pillars; alternative arithmetic, principal-component, Delphi-weighted, and winsorized
z-score aggregation schemes are evaluated in
Section 6.
7. Discussion and Policy Implications
The ablation in
Table 10 decomposes the KE-Mamba gain over baselines: removing the knowledge embeddings raises MAE by 18.5% (0.0389 to 0.0461), replacing the gated KG fusion layer with plain concatenation raises it by 9.8% (to 0.0427), removing the LLM narrative features raises it by 4.9% (to 0.0408), and further removing the knowledge editing step while keeping the raw LLM scores raises it by 2.6% (to 0.0399). The last two effects are smaller than the knowledge graph contribution but stable across seeds, indicating that micro-level narrative signals carry information complementary to the macro panel indicators and that anchoring the LLM to verified film records noticeably improves their reliability. The three mechanisms—structural KG embeddings, screenplay-oriented narrative proxies, and knowledge editing—combine to deliver the full accuracy, which is consistent with evidence from knowledge-enhanced macroeconomic forecasting [
18].
The SHAP analysis points to three policy-relevant variables. Domestic market share ranks first, which is consistent with the long-standing emphasis in cultural economics on maintaining a viable local production base [
3,
40]. The digital distribution ratio ranks second, above genre diversity and GDP per capita, suggesting that the shift to digital channels is not merely a commercial trend but a structural determinant of sustainability; the positive interaction with lower GDP per capita indicates that digital channels can partially bypass the capital constraints, limiting physical exhibition in lower-income countries.
The four-cluster typology supports differentiated policy approaches. In Mature–Diverse markets (e.g., France, South Korea, the UK), the associative stress tests suggest limited returns from broad interventions, so targeted efforts on gender representation and defense of domestic share against global streaming platforms are more relevant. Emerging–Dynamic markets (e.g., China, India, Italy, Spain) stand to gain the most from accelerating digital distribution while investing in genre diversification. In State-Regulated markets the Cultural Diversity Pillar lags, pointing to content and distribution regulations as the binding constraint. Developing–Fragile markets face a compounding problem in which low economic resilience limits investment in cultural diversity and access, and international cooperation may be needed to break this cycle. The nonlinear (inverted-U) relationship between co-production rate and sustainability also suggests that co-production incentive programs should include graduated support rewarding collaboration without creating dependence on foreign partners.
At the country-year level, narrative dimensions should not be interpreted as judgments of national artistic value. They are aggregate proxies for the kinds of films that enter internationally visible metadata channels. For example, a high cultural-specificity score indicates that retained titles from a country-year contain more localized setting, social context, or culturally specific plot elements in their synopses; it does not imply that the entire national cinema is culturally specific or that non-retained domestic titles lack such features. Similarly, thematic depth captures the density of stated social, moral, or political themes in available synopses. These variables are therefore used as weak content signals for forecasting rather than as normative rankings of film quality.
Residual hallucination risk remains even after factual editing. The editing protocol corrects verifiable title-level attributes such as country, year, and genre, but qualitative dimensions such as thematic depth, cultural specificity, and projected audience reach are evaluative rather than factual. They may therefore retain systematic LLM biases, including Western-centric assumptions about narrative structure, genre coherence, and audience reach. The human-validation results reduce but do not eliminate this concern. For this reason, the narrative features are treated as weak aggregate signals and are interpreted together with coverage diagnostics, cross-model agreement, and human-rating correlations.
Ethically, the framework should not be used to rank the cultural worth of national cinemas. A low narrative score may reflect sparse English metadata, limited international distribution, or LLM training-data imbalance rather than lower artistic quality. Future deployments should incorporate multilingual synopses, local-language models, culturally diverse expert panels, and uncertainty intervals for narrative scores—especially in developing or non-English markets.
The full pipeline is more complex than a conventional panel model because it requires KG construction, LLM scoring, and model editing, but the most expensive steps are one-off or annual preprocessing tasks. The dominant cost is the LLM narrative scoring of the 12,438-title sample (5.6 GPU h one-off, 18–35 min annual incremental); KG construction (7 CPU min) and RotatE embedding training (16 GPU min) are negligible by comparison, and knowledge editing of the 4127 failed-probe titles adds only 0.9 GPU h. Once features are cached, KE-Mamba training takes 53 s, and inference for all 42 countries takes 0.19 s on a single A100, making the framework practical for annual dashboards. For policy institutions with limited computational resources, a simplified deployment version without LLM scoring and KG retraining runs end-to-end in under 10 s on a central processing unit (CPU) at the cost of higher MAE (0.0461 without KG; see
Table 10).
Several limitations should be acknowledged. The scenario analysis is based on observational correlations learned by the forecasting model. It cannot separate policy-induced variation from selection effects, omitted institutions, or reverse causality. The placebo test only checks whether the learned sensitivity pattern is stronger than random reassignment; it does not establish identification. Stronger causal claims would require instruments, staggered policy variation, natural experiments, or synthetic-control designs around clearly dated policy changes. The FISI involves subjective choices in indicator selection and weighting; sensitivity analyses with alternative aggregation schemes (arithmetic, PCA, Delphi, winsorized
z-score) are reported in
Section 6 (
Table 12) and indicate that the headline forecast is robust to these choices. The 42-country sample excludes smaller markets without consistent time series, and linear interpolation for short within-country gaps may understate short-run volatility. Four limitations concern the LLM narrative stream specifically: (i) the pipeline relies on English-language TMDb synopses, biasing scores toward internationally distributed titles rather than the domestic long tail; (ii) residual hallucinations remain possible for very recent releases absent from the base model’s pre-training corpus, and the editing protocol touches only the 4127 titles that failed the factual probe; (iii) scores on dimensions such as “dialogue quality” are inferred from synopses rather than full scripts and should be read as coarse proxies rather than substitutes for expert reader judgments; and (iv) the knowledge editing step delivers a modest 2.6% MAE reduction, so it is treated as a cheap safeguard rather than an essential component—one that plausibly becomes more valuable as the sample extends into years less covered by the base model’s parametric knowledge.
Future work should extend the narrative stream beyond synopses by incorporating trailers, posters, audience reviews, festival selections, and streaming-platform engagement data. Multimodal signals could improve the measurement of audience reach and cultural specificity, while real-time platform data could support nowcasting rather than annual retrospective forecasting.
8. Conclusions
This paper introduced the Film Industry Sustainability Index—a composite of 18 indicators spanning cultural diversity, economic resilience, and SDG alignment—and applied it to a balanced panel of 42 countries over 2005–2023. The Screenplay-Aware Knowledge-Enhanced Mamba architecture, integrating knowledge graph embeddings, LLM-derived screenplay-oriented narrative proxies, and a knowledge editing step that anchors the LLM to verified UNESCO and LUMIERE records, achieved the best forecast accuracy among all tested models ( on 2020–2023). The ablation shows that both the narrative stream and the editing step contribute measurable gains over a macro-only baseline, supporting the value of coupling macro panel signals with micro-level narrative signals. Domestic market share, digital distribution ratio, and genre diversity emerged as the three strongest explicit predictors, jointly accounting for roughly 31% of the total SHAP mass, with the grouped KG embedding contributing a further 34%. The four-cluster typology and associative policy stress tests translate the forecasting results into differentiated policy hypotheses that can guide further institutional analysis, but they should not be interpreted as causal policy effects. Future work should combine the proposed forecasting framework with multilingual and multimodal cultural data, real-time platform indicators, and causal-inference designs such as instrumental variables, staggered policy evaluation, or synthetic controls.