Against the dual backdrop of the rural revitalization strategy and the pursuit of high-quality, balanced urban–rural education, optimizing rural campus spaces has emerged as an important lever for addressing educational resource disparities and improving pedagogical quality. However, conventional evaluation of campus space optimization faces two systemic dilemmas. First, top-down decision-making often neglects the authentic needs of diverse stakeholders and place-based knowledge, resulting in spatial interventions that lose regional distinctiveness. Second, routine public participation is constrained by geographical barriers, time costs, and sample-size limitations, which can amplify professional cognitive bias and impede comprehensive feedback formation. The compounded effect of these challenges contributes to a disconnect between spatial optimization outcomes and perceived needs, thereby constraining the distinctive development of rural educational spaces. To address these constraints, this study proposes a novel method that integrates regional spatial feature recognition with digital media-based public perception assessment. At the data collection and ethical governance level, the study strictly adheres to platform compliance and academic ethics. A total of 12,800 preliminary comments were scraped from major social media platforms (e.g., Douyin, Dianping, and Xiaohongshu) and processed through a three-stage screening workflow—keyword screening–rule-based filtering–manual verification—to yield 8616 valid records covering diverse public groups across China. All user-identifying information was fully anonymized to ensure lawful use and privacy protection. At the analytical modeling level, we develop a Transformer-based deep learning system that leverages multi-head attention mechanisms to capture implicit spatial-sentiment features and metaphorical expressions embedded in review texts. Evaluation on an independent test set indicates a classification accuracy of 89.2%, aligning with balanced and stable scoring performance. Robustness is further strengthened by introducing an equal-weight alternative strategy and conducting stability checks to indicate the consistency of model outputs across weighting assumptions. At the scenario interpretation level, we combine grounded-theory coding with semantic network analysis to establish a three-tier spatial analysis framework—macro (landscape pattern/hydro-topological patterns), meso (architectural interface), and micro (teaching scenes/pedagogical scenarios)—and incorporate an interpretive stakeholder typology (tourists, residents, parents, and professional groups) to systematically identify and quantify key features shaping public spatial perception. Findings show that, at the macro level, naturally integrated scenarios—such as “campus–farmland integration” and “mountain–water embeddedness”—exhibit high affective association, aligning with the “mountain-water-field-village” spatial sequence logic and suggesting broad public endorsement of ecological campus concepts, whereas vernacular settlement-pattern scenarios receive relatively low attention due to cognitive discontinuities. At the meso level, innovative corridor strategies (e.g., framed vistas and expanded corridor spaces) strengthen the building–nature interaction and suggest latent value in stimulating exploratory spatial experience. At the micro level, place-based practice-oriented teaching scenes (e.g., intangible cultural heritage handcraft and creative workshops) achieve higher scores, aligning with the compatibility of vernacular education’s “differential esthetics,” while urban convergence-oriented interdisciplinary curriculum scenes suggest an interpretive gap relative to public expectations. These results indicate an embedded relationship between public perception and regional spatial features, which is further shaped by a multi-actor governance process—characterized by “Government + Influencers + Field Study”—that mediates how rural educational spaces are produced, communicated, and interpreted in digital environments. The study’s innovative value lies in integrating sociological theories (e.g., embeddedness) with deep learning techniques to fill the regional and multi-actor perspective gap in rural campus POE and to promote a methodological shift from “experience-based induction” toward a “data-theory” dual-drive model. The findings provide inferential evidence for rural campus renewal and optimization; the methodological pipeline is transferable to small-scale rural primary schools with media exposure and salient regional ecological characteristics, and it offers a new pathway for incorporating digital media-driven public perception feedback into planning and design practice. The research methodology of this study consists of four sequential stages, which are implemented in a systematic and progressive manner: First, data collection was conducted: Python and the Octopus Collector were used to crawl online comment data related to Fuwen Township Central Primary School, strictly complying with the user agreements of the Douyin, Dianping, and Xiaohongshu platforms. Second, semantic preprocessing was performed: The evaluation content was segmented to generate word frequency statistics and semantic networks; qualitative analysis was conducted using Origin software, and quantitative translation was realized via Sankey diagrams. Third, spatial scene coding was carried out: Combined with a spatial characteristic identification system, a macro–meso–micro three-tier classification system for spatial scene characteristics was constructed to encode and quantitatively express the textual content. Finally, sentiment quantification and correlation analysis was implemented: A deep learning model based on the Transformer framework was employed to perform sentiment quantification scoring for each comment; Sankey diagrams were used to quantitatively correlate spatial scenes with sentiment tendencies, thereby exploring the public’s perceptual associations with the architectural spatial environment of rural campuses.