A Context-Conditioned Reinforcement Learning Framework for Space Frame Structure Optimization
Abstract
1. Introduction
- Based on the Soft Actor–Critic framework in DRL, the SFO-Agent is developed to interact directly with FEM and achieve automated optimization design of space frame structures through training.
- A DRL environment is constructed by integrating parametric modeling and FEM, providing an efficient state representation and enabling coordinated optimization of structural geometry and component engineering.
- A piecewise reward function is formulated in accordance with relevant design-code provisions for space frame structures, guiding the optimization to satisfy safety requirements while reducing material consumption.
- A context-conditioned off-policy training strategy is adopted, in which varying design conditions are incorporated into the training process, enabling the agent to learn reusable design policies and improve generalization across multiple structural design scenarios.
2. Methodology
2.1. SFO-Environment
2.2. State and Action
2.3. Reward Design
2.4. SFO-Agent Design
- Actor–environment interaction: Given the current state , the policy outputs an action , which is applied to the SFO-Environment to obtain the reward and the next state . The transition tuple is then stored in the replay buffer for subsequent training.
- Critic update: Given , the actor outputs the next action and the entropy-related term . Concatenating and forms the input to the target critics, which produce two target Q-values, and . Similarly, concatenating the sampled and provides the input to the critics, yielding the action-value estimates and The target temporal difference and temporal difference error used for network updates are given in Equations (12) and (13).
- 3.
- Soft update of the target critics: The target critics share the same network architecture as the critics, and this study adopts a soft update controlled by , which incrementally moves the target parameters toward the critic parameters, as given in Equation (14), where and denote the parameters of the target critics and critics, respectively.
- 4.
- Actor update: The actor is updated under the guidance of the critics to favor higher-value actions. For sampled states , the actor produces a new action , which is evaluated by the critics to obtain and . The actor objective encourages both high action value and high entropy, leading to the loss function in Equation (15).
3. Experiments and Results
3.1. Computational Setup
3.2. Training Results
3.3. Typical Case Analysis
4. Discussion of Agent Performance
4.1. Agent Evaluation
4.2. Algorithm Comparison
4.3. Extension to Other Structural Types
5. Conclusions
- The SFO-Agent was established by coupling an actor network with critics and target critics. Parametric modeling, FEM, and a code-compliant reward function were then developed and embedded into the SFO-Environment. Off-policy training within a context-conditioned framework enabled continuous interaction between SFO-Agent and SFO-Environment, allowing the agent to learn reusable policies across varying design cases for automated optimization of space frame structures.
- A single-layer Kiewitt dome was selected as the benchmark, and the 600-episode off-policy training results demonstrated the progressive improvement of the agent. The analysis of a representative design case verified the rationality of the learned strategy. In addition, tests on 100 randomly generated cases and full-domain sweeps across context parameters showed robust performance over varying design cases.
- Compared with the GA, the context-conditioned DRL agent learned reusable design patterns during training and therefore required substantially less time for optimization during the inference stage. The maximum-entropy formulation enhanced action exploration, leading to improved performance on more challenging models. Moreover, the proposed method was extended to several additional forms of space frame structures, further supporting its effectiveness.
- In this study, the SFO-Agent was evaluated on common forms of space frame structures to demonstrate its effectiveness, while its applicability to a broader range of structural types still requires further extension, particularly for non-standard configurations. In addition, the current generalization capability mainly refers to adaptation to different design contexts within the same parametric structural type, rather than direct topological generalization across different structural systems. The test results also indicate that, under certain design conditions, the agent may produce low-reward solutions, suggesting that the robustness and overall optimization performance of the proposed framework should be further improved.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
| Algorithm A1 Training procedure of the SFO-Agent | |
| Given: | |
| Initialize: , replay buffer RB | |
| for episode = 1, …, N do | |
| for t = 1, …, T do | |
| in RB | |
| if then | |
| ~ RB |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Update PER priorities based on TD error | |
| end if | |
| end for | |
| end for | |
References
- Narayanan, S. Space Structures: Principles and Practice; Multi-Science Publishing Co., Ltd.: Brentwood, CA, USA, 2006; ISBN 978-0-906522-42-4. [Google Scholar]
- Dong, S.L. Analysis Design and Construction of New Space Structures; People’s Communications Publishing House Co., Ltd.: Beijing, China, 2006. [Google Scholar]
- Aldwaik, M.; Adeli, H. Advances in Optimization of Highrise Building Structures. Struct. Multidiscip. Optim. 2014, 50, 899–919. [Google Scholar] [CrossRef]
- Mei, L.; Wang, Q. Structural Optimization in Civil Engineering: A Literature Review. Buildings 2021, 11, 66. [Google Scholar] [CrossRef]
- Erbatur, F.; Hasançebi, O.; Tütüncü, İ.; Kılıç, H. Optimal Design of Planar and Space Structures with Genetic Algorithms. Comput. Struct. 2000, 75, 209–224. [Google Scholar] [CrossRef]
- Delyová, I.; Frankovský, P.; Bocko, J.; Trebuňa, P.; Živčák, J.; Schürger, B.; Janigová, S. Sizing and Topology Optimization of Trusses Using Genetic Algorithm. Materials 2021, 14, 715. [Google Scholar] [CrossRef]
- Qin, L.; Huang, W.; Du, Y.; Zheng, L.; Jawed, M.K. Genetic Algorithm-Based Inverse Design of Elastic Gridshells. Struct. Multidiscip. Optim. 2020, 62, 2691–2707. [Google Scholar] [CrossRef]
- Tomei, V.; Grande, E.; Imbimbo, M. Design Optimization of Gridshells Equipped with Pre-Tensioned Rods. J. Build. Eng. 2022, 52, 104407. [Google Scholar] [CrossRef]
- Lamberti, L. An Efficient Simulated Annealing Algorithm for Design Optimization of Truss Structures. Comput. Struct. 2008, 86, 1936–1953. [Google Scholar] [CrossRef]
- Li, L.J.; Huang, Z.B.; Liu, F.; Wu, Q.H. A Heuristic Particle Swarm Optimizer for Optimization of Pin Connected Structures. Comput. Struct. 2007, 85, 340–349. [Google Scholar] [CrossRef]
- Luh, G.C.; Lin, C.Y. Optimal Design of Truss-Structures Using Particle Swarm Optimization. Comput. Struct. 2011, 89, 2221–2232. [Google Scholar] [CrossRef]
- Tsiptsis, I.N.; Liimatainen, L.; Kotnik, T.; Niiranen, J. Structural Optimization Employing Isogeometric Tools in Particle Swarm Optimizer. J. Build. Eng. 2019, 24, 100761. [Google Scholar] [CrossRef]
- Xiao, F.; Mao, Y.; Tian, G.; Chen, G.S. Partial-Model-Based Damage Identification of Long-Span Steel Truss Bridge Based on Stiffness Separation Method. Struct. Control Health Monit. 2024, 2024, 5530300. [Google Scholar] [CrossRef]
- Mao, Y.; Xiao, F.; Tian, G.; Xiang, Y. Sensitivity Analysis and Sensor Placement for Damage Identification of Steel Truss Bridge. Structures 2025, 73, 108310. [Google Scholar] [CrossRef]
- Charalampakis, A.E.; Papanikolaou, V.K. Machine Learning Design of R/C Columns. Eng. Struct. 2021, 226, 111412. [Google Scholar] [CrossRef]
- Cheng, J.; Li, X.; Jiang, K.; Li, S.; Su, A.; Zhao, O. Machine-Learning-Assisted Design of High Strength Steel I-Section Columns. Eng. Struct. 2024, 308, 118018. [Google Scholar] [CrossRef]
- Huang, X.; Jiang, K.; Zhao, O. Unified Machine-Learning-Aided Design of Cold-Formed Steel Channel Section Columns with Different Buckling Modes at Ambient and Elevated Temperatures. Eng. Struct. 2024, 320, 118875. [Google Scholar] [CrossRef]
- Marie, H.S.; Abu el-hassan, K.; Almetwally, E.M.; El-Mandouh, M.A. Joint Shear Strength Prediction of Beam-Column Connections Using Machine Learning via Experimental Results. Case Stud. Constr. Mater. 2022, 17, e01463. [Google Scholar] [CrossRef]
- Wang, S.; Xu, J.; Wang, Y.; Pan, C. Machine Learning-Based Prediction of Shear Strength of Steel Reinforced Concrete Columns Subjected to Axial Compressive Load and Seismic Lateral Load. Structures 2023, 56, 104968. [Google Scholar] [CrossRef]
- de Lautour, O.R.; Omenzetter, P. Prediction of Seismic-Induced Structural Damage Using Artificial Neural Networks. Eng. Struct. 2009, 31, 600–606. [Google Scholar] [CrossRef]
- Asgarkhani, N.; Kazemi, F.; Jankowski, R. Machine Learning-Based Prediction of Residual Drift and Seismic Risk Assessment of Steel Moment-Resisting Frames Considering Soil-Structure Interaction. Comput. Struct. 2023, 289, 107181. [Google Scholar] [CrossRef]
- Chang, K.-H.; Cheng, C.-Y. Learning to Simulate and Design for Structural Engineering. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1426–1436. [Google Scholar]
- Song, L.; Wang, C.; Fan, J.; Lu, H. Elastic Structural Analysis Based on Graph Neural Network without Labeled Data. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 1307–1323. [Google Scholar] [CrossRef]
- Zhao, P.; Liao, W.; Huang, Y.; Lu, X. Intelligent Beam Layout Design for Frame Structure Based on Graph Neural Networks. J. Build. Eng. 2023, 63, 105499. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Pizarro, P.N.; Massone, L.M.; Rojas, F.R.; Ruiz, R.O. Use of Convolutional Networks in the Conceptual Structural Design of Shear Wall Buildings Layout. Eng. Struct. 2021, 239, 112311. [Google Scholar] [CrossRef]
- Huang, W.X.; Zheng, H. Architectural Drawings Recognition and Generation through Machine Learning. In Proceedings of the Recalibration: On Imprecision and Infidelity: Proceedings of the 38th Annual Conference of the Association for Computer Aided Design in Architecture; Association for Computer Aided Design in Architecture (ACADIA): Fargo, ND, USA, 2018; pp. 156–165. [Google Scholar]
- Zheng, H.; An, K.; Wei, J.X.; Ren, Y. Apartment Floor Plans Generation via Generative Adversarial Networks. In Proceedings of the RE: Anthropocene, Design in the Age of Humans: Proceedings of the 25th International Conference on Computer-Aided Architectural Design Research in Asia (CAADRIA 2020); The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA): Fargo, ND, USA, 2020; pp. 601–610. [Google Scholar]
- Liao, W.J.; Lu, X.Z.; Huang, Y.; Zheng, Z.; Lin, Y. Automated Structural Design of Shear Wall Residential Buildings Using Generative Adversarial Networks. Autom. Constr. 2021, 132, 103931. [Google Scholar] [CrossRef]
- Minsky, M. Steps toward Artificial Intelligence. Proc. IRE 1961, 49, 8–30. [Google Scholar] [CrossRef]
- Sutton, R.S. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In Machine Learning Proceedings 1990; Morgan Kaufmann: Burlington, MA, USA, 1990; pp. 216–224. [Google Scholar]
- Ha, D.; Schmidhuber, J. World Models. arXiv 2018, arXiv:1803.10122. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Wang, Z.Y.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; de Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In International Conference on Machine Learning; PMLR: New York, NY, USA, 2016. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Konda, V.; Tsitsiklis, J. Actor-Critic Algorithms. In Proceedings of the Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1999; Volume 12. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2019, arXiv:1812.05905. [Google Scholar] [CrossRef]
- Hayashi, K.; Ohsaki, M. Reinforcement Learning and Graph Embedding for Binary Truss Topology Optimization Under Stress and Displacement Constraints. Front. Built Environ. 2020, 6, 59. [Google Scholar] [CrossRef]
- Zhu, S.J.; Ohsaki, M.; Hayashi, K.; Guo, X.N. Machine-Specified Ground Structures for Topology Optimization of Binary Trusses Using Graph Embedding Policy Network. Adv. Eng. Softw. 2021, 159, 103032. [Google Scholar] [CrossRef]
- Jeong, J.; Jo, H. Deep Reinforcement Learning for Automated Design of Reinforced Concrete Structures. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 1508–1529. [Google Scholar] [CrossRef]
- Fu, B.C.; Gao, Y.Q.; Wang, W. A Physics-informed Deep Reinforcement Learning Framework for Autonomous Steel Frame Structure Design. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 3125–3144. [Google Scholar] [CrossRef]
- Du, M.; Gao, Y.; Wang, W.; Fu, B. FrameGym: A Reinforcement Learning Environments for Steel Frame Structures. Eng. Struct. 2025, 343, 120991. [Google Scholar] [CrossRef]
- Hallak, A.; Di Castro, D.; Mannor, S. Contextual Markov Decision Processes. arXiv 2015, arXiv:1502.02259. [Google Scholar] [CrossRef]
- Schaul, T.; Horgan, D.; Gregor, K.; Silver, D. Universal Value Function Approximators. In Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1312–1320. [Google Scholar]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar] [CrossRef]
- Holzer, D.; Hough, R.; Burry, M. Parametric Design and Structural Optimisation for Early Design Exploration. Int. J. Archit. Comput. 2007, 5, 625–643. [Google Scholar] [CrossRef]
- JGJ 7-2010; Technical Specification for Space Frame Structure. China Architecture & Building Press: Beijing, China, 2010.
- Precup, D.; Sutton, R.S.; Singh, S.P. Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2000; pp. 759–766. [Google Scholar]
- Gythiel, W.; Mommeyer, C.; Raymaekers, T.; Schevenels, M. A Comparative Study of the Structural Performance of Different Types of Reticulated Dome Subjected to Distributed Loads. Front. Built Environ. 2020, 6, 56. [Google Scholar] [CrossRef]
- GB/T 17395-2024; Dimensions, Shapes, Masses and Tolerances of Steel Tubes. China Iron and Steel Association: Beijing, China, 2024.















| No. | Parameter | Definition | Unit |
|---|---|---|---|
| 1 | D | Spherical shell span diameter | mm |
| 2 | Rise-to-span ratio | - | |
| 3 | DL | Dead load | N/mm2 |
| 4 | LL | Live load | N/mm2 |
| 5 | Radial grid size | mm | |
| 6 | Ratio of circumferential to radial grid size | - | |
| 7 | Section index of radial members | - | |
| 8 | Section index of circumferential members | - | |
| 9 | Section index of diagonal members | - |
| No. | Parameter | Definition | Unit |
|---|---|---|---|
| 1 | M | Material usage | t |
| 2 | UZ | Maximum deflection | mm |
| 3 | BF | First-order buckling factor | - |
| 4 | - | ||
| 5 | - | ||
| 6 | - |
| No. | Parameter | Unit | Min | Max | Resolution | Description |
|---|---|---|---|---|---|---|
| 1 | D | mm | 20,000 | 80,000 | 1000 | Spherical shell span diameter |
| 2 | - | 1/10 | 1/4 | 0.01 | Rise-to-span ratio | |
| 3 | DL | N/mm2 | 0.0003 | 0.0015 | 0.0001 | Dead load |
| 4 | LL | N/mm2 | 0.0005 | 0.0015 | 0.0001 | Live load |
| 5 | mm | 2000 | 6000 | 100 | Radial grid size | |
| 6 | - | 0.5 | 1.5 | 0.1 | Ratio of circumferential to radial grid size | |
| 7 | - | 0 | 74 | 1 | Section index of radial members | |
| 8 | - | 0 | 74 | 1 | Section index of circumferential members | |
| 9 | - | 0 | 74 | 1 | Section index of diagonal members |
| Network | Hidden Layer | Hidden Layer Dimension | Activation Function | Learning Rate |
|---|---|---|---|---|
| Actor | 1 | 512 | ReLU | |
| 2 | 512 | Tanh | ||
| Critic 1 | 1 | 512 | ReLU | |
| 2 | 512 | Tanh | ||
| Critic 2 | 1 | 512 | ReLU | |
| 2 | 512 | Tanh | ||
| Target critic 1 | 1 | 512 | ReLU | - |
| 2 | 512 | Tanh | ||
| Target critic 2 | 1 | 512 | ReLU | - |
| 2 | 512 | Tanh |
| Step | (mm) | Reward | Compliance | M (t) | ||||
|---|---|---|---|---|---|---|---|---|
| 1 | 3600 | 1.2 | 37 | 20 | 33 | 9.30 | Pass | 59.93 |
| 2 | 4400 | 1.2 | 56 | 33 | 32 | 10.20 | Pass | 54.96 |
| 3 | 3600 | 1.4 | 46 | 24 | 29 | 10.57 | Pass | 52.93 |
| 4 | 4400 | 1.4 | 51 | 29 | 28 | 12.44 | Pass | 42.67 |
| 5 | 3900 | 1.5 | 52 | 27 | 27 | 10.03 | Pass | 55.92 |
| 6 | 4200 | 1.5 | 49 | 29 | 26 | 10.28 | Pass | 54.55 |
| 7 | 4100 | 1.4 | 52 | 27 | 26 | 9.43 | Pass | 59.22 |
| 8 | 4200 | 1.4 | 52 | 27 | 26 | 8.95 | Pass | 61.83 |
| 9 | 4000 | 1.5 | 49 | 26 | 28 | 10.65 | Pass | 52.51 |
| 10 | 4300 | 1.5 | 52 | 29 | 27 | 10.08 | Pass | 56.64 |
| Case for Comparison | D (mm) | DL (N/mm2) | LL (N/mm2) | |
|---|---|---|---|---|
| Case 1 | 25,000 | 1/8 | 0.0014 | 0.001 |
| Case 2 | 50,000 | 1/7 | 0.0015 | 0.0005 |
| Case 3 | 70,000 | 1/6 | 0.0008 | 0.0005 |
| Case for Comparison | GA | SFO-Agent | ||||
|---|---|---|---|---|---|---|
| Time (s) | Reward | M (t) | Time (s) | Reward | M (t) | |
| Case 1 | 2036 | 14.54 | 10.67 | 57 | 13.60 | 12.44 |
| Case 2 | 3296 | 12.29 | 43.48 | 102 | 12.44 | 42.67 |
| Case 3 | 5862 | 5.86 | 86.06 | 137 | 7.90 | 73.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Y.; Xiao, C.; Fan, F.; Zhi, X. A Context-Conditioned Reinforcement Learning Framework for Space Frame Structure Optimization. Buildings 2026, 16, 2321. https://doi.org/10.3390/buildings16122321
Li Y, Xiao C, Fan F, Zhi X. A Context-Conditioned Reinforcement Learning Framework for Space Frame Structure Optimization. Buildings. 2026; 16(12):2321. https://doi.org/10.3390/buildings16122321
Chicago/Turabian StyleLi, Yinbin, Congzhen Xiao, Feng Fan, and Xudong Zhi. 2026. "A Context-Conditioned Reinforcement Learning Framework for Space Frame Structure Optimization" Buildings 16, no. 12: 2321. https://doi.org/10.3390/buildings16122321
APA StyleLi, Y., Xiao, C., Fan, F., & Zhi, X. (2026). A Context-Conditioned Reinforcement Learning Framework for Space Frame Structure Optimization. Buildings, 16(12), 2321. https://doi.org/10.3390/buildings16122321

