Next Article in Journal
Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records
Previous Article in Journal
Review of Maturity Models for Data Mining and Proposal of a Data Preparation Maturity Model Prototype for Data Mining
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack

by
Yi Liu
1,
Parth Sandeepbhai Shah
2,
Tian Xia
3,* and
Dryver Huston
1,*
1
Department of Mechanical Engineering, University of Vermont, Burlington, VT 05405, USA
2
Intel Corporation, Chandler, AZ 85224, USA
3
Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
*
Authors to whom correspondence should be addressed.
Computers 2025, 14(4), 147; https://doi.org/10.3390/computers14040147
Submission received: 17 February 2025 / Revised: 31 March 2025 / Accepted: 3 April 2025 / Published: 11 April 2025

Abstract

:
The rise of 3D heterogeneous packaging holds promise for increased performance in applications such as AI by bringing compute and memory modules into close proximity. This increased performance comes with increased thermal management challenges. This research explores the use of thermal sensing and load throttling combined with federated computation to manage localized internal heating in a multi-3D chip package. The overall concept is that individual chiplets may heat at different rates due to operational and geometric factors. Shifting computational loads from hot to cooler chiplets can prevent local overheating while maintaining overall computational output. This concept is verified with experiments in a low-cost test vehicle. The test vehicle mimics a 3D chiplet stack with a tightly stacked assembly of SoC devices. These devices can sense and report internal temperature and dynamically adjust frequency. The configuration is for ESP32-S3 microcontrollers to work on a federated computational task, while reporting internal temperature to a host controller. The tight packing of processors causes temperatures to rise, with those internal to the stack rising more quickly than external ones. With real-time temperature monitoring, when the temperatures exceed a threshold, the AI system reduces the processor frequency, i.e., throttles the processor, to save power and dynamically shifts part of the workload to other ESP32-S3s with lower temperatures. This approach maximizes overall efficiency while maintaining thermal safety without compromising computational power. Experimental results with up to six processors confirm the validity of the concept.

1. Introduction

The relentless demand for high-performance computing and miniaturized electronic systems has driven the evolution of packaging technologies, culminating in heterogeneous 3D integration. Historically, electronic packaging focused on two-dimensional (2D) assembly methods, where components were mounted side-by-side on printed circuit boards (PCBs). This approach, however, reached its physical and performance limits as the semiconductor industry pursued Moore’s Law. The advent of Through-Silicon Vias (TSVs) in the early 21st century enabled three-dimensional (3D) integration. TSVs provided vertical interconnects through silicon wafers, reducing interconnect distances, improving signal integrity, and enhancing power efficiency [1]. Over time, heterogeneous packaging extended these advancements by integrating disparate technologies, such as logic, memory, and analog devices, into unified systems, leveraging hybrid bonding techniques and advanced interconnect methodologies [2,3].
Heterogeneous packaging enables the stacking of chips with different functionalities, such as processors and memory, into a single module. This approach, combined with technologies like hybrid wafer bonding, facilitates unprecedented levels of system integration, opening doors for advanced applications in artificial intelligence (AI) and edge computing [4]. By minimizing signal propagation delays and reducing device footprints, heterogeneous 3D packaging enhances computational efficiency, making it indispensable for tasks requiring high-speed processing and minimal latency [5]. Yet, these benefits come at a cost—thermal management remains a formidable challenge.
As chip densities increase and transistor sizes shrink, thermal hotspots emerge as a critical bottleneck. Uneven heat dissipation, particularly in densely stacked 3D architectures, leads to localized overheating, which degrades performance, increases power consumption, and shortens the lifespan of electronic systems [6]. Unlike traditional 2D systems, where heat can be dissipated through the package surface, 3D architectures exacerbate thermal issues by limiting the available cooling surface area while concentrating heat sources in a confined space [7]. The problem is further compounded in heterogeneous systems, where the thermal properties of integrated components can vary significantly, creating thermal mismatches and hotspots [8].
To address these challenges, researchers have explored a variety of innovative thermal management strategies. Traditional methods, such as heat sinks and thermal interface materials (TIMs), have been adapted for 3D systems but often fall short in meeting the unique demands of heterogeneous integration [9]. For instance, microfluidic cooling, which involves circulating coolant through microchannels embedded in the package, offers high thermal efficiency but adds complexity to the manufacturing process [6]. Similarly, liquid cooling using cold plates has shown promise in mitigating heat in high-power applications, yet its scalability remains a challenge for compact edge devices [9].
A more recent approach leverages dynamic thermal management (DTM) techniques, which adjust system parameters in real-time to balance power consumption and thermal output. AI-driven load balancing and real-time thermal sensing are integral to DTM, enabling predictive adjustments to workload distribution and processor frequency based on temperature data [7]. These methods not only mitigate thermal hotspots but also optimize overall system performance, making them particularly suitable for heterogeneous 3D systems. Predictive modeling using machine learning algorithms has further enhanced the efficacy of these techniques, allowing systems to anticipate thermal events and adapt accordingly [2,8].
Parallel computing has emerged as another promising avenue for thermal management in heterogeneous architectures. By distributing computational tasks across multiple processing units, parallel computing reduces the thermal load on individual cores, preventing localized overheating. This approach is particularly effective in edge computing applications, where workloads can be dynamically allocated to minimize energy consumption and heat generation [9]. For example, parallelized algorithms for tasks like image recognition and natural language processing not only improve computational efficiency but also help maintain thermal stability in 3D integrated systems [3].
The integration of ESP32-S3 microcontrollers in edge computing systems exemplifies the potential of parallel computing for thermal management. These devices, known for their low power consumption and high computational efficiency, are ideal for implementing parallelized workloads in a compact form factor. By using real-time temperature monitoring and dynamic task allocation, systems can adapt to thermal variations, ensuring reliable operation under diverse conditions. AI-based thermal prediction models further enhance this adaptability, enabling systems to preemptively reconfigure tasks to avoid overheating.
Artificial intelligence (AI) has revolutionized thermal management in heterogeneous packaging by enabling adaptive, real-time control over heat dissipation. AI-driven techniques leverage predictive analytics and machine learning to monitor, model, and manage thermal conditions dynamically. These methods surpass traditional thermal management systems by responding to localized overheating and optimizing task allocation and processor frequencies based on predictive data [10]. For instance, machine learning algorithms have been employed to predict temperature spikes and adjust workloads accordingly, effectively preventing thermal bottlenecks in multi-die systems [11]. This capability is especially critical in 3D stacked architectures where thermal gradients can severely impact performance and reliability.
Furthermore, AI-driven approaches facilitate the integration of advanced cooling technologies, such as microfluidics and phase-change materials, by coordinating their operations with processor workload requirements [12]. These systems employ real-time data from thermal sensors distributed throughout the package to identify hotspots and adapt cooling mechanisms, thereby optimizing energy efficiency while maintaining performance thresholds. Additionally, AI applications in System-in-Package (SiP) technologies enable complex heat mapping and predictive load balancing, providing a holistic solution for managing the intricate thermal demands of heterogeneous systems [13].
In edge computing and AI-intensive tasks, AI-based thermal management not only addresses heat dissipation but also improves overall system efficiency by reducing power consumption and extending component lifespans. By incorporating sophisticated neural networks, these systems can autonomously adapt to varying workloads and environmental conditions, ensuring sustainable operation across a wide range of applications. As heterogeneous integration progresses, AI will play a pivotal role in overcoming thermal challenges, ensuring the viability of next-generation computing architectures.

2. Methodology

2.1. Test Vehicle

The test vehicle for this research is a custom-designed 3D-printed shield that houses a compact assembly of 6 ESP32-S3 SoC devices. ESP32-S3 is a powerful and versatile microcontroller from Espressif Systems. It is equipped with a dual-core Xtensa® LX7 processor that has 3 frequency options: 80 MHz, 160 MHz, and 240 MHz. The Xtensa LX7 processor is also equipped with a built-in temperature sensor. Additionally, the ESP32-S3 device also has 8 MiB of flash memory and 512 KB of SRAM. Each component is highlighted in Figure 1 and listed in Table 1.
Each ESP32-S3 within the stack is configured to execute parallel computing tasks—edge detection for this research. This setup highlights the processing power of the ESP32-S3 and its suitability for localized computation in scenarios requiring high efficiency and low latency. In addition to their computational capabilities, the ESP32-S3 devices are equipped with in-built temperature sensors to monitor and report their internal temperatures in real-time. This real-time feedback is essential for dynamic thermal management, as each device can adjust its operating frequency based on external control. The structural design of the test vehicle is based on a 3D-printed enclosure, as shown in Figure 2. This enclosure is designed to hold the 6 ESP32-S3 devices in a vertically stacked arrangement, ensuring both stability and accessibility. The stack is a compact assembly with a small footprint for test scenarios requiring dense integration. These ESP32-S3 devices are labeled 1 to 6 from top to bottom as shown in Figure 2.

2.2. Conceptual Approach

Dynamic load balancing and thermal management are the kernel of this system. By leveraging real-time temperature data from the ESP-S3 chiplets, computational tasks are distributed dynamically. Tasks are allocated based on the thermal and performance state of each chiplet, enabling higher clock frequencies and maintaining temperatures close to the pre-defined value, for optimal performance.
The AI-driven model is trained with the thermal behavior of 6 ESP32-S3 devices. By monitoring SoC temperature, the AI model predicts the temperature 5 s later and adjusts processor frequencies and workload distribution proactively. During computational tasks like edge detection, workloads are divided into subtasks and allocated dynamically according to the current state of the chiplets.
The federated computation framework, inspired by stackable SoC architectures, distributes tasks across multiple ESP32-S3 devices. This framework leverages the parallel processing capabilities of the ESP32-S3 to manage workloads effectively while controlling thermal conditions.

3. Experimental Setup

3.1. Initial Data Collection

To study the thermal behavior of ESP32-S3 devices under different clock frequencies, and also to collect data for AI-model training, 6 ESP32-S3 devices were configured to operate at full load while running an edge detection algorithm. During the data collection period, Unity Editor (2022.3.39f1) was used as the host to read and control the collection of ESP32-S3 SoC temperature data through serial port communication. The monitoring and control dashboard is shown in Figure 3. The interface includes a list of connected devices identified by their port names (e.g., COM4, COM6), with each entry displaying the SoC temperature, the number of completed computation rounds, and the current status of the devices, such as “Stand by”, “Creating Matrix”, “Simulating”, or “Complete”. On the right side, control buttons labeled “Scan”, “Connect”, “Start”, and “Stop” allow the user to manage the operations of connected ESP32-S3 devices. A timer in the bottom right corner displays the elapsed time in the format of minutes, seconds, and milliseconds.
The ESP32-S3 devices are programmed using Arduino IDE (Version: 2.3.4) to report the SoC temperature as well as the status marker (such as “Creating matrix” and “Simulating” in Figure 4) in real-time back to the host. The status marker is maintained throughout the data collection phase to keep track of how many computation cycles the ESP32-S3 stack package completes in total.
As shown in Figure 4, the edge detection process begins with a “Creating Matrix” step, where a 1024 × 768 matrix is initialized, and a marker “Creating Matrix” is sent to host. Next, the flow moves to the simulation step, where the edge detection algorithm is applied. This includes calculating pixel gradients with the Sobel operator, computing the gradient magnitude, and clamping the output values to ensure they fall within a valid range. A status marker labeled “Simulating” is also sent to the host during this step. The simulation is repeated 10 times to process a high-resolution image as this large-scale resolution requires multiple iterations to fully analyze and complete the necessary edge detection operations. After completing the simulation, the process checks whether the simulation has been repeated 10 times. If 10 iterations are complete, it proceeds to check if the simulation has run for 60 min. If not, the flow loops back to the “Creating matrix” step again. If the 60 min condition is met, the results are saved as a .csv file, marking the end of the process.
As mentioned before, each ESP32-S3 was set to a fixed frequency among the 3 three options, 80 MHz, 160 MHz, and 240 MHz, during separate rounds of testing. Figure 5 shows the thermal behavior of each ESP32-S3 SoC in the package stack during the one-hour collection period.
At the beginning of each round, the temperature starts from 27   ° C . In Figure 5a, the temperature rises gradually, stabilizing between 35   ° C and 40   ° C , showing the slowest increase and lowest stabilization temperature due to the lower power consumption. At 160 MHz, in Figure 5b, the temperature climbs more quickly and stabilizes between 40   ° C and 50   ° C , reflecting moderate heat generation. At 240 MHz, in Figure 5c, the temperature rises most rapidly, stabilizing between 50   ° C and 55   ° C , highlighting the highest thermal load. These curves demonstrate the relationship between operating frequency and thermal behavior, with higher frequencies generating more heat and reaching higher stabilization temperatures. Table A1 presents the performance metrics of 6 ESP32-S3 devices at 80 MHz, 160 MHz, and 240 MHz. At 80 MHz, the average time for creating a matrix is around 30.2 s, and each computation round takes approximately 123.16 s, resulting in 30 completed rounds in one hour. At 160 MHz, the matrix creation time is halved to 14.97 s, and the total time per computation round is reduced to about 60.98 s, allowing 60 rounds to be completed. At 240 MHz, the matrix creation time further decreases to 9.95 s, with each round taking approximately 40.57 s, enabling 89 rounds in one hour. These collected data demonstrate that higher frequencies significantly reduce computation time per round, allowing for a greater number of completed rounds but at the cost of increased energy consumption and potential thermal stress. The comparison between this table and the AI-model-driven result is discussed later.

3.2. AI Model Training

The collected temperature data serve as the foundation for training a logistic regression model designed to predict short-term thermal behavior. Specifically, in Figure 6, the model forecasts whether the temperature of an ESP32-S3 device will increase or decrease over the next 10 s. By analyzing both current and historical temperature data, the model provides insights into the system’s thermal trends.
A total of 18 temperature–time datasets (6 devices at 3 different frequencies) were obtained, each capturing the heating process from an initial ambient temperature to a stabilization status over 3600 s. At different frequencies, among 6 curves, the third ESP32-S3 dataset was chosen for model training because it exhibited the highest overall temperature range, thus better representing the extreme thermal conditions of the package stack.
The AI model for predicting future temperature trends involves processing raw time-temperature data, creating predictive features, and training a logistic regression model using the Scikit-Learn library. The data provided include timestamps and temperature values recorded at a fixed interval of 0.05 s and are described in a pair as
t i , T i
where T i is the temperature (in °C) recorded at time t i (in seconds).
To predict the temperature 10 s into the predictive future based on recent behavior, a sliding-window approach is adopted. For each valid index t i , the history covering the previous 10 s is defined by the 10 s / 0.05 s = 200 data points preceding the current index, since each step is 0.05 s. Then, the predictive feature or the average temperature over the past 200 steps can be calculated as
T ¯ i 200 , i 1 = 1 200 j = i 200 i 1 T j
The temperature 10 s later is T i + 200 . Then, two features can be extracted: the current temperature T i and the rolling average of temperatures over the preceding 10 s T ¯ i 200 , i 1 . The label y is derived by comparing T i and T i + 200 . Specifically,
y = 1 , T i + 200 > T i 0 , T i + 200 < T i
The logistic regression model predicts the probability that the temperature will increase based on these two features. The model is represented mathematically as
P y = 1 | T = σ ( z ) = 1 1 + e z , z = β 0 + β 1 T i + β 2 T ¯ i 200 , i ,
where P ( y = 1 | T ) denotes the probability of the target variable y = 1 , s σ ( z ) is the sigmoid activation function, z is the linear combination of the features T i and T ¯ i 200 , i , and β 0 , β 1 , and β 2 are the model coefficients.
The model is trained using Scikit-Learn’s LogisticRegression class [14]. By repeating the training process 3 times with 80 MHz, 160 MHz, and 240 MHz, 3 AI models are trained. The tuned coefficients of β 0 , β 1 , and β 2 as well as the accuracy are listed in Table 2.

3.3. Load-Sharing Model

With AI models, load sharing is a critical strategy in distributed computing, particularly in systems where temperature control and efficiency are paramount. By dynamically distributing tasks across multiple devices based on their thermal and computational states, load sharing not only enhances performance but also prevents localized overheating. Efficient load sharing ensures that devices operating at higher temperatures are assigned fewer tasks, allowing them to cool down, while cooler devices with higher CPU frequencies handle a larger share of the workload.
Figure 7 depicts a distributed processing system designed for efficient task execution across multiple ESP32-S3 microcontrollers. Identical to the procedure of raw temperature data collection, the system addresses the task of processing a high-resolution matrix (10,240 × 7680 pixels) by partitioning it into smaller subtasks, leveraging the hardware capabilities of the ESP32-S3, and incorporating the AI-driven CPU frequency adjustment. The matrix processing is initiated with the creation of a pixel matrix, subsequently divided into 10 subtasks. Each subtask, slightly larger than 1024 × 768 pixels (1026 × 770 pixels) due to the application of the Sobel operator, includes a two-pixel overlap to accommodate convolution-based edge detection. This ensures seamless integration of the processed subtasks into the final result, addressing boundary inconsistencies inherent to edge detection algorithms. By breaking down the problem into independent units, the system prepares the data for parallel computation.
A critical component of the system is its task allocation mechanism, which is governed by the dynamic availability of ESP32-S3 devices. The framework continuously scans serial ports to detect the presence of ESP32-S3 devices. Upon identifying available devices, the system implements a task allocation model that is coupled with an AI model, because the AI model dynamically evaluates the current temperature and operational CPU frequency of each device, prioritizing those with lower temperatures and higher frequencies.
The process monitors task completion and reassigns unfinished subtasks as necessary. Once all tasks are completed, the system checks if 60 min has elapsed since the last save, saving the processed data to a .csv file if the condition is met. This ensures data security and enables long-term operation. The iterative workflow continues, making the system scalable, efficient, and suitable for processing large datasets while maintaining hardware safety and performance.

4. Results

4.1. Temperature Adjustments

By implementing AI trained with Scikit-Learn [14] and load-sharing models, the test was divided into two groups. The first group set the temperature limit to 45   ° C , while the second group set it to 50   ° C . Identical to the raw data collection part, the test also lasted one hour, and the temperature variation curves for the two groups are shown in Figure 8 and Figure 9.
In the 50   ° C group, the temperature gradually increases and stabilizes near the upper threshold, showing some variability among devices. The 45   ° C group stabilizes more quickly and exhibits tighter control, with less variability between devices, indicating more consistent thermal management. The 45   ° C shows slight overshooting; however, both groups enter a relatively stable phase after 1500 s.

4.2. AI Model Performance

The status markers are also recorded during the test period. The 45   ° C group completes 493 computation rounds while the 50   ° C group completes 542 computation rounds. The comparison between the AI-model- and load-sharing-model-driven ESP32-S3 package stack and no-model-driven ESP32-S3 package stack is shown in Figure 10.
Without the AI model and load-sharing model, at 80 MHz, the system completes 180 rounds (34%), reflecting minimal performance and low thermal load ( T m a x : 39   ° C ). At 160 MHz, performance improves significantly to 360 rounds (67%), with T m a x rising to 49 °C, while at 240 MHz, the system achieves peak performance with 534 rounds (100%) but at the highest temperature of 56   ° C . With AI-driven load balancing, the 45   ° C limit achieves 493 rounds (92%) with T m a x at 48   ° C , demonstrating effective thermal control. Raising the limit to 50   ° C results in 542 rounds (101%), a slight 1% performance gain with a 5   ° C increase in T m a x ( 51   ° C ). These results highlight the AI model’s ability to optimize task allocation, maintaining near-maximum performance under thermal constraints while balancing the trade-off between computational throughput and temperature.

5. Discussion

5.1. Achievements

The system achieved real-time thermal management across multiple devices by utilizing a logistic regression model trained on collected temperature data to predict temperature changes over the next 5 s. This predictive capability enabled proactive adjustments to the SoC frequencies as temperatures approached critical thresholds, effectively preventing overheating while maintaining performance. As shown in Figure 10, the AI model dynamically optimized SoC frequencies based on these predictions, demonstrating a significant improvement in temperature control and computational efficiency. Furthermore, the implementation of a dynamic load balancing and task allocation mechanism ensured efficient workload distribution. Tasks were allocated to ESP32-S3 devices with lower temperatures and higher frequency capacities, maintaining balanced thermal states and maximizing system throughput. These combined strategies allowed the system to achieve superior performance under thermal constraints, as evidenced by the AI-driven 50 °C group completing 542 computation rounds while maintaining a manageable T m a x of 51 °C.

5.2. Limitations and Future Directions

The current study faced several limitations that impacted the performance of the AI-driven thermal management system. Limited data during model training were a significant constraint, as the relatively small dataset used in the training phase may not have fully captured the complex thermal dynamics of the ESP32-S3 stack under varying operating conditions. This limitation likely contributed to the observed overshoot in temperature predictions, as the model lacked sufficient variability in the data to generalize effectively. Additionally, the approach to stack-level modeling versus individual device modeling introduced further challenges. The AI model was trained on aggregated temperature data from all devices in the stack, representing the average thermal behavior of the system. While this approach simplifies the modeling process, it does not account for the unique thermal profiles and behaviors of individual devices, potentially reducing the precision of temperature predictions and control.
For future research, data collection and model training should be expanded, with a focus on gathering more granular performance data from each individual ESP32-S3 device. Training separate models for each device would enhance the precision of the temperature control system by accommodating the unique thermal characteristics of each component. Additionally, AI model comparison should be conducted to explore alternative AI models that may offer superior performance in predicting thermal behavior and optimizing temperature control. Comparing various machine learning algorithms, such as neural networks or ensemble models, could uncover more effective solutions for managing thermal dynamics. Lastly, optimizing load-sharing algorithms offers a promising avenue for further improvement. Dynamic task allocation based on both temperature and processing capability could ensure an even more efficient distribution of workloads across the stack, reducing thermal hotspots and enhancing overall system performance. These advancements would contribute to a more robust, efficient, and scalable thermal management framework for the ESP32-S3 package stack.

6. Conclusions

This study demonstrates the effectiveness of integrating real-time thermal management and AI-driven load balancing strategies in managing performance throttling within a low-cost, heterogeneous 3D ESP32-S3 package stack. The experimental results confirmed that dynamically adjusting processor frequencies and strategically redistributing computational workloads across multiple ESP32-S3 devices significantly improved thermal stability without compromising computational performance. Specifically, the AI-driven approach enabled the system to achieve near-optimal computational throughput while maintaining thermal constraints, effectively preventing localized overheating in densely integrated chip stacks.
Despite the promising outcomes, further research is required to address identified limitations, such as the model’s reliance on limited training data and its generalization across individual device behaviors. Future investigations should focus on collecting more comprehensive datasets, exploring device-specific predictive models, and refining load-sharing algorithms to enhance precision and efficiency. Continued advancements in AI-based thermal management strategies will be essential to meeting the increasing performance and miniaturization demands of next-generation heterogeneous computing architectures.

Author Contributions

Conceptualization, Y.L., P.S.S., D.H. and T.X.; methodology, Y.L., P.S.S., D.H., and T.X.; software, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, D.H. and T.X.; visualization, Y.L.; supervision, D.H. and T.X.; funding acquisition, D.H. and T.X. All authors have read and agreed to the published version of the manuscript.

Funding

National Science Foundation grant no. 2119485.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Parth Sandeepbhai Shah are employed by Intel Corporation Chandler, USA. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Performance metrics of ESP32-S3 devices running at different frequencies, including average times for matrix creation, and one computation round, as well as total completed rounds. (a) At 80 MHz. (b) At 160 MHz. (c) At 240 MHz.
Table A1. Performance metrics of ESP32-S3 devices running at different frequencies, including average times for matrix creation, and one computation round, as well as total completed rounds. (a) At 80 MHz. (b) At 160 MHz. (c) At 240 MHz.
(a)
Device NumberAvg. Creating Matrix Time (s)Avg. Simulation Time (s)Avg. One Round Time (s)Completed Computation Rounds
130.197992.9652123.163130
230.198792.9674123.166130
330.199792.9650123.164730
430.197892.9617123.159530
530.197992.9649123.162830
630.198192.9678123.165930
(b)
Device NumberAvg. Creating Matrix Time (s)Avg. Simulation Time (s)Avg. One Round Time (s)Completed Computation Rounds
114.970046.005560.975560
214.970746.999660.970360
314.971546.006660.978160
414.970046.006160.976160
514.971346.999260.970560
614.970746.011060.981860
(c)
Device NumberAvg. Creating Matrix Time (s)Avg. Simulation Time (s)Avg. One Round Time (s)Completed Computation Rounds
19.953330.619540.5725 89
29.953530.616040.5690 89
39.953730.620640.5740 89
49.953730.620640.5734 89
59.953630.615540.5687 89
69.953530.615840.5689 89

References

  1. Lau, J.H. Overview and outlook of through-silicon via (TSV) and 3D integrations. Microelectron. Int. 2011, 28, 8–22. [Google Scholar] [CrossRef]
  2. Shaikh, S.F. Heterogeneous Integration Strategy for Obtaining Physically Flexible 3d Compliant Electronic Systems. Ph.D. Dissertation, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia, 2020. [Google Scholar]
  3. Singh, N.; Kumar, A.; Srivastava, K.; Yadav, N.; Singh, R.; Verma, A.S.; Gehlot, A.; Yadav, A.K.; Kumar, T.; Pandey, K.; et al. Challenges and Opportunities in Engineering of Next Generation 3D Microelectronic Devices: Improved Performance, Higher Integration Density. Nanoscale Adv. 2024, 6, 6044–6060. [Google Scholar] [CrossRef] [PubMed]
  4. Peng, X.; Kaul, A.; Bakir, M.S.; Yu, S. Heterogeneous 3-d integration of multitier compute-in-memory accelerators: An electrical-thermal co-design. IEEE Trans. Electron Devices 2021, 68, 5598–5605. [Google Scholar] [CrossRef]
  5. Yu, W.; Cheng, S.; Li, Z.; Liu, L.; Zhang, Z.; Zhao, Y.; Guo, Y.; Liu, S. The Application of Multi-scale Simulation in Advanced Electronic Packaging. Fundam. Res. 2024, 4, 1442–1454. [Google Scholar] [CrossRef] [PubMed]
  6. Cheemalamarri, H.K.; Bonam, S.; Vanjari, S.R.K.; Singh, S.G. Ti/Si interface enabling complementary metal oxide semiconductor compatible, high reliable bonding for inter-die micro-fluidic cooling for future advanced 3D integrated circuit integration. J. Micromech. Microeng. 2020, 30, 105005. [Google Scholar] [CrossRef]
  7. Chang, Y.W. Physical Design Challenges in Modern Heterogeneous Integration. In Proceedings of the 2024 International Symposium on Physical Design, Taipei, Taiwan, 12–15 March 2024; pp. 125–134. [Google Scholar]
  8. Joseph, J.M. Networks-on-Chip for heterogeneous 3D Systems-on-Chip. Ph.D. Thesis, University of Halle, Halle, Germany, 2019. [Google Scholar]
  9. Wang, C.; Vafai, K. Heat transfer enhancement for 3D chip thermal simulation and prediction. Appl. Therm. Eng. 2024, 236, 121499. [Google Scholar] [CrossRef]
  10. Chen, S.; Zhang, H.; Ling, Z.; Zhai, J.; Yu, B. The Survey of Chiplet-based Integrated Architecture: An EDA perspective. arXiv 2024, arXiv:2411.04410. [Google Scholar]
  11. Benelhaouare, A.Z.; Mellal, I.; Oumlaz, M.; Lakhssassi, A. Mitigating Thermal Side-Channel Vulnerabilities in FPGA-Based SiP Systems Through Advanced Thermal Management and Security Integration Using Thermal Digital Twin (TDT) Technology. Electronics 2024, 13, 4176. [Google Scholar] [CrossRef]
  12. Ramamoorthi, V. Multi-Objective Optimization Framework for Cloud Applications Using AI-Based Surrogate Models. J. Big-Data Anal. Cloud Comput. 2021, 6, 23–32. [Google Scholar]
  13. Yakoubi, S. Sustainable Revolution: AI-Driven Enhancements for Composite Polymer Processing and Optimization in Intelligent Food Packaging. Food Bioprocess Technol. 2024, 18, 82–107. [Google Scholar] [CrossRef]
  14. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Figure 1. Annotated diagram of ESP32-S3’s key components.
Figure 1. Annotated diagram of ESP32-S3’s key components.
Computers 14 00147 g001
Figure 2. The ESP32-S3 stack configuration with device numbering, where the gaps between each ESP32-S3 device are 4.2 mm.
Figure 2. The ESP32-S3 stack configuration with device numbering, where the gaps between each ESP32-S3 device are 4.2 mm.
Computers 14 00147 g002
Figure 3. ESP32-S3 management interface built in Unity showing monitoring and control dashboard.
Figure 3. ESP32-S3 management interface built in Unity showing monitoring and control dashboard.
Computers 14 00147 g003
Figure 4. Flowchart of thermal behavior collection and edge detection algorithm.
Figure 4. Flowchart of thermal behavior collection and edge detection algorithm.
Computers 14 00147 g004
Figure 5. Thermal behavior of individual ESP32-S3 at 3 different frequencies in the package stack.
Figure 5. Thermal behavior of individual ESP32-S3 at 3 different frequencies in the package stack.
Computers 14 00147 g005
Figure 6. Flowchart of AI-model workflow and process.
Figure 6. Flowchart of AI-model workflow and process.
Computers 14 00147 g006
Figure 7. Load-sharing framework for task-allocation-distributed ESP32-S3 package stack.
Figure 7. Load-sharing framework for task-allocation-distributed ESP32-S3 package stack.
Computers 14 00147 g007
Figure 8. AI-driven model performance at 45 °C temperature limit.
Figure 8. AI-driven model performance at 45 °C temperature limit.
Computers 14 00147 g008
Figure 9. AI-driven model performance at 50 °C temperature limit.
Figure 9. AI-driven model performance at 50 °C temperature limit.
Computers 14 00147 g009
Figure 10. Comparison of ESP32-S3 performance under fixed frequencies and AI-driven thermal management. Blue: results under fixed frequency, Green: results with AI controlled frequency.
Figure 10. Comparison of ESP32-S3 performance under fixed frequencies and AI-driven thermal management. Blue: results under fixed frequency, Green: results with AI controlled frequency.
Computers 14 00147 g010
Table 1. Mapping of figure annotations to ESP32-S3 components.
Table 1. Mapping of figure annotations to ESP32-S3 components.
NumbersDescriptions
1ESP32-S3 Module
28 MiB Flash Memory
3Power Supply
4WS2812 LED
5USB Serial/JTAG Interface
6USB OTG Interface
7Serial Port Chip
8Boot Button
9Reset Button
1040 MHz Crystal Oscillator
Table 2. Values of trained β 0 , β 1 , and β 3 .
Table 2. Values of trained β 0 , β 1 , and β 3 .
Target Frequencies (MHz) β 0 β 1 β 2 Accuracy
80−2.1549−8.72928.662297.70%
160−3.0975−8.91338.884696.92%
240−1.9406−8.98658.937397.45%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Shah, P.S.; Xia, T.; Huston, D. An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack. Computers 2025, 14, 147. https://doi.org/10.3390/computers14040147

AMA Style

Liu Y, Shah PS, Xia T, Huston D. An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack. Computers. 2025; 14(4):147. https://doi.org/10.3390/computers14040147

Chicago/Turabian Style

Liu, Yi, Parth Sandeepbhai Shah, Tian Xia, and Dryver Huston. 2025. "An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack" Computers 14, no. 4: 147. https://doi.org/10.3390/computers14040147

APA Style

Liu, Y., Shah, P. S., Xia, T., & Huston, D. (2025). An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack. Computers, 14(4), 147. https://doi.org/10.3390/computers14040147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop