An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack

Liu, Yi; Shah, Parth Sandeepbhai; Xia, Tian; Huston, Dryver

doi:10.3390/computers14040147

Open AccessArticle

An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack

by

Yi Liu

¹

,

Parth Sandeepbhai Shah

²,

Tian Xia

^3,* and

Dryver Huston

^1,*

¹

Department of Mechanical Engineering, University of Vermont, Burlington, VT 05405, USA

²

Intel Corporation, Chandler, AZ 85224, USA

³

Department of Computer Science, University of Vermont, Burlington, VT 05405, USA

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(4), 147; https://doi.org/10.3390/computers14040147

Submission received: 17 February 2025 / Revised: 31 March 2025 / Accepted: 3 April 2025 / Published: 11 April 2025

Download

Browse Figures

Versions Notes

Abstract

The rise of 3D heterogeneous packaging holds promise for increased performance in applications such as AI by bringing compute and memory modules into close proximity. This increased performance comes with increased thermal management challenges. This research explores the use of thermal sensing and load throttling combined with federated computation to manage localized internal heating in a multi-3D chip package. The overall concept is that individual chiplets may heat at different rates due to operational and geometric factors. Shifting computational loads from hot to cooler chiplets can prevent local overheating while maintaining overall computational output. This concept is verified with experiments in a low-cost test vehicle. The test vehicle mimics a 3D chiplet stack with a tightly stacked assembly of SoC devices. These devices can sense and report internal temperature and dynamically adjust frequency. The configuration is for ESP32-S3 microcontrollers to work on a federated computational task, while reporting internal temperature to a host controller. The tight packing of processors causes temperatures to rise, with those internal to the stack rising more quickly than external ones. With real-time temperature monitoring, when the temperatures exceed a threshold, the AI system reduces the processor frequency, i.e., throttles the processor, to save power and dynamically shifts part of the workload to other ESP32-S3s with lower temperatures. This approach maximizes overall efficiency while maintaining thermal safety without compromising computational power. Experimental results with up to six processors confirm the validity of the concept.

Keywords:

package stack heating; 3D heterogeneous integration; federated computing; artificial intelligence; thermal control; throttling

1. Introduction

The relentless demand for high-performance computing and miniaturized electronic systems has driven the evolution of packaging technologies, culminating in heterogeneous 3D integration. Historically, electronic packaging focused on two-dimensional (2D) assembly methods, where components were mounted side-by-side on printed circuit boards (PCBs). This approach, however, reached its physical and performance limits as the semiconductor industry pursued Moore’s Law. The advent of Through-Silicon Vias (TSVs) in the early 21st century enabled three-dimensional (3D) integration. TSVs provided vertical interconnects through silicon wafers, reducing interconnect distances, improving signal integrity, and enhancing power efficiency [1]. Over time, heterogeneous packaging extended these advancements by integrating disparate technologies, such as logic, memory, and analog devices, into unified systems, leveraging hybrid bonding techniques and advanced interconnect methodologies [2,3].

Heterogeneous packaging enables the stacking of chips with different functionalities, such as processors and memory, into a single module. This approach, combined with technologies like hybrid wafer bonding, facilitates unprecedented levels of system integration, opening doors for advanced applications in artificial intelligence (AI) and edge computing [4]. By minimizing signal propagation delays and reducing device footprints, heterogeneous 3D packaging enhances computational efficiency, making it indispensable for tasks requiring high-speed processing and minimal latency [5]. Yet, these benefits come at a cost—thermal management remains a formidable challenge.

As chip densities increase and transistor sizes shrink, thermal hotspots emerge as a critical bottleneck. Uneven heat dissipation, particularly in densely stacked 3D architectures, leads to localized overheating, which degrades performance, increases power consumption, and shortens the lifespan of electronic systems [6]. Unlike traditional 2D systems, where heat can be dissipated through the package surface, 3D architectures exacerbate thermal issues by limiting the available cooling surface area while concentrating heat sources in a confined space [7]. The problem is further compounded in heterogeneous systems, where the thermal properties of integrated components can vary significantly, creating thermal mismatches and hotspots [8].

To address these challenges, researchers have explored a variety of innovative thermal management strategies. Traditional methods, such as heat sinks and thermal interface materials (TIMs), have been adapted for 3D systems but often fall short in meeting the unique demands of heterogeneous integration [9]. For instance, microfluidic cooling, which involves circulating coolant through microchannels embedded in the package, offers high thermal efficiency but adds complexity to the manufacturing process [6]. Similarly, liquid cooling using cold plates has shown promise in mitigating heat in high-power applications, yet its scalability remains a challenge for compact edge devices [9].

A more recent approach leverages dynamic thermal management (DTM) techniques, which adjust system parameters in real-time to balance power consumption and thermal output. AI-driven load balancing and real-time thermal sensing are integral to DTM, enabling predictive adjustments to workload distribution and processor frequency based on temperature data [7]. These methods not only mitigate thermal hotspots but also optimize overall system performance, making them particularly suitable for heterogeneous 3D systems. Predictive modeling using machine learning algorithms has further enhanced the efficacy of these techniques, allowing systems to anticipate thermal events and adapt accordingly [2,8].

Parallel computing has emerged as another promising avenue for thermal management in heterogeneous architectures. By distributing computational tasks across multiple processing units, parallel computing reduces the thermal load on individual cores, preventing localized overheating. This approach is particularly effective in edge computing applications, where workloads can be dynamically allocated to minimize energy consumption and heat generation [9]. For example, parallelized algorithms for tasks like image recognition and natural language processing not only improve computational efficiency but also help maintain thermal stability in 3D integrated systems [3].

The integration of ESP32-S3 microcontrollers in edge computing systems exemplifies the potential of parallel computing for thermal management. These devices, known for their low power consumption and high computational efficiency, are ideal for implementing parallelized workloads in a compact form factor. By using real-time temperature monitoring and dynamic task allocation, systems can adapt to thermal variations, ensuring reliable operation under diverse conditions. AI-based thermal prediction models further enhance this adaptability, enabling systems to preemptively reconfigure tasks to avoid overheating.

Artificial intelligence (AI) has revolutionized thermal management in heterogeneous packaging by enabling adaptive, real-time control over heat dissipation. AI-driven techniques leverage predictive analytics and machine learning to monitor, model, and manage thermal conditions dynamically. These methods surpass traditional thermal management systems by responding to localized overheating and optimizing task allocation and processor frequencies based on predictive data [10]. For instance, machine learning algorithms have been employed to predict temperature spikes and adjust workloads accordingly, effectively preventing thermal bottlenecks in multi-die systems [11]. This capability is especially critical in 3D stacked architectures where thermal gradients can severely impact performance and reliability.

Furthermore, AI-driven approaches facilitate the integration of advanced cooling technologies, such as microfluidics and phase-change materials, by coordinating their operations with processor workload requirements [12]. These systems employ real-time data from thermal sensors distributed throughout the package to identify hotspots and adapt cooling mechanisms, thereby optimizing energy efficiency while maintaining performance thresholds. Additionally, AI applications in System-in-Package (SiP) technologies enable complex heat mapping and predictive load balancing, providing a holistic solution for managing the intricate thermal demands of heterogeneous systems [13].

In edge computing and AI-intensive tasks, AI-based thermal management not only addresses heat dissipation but also improves overall system efficiency by reducing power consumption and extending component lifespans. By incorporating sophisticated neural networks, these systems can autonomously adapt to varying workloads and environmental conditions, ensuring sustainable operation across a wide range of applications. As heterogeneous integration progresses, AI will play a pivotal role in overcoming thermal challenges, ensuring the viability of next-generation computing architectures.

2. Methodology

2.1. Test Vehicle

The test vehicle for this research is a custom-designed 3D-printed shield that houses a compact assembly of 6 ESP32-S3 SoC devices. ESP32-S3 is a powerful and versatile microcontroller from Espressif Systems. It is equipped with a dual-core Xtensa^® LX7 processor that has 3 frequency options: 80 MHz, 160 MHz, and 240 MHz. The Xtensa LX7 processor is also equipped with a built-in temperature sensor. Additionally, the ESP32-S3 device also has 8 MiB of flash memory and 512 KB of SRAM. Each component is highlighted in Figure 1 and listed in Table 1.

Each ESP32-S3 within the stack is configured to execute parallel computing tasks—edge detection for this research. This setup highlights the processing power of the ESP32-S3 and its suitability for localized computation in scenarios requiring high efficiency and low latency. In addition to their computational capabilities, the ESP32-S3 devices are equipped with in-built temperature sensors to monitor and report their internal temperatures in real-time. This real-time feedback is essential for dynamic thermal management, as each device can adjust its operating frequency based on external control. The structural design of the test vehicle is based on a 3D-printed enclosure, as shown in Figure 2. This enclosure is designed to hold the 6 ESP32-S3 devices in a vertically stacked arrangement, ensuring both stability and accessibility. The stack is a compact assembly with a small footprint for test scenarios requiring dense integration. These ESP32-S3 devices are labeled 1 to 6 from top to bottom as shown in Figure 2.

2.2. Conceptual Approach

Dynamic load balancing and thermal management are the kernel of this system. By leveraging real-time temperature data from the ESP-S3 chiplets, computational tasks are distributed dynamically. Tasks are allocated based on the thermal and performance state of each chiplet, enabling higher clock frequencies and maintaining temperatures close to the pre-defined value, for optimal performance.

The AI-driven model is trained with the thermal behavior of 6 ESP32-S3 devices. By monitoring SoC temperature, the AI model predicts the temperature 5 s later and adjusts processor frequencies and workload distribution proactively. During computational tasks like edge detection, workloads are divided into subtasks and allocated dynamically according to the current state of the chiplets.

The federated computation framework, inspired by stackable SoC architectures, distributes tasks across multiple ESP32-S3 devices. This framework leverages the parallel processing capabilities of the ESP32-S3 to manage workloads effectively while controlling thermal conditions.

3. Experimental Setup

3.1. Initial Data Collection

To study the thermal behavior of ESP32-S3 devices under different clock frequencies, and also to collect data for AI-model training, 6 ESP32-S3 devices were configured to operate at full load while running an edge detection algorithm. During the data collection period, Unity Editor (2022.3.39f1) was used as the host to read and control the collection of ESP32-S3 SoC temperature data through serial port communication. The monitoring and control dashboard is shown in Figure 3. The interface includes a list of connected devices identified by their port names (e.g., COM4, COM6), with each entry displaying the SoC temperature, the number of completed computation rounds, and the current status of the devices, such as “Stand by”, “Creating Matrix”, “Simulating”, or “Complete”. On the right side, control buttons labeled “Scan”, “Connect”, “Start”, and “Stop” allow the user to manage the operations of connected ESP32-S3 devices. A timer in the bottom right corner displays the elapsed time in the format of minutes, seconds, and milliseconds.

The ESP32-S3 devices are programmed using Arduino IDE (Version: 2.3.4) to report the SoC temperature as well as the status marker (such as “Creating matrix” and “Simulating” in Figure 4) in real-time back to the host. The status marker is maintained throughout the data collection phase to keep track of how many computation cycles the ESP32-S3 stack package completes in total.

As shown in Figure 4, the edge detection process begins with a “Creating Matrix” step, where a

1024 \times 768

matrix is initialized, and a marker “Creating Matrix” is sent to host. Next, the flow moves to the simulation step, where the edge detection algorithm is applied. This includes calculating pixel gradients with the Sobel operator, computing the gradient magnitude, and clamping the output values to ensure they fall within a valid range. A status marker labeled “Simulating” is also sent to the host during this step. The simulation is repeated 10 times to process a high-resolution image as this large-scale resolution requires multiple iterations to fully analyze and complete the necessary edge detection operations. After completing the simulation, the process checks whether the simulation has been repeated 10 times. If 10 iterations are complete, it proceeds to check if the simulation has run for 60 min. If not, the flow loops back to the “Creating matrix” step again. If the 60 min condition is met, the results are saved as a .csv file, marking the end of the process.

As mentioned before, each ESP32-S3 was set to a fixed frequency among the 3 three options, 80 MHz, 160 MHz, and 240 MHz, during separate rounds of testing. Figure 5 shows the thermal behavior of each ESP32-S3 SoC in the package stack during the one-hour collection period.

At the beginning of each round, the temperature starts from

27 ° C

. In Figure 5a, the temperature rises gradually, stabilizing between

35 ° C

and

40 ° C

, showing the slowest increase and lowest stabilization temperature due to the lower power consumption. At 160 MHz, in Figure 5b, the temperature climbs more quickly and stabilizes between

40 ° C

and

50 ° C

, reflecting moderate heat generation. At 240 MHz, in Figure 5c, the temperature rises most rapidly, stabilizing between

50 ° C

and

55 ° C

, highlighting the highest thermal load. These curves demonstrate the relationship between operating frequency and thermal behavior, with higher frequencies generating more heat and reaching higher stabilization temperatures. Table A1 presents the performance metrics of 6 ESP32-S3 devices at 80 MHz, 160 MHz, and 240 MHz. At 80 MHz, the average time for creating a matrix is around 30.2 s, and each computation round takes approximately 123.16 s, resulting in 30 completed rounds in one hour. At 160 MHz, the matrix creation time is halved to 14.97 s, and the total time per computation round is reduced to about 60.98 s, allowing 60 rounds to be completed. At 240 MHz, the matrix creation time further decreases to 9.95 s, with each round taking approximately 40.57 s, enabling 89 rounds in one hour. These collected data demonstrate that higher frequencies significantly reduce computation time per round, allowing for a greater number of completed rounds but at the cost of increased energy consumption and potential thermal stress. The comparison between this table and the AI-model-driven result is discussed later.

3.2. AI Model Training

The collected temperature data serve as the foundation for training a logistic regression model designed to predict short-term thermal behavior. Specifically, in Figure 6, the model forecasts whether the temperature of an ESP32-S3 device will increase or decrease over the next 10 s. By analyzing both current and historical temperature data, the model provides insights into the system’s thermal trends.

A total of 18 temperature–time datasets (6 devices at 3 different frequencies) were obtained, each capturing the heating process from an initial ambient temperature to a stabilization status over 3600 s. At different frequencies, among 6 curves, the third ESP32-S3 dataset was chosen for model training because it exhibited the highest overall temperature range, thus better representing the extreme thermal conditions of the package stack.

The AI model for predicting future temperature trends involves processing raw time-temperature data, creating predictive features, and training a logistic regression model using the Scikit-Learn library. The data provided include timestamps and temperature values recorded at a fixed interval of 0.05 s and are described in a pair as

(t_{i}, T_{i})

(1)

where

T_{i}

is the temperature (in °C) recorded at time

t_{i}

(in seconds).

To predict the temperature 10 s into the predictive future based on recent behavior, a sliding-window approach is adopted. For each valid index

t_{i}

, the history covering the previous 10 s is defined by the

10 s / 0.05 s = 200

data points preceding the current index, since each step is 0.05 s. Then, the predictive feature or the average temperature over the past 200 steps can be calculated as

{\bar{T}}_{[i - 200, i - 1]} = \frac{1}{200} \sum_{j = i - 200}^{i - 1} T_{j}

(2)

The temperature 10 s later is

T_{i + 200}

. Then, two features can be extracted: the current temperature

T_{i}

and the rolling average of temperatures over the preceding 10 s

{\bar{T}}_{[i - 200, i - 1]}

. The label y is derived by comparing

T_{i}

and

T_{i + 200}

. Specifically,

y = \{\begin{matrix} 1, T_{i + 200} > T_{i} \\ 0, T_{i + 200} < T_{i} \end{matrix}

(3)

The logistic regression model predicts the probability that the temperature will increase based on these two features. The model is represented mathematically as

\begin{matrix} P (y = 1 | T) = σ (z) = \frac{1}{1 + e^{- z}}, z = β_{0} + β_{1} T_{i} + β_{2} {\bar{T}}_{[i - 200, i]}, \end{matrix}

(4)

where

P (y = 1 | T)

denotes the probability of the target variable

y = 1

,

s σ (z)

is the sigmoid activation function, z is the linear combination of the features

T_{i}

and

{\bar{T}}_{[i - 200, i]}

, and

β_{0}

,

β_{1}

, and

β_{2}

are the model coefficients.

The model is trained using Scikit-Learn’s LogisticRegression class [14]. By repeating the training process 3 times with 80 MHz, 160 MHz, and 240 MHz, 3 AI models are trained. The tuned coefficients of

β_{0}

,

β_{1}

, and

β_{2}

as well as the accuracy are listed in Table 2.

3.3. Load-Sharing Model

With AI models, load sharing is a critical strategy in distributed computing, particularly in systems where temperature control and efficiency are paramount. By dynamically distributing tasks across multiple devices based on their thermal and computational states, load sharing not only enhances performance but also prevents localized overheating. Efficient load sharing ensures that devices operating at higher temperatures are assigned fewer tasks, allowing them to cool down, while cooler devices with higher CPU frequencies handle a larger share of the workload.

Figure 7 depicts a distributed processing system designed for efficient task execution across multiple ESP32-S3 microcontrollers. Identical to the procedure of raw temperature data collection, the system addresses the task of processing a high-resolution matrix (10,240 × 7680 pixels) by partitioning it into smaller subtasks, leveraging the hardware capabilities of the ESP32-S3, and incorporating the AI-driven CPU frequency adjustment. The matrix processing is initiated with the creation of a pixel matrix, subsequently divided into 10 subtasks. Each subtask, slightly larger than 1024 × 768 pixels (1026 × 770 pixels) due to the application of the Sobel operator, includes a two-pixel overlap to accommodate convolution-based edge detection. This ensures seamless integration of the processed subtasks into the final result, addressing boundary inconsistencies inherent to edge detection algorithms. By breaking down the problem into independent units, the system prepares the data for parallel computation.

A critical component of the system is its task allocation mechanism, which is governed by the dynamic availability of ESP32-S3 devices. The framework continuously scans serial ports to detect the presence of ESP32-S3 devices. Upon identifying available devices, the system implements a task allocation model that is coupled with an AI model, because the AI model dynamically evaluates the current temperature and operational CPU frequency of each device, prioritizing those with lower temperatures and higher frequencies.

The process monitors task completion and reassigns unfinished subtasks as necessary. Once all tasks are completed, the system checks if 60 min has elapsed since the last save, saving the processed data to a .csv file if the condition is met. This ensures data security and enables long-term operation. The iterative workflow continues, making the system scalable, efficient, and suitable for processing large datasets while maintaining hardware safety and performance.

4. Results

4.1. Temperature Adjustments

By implementing AI trained with Scikit-Learn [14] and load-sharing models, the test was divided into two groups. The first group set the temperature limit to

45 ° C

, while the second group set it to

50 ° C

. Identical to the raw data collection part, the test also lasted one hour, and the temperature variation curves for the two groups are shown in Figure 8 and Figure 9.

In the

50 ° C

group, the temperature gradually increases and stabilizes near the upper threshold, showing some variability among devices. The

45 ° C

group stabilizes more quickly and exhibits tighter control, with less variability between devices, indicating more consistent thermal management. The

45 ° C

shows slight overshooting; however, both groups enter a relatively stable phase after 1500 s.

4.2. AI Model Performance

The status markers are also recorded during the test period. The

45 ° C

group completes 493 computation rounds while the

50 ° C

group completes 542 computation rounds. The comparison between the AI-model- and load-sharing-model-driven ESP32-S3 package stack and no-model-driven ESP32-S3 package stack is shown in Figure 10.

Without the AI model and load-sharing model, at 80 MHz, the system completes 180 rounds (34%), reflecting minimal performance and low thermal load (

T_{m a x}

:

39 ° C

). At 160 MHz, performance improves significantly to 360 rounds (67%), with

T_{m a x}

rising to 49 °C, while at 240 MHz, the system achieves peak performance with 534 rounds (100%) but at the highest temperature of

56 ° C

. With AI-driven load balancing, the

45 ° C

limit achieves 493 rounds (92%) with

T_{m a x}

at

48 ° C

, demonstrating effective thermal control. Raising the limit to

50 ° C

results in 542 rounds (101%), a slight 1% performance gain with a

5 ° C

increase in

T_{m a x}

(

51 ° C

). These results highlight the AI model’s ability to optimize task allocation, maintaining near-maximum performance under thermal constraints while balancing the trade-off between computational throughput and temperature.

5. Discussion

5.1. Achievements

The system achieved real-time thermal management across multiple devices by utilizing a logistic regression model trained on collected temperature data to predict temperature changes over the next 5 s. This predictive capability enabled proactive adjustments to the SoC frequencies as temperatures approached critical thresholds, effectively preventing overheating while maintaining performance. As shown in Figure 10, the AI model dynamically optimized SoC frequencies based on these predictions, demonstrating a significant improvement in temperature control and computational efficiency. Furthermore, the implementation of a dynamic load balancing and task allocation mechanism ensured efficient workload distribution. Tasks were allocated to ESP32-S3 devices with lower temperatures and higher frequency capacities, maintaining balanced thermal states and maximizing system throughput. These combined strategies allowed the system to achieve superior performance under thermal constraints, as evidenced by the AI-driven 50 °C group completing 542 computation rounds while maintaining a manageable

T_{m a x}

of 51 °C.

5.2. Limitations and Future Directions

The current study faced several limitations that impacted the performance of the AI-driven thermal management system. Limited data during model training were a significant constraint, as the relatively small dataset used in the training phase may not have fully captured the complex thermal dynamics of the ESP32-S3 stack under varying operating conditions. This limitation likely contributed to the observed overshoot in temperature predictions, as the model lacked sufficient variability in the data to generalize effectively. Additionally, the approach to stack-level modeling versus individual device modeling introduced further challenges. The AI model was trained on aggregated temperature data from all devices in the stack, representing the average thermal behavior of the system. While this approach simplifies the modeling process, it does not account for the unique thermal profiles and behaviors of individual devices, potentially reducing the precision of temperature predictions and control.

For future research, data collection and model training should be expanded, with a focus on gathering more granular performance data from each individual ESP32-S3 device. Training separate models for each device would enhance the precision of the temperature control system by accommodating the unique thermal characteristics of each component. Additionally, AI model comparison should be conducted to explore alternative AI models that may offer superior performance in predicting thermal behavior and optimizing temperature control. Comparing various machine learning algorithms, such as neural networks or ensemble models, could uncover more effective solutions for managing thermal dynamics. Lastly, optimizing load-sharing algorithms offers a promising avenue for further improvement. Dynamic task allocation based on both temperature and processing capability could ensure an even more efficient distribution of workloads across the stack, reducing thermal hotspots and enhancing overall system performance. These advancements would contribute to a more robust, efficient, and scalable thermal management framework for the ESP32-S3 package stack.

6. Conclusions

This study demonstrates the effectiveness of integrating real-time thermal management and AI-driven load balancing strategies in managing performance throttling within a low-cost, heterogeneous 3D ESP32-S3 package stack. The experimental results confirmed that dynamically adjusting processor frequencies and strategically redistributing computational workloads across multiple ESP32-S3 devices significantly improved thermal stability without compromising computational performance. Specifically, the AI-driven approach enabled the system to achieve near-optimal computational throughput while maintaining thermal constraints, effectively preventing localized overheating in densely integrated chip stacks.

Despite the promising outcomes, further research is required to address identified limitations, such as the model’s reliance on limited training data and its generalization across individual device behaviors. Future investigations should focus on collecting more comprehensive datasets, exploring device-specific predictive models, and refining load-sharing algorithms to enhance precision and efficiency. Continued advancements in AI-based thermal management strategies will be essential to meeting the increasing performance and miniaturization demands of next-generation heterogeneous computing architectures.

Author Contributions

Conceptualization, Y.L., P.S.S., D.H. and T.X.; methodology, Y.L., P.S.S., D.H., and T.X.; software, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, D.H. and T.X.; visualization, Y.L.; supervision, D.H. and T.X.; funding acquisition, D.H. and T.X. All authors have read and agreed to the published version of the manuscript.

Funding

National Science Foundation grant no. 2119485.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Parth Sandeepbhai Shah are employed by Intel Corporation Chandler, USA. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Performance metrics of ESP32-S3 devices running at different frequencies, including average times for matrix creation, and one computation round, as well as total completed rounds. (a) At 80 MHz. (b) At 160 MHz. (c) At 240 MHz.

(a)
Device Number	Avg. Creating Matrix Time (s)	Avg. Simulation Time (s)	Avg. One Round Time (s)	Completed Computation Rounds
1	30.1979	92.9652	123.1631	30
2	30.1987	92.9674	123.1661	30
3	30.1997	92.9650	123.1647	30
4	30.1978	92.9617	123.1595	30
5	30.1979	92.9649	123.1628	30
6	30.1981	92.9678	123.1659	30
(b)
Device Number	Avg. Creating Matrix Time (s)	Avg. Simulation Time (s)	Avg. One Round Time (s)	Completed Computation Rounds
1	14.9700	46.0055	60.9755	60
2	14.9707	46.9996	60.9703	60
3	14.9715	46.0066	60.9781	60
4	14.9700	46.0061	60.9761	60
5	14.9713	46.9992	60.9705	60
6	14.9707	46.0110	60.9818	60
(c)
Device Number	Avg. Creating Matrix Time (s)	Avg. Simulation Time (s)	Avg. One Round Time (s)	Completed Computation Rounds
1	9.9533	30.6195	40.5725	89
2	9.9535	30.6160	40.5690	89
3	9.9537	30.6206	40.5740	89
4	9.9537	30.6206	40.5734	89
5	9.9536	30.6155	40.5687	89
6	9.9535	30.6158	40.5689	89

References

Lau, J.H. Overview and outlook of through-silicon via (TSV) and 3D integrations. Microelectron. Int. 2011, 28, 8–22. [Google Scholar] [CrossRef]
Shaikh, S.F. Heterogeneous Integration Strategy for Obtaining Physically Flexible 3d Compliant Electronic Systems. Ph.D. Dissertation, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia, 2020. [Google Scholar]
Singh, N.; Kumar, A.; Srivastava, K.; Yadav, N.; Singh, R.; Verma, A.S.; Gehlot, A.; Yadav, A.K.; Kumar, T.; Pandey, K.; et al. Challenges and Opportunities in Engineering of Next Generation 3D Microelectronic Devices: Improved Performance, Higher Integration Density. Nanoscale Adv. 2024, 6, 6044–6060. [Google Scholar] [CrossRef] [PubMed]
Peng, X.; Kaul, A.; Bakir, M.S.; Yu, S. Heterogeneous 3-d integration of multitier compute-in-memory accelerators: An electrical-thermal co-design. IEEE Trans. Electron Devices 2021, 68, 5598–5605. [Google Scholar] [CrossRef]
Yu, W.; Cheng, S.; Li, Z.; Liu, L.; Zhang, Z.; Zhao, Y.; Guo, Y.; Liu, S. The Application of Multi-scale Simulation in Advanced Electronic Packaging. Fundam. Res. 2024, 4, 1442–1454. [Google Scholar] [CrossRef] [PubMed]
Cheemalamarri, H.K.; Bonam, S.; Vanjari, S.R.K.; Singh, S.G. Ti/Si interface enabling complementary metal oxide semiconductor compatible, high reliable bonding for inter-die micro-fluidic cooling for future advanced 3D integrated circuit integration. J. Micromech. Microeng. 2020, 30, 105005. [Google Scholar] [CrossRef]
Chang, Y.W. Physical Design Challenges in Modern Heterogeneous Integration. In Proceedings of the 2024 International Symposium on Physical Design, Taipei, Taiwan, 12–15 March 2024; pp. 125–134. [Google Scholar]
Joseph, J.M. Networks-on-Chip for heterogeneous 3D Systems-on-Chip. Ph.D. Thesis, University of Halle, Halle, Germany, 2019. [Google Scholar]
Wang, C.; Vafai, K. Heat transfer enhancement for 3D chip thermal simulation and prediction. Appl. Therm. Eng. 2024, 236, 121499. [Google Scholar] [CrossRef]
Chen, S.; Zhang, H.; Ling, Z.; Zhai, J.; Yu, B. The Survey of Chiplet-based Integrated Architecture: An EDA perspective. arXiv 2024, arXiv:2411.04410. [Google Scholar]
Benelhaouare, A.Z.; Mellal, I.; Oumlaz, M.; Lakhssassi, A. Mitigating Thermal Side-Channel Vulnerabilities in FPGA-Based SiP Systems Through Advanced Thermal Management and Security Integration Using Thermal Digital Twin (TDT) Technology. Electronics 2024, 13, 4176. [Google Scholar] [CrossRef]
Ramamoorthi, V. Multi-Objective Optimization Framework for Cloud Applications Using AI-Based Surrogate Models. J. Big-Data Anal. Cloud Comput. 2021, 6, 23–32. [Google Scholar]
Yakoubi, S. Sustainable Revolution: AI-Driven Enhancements for Composite Polymer Processing and Optimization in Intelligent Food Packaging. Food Bioprocess Technol. 2024, 18, 82–107. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]

Figure 1. Annotated diagram of ESP32-S3’s key components.

Figure 2. The ESP32-S3 stack configuration with device numbering, where the gaps between each ESP32-S3 device are 4.2 mm.

Figure 3. ESP32-S3 management interface built in Unity showing monitoring and control dashboard.

Figure 4. Flowchart of thermal behavior collection and edge detection algorithm.

Figure 5. Thermal behavior of individual ESP32-S3 at 3 different frequencies in the package stack.

Figure 6. Flowchart of AI-model workflow and process.

Figure 7. Load-sharing framework for task-allocation-distributed ESP32-S3 package stack.

Figure 8. AI-driven model performance at 45 °C temperature limit.

Figure 9. AI-driven model performance at 50 °C temperature limit.

Figure 10. Comparison of ESP32-S3 performance under fixed frequencies and AI-driven thermal management. Blue: results under fixed frequency, Green: results with AI controlled frequency.

Table 1. Mapping of figure annotations to ESP32-S3 components.

Numbers	Descriptions
1	ESP32-S3 Module
2	8 MiB Flash Memory
3	Power Supply
4	WS2812 LED
5	USB Serial/JTAG Interface
6	USB OTG Interface
7	Serial Port Chip
8	Boot Button
9	Reset Button
10	40 MHz Crystal Oscillator

Table 2. Values of trained

β_{0}

,

β_{1}

, and

β_{3}

.

Table 2. Values of trained

β_{0}

,

β_{1}

, and

β_{3}

.

Target Frequencies (MHz)	$β_{0}$	$β_{1}$	$β_{2}$	Accuracy
80	−2.1549	−8.7292	8.6622	97.70%
160	−3.0975	−8.9133	8.8846	96.92%
240	−1.9406	−8.9865	8.9373	97.45%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Shah, P.S.; Xia, T.; Huston, D. An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack. Computers 2025, 14, 147. https://doi.org/10.3390/computers14040147

AMA Style

Liu Y, Shah PS, Xia T, Huston D. An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack. Computers. 2025; 14(4):147. https://doi.org/10.3390/computers14040147

Chicago/Turabian Style

Liu, Yi, Parth Sandeepbhai Shah, Tian Xia, and Dryver Huston. 2025. "An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack" Computers 14, no. 4: 147. https://doi.org/10.3390/computers14040147

APA Style

Liu, Y., Shah, P. S., Xia, T., & Huston, D. (2025). An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack. Computers, 14(4), 147. https://doi.org/10.3390/computers14040147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack

Abstract

1. Introduction

2. Methodology

2.1. Test Vehicle

2.2. Conceptual Approach

3. Experimental Setup

3.1. Initial Data Collection

3.2. AI Model Training

3.3. Load-Sharing Model

4. Results

4.1. Temperature Adjustments

4.2. AI Model Performance

5. Discussion

5.1. Achievements

5.2. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI