1. Introduction
The relentless demand for high-performance computing and miniaturized electronic systems has driven the evolution of packaging technologies, culminating in heterogeneous 3D integration. Historically, electronic packaging focused on two-dimensional (2D) assembly methods, where components were mounted side-by-side on printed circuit boards (PCBs). This approach, however, reached its physical and performance limits as the semiconductor industry pursued Moore’s Law. The advent of Through-Silicon Vias (TSVs) in the early 21st century enabled three-dimensional (3D) integration. TSVs provided vertical interconnects through silicon wafers, reducing interconnect distances, improving signal integrity, and enhancing power efficiency [
1]. Over time, heterogeneous packaging extended these advancements by integrating disparate technologies, such as logic, memory, and analog devices, into unified systems, leveraging hybrid bonding techniques and advanced interconnect methodologies [
2,
3].
Heterogeneous packaging enables the stacking of chips with different functionalities, such as processors and memory, into a single module. This approach, combined with technologies like hybrid wafer bonding, facilitates unprecedented levels of system integration, opening doors for advanced applications in artificial intelligence (AI) and edge computing [
4]. By minimizing signal propagation delays and reducing device footprints, heterogeneous 3D packaging enhances computational efficiency, making it indispensable for tasks requiring high-speed processing and minimal latency [
5]. Yet, these benefits come at a cost—thermal management remains a formidable challenge.
As chip densities increase and transistor sizes shrink, thermal hotspots emerge as a critical bottleneck. Uneven heat dissipation, particularly in densely stacked 3D architectures, leads to localized overheating, which degrades performance, increases power consumption, and shortens the lifespan of electronic systems [
6]. Unlike traditional 2D systems, where heat can be dissipated through the package surface, 3D architectures exacerbate thermal issues by limiting the available cooling surface area while concentrating heat sources in a confined space [
7]. The problem is further compounded in heterogeneous systems, where the thermal properties of integrated components can vary significantly, creating thermal mismatches and hotspots [
8].
To address these challenges, researchers have explored a variety of innovative thermal management strategies. Traditional methods, such as heat sinks and thermal interface materials (TIMs), have been adapted for 3D systems but often fall short in meeting the unique demands of heterogeneous integration [
9]. For instance, microfluidic cooling, which involves circulating coolant through microchannels embedded in the package, offers high thermal efficiency but adds complexity to the manufacturing process [
6]. Similarly, liquid cooling using cold plates has shown promise in mitigating heat in high-power applications, yet its scalability remains a challenge for compact edge devices [
9].
A more recent approach leverages dynamic thermal management (DTM) techniques, which adjust system parameters in real-time to balance power consumption and thermal output. AI-driven load balancing and real-time thermal sensing are integral to DTM, enabling predictive adjustments to workload distribution and processor frequency based on temperature data [
7]. These methods not only mitigate thermal hotspots but also optimize overall system performance, making them particularly suitable for heterogeneous 3D systems. Predictive modeling using machine learning algorithms has further enhanced the efficacy of these techniques, allowing systems to anticipate thermal events and adapt accordingly [
2,
8].
Parallel computing has emerged as another promising avenue for thermal management in heterogeneous architectures. By distributing computational tasks across multiple processing units, parallel computing reduces the thermal load on individual cores, preventing localized overheating. This approach is particularly effective in edge computing applications, where workloads can be dynamically allocated to minimize energy consumption and heat generation [
9]. For example, parallelized algorithms for tasks like image recognition and natural language processing not only improve computational efficiency but also help maintain thermal stability in 3D integrated systems [
3].
The integration of ESP32-S3 microcontrollers in edge computing systems exemplifies the potential of parallel computing for thermal management. These devices, known for their low power consumption and high computational efficiency, are ideal for implementing parallelized workloads in a compact form factor. By using real-time temperature monitoring and dynamic task allocation, systems can adapt to thermal variations, ensuring reliable operation under diverse conditions. AI-based thermal prediction models further enhance this adaptability, enabling systems to preemptively reconfigure tasks to avoid overheating.
Artificial intelligence (AI) has revolutionized thermal management in heterogeneous packaging by enabling adaptive, real-time control over heat dissipation. AI-driven techniques leverage predictive analytics and machine learning to monitor, model, and manage thermal conditions dynamically. These methods surpass traditional thermal management systems by responding to localized overheating and optimizing task allocation and processor frequencies based on predictive data [
10]. For instance, machine learning algorithms have been employed to predict temperature spikes and adjust workloads accordingly, effectively preventing thermal bottlenecks in multi-die systems [
11]. This capability is especially critical in 3D stacked architectures where thermal gradients can severely impact performance and reliability.
Furthermore, AI-driven approaches facilitate the integration of advanced cooling technologies, such as microfluidics and phase-change materials, by coordinating their operations with processor workload requirements [
12]. These systems employ real-time data from thermal sensors distributed throughout the package to identify hotspots and adapt cooling mechanisms, thereby optimizing energy efficiency while maintaining performance thresholds. Additionally, AI applications in System-in-Package (SiP) technologies enable complex heat mapping and predictive load balancing, providing a holistic solution for managing the intricate thermal demands of heterogeneous systems [
13].
In edge computing and AI-intensive tasks, AI-based thermal management not only addresses heat dissipation but also improves overall system efficiency by reducing power consumption and extending component lifespans. By incorporating sophisticated neural networks, these systems can autonomously adapt to varying workloads and environmental conditions, ensuring sustainable operation across a wide range of applications. As heterogeneous integration progresses, AI will play a pivotal role in overcoming thermal challenges, ensuring the viability of next-generation computing architectures.
3. Experimental Setup
3.1. Initial Data Collection
To study the thermal behavior of ESP32-S3 devices under different clock frequencies, and also to collect data for AI-model training, 6 ESP32-S3 devices were configured to operate at full load while running an edge detection algorithm. During the data collection period, Unity Editor (2022.3.39f1) was used as the host to read and control the collection of ESP32-S3 SoC temperature data through serial port communication. The monitoring and control dashboard is shown in
Figure 3. The interface includes a list of connected devices identified by their port names (e.g., COM4, COM6), with each entry displaying the SoC temperature, the number of completed computation rounds, and the current status of the devices, such as “Stand by”, “Creating Matrix”, “Simulating”, or “Complete”. On the right side, control buttons labeled “Scan”, “Connect”, “Start”, and “Stop” allow the user to manage the operations of connected ESP32-S3 devices. A timer in the bottom right corner displays the elapsed time in the format of minutes, seconds, and milliseconds.
The ESP32-S3 devices are programmed using Arduino IDE (Version: 2.3.4) to report the SoC temperature as well as the status marker (such as “Creating matrix” and “Simulating” in
Figure 4) in real-time back to the host. The status marker is maintained throughout the data collection phase to keep track of how many computation cycles the ESP32-S3 stack package completes in total.
As shown in
Figure 4, the edge detection process begins with a “Creating Matrix” step, where a
matrix is initialized, and a marker “Creating Matrix” is sent to host. Next, the flow moves to the simulation step, where the edge detection algorithm is applied. This includes calculating pixel gradients with the Sobel operator, computing the gradient magnitude, and clamping the output values to ensure they fall within a valid range. A status marker labeled “Simulating” is also sent to the host during this step. The simulation is repeated 10 times to process a high-resolution image as this large-scale resolution requires multiple iterations to fully analyze and complete the necessary edge detection operations. After completing the simulation, the process checks whether the simulation has been repeated 10 times. If 10 iterations are complete, it proceeds to check if the simulation has run for 60 min. If not, the flow loops back to the “Creating matrix” step again. If the 60 min condition is met, the results are saved as a .csv file, marking the end of the process.
As mentioned before, each ESP32-S3 was set to a fixed frequency among the 3 three options, 80 MHz, 160 MHz, and 240 MHz, during separate rounds of testing.
Figure 5 shows the thermal behavior of each ESP32-S3 SoC in the package stack during the one-hour collection period.
At the beginning of each round, the temperature starts from
. In
Figure 5a, the temperature rises gradually, stabilizing between
and
, showing the slowest increase and lowest stabilization temperature due to the lower power consumption. At 160 MHz, in
Figure 5b, the temperature climbs more quickly and stabilizes between
and
, reflecting moderate heat generation. At 240 MHz, in
Figure 5c, the temperature rises most rapidly, stabilizing between
and
, highlighting the highest thermal load. These curves demonstrate the relationship between operating frequency and thermal behavior, with higher frequencies generating more heat and reaching higher stabilization temperatures.
Table A1 presents the performance metrics of 6 ESP32-S3 devices at 80 MHz, 160 MHz, and 240 MHz. At 80 MHz, the average time for creating a matrix is around 30.2 s, and each computation round takes approximately 123.16 s, resulting in 30 completed rounds in one hour. At 160 MHz, the matrix creation time is halved to 14.97 s, and the total time per computation round is reduced to about 60.98 s, allowing 60 rounds to be completed. At 240 MHz, the matrix creation time further decreases to 9.95 s, with each round taking approximately 40.57 s, enabling 89 rounds in one hour. These collected data demonstrate that higher frequencies significantly reduce computation time per round, allowing for a greater number of completed rounds but at the cost of increased energy consumption and potential thermal stress. The comparison between this table and the AI-model-driven result is discussed later.
3.2. AI Model Training
The collected temperature data serve as the foundation for training a logistic regression model designed to predict short-term thermal behavior. Specifically, in
Figure 6, the model forecasts whether the temperature of an ESP32-S3 device will increase or decrease over the next 10 s. By analyzing both current and historical temperature data, the model provides insights into the system’s thermal trends.
A total of 18 temperature–time datasets (6 devices at 3 different frequencies) were obtained, each capturing the heating process from an initial ambient temperature to a stabilization status over 3600 s. At different frequencies, among 6 curves, the third ESP32-S3 dataset was chosen for model training because it exhibited the highest overall temperature range, thus better representing the extreme thermal conditions of the package stack.
The AI model for predicting future temperature trends involves processing raw time-temperature data, creating predictive features, and training a logistic regression model using the
Scikit-Learn library. The data provided include timestamps and temperature values recorded at a fixed interval of 0.05 s and are described in a pair as
where
is the temperature (in °C) recorded at time
(in seconds).
To predict the temperature 10 s into the predictive future based on recent behavior, a sliding-window approach is adopted. For each valid index
, the history covering the previous 10 s is defined by the
data points preceding the current index, since each step is 0.05 s. Then, the predictive feature or the average temperature over the past 200 steps can be calculated as
The temperature 10 s later is
. Then, two features can be extracted: the current temperature
and the rolling average of temperatures over the preceding 10 s
. The label
y is derived by comparing
and
. Specifically,
The logistic regression model predicts the probability that the temperature will increase based on these two features. The model is represented mathematically as
where
denotes the probability of the target variable
,
is the sigmoid activation function,
z is the linear combination of the features
and
, and
,
, and
are the model coefficients.
The model is trained using
Scikit-Learn’s
LogisticRegression class [
14]. By repeating the training process 3 times with 80 MHz, 160 MHz, and 240 MHz, 3 AI models are trained. The tuned coefficients of
,
, and
as well as the accuracy are listed in
Table 2.
3.3. Load-Sharing Model
With AI models, load sharing is a critical strategy in distributed computing, particularly in systems where temperature control and efficiency are paramount. By dynamically distributing tasks across multiple devices based on their thermal and computational states, load sharing not only enhances performance but also prevents localized overheating. Efficient load sharing ensures that devices operating at higher temperatures are assigned fewer tasks, allowing them to cool down, while cooler devices with higher CPU frequencies handle a larger share of the workload.
Figure 7 depicts a distributed processing system designed for efficient task execution across multiple ESP32-S3 microcontrollers. Identical to the procedure of raw temperature data collection, the system addresses the task of processing a high-resolution matrix (10,240 × 7680 pixels) by partitioning it into smaller subtasks, leveraging the hardware capabilities of the ESP32-S3, and incorporating the AI-driven CPU frequency adjustment. The matrix processing is initiated with the creation of a pixel matrix, subsequently divided into 10 subtasks. Each subtask, slightly larger than 1024 × 768 pixels (1026 × 770 pixels) due to the application of the Sobel operator, includes a two-pixel overlap to accommodate convolution-based edge detection. This ensures seamless integration of the processed subtasks into the final result, addressing boundary inconsistencies inherent to edge detection algorithms. By breaking down the problem into independent units, the system prepares the data for parallel computation.
A critical component of the system is its task allocation mechanism, which is governed by the dynamic availability of ESP32-S3 devices. The framework continuously scans serial ports to detect the presence of ESP32-S3 devices. Upon identifying available devices, the system implements a task allocation model that is coupled with an AI model, because the AI model dynamically evaluates the current temperature and operational CPU frequency of each device, prioritizing those with lower temperatures and higher frequencies.
The process monitors task completion and reassigns unfinished subtasks as necessary. Once all tasks are completed, the system checks if 60 min has elapsed since the last save, saving the processed data to a .csv file if the condition is met. This ensures data security and enables long-term operation. The iterative workflow continues, making the system scalable, efficient, and suitable for processing large datasets while maintaining hardware safety and performance.