PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices

Cho, Woojin; Kim, Dongju; Lim, Byunghyun; Gu, Jaehoi

doi:10.3390/electronics14101912

Open AccessArticle

PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices

Energy Environment IT Convergence Group, Plant Engineering Center, Institute for Advanced Engineering, Yongin 17180, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(10), 1912; https://doi.org/10.3390/electronics14101912

Submission received: 14 April 2025 / Revised: 6 May 2025 / Accepted: 7 May 2025 / Published: 8 May 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Rising energy costs due to environmental degradation, climate change, global conflicts, and pandemics have prompted the need for efficient energy management. Edge devices are increasingly recognized for improving energy efficiency; however, their role as primary computing units remains underexplored. This study presents PreEdgeDB, a lightweight platform deployed on low-power edge devices to optimize energy usage in industrial complexes, which consume approximately 57.29% of South Korea’s total energy. The platform integrates real-time data preprocessing, time-series storage, and prediction capabilities, enabling independent operation at individual factories. A low-resource preprocessing module was developed to handle missing and anomalous data. For storage, RocksDB—a lightweight, high-performance key–value database—was optimized for edge environments. For prediction, Light Gradient Boosting Machine (LightGBM) was adopted due to its efficiency and high accuracy on limited-resource systems. The resulting model achieved a coefficient of variation of the root mean squared error (CV(RMSE)) of 14.36% and a prediction score of 0.8240. The total processing time from data collection to prediction was under 300 milliseconds. With memory usage below 150 MB and CPU utilization around 60%, PreEdgeDB enables fully autonomous energy prediction and analysis on edge devices, without relying on centralized servers.

Keywords:

energy prediction; edge device; low-power computing; machine learning; key–value store; preprocessing system; LightGBM

1. Introduction

Today, solutions to rising energy costs are being pursued in response to various issues, such as environmental degradation, climate change, the threats of war, and pandemics. In South Korea, these efforts are being addressed through national research projects. Under the 3rd Basic Plan for Energy [1], research and demonstrations have been conducted to reduce energy consumption in energy-intensive factories by utilizing factory energy management systems.

As of 2023, the industrial sector accounted for 72.8% of the country’s total energy usage, with industrial complexes consuming 78.7% of the energy used in this sector [2]. Accordingly, industrial complexes accounted for 57.29% of the total energy consumption. Given this high proportion, South Korea has prioritized addressing energy issues in industrial complexes. Recognizing that factories within these complexes often use similar utilities, a study on a “virtual energy utility plant” was undertaken to improve energy efficiency [3]. This study involved establishing shared energy equipment among companies using similar utilities, thereby reducing the costs incurred by purchasing energy from external suppliers. Additionally, the system facilitated energy trading by linking companies that generate surplus energy with those in need. The study not only reduced energy consumption in industrial complexes—a significant contributor to national energy use—but also contributed to the reduction in greenhouse gas emissions. Additionally, it demonstrated strong potential for energy conservation and value creation through surplus energy trading.

The operation of virtual energy utility plants relies on several critical components, such as forecasting, individual plant operation, and data stabilization. Among these, stable data collection is crucial, as each plant requires accurate data for effective analysis. Highly accurate data are vital for prediction and transaction processes; if missing or anomalous data are not properly handled, the reliability of predictions may be compromised. Additionally, factory data are often production related and, therefore, highly sensitive, making external data transfer a significant concern owing to security and confidentiality risks. To overcome these challenges, this study investigated optimizing RocksDB [4] for reliable data collection using edge devices, handling abnormal and missing data, and storing time-series data on edge devices, as shown in Figure 1. It also developed an artificial intelligence (AI) system for forecasting energy demand within each factory, which includes both data collection and demand forecasting at the regional level.

The main contributions of this study are as follows:

(1): We develop a lightweight energy prediction model optimized for edge devices with low power consumption.
(2): The proposed PreEdgeDB system improves prediction accuracy by optimizing time-series data while minimizing processing overhead.
(3): We validate the effectiveness of our approach using real-world industrial datasets and demonstrate its superiority over existing methods.

The structure of this paper is as follows: Section 2 discusses related work, Section 3 covers the background, Section 4 details the data preprocessing system, and Section 5 outlines the process of optimizing time-series data in the database. Section 6 focuses on predictions for individual factories using Light Gradient Boosting Machine (LightGBM), whereas Section 7 addresses the limitations of the study. Finally, Section 8 discusses the results and future research directions.

2. Related Work

Research on the independent use of edge devices has not traditionally been a primary focus largely because of the challenges of performing high-performance prediction, storage, and preprocessing tasks on low-spec edge devices.

While prior studies have explored the use of AI algorithms on edge devices for prediction, they have primarily emphasized data collection. In cases requiring substantial computational resources, such as prediction tasks, these studies fundamentally relied on techniques such as offloading to servers or cloud systems. As a result, few studies have treated edge devices as primary computing resources. Some studies have attempted to address these limitations by using lightweight AI libraries such as TensorFlow Lite. However, building an entire platform—encompassing data collection, preprocessing, AI training and inference, and data storage—entirely on a low-spec edge device has rarely been investigated [5,6,7,8,9,10]. Additionally, previous studies that used “multilayer perceptrons” (MLPs) for edge deployment often neglected preprocessing. MLPs yield good predictive performance when enough variables are available. However, with limited variables, such as those typically gathered from individual factories within industrial complexes, both MLP performance and “coefficient of variation of the root mean squared error” (CV(RMSE)) scores are inadequate for practical application. Furthermore, MyRocks, a variant of RocksDB, showed higher resource usage than the optimized RocksDB used in this study, imposing a significant load on the system. Therefore, a database solution better suited for low-spec devices was adopted [11].

Additionally, various studies have explored the use of AI on single-board computers and Arduino platforms. However, these studies have focused primarily on preprocessing data on a single-board computer and transmitting it to a cloud or central system rather than investigating the standalone use of such devices. Even in cases where AI systems were deployed, the research often centered solely on integrating AI, which is quite distinct from our aim of developing a platform capable of independent operation [12,13,14].

In response to these limitations, this study proposes an integrated lightweight platform, PreEdgeDB, which enables edge devices to operate independently, eliminating the need for large server systems within factory environments. The name PreEdgeDB reflects its key functions: “Pre” for preprocessing and prediction, “DB” for data storage, and “Edge” for edge deployment. By optimizing the database and developing prediction models capable of operating with only two input variables—time and sensor values—this study aimed to build a more efficient and practical system for standalone edge device utilization. Accordingly, we conducted research to enhance the integration of AI systems, preprocessing systems, and server functionalities directly within edge devices, offering simpler and more efficient operation than previous studies. A comparison to existing research is shown in Table 1.

3. Preliminaries

3.1. Data Preprocessing

Real-time data were collected through sensors installed in a factory applying a virtual energy utility plant. However, owing to factors such as network errors, device issues, and human errors, missing values or outliers may have been recorded. This study adopted interpolation techniques, among various possible solutions, for data preprocessing and stored the processed data in a database.

Our study focused on predicting the compressed air demand for individual factories. The data collected for each factory comprised compressed air demand recorded at minute-level intervals. Using these data, we conducted research to forecast the compressed air demand for each factory 15 min ahead.

3.2. RocksDB

RocksDB, a high-performance key–value store developed by Facebook, is based on LevelDB [15] and is optimized for storage media such as SSDs. It uses a log-structured merge-tree architecture, which is particularly efficient for write-heavy operations, and offers several useful features. As a basic key–value store, RocksDB is simpler to operate than traditional relational database management systems (DBMSs), such as MySQL [16] and Oracle DB [17]; therefore, it is more suitable for deployment on low-spec devices.

3.3. Artificial Intelligence

AI enables computers to learn from data and make decisions in a manner similar to that of humans. AI systems identify patterns, solve problems, or make predictions based on the information learned. AI techniques are broadly classified as machine learning or deep learning. Algorithms such as LightGBM [18] are widely used within machine learning, whereas deep learning includes models such as the MLP [19] and long short-term memory (LSTM) [20]. This study compared LightGBM, MLP, and LSTM—algorithms that can be sufficiently executed on low-power devices—and selected the optimal algorithm for deployment.

In addition, this study compared LSTM, LightGBM, and a hybrid model, with a focus on optimizing the model for time-series data. Specifically, we explored a hybrid model that combines the widely used LSTM and MLP to achieve the best fit for time-series forecasting. Given the need for continuous predictions on a single-board computer with limited resources, we selected LightGBM owing to its ability to perform efficiently even under low-performance conditions. Furthermore, upon analyzing the data patterns, we confirmed the nonlinear nature of the data, which made LightGBM a suitable choice for the task. As described in Section 6.1, we decomposed the timestamp and sensor value data to apply them effectively in the model.

3.3.1. MLP

The MLP [19] is a type of artificial neural network used for supervised learning. It consists of multiple layers of fully connected perceptron neurons and is commonly used in classification and regression problems. The model propagates input data through multiple hidden layers using nonlinear activation functions, enabling it to learn complex patterns. However, MLPs typically require substantial amounts of data and computational resources for effective training, which can be challenging for low-power devices.

3.3.2. LSTM

LSTM [20] is a variant of recurrent neural networks that excels at modeling temporal patterns and long-term dependencies in time-series data. It has proven effective for time-series prediction tasks. By utilizing a system of gates (input, forget, and output), LSTMs effectively mitigate the vanishing gradient problem and retain information over extended time periods. Nonetheless, their sequential nature often results in higher computational costs and longer training times than other models.

3.3.3. LightGBM

LightGBM [18] is a gradient boosting-based machine learning model designed for high-speed and high-performance prediction on large-scale datasets. Owing to its low memory footprint and computational efficiency, it is particularly suitable for low-power edge devices. Compared with traditional boosting methods, LightGBM uses histogram-based algorithms and leaf-wise tree growth strategies, enabling faster convergence and improved accuracy. Its ability to handle sparse data and categorical features natively makes it particularly attractive for real-time applications on resource-constrained platforms.

3.4. Performance Metrics for AI Models

After predictions are made based on time-series data, the performance of the model must be evaluated to ensure that the predictions are reliable. According to the ASHRAE Guideline criteria presented in Table 1 [21], a reliable model should achieve an R² value above 80% and a coefficient of variation of the root mean square error (CV(RMSE)) below 30%. As this study performed hourly predictions, the same criteria—R² above 80% and CV(RMSE) below 30%—were applied to evaluate model reliability.

4. Preprocessing Methodology

In a virtual energy utility plant, real-time data collection from sensors is essential for applications such as forecasting, energy transaction analysis, and usage monitoring. However, continuous data acquisition can encounter issues such as network problems and device reception errors. Despite these challenges, reliable data are critical for accurate analysis and prediction. Because data are collected exclusively through PLC [22] and OPC-UA [23], a robust preprocessing system is necessary to overcome potential inconsistencies and ensure data quality. To overcome these limitations, this study established a preprocessing system integrated within a lightweight platform, as shown in Figure 2. Figure 2 and Figure 3 present detailed representations of the subsystems depicted in Figure 1.

First, upon receiving data, the system checks whether the value is −999, which has been designated as a placeholder for missing data. If this condition is met, the value is classified as missing. The system then references existing data to perform linear interpolation, after which _2 is appended to the interpolated value before it is sent to the DBMS.

The appropriate interpolation method can vary depending on the data characteristics. Linear interpolation was chosen because it generally yields acceptable results. Linear interpolation was used to replace missing values, as this method offers the most realistic and smooth data reconstruction, given the continuous nature of the sensor data with a 1 min interval. This is particularly true in cases where the values are accumulated as integrated data, and minor variations from interpolation have minimal impact.

If the value is not −999, the system then applies the Z-score method [24] to detect outliers. If the Z-score exceeds 3, the value is classified as an outlier, and _1 is appended to it before it is sent to the DBMS. For all other regular data points, _0 is appended.

This approach enables the system to distinguish between outliers, missing data (with interpolated values), and normal data. It also enables future queries to identify whether a data point was interpolated or flagged as an outlier. By implementing this preprocessing system, the study aimed to improve the accuracy of subsequent predictions and analyses.

5. Time-Series Data Optimization in PreEdgeDB

The deployment of virtual energy utility plants on low-power devices requires the use of a DBMS that can operate smoothly on low-spec devices. In this study, a database optimized for time-series data and low-spec devices was developed to store data processed by the preprocessing system and to overcome performance limitations inherent to such environments. For this purpose, RocksDB was selected and optimized.

The primary factor for choosing RocksDB was the performance burden associated with previously used systems such as MyRocks. Although MyRocks was favored in earlier studies for its usability and integration within a broader ecosystem, it became a bottleneck as system complexity increased. Compared with RocksDB—which is a lightweight key–value store that implements only essential functions—MyRocks showed higher CPU and memory usage. Therefore, running MyRocks on a platform that operates various applications entirely on a single edge device is difficult. Figure 2 and Figure 3 provide detailed implementations of the subsystems outlined in Figure 1.

Furthermore, the main reason for using RocksDB is its seamless integrability and utilizability within the system, as shown in Figure 3. In this study, data were first stored in RocksDB and then used for prediction tasks. A dedicated module was also included to transmit the stored data to an upper-level server, enabling the edge device to support more advanced prediction processes externally. To enhance this functionality, research was conducted to generate SQL queries that enable the collected data to be transferred to an upper SQL-based DBMS rather than simply transmitting raw data. Therefore, RocksDB was considered the most suitable option for implementing this functionality with low overhead.

In this process, the data stream passes through the preprocessing system, where missing and anomalous data are filtered out before being transmitted to the DBMS. At this stage, RocksDB—optimized for time-series data—is utilized to store the filtered data efficiently. The stored time-series data is then used for prediction using the LightGBM model, and the resulting predictions are once again stored in the DBMS. Finally, when necessary, a key–value translator module is implemented to enable the transmission of either sensor data or prediction results to the cloud, allowing the system to replicate edge functionalities in a cloud environment.

In RocksDB, as shown in Figure 4, column families [25] can be managed more efficiently. As the data structure varies across factories, a separate column family was assigned to each plant to enable the use of delta compression and other optimization techniques. The column families were named according to their respective factory names. As a result, performance improvements were observed during data insertion.

Additionally, for the key and value, the seconds in the timestamp were removed, and only the minutes were retained, as shown in Figure 5. Considering that the seconds may vary slightly depending on the time of data insertion, this approach eliminates the burden of including seconds when performing a get operation. The value is stored as received; however, adding the classification of missing, abnormal, and normal data from the preprocessing system increases the convenience of future data analysis by enabling the entry of categorized numeric values, as shown above. Moreover, as delta compression offers advantages in value processing, this format was maintained.

For the AI prediction system, a program was created to transmit timestamp and value data. In particular, a feature was added to support predictions by deleting outliers and missing data from the overall dataset.

Finally, to transmit data to the server system, a program was created to support SQL-based data storage. This was implemented using the MySQL C++ Library for the stored time-series key–value data and functions as long as the address of the target DBMS was known. Thus, the system now not only supports standalone use but also serves as a functional edge node.

A performance test was conducted to verify the effectiveness of the DBMS. The test device used was the HardKernel Odroid M1S [26] model, which is equipped with 8 GB of RAM and 64 GB of eMMC storage.

Furthermore, the write buffer size, number, and background thread values were adjusted to be smaller, ensuring proper allocation for the edge device, as shown in Table 2. The following table shows the modified parameter settings:

After the optimization was performed as mentioned above and data were continuously inserted for testing, it was observed that a single compaction operation used approximately 30 MB of memory and resulted in a CPU usage of approximately 4%. This observation indicates that resource usage was very low, making the system suitable for tasks such as prediction and cloud data transmission. Through various optimization techniques—such as parameter optimization and the use of column families—the system was designed to deliver optimal performance on low-power edge devices.

6. Experimental Evaluation

Demand forecasting is a critical area of study for the platform of a virtual energy utility plant because forecasting the compressed air demand of individual factories enables the efficient operation and prediction of shared facility supply. Accurate forecasting helps avoid unnecessary energy consumption during periods of low demand and avoid under-supply during high-demand periods, which could significantly impact production quality. Therefore, demand forecasting plays an important role in ensuring both energy efficiency and production stability. Similarly, it is a key component in facilitating energy trading, as the energy produced by each factory must exceed its forecasted demand to enable effective supply to others.

6.1. Used Data Type

This study used compressed air data from individual factories to forecast their respective demand levels.

As shown in Table 3, the timestamp was separated into weekday, hour, minute, and day to increase the number of variables to improve the prediction accuracy by compensating for the fundamental lack of input variables. Currently, research is being conducted to predict data 60 min into the future using compressed air values.

6.2. Dataset and Experimental Settings

The data used in this study were collected from 21 March 2024 to 20 February 2025. To verify the final results using a graph based on the most recent data, the model was trained on data up to 15 February 2025 and validated using the data collected thereafter. To verify the reliability of the AI model, verification was conducted using a random sample of 2880 data points, which is equivalent to two days of data. Additionally, to evaluate the CPU usage of both RocksDB and the AI model, 110 data points per minute were stored and predicted during the validation process.

The LSTM model was implemented using PyTorch 2.4.1, with a hidden size of 64, four layers, and 500 epochs. The MLP model was trained with three layers and 500 epochs. Lastly, the LightGBM model was configured with 512 leaves, a learning rate of 0.05, a feature fraction of 0.7, and a depth of −1.

6.3. Performance Evaluation According to the AI Model

We conducted training and prediction using LSTM, which specializes in time-series data; an MLP, which offers advantages in extracting features from input data; and LightGBM, which is advantageous for low-power devices and shows high accuracy when sufficient data are available.

First, even after the MLP was trained with a sufficiently deep architecture and Bayesian optimization was applied, the model failed to learn small values below a certain threshold and tended to overpredict certain values, as shown in Figure 6. Even after increasing the number of layers, extending the training epochs, and raising the model complexity, the predictions remained inaccurate. Despite these optimization efforts, the model showed poor performance, with a CV(RMSE) of 29.32% and an R² of 0.2564, primarily due to overestimation tendencies. In particular, the prediction accuracy deteriorated as the data values approached zero. While various methods—such as changing the scaler and switching to LeakyReLU—were attempted, the previously mentioned values were the best. Therefore, the MLP is not suitable for predicting data from individual factories.

Additionally, owing to the nature of MLPs, increasing the model depth results in high resource consumption, making their operation difficult on low-power edge devices. Thus, MLPs may not be a suitable choice for this application.

When predictions were made using LSTM, we tested various window sizes and layer configurations to accommodate data from 60 min prior. However, as shown in Figure 7, the model tended to predict lower values more accurately than higher ones. Considering that splitting the timestamp might be disadvantageous for LSTM, we retrained the model without separating the timestamp components. Despite this adjustment, the model exhibited overfitting and low overall performance. The model achieved a CV(RMSE) of 26% and an

R^{2}

of 0.4124. Additionally, LSTM imposed a high computational load, making it a difficult algorithm to use. Similar to the MLP, LSTM becomes increasingly burdensome for edge devices as the window size increases, limiting its suitability for deployment in low-power environments.

However, as LSTM was more accurate at predicting lower-value data and the MLP performed better in the higher range, we explored a hybrid method that combines the strengths of both LSTM and the MLP (LSTM + MLP).

As shown in Figure 8, the hybrid model was able to predict both low and relatively high consumption values more effectively than the individual models. However, it still struggled to capture sudden spikes in special cases, leading to limitations in practical application. Although the model achieved an improved performance, with a CV(RMSE) of 20.48% and an R² of 0.6394, it did not meet the ASHRAE Guideline criteria, which require a CV(RMSE) below 30% and an R² above 0.8. These results indicate that the model lacks sufficient reliability for deployment in real-world environments.

LSTM struggled to learn the nonlinear and short-term changes in the compressed air flow of the factory. While the MLP showed relatively better performance when the high-dimensional output vectors from LSTM were used, information loss during input transfer reduced its effectiveness. However, the hybrid model also failed to meet the model explanatory standards outlined in the ASHRAE Guideline.

Moreover, despite being more effective than individual algorithms, the hybrid approach imposes a high computational load, making it unsuitable for deployment on low-spec devices. Therefore, after further consideration of the algorithms to be employed in this study, it was concluded that a gradient boosting algorithm would be more suitable, as it is well suited for regression tasks and effectively handles nonlinear and imbalanced data.

While gradient boosting machines can be too slow for low-performance environments such as edge devices, a viable solution was found in LightGBM, which was developed to overcome these limitations. This study, thus, focused on evaluating LightGBM in terms of prediction accuracy and execution time. LightGBM proved to be a more suitable algorithm owing to its low memory and CPU usage, operating primarily on the CPU.

The prediction results are shown in Figure 9. Similar to LSTM, LightGBM performs well on small datasets and handles highly specialized and outlier data effectively.

Given the nonlinear nature of the compressed air data, improved results were achieved by using features such as the hour, minute, day, month, and weekday. These variables helped the model better understand temporal trends, and it can be inferred that the model effectively captured patterns specific to certain time periods. Accordingly, LightGBM was found to be a highly suitable algorithm, achieving a CV(RMSE) of 14.36% and an

R^{2}

of 0.8240.

Figure 10 illustrates feature importance. The results show that the sensor data from the previous 60 min had the highest correlation with the target variable, followed by month, day, weekday, and hour, which also had high correlations. However, the minute variable showed the lowest correlation.

It is difficult to conclude that a model is suitable for deployment on an edge device based solely on its accuracy. Therefore, a test set was created from the current data to evaluate the ability of the model to predict values 60 min in advance. The determined accuracy was assessed, and the time taken for execution was measured accordingly.

To test the practical application of the AI model, the actual values from February 18 were compared with the values predicted for the factory using the virtual energy utility plant, as shown in Figure 11. The comparison revealed an average error rate of approximately 13.03%, corresponding to an accuracy of approximately 86.97%. Moreover, it was confirmed that predicting 110 data points took only 0.41 s, which demonstrated that the system can perform stable, real-time predictions in environments where one prediction per minute is sufficient.

Data were received continuously at 1 min intervals and stored using RocksDB. Experiments were conducted to test how resource usage is affected during real-time data collection and prediction. Data insertion consumed approximately 5% of the CPU and approximately 30 MB of memory until a flush occurred. Additionally, during the prediction process, data retrieval consumed an average of 30% of the CPU and approximately 83 MB of memory. The prediction used approximately 30 MB of memory and an average of 25% CPU. Therefore, the memory consumption totaled approximately 143 MB per minute: 30 MB during data reception, 83 MB for data retrieval, and 30 MB for prediction. In terms of CPU usage, data collection used approximately 5%, whereas data retrieval and prediction consumed 30% and 25%, respectively, resulting in a combined CPU usage of approximately 55%. These results indicate that performing predictions every minute does not pose a significant system load. Hence, the use of LightGBM for prediction and RocksDB for data storage proved to be well-suited choices, specifically optimized for deployment on low-spec edge devices. A comparison of the AI algorithms is shown in Table 4.

7. Discussion and Limitations

In this study, we analyzed real-time sensor data using the LightGBM model to predict factory operation rates. The prediction results were generally satisfactory, and the model demonstrated stable performance even in the presence of multivariate time-series structures and missing data. These results highlight the model’s high potential for real-world industrial applications. LightGBM was particularly well suited for the conditions of this study, thanks to its fast training speed, automatic handling of missing values, and ability to model complex feature interactions efficiently.

Feature importance analysis revealed that certain sensor values and time-related variables significantly impacted prediction performance. This insight can support on-site operators in proactively conducting equipment inspections and making timely process adjustments. Moreover, the proposed model can continuously collect real-time data and perform ongoing updates, allowing for early detection of unexpected process interruptions or inefficiencies. Consequently, it has the potential to enhance productivity, reduce energy consumption, and lower equipment maintenance costs. If implemented in industrial environments, such a system could significantly improve the level of automation in decision making based on predictive analytics.

However, several limitations were identified. Predicting compressed air flow 15 min ahead using limited factory data was challenging due to the nonlinear nature of the data. To address the limited number of input variables, we decomposed timestamps into finer components. While this improved the accuracy, it also introduced some performance constraints. To mitigate these issues, we employed LightGBM for its efficiency in handling nonlinear relationships and low resource consumption.

For future work, we plan to integrate neural processing units (NPUs) to mitigate performance degradation and support the use of more sophisticated AI models. We also aim to incorporate additional measurements, such as power meter readings, to enhance predictive accuracy. Beyond standalone deployment on a single-board computer, future studies will explore more advanced algorithms that may surpass LightGBM in performance. In particular, we plan to investigate NPU-accelerated models for improved computational efficiency and predictive power [27]. Comparative evaluations with emerging models such as Prophet and gated recurrent units (GRUs) will also be conducted to further refine the predictive framework.

8. Conclusions

This study proposed PreEdgeDB, an independent platform designed for deployment in individual plants within a virtual energy utility plant. The study focused on transforming a database, which originally served as an edge DB for preprocessing and prediction, into a fully integrated platform capable of executing all necessary roles at the edge. Through the preprocessing system, missing and anomalous data were identified and stored accordingly. By labeling these data types, the system facilitated easier classification for future analysis. Additionally, the database was optimized for time-series data storage, and a system was created to transmit SQL data to the server system, enhancing its functionality as an edge device. Because existing prediction models have limitations in forecasting the energy demand for individual factories, the study implemented LightGBM, which achieved a higher accuracy than previous methods.

The results confirmed that when one data point per minute was received and predictions were made, the entire process—from data collection to prediction—was completed within 300 ms. Even with continuous data storage and compaction, which is a characteristic of RocksDB, CPU usage remained at approximately 60%, and memory usage remained below 150 MB. These outcomes demonstrate the feasibility of running the platform independently on low-spec edge devices in factory environments. The study successfully enhanced the practicality of edge computing for industrial settings by developing a low-power platform that replaced traditional server systems exceeding 1000 W with a 15 W solution capable of real-time prediction. This advancement enables improved factory automation and energy management, contributing not only to additional energy savings but also to a significant reduction in greenhouse gas emissions.

Future research will focus on further optimizing AI algorithms. Additionally, we aim to investigate how the use of NPUs can minimize the computational load during AI training and prediction while also exploring the potential for enhancing other aspects of the system.

Author Contributions

Conceptualization, W.C., J.G. and B.L.; methodology, W.C.; software, W.C.; writing—original draft preparation, W.C.; writing—review and editing, W.C.; visualization, W.C. and D.K.; supervision, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea, grant number 20202020900170.

Data Availability Statement

Due to privacy concerns, the data for this study cannot be made available.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

MLP	Multilayer perceptron
AI	Artificial intelligence
LSTM	Long short-term memory
DBMS	Database management system
CPU	Central processing unit

References

Ministry of Trade. Industry and Energy. 3rd Basic Energy Plan. Available online: https://www.korea.kr/briefing/pressReleaseView.do?newsId=156334773 (accessed on 27 March 2025).
Korea Energy Agency. Energy Usage Statistics for 2023. Available online: https://www.energy.or.kr/front/board/List9.do (accessed on 27 March 2025).
Kim, H. Transformation of Energy, Environmental Plant Industry, Carbon Neutrality, and Digitalization. Available online: http://www.engjournal.co.kr/news/articleView.html?idxno=2801 (accessed on 27 March 2025).
Facebook. RocksDB. Available online: https://rocksdb.org/ (accessed on 27 March 2025).
Zhu, G.; Liu, D.; Du, Y.; You, C.; Zhang, J.; Huang, K. Toward an intelligent edge: Wireless communication meets machine learning. IEEE Commun. Mag. 2020, 58, 19–25. [Google Scholar] [CrossRef]
Merenda, M.; Porcaro, C.; Iero, D. Edge machine learning for AI-enabled IoT devices: A Review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef] [PubMed]
Murshed, M.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine learning at the network edge: A survey. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
Yazici, M.T.; Basurra, S.; Gaber, M.M. Edge machine learning: Enabling smart internet of things applications. Big Data 2018, 6, 26. [Google Scholar] [CrossRef]
Verhelst, M.; Murmann, B. Machine learning at the edge. In NANO-CHIPS 2030: On-Chip AI for an Efficient Data-Driven World; Springer: Cham, Switzerland, 2020; pp. 293–322. [Google Scholar] [CrossRef]
Liu, F.; Tang, G.; Li, Y.; Cai, Z.; Zhang, X.; Zhou, T. A survey on edge computing systems and tools. Proc. IEEE 2019, 107, 1537–1562. [Google Scholar] [CrossRef]
Cho, W.; Lee, H.; Gu, J.-H. Optimization techniques and evaluation for building an integrated lightweight platform for AI and data collection systems on low-power edge devices. Energies 2024, 17, 1757. [Google Scholar] [CrossRef]
Karunaratna, S.; Maduranga, P. Artificial intelligence on single board computers: An experiment on sound event classification. In Proceedings of the 2021 5th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI), Colombo, Sri Lanka, 6–7 December 2021. [Google Scholar] [CrossRef]
Iqbal, U.; Davies, T.; Perez, P. A review of recent hardware and software advances in GPU-accelerated edge-computing single-board computers (SBCs) for computer vision. Sensors 2024, 24, 4830. [Google Scholar] [CrossRef] [PubMed]
Guato Burgos, M.F.; Morato, J.; Vizcaino Imacaña, F.P. A review of smart grid anomaly detection approaches pertaining to artificial intelligence. Appl. Sci. 2024, 14, 1194. [Google Scholar] [CrossRef]
Google. LevelDB. Available online: https://github.com/google/leveldb (accessed on 27 March 2025).
MySQL. MySQL. Available online: https://www.mysql.com/ (accessed on 27 March 2025).
Oracle. OracleDB. Available online: https://www.oracle.com/kr/database/ (accessed on 27 March 2025).
Microsoft. LightGBM. Available online: https://github.com/microsoft/LightGBM (accessed on 27 March 2025).
Lee, H.; Park, J.; Cho, W.; Kim, D.; Gu, J. Energy demand/supply prediction and simulator UI design for energy efficiency in the industrial complex. J. Converg. Cult. Technol. 2024, 10, 693–700. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
ASHRAE. ASHRAE Guideline. Available online: https://www.ashrae.org/technical-resources/standards-and-guidelines (accessed on 27 March 2025).
Wikipedia. Programmable Logic Controller. Available online: https://en.wikipedia.org/wiki/Programmable_logic_controller (accessed on 27 March 2025).
Wikipedia. OPC Unified Architecture. Available online: https://en.wikipedia.org/wiki/OPC_Unified_Architecture (accessed on 27 March 2025).
Wikipedia. Z-Score. Available online: https://en.wikipedia.org/wiki/Standard_score (accessed on 27 March 2025).
Wikipedia. Column Family. Available online: https://en.wikipedia.org/wiki/Column_family (accessed on 27 March 2025).
HardKernel. Odroid M1S. Available online: https://www.hardkernel.com/shop/odroid-m1s-with-8gbyte-ram-io-header/ (accessed on 27 March 2025).
Wikipedia. Neural Processing Unit. Available online: https://en.wikipedia.org/wiki/Neural_processing_unit (accessed on 27 March 2025).

Figure 1. Overall architecture of PreEdgeDB, illustrating the integrated workflow, from real-time sensor data collection, preprocessing for anomaly correction, time-series data storage using RocksDB, energy demand prediction with LightGBM, and optional transmission to a central server for advanced analytics.

Figure 2. Preprocessing system overview.

Figure 3. Overview of PreEdgeDB operation flow.

Figure 4. Column families assigned to RocksDB.

Figure 5. RocksDB key–value pair.

Figure 6. MLP prediction data graph.

Figure 7. LSTM prediction data graph.

Figure 8. Hybrid model prediction graph.

Figure 9. LightGBM prediction data graph.

Figure 10. LightGBM feature importance graph.

Figure 11. LightGBM prediction result graph for February 18.

Table 1. Comparison of related work and the proposed PreEdgeDB method.

Category	Related Work	PreEdgeDB
Edge device role	Primarily for data collection, with AI models offloaded to servers	Performs data collection, preprocessing, storage, and prediction independently
Database usage	Typically file DBs or MyRocks, configured arbitrarily	Optimized RocksDB for low-resource environments
AI model application	Small AI models offloaded to cloud or server	Runs LightGBM entirely on the edge, being capable of standalone operation
System architecture	Reliant on servers/cloud	Fully functional on local devices, with independent operation
Key limitation	Inability to adapt to low-resource devices	Capable of standalone operation on edge devices
Data type applied	Various sensor data and large-scale variables	Optimized for predicting single-factory data

Table 2. RocksDB parameter value table.

Parameter	Setting Value
Block_Cache_Size	256 MB
Write_Buffer_Size	32 MB
Max_Write_Buffer_Number	2
Level_Compaction_Dynamic_Level_Bytes	True
Max_Background_Compactions	1
Max_Background_Flushes	1
Compression	LZ4

Table 3. Data used for AI.

Utilized Data
Month
Day
Hour
Minute
Weekday
Compressed air flow

Table 4. Summary of AI algorithm effects.

AI Algorithm	CV(RMSE)	$R^{2}$
MLP	29.32%	0.2564
LSTM	26%	0.4124
Hybrid model	20.48%	0.6394
LightGBM	14.36%	0.8240

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cho, W.; Kim, D.; Lim, B.; Gu, J. PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices. Electronics 2025, 14, 1912. https://doi.org/10.3390/electronics14101912

AMA Style

Cho W, Kim D, Lim B, Gu J. PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices. Electronics. 2025; 14(10):1912. https://doi.org/10.3390/electronics14101912

Chicago/Turabian Style

Cho, Woojin, Dongju Kim, Byunghyun Lim, and Jaehoi Gu. 2025. "PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices" Electronics 14, no. 10: 1912. https://doi.org/10.3390/electronics14101912

APA Style

Cho, W., Kim, D., Lim, B., & Gu, J. (2025). PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices. Electronics, 14(10), 1912. https://doi.org/10.3390/electronics14101912

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PreEdgeDB: A Lightweight Platform for Energy Prediction on Low-Power Edge Devices

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Data Preprocessing

3.2. RocksDB

3.3. Artificial Intelligence

3.3.1. MLP

3.3.2. LSTM

3.3.3. LightGBM

3.4. Performance Metrics for AI Models

4. Preprocessing Methodology

5. Time-Series Data Optimization in PreEdgeDB

6. Experimental Evaluation

6.1. Used Data Type

6.2. Dataset and Experimental Settings

6.3. Performance Evaluation According to the AI Model

7. Discussion and Limitations

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI