A Study on Acer Mono Sap Integration Management System Based on Energy Harvesting Electric Device and Sap Big Data Analysis Model

: This study set out to invent an Information and Communication Technologies (ICT)-based smart Acer mono sap collection electric device to make efficient use of the labor force by reducing inefficient activities of old manual work to record sap exudation and state information. Based on the assumption that environmental information would have close connections with Acer mono sap exudation to reinforce the competitive edge of production in forest products, the study analyzed correlations between Acer mono sap exudation and environmental information and predicted Acer mono exudation. A smart collection of electric devices would gather data about Acer mono sap exudation per hour on outdoor temperature, humidity, conductivity, and wind direction and velocity, and was installed in four areas in the Republic of Korea, including Sancheong, Gwangyang, Geoje, and Inje. Collected data were used to analyze correlations between environmental information and Acer mono sap exudation using four different algorithms, including linear regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), and random forest, to predict Acer mono sap exudation. Remarkable outcomes were obtained across all the algorithms except for linear regression, demonstrating close connections between environmental information and Acer mono sap exudation. The random forest model, which showed the most outstanding performance, was used to make a mobile app capable of providing predicted Acer mono sap exudation and collected environmental information.


Introduction
Entering the Fourth Industrial Revolution era in recent years, researchers are conducting various studies with core technologies of the Fourth Industrial Revolution, including big data, artificial intelligence, and the Internet of Things, across a range of various fields [1,2].
Ref. [3] proposed an idea of increasing the reliability of the agriculture journal by saving the data of product conditions and controlled environments automatically and entering the multimedia data of products. It consisted of soil sensors for the cultivation plot, internal and external sensors for the cultivation field, a database of cultivation environments, a middle layer encompassing videos, sensors, and server management, and a management layer providing users with a Graphical User Interface(GUI). A farming journal was designed to record pests and diseases predictions as well as general work and check the data inserted in videos, voices, texts, and images. Ref. [4] proposed a system to manage and monitor the growth and development environment of a crop to increase its yield. The proposed temperatures will rise by 4 • C across the Korean Peninsula in the end of the 22nd century and starting from the end of the 21st century and that daily lows will rise more than daily highs with the annual range dropping by 1.7 • C. It is also predicted that precipitation will increase by 17% across all the regions of the peninsula. Such weather changes will likely have enormous impacts on agriculture and forestry on the peninsula. Forest products with the most unfavorable cultivation conditions will be the most vulnerable to such weather changes. If it is feasible to obtain accurate information about the supporting capacity of production-based elements and the major factors of cultivation management to reinforce the productive competitive edge of forest products, it will be possible to predict outputs according to the major cultivation conditions of trees in forestry, including changing weather conditions and unusual weather events based on the alteration of statistical outputs. In the Republic of Korea, Acer mono is an important tree species to collect sap from. Acer mono is a broadleaf tree in the family of Aceraceae and called the maple tree in North America. In the Republic of Korea, major producing areas of Acer mono sap include Inje, Gwangyang, and Sancheong that are usually in alpine zones 500 m above sea level. Given the characteristics of Acer mono found in rugged mountains where its management is difficult, the work of managing the tree species and collecting its sap require substantial labor force and is accompanied by accident risk. Despite its unfavorable conditions, however, Acer mono sap holds a big part in farmers' income in the Jeonnam region and is managed for research purposes. The old management system, however, demands that people should check and record Acer mono sap exudation in person, thus having a couple of disadvantages, including the inaccuracy of recorded information and difficulty with the efficient use of the information. And various fields have conducted research on energy collection with various new renewable energy sources including thermal, piezoelectric and vibration with regard to energy harvesting. In recent years, IoT and various devices require energy supply and raise a need for energy self-sufficient IoT devices capable of self-supply of energy. Research is underway on energy collection devices combined with IoT devices [1,2,7]. Forest products in deep mountains or alpine zones pose many limits due to their extreme geographical conditions. For data analysis, data should be collected in such alpine zones where there is no smooth supply of electricity. When batteries are used, they are drained quickly due to low temperature, which make it difficult to collect data normally. These problems can be solved with a self-sufficient supply of energy in big data collection devices.
The present study decided to apply the energy harvesting technology to solve these problems. And this study thus set out to develop an ICT-based smart Acer mono sap collection device to promote the efficient utilization of labor force and reduce accident risk by cutting down unnecessary activities, including the manual recording of sap exudation in previous studies, collecting eight factors of environmental data and sap exudation within an hour. Based on farmers' experiences to suggest close connections between environmental information and Acer mono sap exudation, the study analyzed correlations between them with linear regression, SVM, ANN and random forest and tested a hypothesis with a prediction model for Acer mono sap outputs by the algorithm. Of these prediction models embodied in the study, one was selected for its great availability for a mobile app based on learning hours, prediction hours, and prediction accuracy to provide such data via a mobile app along with environmental information collected with a smart collection device. Figure 1 shows the overall block diagram of the proposed system. The proposed Acer mono sap storage system consists of hardware and software. The hardware of the system consists of three major parts: a collection device to store sap from Acer mono trees, a big data collection device for data from environmental sensors in and outside the collection device, and a data transmission device to send collects data to the server. The Acer mono sap collection device was made of stainless steel in 1000 L volume. The data collector collected the data of water level, pH, temperature and humidity inside the collection device and the data of outdoor temperature and humidity, ground temperature and Electronics 2020, 9,1979 4 of 28 humidity, solar radiation, conductivity, wind direction, and velocity. The data transmission device sends data collected from the data collector to the external server via Ethernet and LET communication. In addition, the system software was comprised of an Android-based app to print out data collected from the big data collection device to check it on a mobile terminal in real time and analysis software to analyze correlations between various pieces of environmental information and Acer mono sap yields. The analysis software preprocessed collected data, analyzed correlations with such algorithms as linear regression, SVM, ANN, and random forest, predicted Acer mono sap exudation, and presented the outcomes on a mobile app, which also shows the information about the volume and state of collected sap and external environmental data as well as predicted Acer mono sap exudation in graphs or tables.

Overall Block Daigram of Proposed System
Electronics 2020, 9, x FOR PEER REVIEW 4 of 30 volume. The data collector collected the data of water level, pH, temperature and humidity inside the collection device and the data of outdoor temperature and humidity, ground temperature and humidity, solar radiation, conductivity, wind direction, and velocity. The data transmission device sends data collected from the data collector to the external server via Ethernet and LET communication. In addition, the system software was comprised of an Android-based app to print out data collected from the big data collection device to check it on a mobile terminal in real time and analysis software to analyze correlations between various pieces of environmental information and Acer mono sap yields. The analysis software preprocessed collected data, analyzed correlations with such algorithms as linear regression, SVM, ANN, and random forest, predicted Acer mono sap exudation, and presented the outcomes on a mobile app, which also shows the information about the volume and state of collected sap and external environmental data as well as predicted Acer mono sap exudation in graphs or tables.

Design of Acer Mono Sap Collection Device
The hardware of the data collection system for Acer mono environment and sap consists of three major parts: the big data collection device to collect data about the meteorological environments of Acer mono sap producing areas scattered in vast zones and the data of sap quality, the ICT-based smart Acer mono sap collection device needed for big data collection, and the communication relay device to collect the sensing data of the sap collection device. Figure 1 (Left) shows the block diagram of the proposed system hardware. Figure 2 presents the blueprint of the smart collection tank. The old ones simply stored Acer mono sap and did almost nothing for the quality management of collected Acer mono sap in a plastic container.

Design of Acer Mono Sap Collection Device
The hardware of the data collection system for Acer mono environment and sap consists of three major parts: the big data collection device to collect data about the meteorological environments of Acer mono sap producing areas scattered in vast zones and the data of sap quality, the ICT-based smart Acer mono sap collection device needed for big data collection, and the communication relay device to collect the sensing data of the sap collection device. Figure 1 (Left) shows the block diagram of the proposed system hardware. Figure 2 presents the blueprint of the smart collection tank. The old ones simply stored Acer mono sap and did almost nothing for the quality management of collected Acer mono sap in a plastic container.
In the present study, a 1000-L Acer mono sap collection tank was made of stainless steel to prevent corrosion. Measuring devices were added to it to measure temperature, water level, and pH along with communication nodes to collect data. The proposed collection tank was also designed to ensure stable energy supply and eliminate any need for battery replacement by using solar energy panels to control and make use of the energy collected. The energy generated in the photovoltaic modules was supplied to the Acer mono sap collection device and data collection device. The optimal capacity of the energy harvesting device was designed based on the sensors to collect meteorological data and the electric power load and number of sunless days of the sap and data collection devices. Figure 3 shows the power circuit diagram of the energy harvesting-based data collection device. The input power was DC 12 V designed for a stable power supply on the communication board. The power consumption was designed in 3.3 V under 0.5 A for efficient power consumption.
smart Acer mono sap collection device needed for big data collection, and the communication relay device to collect the sensing data of the sap collection device. Figure 1 (Left) shows the block diagram of the proposed system hardware. Figure 2 presents the blueprint of the smart collection tank. The old ones simply stored Acer mono sap and did almost nothing for the quality management of collected Acer mono sap in a plastic container.  In the present study, a 1000-L Acer mono sap collection tank was made of stainless steel to prevent corrosion. Measuring devices were added to it to measure temperature, water level, and pH along with communication nodes to collect data. The proposed collection tank was also designed to ensure stable energy supply and eliminate any need for battery replacement by using solar energy panels to control and make use of the energy collected. The energy generated in the photovoltaic modules was supplied to the Acer mono sap collection device and data collection device. The optimal capacity of the energy harvesting device was designed based on the sensors to collect meteorological data and the electric power load and number of sunless days of the sap and data collection devices. Figure 3 shows the power circuit diagram of the energy harvesting-based data collection device. The input power was DC 12 V designed for a stable power supply on the communication board. The power consumption was designed in 3.3 V under 0.5 A for efficient power consumption.     Figure 4 shows a block diagram of the control panel of the smart sap collection device. The control panel was designed for the stable acquisition of sensing data with a programmable logic controller. It can check the current state of Acer mono sap with an electrical box and was designed as a dust-and water-proof panel.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 30 In the present study, a 1000-L Acer mono sap collection tank was made of stainless steel to prevent corrosion. Measuring devices were added to it to measure temperature, water level, and pH along with communication nodes to collect data. The proposed collection tank was also designed to ensure stable energy supply and eliminate any need for battery replacement by using solar energy panels to control and make use of the energy collected. The energy generated in the photovoltaic modules was supplied to the Acer mono sap collection device and data collection device. The optimal capacity of the energy harvesting device was designed based on the sensors to collect meteorological data and the electric power load and number of sunless days of the sap and data collection devices. Figure 3 shows the power circuit diagram of the energy harvesting-based data collection device. The input power was DC 12 V designed for a stable power supply on the communication board. The power consumption was designed in 3.3 V under 0.5 A for efficient power consumption.

Design of Monitoring System S/W
The software of the data collection system for Acer mono environment and sap quality used Android-based user monitoring interface. The user interface worked to receive and print out sensor data collected from the data and sap collection devices and save the data in the database after transmitting it to the web server. Figure 5 shows a block diagram of the proposed monitoring software. Figure 6 presents the user's smartphone application class diagram.

Design of Monitoring System S/W
The software of the data collection system for Acer mono environment and sap quality used Android-based user monitoring interface. The user interface worked to receive and print out sensor data collected from the data and sap collection devices and save the data in the database after transmitting it to the web server. Figure 5 shows a block diagram of the proposed monitoring software. Figure 6 presents the user's smartphone application class diagram.     Classes can be defined according to required functions. Methods are processed with each UI button. The login class is for entry into the system and asks the user to provide his or her ID and pin number in the authentication process. The onResume() method is used to generate the instances of connector class and connect them to the server. The login button click event method is used to check IDs and pin numbers. Once users succeed with the login, they will move to the main class, which is connected to the other classes to create instances for each class and enable a transfer to them. The WatertankManager class works to manage the volume of sap and collection in the collection devices and save the data in the database. The WatertankMonitor, WeatherMonitor, and SensorMonitor classes bring the data tables of a farm and show the data about its sap collection devices and weather or sensor data in the button click event method. Figure 7 shows the flow chart of data analysis for the Acer mono sap exudation prediction model proposed in the present study. It consists of four stages. First, data includes the data collected in the study and data from the farmers' manual books or data loggers. Second, data preprocessing involved unifying the parameters of collected data and removing missing values and outliers having adverse effects on the data analysis. Third, models were tested by selecting an optimal one for each of the algorithms including linear regression, support vector machine, artificial neural network, and random forest. Finally, the most efficient model applicable to a mobile app was chosen to reflect predicted Acer mono sap exudation to a mobile app by comparing the optimal models for each algorithm in predicted accuracy and time.

Design of Acer Mono Sap Data Analysis S/W
Electronics 2020, 9, x FOR PEER REVIEW 7 of 30 Classes can be defined according to required functions. Methods are processed with each UI button. The login class is for entry into the system and asks the user to provide his or her ID and pin number in the authentication process. The onResume() method is used to generate the instances of connector class and connect them to the server. The login button click event method is used to check IDs and pin numbers. Once users succeed with the login, they will move to the main class, which is connected to the other classes to create instances for each class and enable a transfer to them. The WatertankManager class works to manage the volume of sap and collection in the collection devices and save the data in the database. The WatertankMonitor, WeatherMonitor, and SensorMonitor classes bring the data tables of a farm and show the data about its sap collection devices and weather or sensor data in the button click event method. Figure 7 shows the flow chart of data analysis for the Acer mono sap exudation prediction model proposed in the present study. It consists of four stages. First, data includes the data collected in the study and data from the farmers' manual books or data loggers. Second, data preprocessing involved unifying the parameters of collected data and removing missing values and outliers having adverse effects on the data analysis. Third, models were tested by selecting an optimal one for each of the algorithms including linear regression, support vector machine, artificial neural network, and random forest. Finally, the most efficient model applicable to a mobile app was chosen to reflect predicted Acer mono sap exudation to a mobile app by comparing the optimal models for each algorithm in predicted accuracy and time.

Big Data Collection
In the present study, data of temperature, humidity, and Acer mono sap for three years was collected from smart sap collection devices attached to 50 Acer mono trees that were 30 years old or older in Sancheong, Gwangyang, Geoje, and Inje. The data was transmitted by an hour and collected on a daily basis. In addition, data was also collected from Acer mono sap farmers' manual books and data loggers.

Big Data Preprocessing
Outlier data was removed in the preprocessing stage since it could have impacts on the accuracy of Acer mono sap prediction before predicting Acer mono sap exudation with the prediction models. Collected data might include cases influencing Acer mono sap exudation such as the suspension of sap collection due to the sap capacity saturation in the collection device for the day, the cleaning of the rubber tubes to transmit Acer mono sap to the collection devices, and artificial and external problems including damage to the rubber tubes by wild animals. Such data were deemed outliers and thus removed. In addition, missing values whose marks were omitted from the data sets were also removed. The exudation volume was rounded off to a 1 L unit to reduce complexity.

Big Data Type
Data was comprised of components in different forms according to the different collection methods described above. The common elements of different data sets were selected and unified into the components in Table 1 to integrate data sets in different forms into a single form. The integrated data sets had such components as average temperature, highs and lows, daily temperature range, maximum and minimum humidity, and exudation. Up to 66 L of exudation was collected for data. The entire data of 408,864 was randomly divided into learning data of 75% and test data of 25% for the learning and testing of Acer mono sap exudation predictions models. Table 2 shows the data distribution at an interval of approximately 10 L for the rough distribution forms of classified data. The exudation in the range of 60∼66 was an extremely rare case. Since there was an exudation event for each liter, only learning data were organized. Of the data applied to the study, the avg_temp was in the range of −17.3~23 • C; hight_temp out of a range of −8.9~24.1 • C; the low_temp was in a range of −21.3~20.1 • C; the daily_temp in a range of 0.3~28 • C; hight_humi in a range of 3.2~100%; low_humi in a range of 0.7~100%; precipitation in the range of 0~96 L; and Acer mono sap yielded a range of 0~66 L. A total of 3,270,912 (408,864 × 8) pieces of data were used. Figure 8 shows the amount of data used in the study. The data was mostly concentrated in a certain range. Since graphs displayed numbers too large to be expressed in one place of decimals, some data was not expressed properly. Precipitation was, in particular, classified in details based on numbers. There was an overwhelming number of days when precipitation was 0 (no rain), which made it impossible to express it properly. As for the avg_temp, 3.2 • C recorded the highest numbers at 3754. 7.6 • C recorded the highest numbers at 3124 in hight_temp; −0.3 • C at 5916 in low_temp; 9.3 • C at 4168 in daily_temp; 48% at 1184 in hight_humi; 23% at 1558 in low_humi; 0 L at 403,290 in precipitation; and 0 L at 169,397 in RISE. Of the data applied to the study, the avg_temp was in the range of −17.3~23 °C; hight_temp out of a range of −8.9~24.1 °C; the low_temp was in a range of −21.3~20.1 °C; the daily_temp in a range of 0.3~28 °C; hight_humi in a range of 3.2~100%; low_humi in a range of 0.7~100%; precipitation in the range of 0~96 L; and Acer mono sap yielded a range of 0~66 L. A total of 3,270,912 (408,864 × 8) pieces of data were used. Figure 8 shows the amount of data used in the study. The data was mostly concentrated in a certain range. Since graphs displayed numbers too large to be expressed in one place of decimals, some data was not expressed properly. Precipitation was, in particular, classified in details based on numbers. There was an overwhelming number of days when precipitation was 0 (no rain), which made it impossible to express it properly. As for the avg_temp, 3.2 °C recorded the highest numbers at 3754. 7.6 °C recorded the highest numbers at 3124 in hight_temp; −0.3 °C at 5916 in low_temp; 9.3 °C at 4168 in daily_temp; 48% at 1184 in hight_humi; 23% at 1558 in low_humi; 0 L at 403,290 in precipitation; and 0 L at 169,397 in RISE.  Figure 9a presents the prototype of the smart sap collection tank. Based on the blueprint, a 1000 L Acer mono sap collection tank was made of stainless steel to prevent corrosion. Figure 9b presents the prototype of a new renewable energy-based energy harvesting device, which was installed in the collection areas of Acer mono sap. New renewable energy was used to ensure the smooth supply of electricity to the data collection devices and the generation, storage and supply of electricity. Electrical boxes were added to prevent damage by an external environment and provide dust-and water-proof functions. The new renewable energy system was in a modular structure comprised of storage devices, solar modules, and solar charging controllers for efficiently new renewable energy combinations for the external environment. For its performance assessment, the energy harvesting device was installed in Gwangyang, Jeollanam Province and Sancheong, Gyeongsangnam Province to ensure the smooth supply of power to the sensor nodes.  Figure 9a presents the prototype of the smart sap collection tank. Based on the blueprint, a 1000 L Acer mono sap collection tank was made of stainless steel to prevent corrosion. Figure 9b presents the prototype of a new renewable energy-based energy harvesting device, which was installed in the collection areas of Acer mono sap. New renewable energy was used to ensure the smooth supply of electricity to the data collection devices and the generation, storage and supply of electricity. Electrical boxes were added to prevent damage by an external environment and provide dust-and water-proof functions. The new renewable energy system was in a modular structure comprised of storage devices, solar modules, and solar charging controllers for efficiently new renewable energy combinations for the external environment. For its performance assessment, the energy harvesting device was installed in Gwangyang, Jeollanam Province and Sancheong, Gyeongsangnam Province to ensure the smooth supply of power to the sensor nodes. Figure 10 shows the hardware of communication nodes in the sap collection tank. The nodes collected data of the tank temperature, water level, and pH, and transmits the data via the gateway. Figure 11 shows the hardware of the multi-channel gateway. The nodes were connected via Ethernet and LET modules for the collection and processing of data transmitted from multiple sensing devices. Outdoor and indoor electrical boxes were made by applying dust-and water-proof features according to the poor external environment so that the multiple-channel gateway could withstand the external environment.    Figure 11 shows the hardware of the multi-channel gateway. The nodes were connected via Ethernet and LET modules for the collection and processing of data transmitted from multiple sensing devices. Outdoor and indoor electrical boxes were made by applying dust-and water-proof features according to the poor external environment so that the multiple-channel gateway could withstand the external environment.     Figure 11 shows the hardware of the multi-channel gateway. The nodes were connected via Ethernet and LET modules for the collection and processing of data transmitted from multiple sensing devices. Outdoor and indoor electrical boxes were made by applying dust-and water-proof features according to the poor external environment so that the multiple-channel gateway could withstand the external environment.      Figure 11 shows the hardware of the multi-channel gateway. The nodes were connected via Ethernet and LET modules for the collection and processing of data transmitted from multiple sensing devices. Outdoor and indoor electrical boxes were made by applying dust-and water-proof features according to the poor external environment so that the multiple-channel gateway could withstand the external environment. Figure 11. The board of multi-channel gateway. Figure 11. The board of multi-channel gateway. Figure 12 shows the GUI for users to monitor farmers' sensing information with a smartphone. The monitoring service consists of sensor data by the hour and date and water level data by the hour and date. Sensor data monitoring by the hour and date offers data of atmospheric temperature and humidity, ground temperature and humidity, EC, solar radiation, and wind direction and velocity.

Implementation of Acer Mono Sap Monitoring System
Water level data monitoring by the hour and date helps to check water level changes according to sap collection through the water level sensors in the sap collection devices. Figure 12 shows the GUI for users to monitor farmers' sensing information with a smartphone. The monitoring service consists of sensor data by the hour and date and water level data by the hour and date. Sensor data monitoring by the hour and date offers data of atmospheric temperature and humidity, ground temperature and humidity, EC, solar radiation, and wind direction and velocity. Water level data monitoring by the hour and date helps to check water level changes according to sap collection through the water level sensors in the sap collection devices.

Evaluation of Acer Mono Sap Output Amount Prediction Model
Based on an assumption that environmental elements around Acer mono trees would have impacts on Acer mono sap exudation based on farmers' experiences of increasing Acer mono sap exudation according to big daily temperature range due to the osmotic pressure effects and drying and decreasing Acer mono sap exudation according to increased temperature, the present study designed an Acer mono sap exudation prediction model with a total of seven parameters: average temperature, high, low, daily temperature range, maximum humidity, minimum humidity, and precipitation according to four algorithms, i.e., Linear regression [26][27][28][29][30][31][32][33][34][35], SVM [36][37][38][39][40][41][42][43][44][45][46], ANN [47][48][49][50][51][52][53][54][55][56], and Random forest [57][58][59][60][61]. Linear regression predicts and classifies based on linear regression equations derived from the analysis of correlations between dependent and independent variables. This is a technique designed to classify data that cannot be classified linearly, SVM maps data on a hyperplane, defines a decision boundary, and classifies according to the decision boundary. Random forests are an ensemble technique-based classification method of making multiple decision-making trees, gathering classification results from the trees, and presenting a final decision based on the majority result of the most choices. ANN is a machine learning algorithm mimicking the principle and structure of the human neural network. It consists of the input, hidden, and output layer. It solves a problem by learning to find optimal weight and bias with an activation function. The present study found that environmental elements (temperature, humidity, and more) had huge impacts on the yield of Acer mono sap and confirmed first-hand that some of the elements had direct effects on it based on the Acer mono data. Based on the results of previous studies, the study used linear regression to figure out whether there were linear relations between environmental elements and Acer mono sap yield. SVM was also used in a multidimensional mapping method to perform classification based on many different environmental elements. In addition, random forests were used to perform classification according to relationships among the environmental elements and their value. Finally, ANN of high performance in regression and classification was used to develop a model. The models were compared in accuracy with grid searches for hyper-parameters or hidden layers to build and test an optimal model. The optimal models were then compared and analyzed in accuracy, learning time and prediction time by the algorithm to choose one applicable to a mobile app. Figure 13 shows a confusion matrix to explain the prediction accuracy indicator. Equation (1)  equations derived from the analysis of correlations between dependent and independent variables. This is a technique designed to classify data that cannot be classified linearly, SVM maps data on a hyperplane, defines a decision boundary, and classifies according to the decision boundary. Random forests are an ensemble technique-based classification method of making multiple decision-making trees, gathering classification results from the trees, and presenting a final decision based on the majority result of the most choices. ANN is a machine learning algorithm mimicking the principle and structure of the human neural network. It consists of the input, hidden, and output layer. It solves a problem by learning to find optimal weight and bias with an activation function. The present study found that environmental elements (temperature, humidity, and more) had huge impacts on the yield of Acer mono sap and confirmed first-hand that some of the elements had direct effects on it based on the Acer mono data. Based on the results of previous studies, the study used linear regression to figure out whether there were linear relations between environmental elements and Acer mono sap yield. SVM was also used in a multidimensional mapping method to perform classification based on many different environmental elements. In addition, random forests were used to perform classification according to relationships among the environmental elements and their value. Finally, ANN of high performance in regression and classification was used to develop a model. The models were compared in accuracy with grid searches for hyper-parameters or hidden layers to build and test an optimal model. The optimal models were then compared and analyzed in accuracy, learning time and prediction time by the algorithm to choose one applicable to a mobile app. Figure 13 shows a confusion matrix to explain the prediction accuracy indicator. Equation (1)   (1)

Linear Regression Model
Based on an assumption that surrounding environmental elements would have impacts on Acer mono sap exudation, the present study selected linear regression as an Acer mono sap exudation prediction model to figure out whether there were linear relations among the elements. The linear regression model underwent OLS to judge and select significant parameters or input variables. In addition, scikit-learn was used to design a multiple linear regression model [26][27][28][29][30][31][32][33][34][35].

Linear Regression Model
Based on an assumption that surrounding environmental elements would have impacts on Acer mono sap exudation, the present study selected linear regression as an Acer mono sap exudation prediction model to figure out whether there were linear relations among the elements. The linear regression model underwent OLS to judge and select significant parameters or input variables. In addition, scikit-learn was used to design a multiple linear regression model [26][27][28][29][30][31][32][33][34][35].
OLS OLS (ordinary least square) is the most basic deterministic linear regression method to obtain the weighted value vector to minimize the residual sum of squares with matrix differentials [31][32][33][34][35]. It can help to check the coefficient and significance probability of each variable by applying regression analysis to preprocessed data. In the data preprocessed in OLS, it was examined whether the independent variables would have effects on the dependent ones (significance probability) to select input variables for the multiple linear regression model. Table 3 shows the outcomes of OLS with all the data after preprocessing. The significance probability of variables was 0.05 or lower in all cases, which means that all seven variables had significant meanings and were thus chosen for the multiple linear regression model as input variables.  Figure 14 shows the correlations between each parameter and amount of sap. There were negative correlations between them except for daily temperature range (daily_temp). The volume of exudation had positive correlations with average temperature (avg_temp) in OLS, which indicates that the independent variables were exchanging influence with one another and that multiple linear regression rather than simple linear regression would be valid for an Acer mono sap exudation prediction model. Equation (2) is for the multiple linear regression model reflecting the OLS outcomes.
OLS OLS (ordinary least square) is the most basic deterministic linear regression method to obtain the weighted value vector to minimize the residual sum of squares with matrix differentials [31][32][33][34][35]. It can help to check the coefficient and significance probability of each variable by applying regression analysis to preprocessed data. In the data preprocessed in OLS, it was examined whether the independent variables would have effects on the dependent ones (significance probability) to select input variables for the multiple linear regression model. Table 3 shows the outcomes of OLS with all the data after preprocessing. The significance probability of variables was 0.05 or lower in all cases, which means that all seven variables had significant meanings and were thus chosen for the multiple linear regression model as input variables.  Figure 14 shows the correlations between each parameter and amount of sap. There were negative correlations between them except for daily temperature range (daily_temp). The volume of exudation had positive correlations with average temperature (avg_temp) in OLS, which indicates that the independent variables were exchanging influence with one another and that multiple linear regression rather than simple linear regression would be valid for an Acer mono sap exudation prediction model. Equation (2) is for the multiple linear regression model reflecting the OLS outcomes.   Figure 15 below shows the analysis results based on the interpretations of Pearson's correlation coefficients between the environmental elements and Acer mono sap yield, which had clear correlations with the avg_temp, low_temp, daily_temp, and low_humi. All the remaining elements had negative correlations with it except for daily_temp. The analysis results indicate that Acer mono sap will record greater yield according to lower average temperature and lows, higher daily temperature range, and lower humidity. (2) Figure 15 below shows the analysis results based on the interpretations of Pearson's correlation coefficients between the environmental elements and Acer mono sap yield, which had clear correlations with the avg_temp, low_temp, daily_temp, and low_humi. All the remaining elements had negative correlations with it except for daily_temp. The analysis results indicate that Acer mono sap will record greater yield according to lower average temperature and lows, higher daily temperature range, and lower humidity.

Result of Linear Regression Model
The multiple linear regression model provided its accuracy results of exudation prediction in MAE (mean absolute error), RMSE (root mean squared error), and . Equation (3) presents methods of expressing prediction accuracy. MAE converts differences between actual and predicted values into absolute ones and obtained their means. RMSE is the root of the mean of difference squares between actual and predicted values.
is the indicator of distribution rate for predicted values against actual ones. The closer it is to 1, the higher prediction accuracy it is. Table 4 shows the prediction results of exudation with the linear regression model. The error mean of absolute values between actual and predicted ones was approximately 5.5. The difference from the observation in an actual environment was 7.12 with a prediction accuracy of 0.649.

Result of Linear Regression Model
The multiple linear regression model provided its accuracy results of exudation prediction in MAE (mean absolute error), RMSE (root mean squared error), and R 2 . Equation (3) presents methods of expressing prediction accuracy. MAE converts differences between actual and predicted values into absolute ones and obtained their means. RMSE is the root of the mean of difference squares between actual and predicted values. R 2 is the indicator of distribution rate for predicted values against actual ones. The closer it is to 1, the higher prediction accuracy it is. Table 4 shows the prediction results of exudation with the linear regression model. The error mean of absolute values between actual and predicted ones was approximately 5.5. The difference from the observation in an actual environment was 7.12 with a prediction accuracy of 0.649.  Figure 16 shows the current prediction of linear regression. The x axis represents the amount of actual sap, while the y axis represents the predicted amount of sap. The outcomes are presented in dots and the correct answer (red line) of linear function in the y = x form of match between predicted values and correct ones. These outcomes form a prolonged rod shape along the y axis and fail to keep an interval from one another compared with the interval on the x axis, which suggests that the predicted outcomes were printed out in rational numbers rather than integers. Another peculiar aspect to the outcomes is negative values in many predictions. There was also a broad distribution of predicted values lower than the correct ones for the entire data along the red line in the graph. The causes can be found in Equation (3), in which most variables' weighted values were negative. As a result, predicted values were lower than correct ones in general. These negative predicted values seem to have happened when Acer mono sap was frozen due to low average temperature (x 1 ) and there was no exudation of Acer mono sap due to absence of osmotic action according to low daily temperature range (x 2 ).
values and correct ones. These outcomes form a prolonged rod shape along the y axis and fail to keep an interval from one another compared with the interval on the x axis, which suggests that the predicted outcomes were printed out in rational numbers rather than integers. Another peculiar aspect to the outcomes is negative values in many predictions. There was also a broad distribution of predicted values lower than the correct ones for the entire data along the red line in the graph. The causes can be found in Equation (3), in which most variables' weighted values were negative. As a result, predicted values were lower than correct ones in general. These negative predicted values seem to have happened when Acer mono sap was frozen due to low average temperature ( ) and there was no exudation of Acer mono sap due to absence of osmotic action according to low daily temperature range ( ).

Support Vector Machine Model
The present study chose SVM (support vector machine) with excellent efficiency in high dimensions as a regression analysis model to be compared with the linear regression model. SVM was embodied with scikit-learn. Since Acer mono sap exudation was predicted in high dimensions with seven parameters, the RBF kernel that was efficient even in high dimensions was used to search for an optimal model [36][37][38][39][40][41][42][43][44][45][46].

Optimization of SVM Model
In SVM, the RBF kernel optimizes a model by regulating gamma and the curvature of the boundary decision according to the influential distance of a data sample. A model was optimized by regulating C (cost) and thus the possibilities of data outliers. Table 5 shows accuracy according to gamma and C values. When C was 0.01 or lower, too many outliers were allowed, which resulted in underfitting. When gamma was lower than 0.001, underfitting happened in which accurate

Support Vector Machine Model
The present study chose SVM (support vector machine) with excellent efficiency in high dimensions as a regression analysis model to be compared with the linear regression model. SVM was embodied with scikit-learn. Since Acer mono sap exudation was predicted in high dimensions with seven parameters, the RBF kernel that was efficient even in high dimensions was used to search for an optimal model [36][37][38][39][40][41][42][43][44][45][46].

Optimization of SVM Model
In SVM, the RBF kernel optimizes a model by regulating gamma and the curvature of the boundary decision according to the influential distance of a data sample. A model was optimized by regulating C (cost) and thus the possibilities of data outliers. Table 5 shows accuracy according to gamma and C values. When C was 0.01 or lower, too many outliers were allowed, which resulted in underfitting. When gamma was lower than 0.001, underfitting happened in which accurate predictions would be impossible due to the increasing overall influence of data samples. When gamma grew to 1 or higher, the sample had smaller influence and resulted in overfitting, in which only learning data would be classified optimally.
Overall, high C values led to high accuracy based on the identification of outliers. As data samples had smaller influence, overfitting would happen in which only learning data would be fit with many outliers identified even within the small influence of the samples and lead to lower accuracy.

SVM Optimal Model
The optimal form of SVM model was 0.001 for gamma and 100 for C. Its exudation prediction accuracy was expressed in precision, recall, and accuracy for accurate testing. This method is shown TP represents true positive; FP false positive; FN false negative; and TN true negative. Precision is the percentage of correct ones of predicted values. Recall is the percentage of correct ones of actual values. Accuracy is the overall accuracy of predicted data. Table 6 and Figure 17 shows the prediction accuracy of the optimal SVM model. When the volume of exudation was 0 L, recall was 0.998. The error rate of recall was 0.002 with 101 errors when wrong predicted values were used for the actual value of 0 L. Precision was 0.988. Its error rate was 0.012 with 520 errors when a wrong predicted value of 0 L was used for an actual value that was not 0 L. Precision errors were too many for the support of other exudation data, holding the risk of overfitting toward 0 L. There was a relatively small amount of learning data in the section of 1 L∼9 L with both precision and recall recording mean 0.8 or so. In the section of 10 L∼30 L where there were a lot of data, precision and recall were close to approximately 0.9 in accuracy. As the volume of data decreased in the following sections, precision and recall dropped. Accuracy will grow according to increasing data. The more data there is, the better the outcomes will come out. The current exudation prediction was analyzed in Figure 18, but the outcomes spread wide from the red line according to more data, which suggests that predicted values will have errors of bigger range according to more data. This issue can be found in errors of wide range to predict exudation of 0 L~38 L. When the amount of data is small, on the other hand, the outcomes are closer to the red line. Even though they are not correct ones, predicted and actual values will have similar measurements. The outcomes of the optimal SVM model indicate that learning data will have broad influence due to lower gamma values and that outliers will be identified within the influence due to high C values. This process obtains high results for prediction accuracy, but a big volume of data means bigger influence, which leads to errors in a wider range including the scope of partial outliers with no connections.

Artificial Neural Network Model
ANN (artificial neural network) was chosen as a prediction model in the study since it could make an approximation function from the data used in learning and thus a proper Acer mono sap exudation prediction model. In the present study, ANN was embodied with TensorFlow. In the model, errors were reduced with a cross-entropy function. With the activation function of ReLu (rectified linear unit), the learning rate was set at 0.001 in the learning process to predict the scope of Acer mono sap exudation [47][48][49][50][51][52][53][54][55][56].

Artificial Neural Network Model
ANN (artificial neural network) was chosen as a prediction model in the study since it could make an approximation function from the data used in learning and thus a proper Acer mono sap exudation prediction model. In the present study, ANN was embodied with TensorFlow. In the model, errors were reduced with a cross-entropy function. With the activation function of ReLu (rectified linear unit), the learning rate was set at 0.001 in the learning process to predict the scope of Acer mono sap exudation [47][48][49][50][51][52][53][54][55][56].

Optimization of ANN Model
For model optimization, the middle layer was comprised of multi-layer and deep neural networks with a different number of nodes for each layer as shown in Table 7 and Figure 19. Models A and B are multi-layer neural networks with differences in the number of nodes around a single middle layer. Models C and D are deep neural networks with differences in the number of nodes around five middle layers. These ANN models were examined for accuracy according to the frequency of learning. Table 8 shows prediction accuracy according to the frequency of learning, which includes 1000, 10,000, and 100,000. The models recorded higher accuracy according to increasing learning, but overfitting happened faster according to increasing volume of learning and complexity of models. Models C and D, in particular, recorded the highest accuracy at the learning frequency of 100,000, but they were unstable as their accuracy made a huge drop in the re-testing process three times. Table 7. Configuration of ANN model. A  B  C  D   1  5  20  5  20  2  --6  26  3  --7  28  4  --6  18  5 -- 5 15 frequency of learning. Table 8 shows prediction accuracy according to the frequency of learning, which includes 1000, 10,000, and 100,000. The models recorded higher accuracy according to increasing learning, but overfitting happened faster according to increasing volume of learning and complexity of models. Models C and D, in particular, recorded the highest accuracy at the learning frequency of 100,000, but they were unstable as their accuracy made a huge drop in the re-testing process three times.

ANN Optimal Model
In ANN, the optimal model was B with the learning frequency of 100,000. Model D had higher accuracy than Model B, but it was unstable as its accuracy went down to 0.44 in the testing process. Being relatively more stable, Model B was chosen as the optimal model. Table 9 shows the prediction accuracy of the optimal ANN model, which recorded high precision and recall values at 1.0 for 0 L of exudation. Table 9 shows the results of rounding off at four decimal places with recall and precision having 15 and two errors for 0 L, respectively. The overall data accuracy was very high at 0.9 or higher. Accuracy was also relatively high even in the section of 47 L~59 L where the amount of data was small. Recall was in the range of 0.6~0.8 for some volumes of exudation, in which phenomenon was estimated to drive from errors based on an incorrect prediction with approximate values as the following volume of exudation had low precision and high recall. Figure 20 shows the current prediction of exudation. At 0 L, there were 15 recall errors. It was the highest at nine for 1 L as an approximate value, being followed by two for 7 L, and one for 17 L, 18 L, 30 L, and 31 L each. There were not many errors other than approximate values, but they were in wide breadth and diversity. Overall predictions were close to the red line, which means overall similarity between actual and predicted values. Some data, however, contained big errors far distant (±7) from the red line with a total of 29 big errors including

Random Forest Model
Random forest is an ensemble technique and was selected as an Acer mono sap prediction model in the present study for its possible prediction of greater reliability than a single optimal model. In the present study, random forest was embodied with scikit-learn [23][24][25][57][58][59][60][61].

Optimization of Random Forest Model
To find an optimal random forest model, the present study regulated the number of models (n_estimators) and that of independent variables (max_features) from the data. The other hyperparameters were kept in the default state during the comparison of models. Table 10 shows the accuracy of random forest models by the hyper-parameter. The bigger the number of independent variables was, the higher accuracy became. The accuracy was the highest when there were a maximum of five independent variables. When the number hit six, however, accuracy dropped a little bit. As the number of models increased, overall accuracy made a small increase as well.

Random Forest Model
Random forest is an ensemble technique and was selected as an Acer mono sap prediction model in the present study for its possible prediction of greater reliability than a single optimal model. In the present study, random forest was embodied with scikit-learn [23][24][25][57][58][59][60][61].

Optimization of Random Forest Model
To find an optimal random forest model, the present study regulated the number of models (n_estimators) and that of independent variables (max_features) from the data. The other hyper-parameters were kept in the default state during the comparison of models. Table 10 shows the accuracy of random forest models by the hyper-parameter. The bigger the number of independent variables was, the higher accuracy became. The accuracy was the highest when there were a maximum of five independent variables. When the number hit six, however, accuracy dropped a little bit. As the number of models increased, overall accuracy made a small increase as well.  Table 11 and Figure 21 show the prediction accuracy of the optimal model. When the volume of exudation was 0 L, the precision was rounded off to 1.0 with six errors and recall was 0.997 with 122 errors. The big number of recall errors at 0 L had impacts on the overall precision of data, but accuracy was high at 0.9 or higher for most volumes of exudation with stable prediction results. The more data there was, the higher accuracy became. Overall accuracy was low in the section of 50 L~59 L where the pieces of learning data were under 100 per liter. Figure 22 shows the current prediction of random forest for the volume of exudation. At 0 L, there were six precision errors with an approximate value at 1 L. There were 122 recall errors, but it was reduced to a total of 107 after the ones for the approximate value of 1 L were removed. The biggest difference in errors was from the minimum 9 L to maximum 35 L. There were 36 errors for 9 L~19 L, 55 for 20 L~29 L, and 16 for 30 L~35 L. There were a total of 55 errors with a difference of ±2 L or more other than approximate values in addition to 0 L. Errors of a difference of 2 L were the most at 37. There were five with a difference 3 L, six of 4 L, one of 5 L, two of 6 L, one of 7 L, and three of 8 L. There was a big difference between actual and predicted values in about ten cases other than 0 L. In the section of 50 L∼59 L characterized by a small amount of data, most of the values were close to the approximate value (±1) of the correct value contrary to the concern with low prediction accuracy. Only six data points had a difference of ±2 among a total of 115 supports.

Comparative Evaluation of Optimal Models Between Predicted Models
The optimal models of linear regression, SVM, ANN and random forest algorithms were compared in learning time, prediction time, and accuracy to select one applicable to a mobile app. Learning time represents time required for a model to learn data. It was chosen as a criterion of evaluation for the expandability of a model for additional learning with data collected from Acer mono sap collection devices. Prediction time represents time required for a model to predict Acer mono sap exudation. It was added as a criterion to take into account the time until the outcomes are reflected when users check exudation prediction with a mobile app. Accuracy represents the degree of match between predicted exudation by a model and actual exudation. It was added as another criterion of evaluation to reflect how accurate outcomes can be delivered to users when they check predicted exudation on a mobile app. Table 12 shows the optimal models in learning time, prediction time, and accuracy.

Comparative Evaluation of Optimal Models Between Predicted Models
The optimal models of linear regression, SVM, ANN and random forest algorithms were compared in learning time, prediction time, and accuracy to select one applicable to a mobile app. Learning time represents time required for a model to learn data. It was chosen as a criterion of evaluation for the expandability of a model for additional learning with data collected from Acer mono sap collection devices. Prediction time represents time required for a model to predict Acer mono sap exudation. It was added as a criterion to take into account the time until the outcomes are reflected when users check exudation prediction with a mobile app. Accuracy represents the degree of match between predicted exudation by a model and actual exudation. It was added as another criterion of evaluation to reflect how accurate outcomes can be delivered to users when they check predicted exudation on a mobile app. Table 12 shows the optimal models in learning time, prediction time, and accuracy. It was the linear regression algorithm that recorded the shortest learning time for a model to learn 306,648 bits of data. The linear regression algorithm calculated weighted values of parameters from learning data to make a linear equation, thus creating a learning model within a short amount of learning time. The SVM algorithm recorded relatively longer learning time as it created a discriminant boundary by selecting support vectors based on the mapping of data in a characteristic space. The ANN algorithm recorded a long learning time as increasing amounts of learning meant higher accuracy for a model. Even though GPU was used to shorten the long learning time, it still recorded the longest time of learning. The random forest algorithm took a short time by doing a relatively simple work of creating random models with bootstrapping to provide outcomes with an ensemble technique. The linear regression and ANN algorithms recorded the shortest prediction time for the test data of 102,216. These two algorithms used weighted values based on learning and underwent a calculation process to provide outcomes, thus recording a short prediction time of less than a second. The SVM algorithm, on the other hand, recorded the longest prediction time as it distinguished data and produced outcomes with a discriminant boundary and did mapping for the test data in the same characteristic space as the learning form. The random forest algorithm recorded a relatively longer prediction time as its final outcomes were based on a majority voting for the results of models(trees) according to the ensemble technique. The random forest algorithm recorded the highest prediction accuracy. In the current predictions of sap exudation by the algorithm (Figures 16, 18, 20 and 22), the algorithm showed the most stable form that was the narrowest to the red line and thus recorded the highest prediction accuracy. The linear regression algorithm recorded very low prediction accuracy of 0.404 even after the compensation of its prediction outcomes for its comparison with other algorithms by converting real numbers into integers and removing negative number predictions with minimum value limited to 0. Both the SVM and ANN algorithms recorded high accuracy, but their exudation predictions were relatively wide along the red line compared with the random forest algorithm, thus leaving room for improvement. The linear regression, SVM, ANN, and random forest algorithms were compared in learning time, prediction time, and accuracy. The linear regression algorithm recorded a short learning and prediction time, but its accuracy was very low, which made it an unfit algorithm for a mobile app to predict sap exudation. The SVM algorithm recorded the highest accuracy, but its learning and prediction was slow. It took a long prediction time due to mapping in a characteristic space even when it used a small amount of test data. The SVM model was thus not fit for a mobile app. The ANN algorithm was slow in learning, but it can be resolved with improved GPU. With its short prediction time and high accuracy, it seems like a fit model to predict sap exudation on a mobile app. The random forest algorithm recorded the highest accuracy and most stable prediction of the models. Its learning time was also short compared with the other models except for linear regression. Its prediction rate was slower than other models, but it recorded as short prediction time as the other models for the data amount of approximately 10,000. The rate issue can be resolved with CPU clock improvement. The random forest model was the fittest of the models to predict sap exudation on a mobile app.

Conclusions
The present study made an Acer mono sap collection device and invented a mobile app for farm managers to check predicted Acer mono sap exudation in real time based on the analysis of data about environmental factors including exudation, outdoor temperature, humidity, conductivity, and wind direction and velocity collected from such a device.
Based on the assumption that Acer mono sap exudation would depend on the environment of Acer mono trees, the study designed prediction models for Acer mono sap exudation with linear regression, SVM, ANN, and random forest algorithms. All the algorithms recorded high prediction accuracy except for linear regression, which confirms the assumption that Acer mono sap exudation would be determined by the surrounding environment. These models were also compared in learning time, prediction time, and accuracy, and the random forest model was chosen to be applicable for a mobile app.
A follow-up study will examine clearer correlations between Acer mono sap exudation and environmental information and design a new algorithm by gathering more data based on the findings of the present study and resolving the data imbalance issue. If the data imbalance issue is not resolved due to climate characteristics, a new approach will be proposed to combine ANN and random forest in an ensemble technique and address the overfitting issue of ANN and the error numbers at 0 L of random forest with the disadvantages of the two algorithms supplemented.