# Predicting High or Low Transfer Efficiency of Photovoltaic Systems Using a Novel Hybrid Methodology Combining Rough Set Theory, Data Envelopment Analysis and Genetic Programming

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. An Overview of PV System

**Figure 1.**A diagram of a PV system [21].

#### 2.1. PV System

#### 2.2. Factors Influencing PV Systems

#### 2.3. Evaluating PV System Transfer Efficiency

## 3. Using DEA to Determine Efficiencies

_{o}($o=1,2,\mathrm{...},n$) be the DMU whose relative efficiency is to be maximized. The DEA model is displayed as LP as follows:

_{o}. Obviously, the maximum value (efficiency score), ${h}_{o}$, cannot exceed 1. If ${h}_{o}=1$, the DMU

_{o}is called the constant returns to scale (CRS) frontier [30]. There are two CCR models in practice. One minimizes input variables, and the other maximizes output variables. In this work, in order to obtain maximum energy efficiency, the maximized output variables of the CCR model are utilized to obtain the optimal value for the objective function, ${h}_{o}$.

## 4. Rough Set Theory and Genetic Programming

#### 4.1. Basic Concepts of Rough Set Theory

_{1},x

_{2},…,x

_{n}}), R is a finite set of attributes (features and variables), $V=\underset{r\in R}{\cup}{V}_{r}$, where V

_{r}is the domain of attribute r, and $f:U\times R\to V$ is an information function such that $f\left(x,r\right)\in {V}_{r}$ for all $x\in U$ and $r\in R$. In RST, highly accurate good-quality approximations are very important when extracting decision rules. Let $P\subseteq R$ and $X\subseteq U$, the lower approximation of X in S by P is denoted as $\underset{\_}{P}X$, and the upper approximation of X in S by P is denoted as $\overline{P}X$ and are derived as follows:

#### 4.2. Genetic Programming

## 5. The Proposed Hybrid Prediction Model

Items | Content |
---|---|

Population size | 400 |

Maximum number of generation | 1000 |

Function set | +, −, ×, ÷, sin, cos, exp, log constant |

Crossover rate | 0.8 |

Mutation rate | 0.02 |

## 6. Empirical Analysis

Variables | Description | Importance (obtained from RST) |
---|---|---|

X_{1} | Texture type | 0.6424 |

X_{2} | The output power of inverter | 0.5715 |

X_{3} | The selection of PV module | 0.4817 |

X_{4} | The number of inverter | 0.3914 |

X_{5} | The weights of PV module | 0.3367 |

X_{6} | The selection of inverter | 0.2893 |

X_{7} | PV module capacity | 0.2567 |

X_{8} | The selection of DC voltage | 0.2638 |

X_{9} | The location of PV setting | 0.2476 |

X_{10} | DMU (obtained from DEA) | － |

_{1}–X

_{9}) are significant (Table 2) because that the importance value of nine independent variables are greater than 0.2. It has not a clear criterion to determine the threshold value (importance value). Moreover, the nine independent variables (X

_{1}–X

_{9}) have high correlation to output variable (the low or high transfer efficiencies of PV systems). The correlation coefficient are greater than 0.6. Also, based on the opinion of experts in PV energy in Taiwan, these nine variables importantly influence for the transfer efficiency of PV systems.

_{10}). In applying DEA, input variables of DEA are the nine significant variables obtained in Step 2 and the output variable of DEA is PV system transfer efficiency. The DEA algorithm can be executed by LINGO software. Table 3 lists the DMU values of the PV systems. In Step 4, the significant independent variables obtained in Step 2 and DMU obtained in Step 3 are utilized as input variables for GP to predict the high or low level of PV system transfer efficiency. To demonstrate the effectiveness of the proposed hybrid model, some basic classification models such as K Nearest Neighbor (KNN), Naive Bayes (NB), SVM, ANN, and GP are utilized as benchmark models. The basic classification models belong to data-mining techniques and can obtain better prediction performance than traditional linear statistical method (e.g., linear regression) [8,10].

No | DMU | No | DMU |
---|---|---|---|

PV001 | 1.0000 | PV023 | 0.7735 |

PV002 | 0.9482 | PV024 | 0.8059 |

PV003 | 0.9879 | PV025 | 1.0000 |

PV004 | 0.8392 | PV026 | 1.0000 |

PV005 | 1.0000 | PV027 | 1.0000 |

PV006 | 1.0000 | PV028 | 1.0000 |

PV007 | 1.0000 | PV029 | 1.0000 |

PV008 | 1.0000 | PV030 | 0.6981 |

PV009 | 1.0000 | PV031 | 0.6417 |

PV010 | 0.6902 | PV032 | 0.6608 |

PV011 | 0.9215 | PV033 | 0.4919 |

PV012 | 0.5153 | PV034 | 1.0000 |

PV013 | 0.4955 | PV035 | 0.8274 |

PV014 | 0.9667 | PV036 | 0.4947 |

PV015 | 0.7484 | PV037 | 0.8405 |

PV016 | 1.0000 | PV038 | 0.9944 |

PV017 | 0.6144 | ||

PV018 | 0.8630 | ||

PV019 | 1.0000 | ||

PV020 | 0.8630 | ||

PV021 | 1.0000 | ||

PV022 | 0.8832 |

_{1}–X

_{9}, as the input variables for GP (model II). In both models I and II, this work adopts leave-one-out cross validation to test the accuracy of the prediction model.

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 22 (95.65%) | 1 (4.35%) |

2 (Low-Level) | 2 (13.33%) | 13 (86.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 21 (91.30%) | 2 (8.70%) |

2 (Low-Level) | 4 (26.67%) | 11 (73.33%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 20 (86.96%) | 3 (13.04%) |

2 (Low-Level) | 4 (26.67%) | 11 (73.33%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 20 (86.96%) | 3 (13.04%) |

2 (Low-Level) | 5 (33.33%) | 10 (66.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 21 (91.30%) | 2 (8.70%) |

2 (Low-Level) | 5 (33.37%) | 10 (66.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 19 (82.61%) | 4 (17.39%) |

2 (Low-Level) | 5 (33.33%) | 10 (66.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 19 (82.61%) | 4 (17.39%) |

2 (Low-Level) | 5 (33.33%) | 10 (66.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 18 (78.26%) | 5 (21.74%) |

2 (Low-Level) | 5 (33.33%) | 10 (66.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 19 (82.61%) | 4 (17.39%) |

2 (Low-Level) | 5 (33.33%) | 10 (66.67%) |

Actual class | Classified class | |
---|---|---|

1 (High-level) | 2 (Low-level) | |

1 (High-Level) | 19 (82.61%) | 4 (17.39%) |

2 (Low-Level) | 6 (40%) | 9 (60%) |

Model | Computational time |
---|---|

Model I (RST-DEA-GP) | 65.13 |

Model II (RST-GP) | 62.34 |

Model III (RST-DEA-SVM) | 60.17 |

Model IV (RST-SVM) | 56.49 |

Model V (RST-DEA-ANN) | 63.28 |

Model VI (RST-ANN) | 60.67 |

Model VII (RST-DEA-KNN) | 55.23 |

Model VIII (RST-KNN) | 53.28 |

Model IX (RST-DEA-NB) | 54.87 |

Model X (RST-NB) | 52.81 |

Model XI (GP) | 51.78 |

Model XII (SVM) | 50.46 |

Model XIII (ANN) | 51.39 |

Model XIV (KNN) | 48.23 |

Model XV (NB) | 46.26 |

## 7. Conclusions

## References

- Bureau of Energy, Ministry of Economic. Available online: http://www.moeaboe.gov.tw (accessed on 13 February 2012).
- Industrial Technology Research Institute. Available online: http://www.solar.org.tw/aboutus/sense/battery.asp (accessed on 13 February 2012).
- Zhou, X.; Liu, K.Y.; Wong, S.T.C. Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomed. Inf.
**2004**, 37, 249–259. [Google Scholar] [CrossRef] - Worth, A.P.; Cronin, M.T.D. The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects. J. Mol. Struct. Theochem.
**2003**, 622, 97–111. [Google Scholar] [CrossRef] - Kurt, I.; Ture, M.; Kurum, A.T. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl.
**2008**, 34, 366–374. [Google Scholar] [CrossRef] - Huang, Z.; Chen, H.; Hsu, C.J.; Chen, W.H.; Wu, S. Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis. Support Syst.
**2004**, 37, 543–558. [Google Scholar] [CrossRef] - Luts, J.; Ojeda, F.; Plas, R.V.D.; Moor, B.D.; Huffel, S.V.; Suykens, J.A.K. A tutorial on support vector machine-based methods for classification problems in chemometrics. Anal. Chim. Acta
**2010**, 665, 129–145. [Google Scholar] [CrossRef] [PubMed] - Ong, C.S.; Huang, J.J.; Tzeng, G.H. Building credit scoring models using genetic programming. Expert Syst. Appl.
**2005**, 29, 41–47. [Google Scholar] [CrossRef] - Nath, R.; Rajagopalan, B.; Ryker, R. Determining the saliency of input neural classifiers. Comput. Oper. Res.
**1997**, 24, 767–773. [Google Scholar] [CrossRef] - Lee, D.G.; Lee, B.W.; Chang, S.H. Genetic programming model for long-term forecasting of electric power demand. Electr. Power Syst. Res.
**1997**, 40, 17–22. [Google Scholar] [CrossRef] - Muttil, N.; Lee, J.H.W. Genetic programming for analysis and real-time prediction of coastal algal blooms. Ecol. Model.
**2005**, 189, 363–376. [Google Scholar] [CrossRef] - Liong, S.Y.; Gautam, T.R.; Khu, S.T.; Babovic, V.; Muttil, N. Genetic Programming: a new paradigm in rainfall-runoff modelling. J. Am. Water Res. Assoc.
**2002**, 38, 557–584. [Google Scholar] - Zhang, Y.; Bhattacharyya, S. Genetic Programming in classifying large-scale data: an ensemble method. Inf. Sci.
**2004**, 163, 85–101. [Google Scholar] [CrossRef] - Ang, B.W. Monitoring changes in economy-wide energy efficiency: From energy-GDP ratio to composite efficiency index. Energy Policy
**2006**, 34, 574–582. [Google Scholar] [CrossRef] - Boyd, J.X.; Pang, T.G. Estimating the linkage between energy efficiency and productivity. Energy Policy
**2000**, 28, 289–296. [Google Scholar] [CrossRef] - Hu, J.L.; Kao, C.H. Efficiency energy-saving targets for APEC economies. Energy Policy
**2007**, 35, 373–382. [Google Scholar] [CrossRef] - Pawlak, Z. Rough sets and intelligent data analysis. Inf. Sci.
**2002**, 147, 1–12. [Google Scholar] [CrossRef] - Ahn, B.S.; Cho, S.S.; Kim, C.Y. The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Syst. Appl.
**2000**, 18, 65–74. [Google Scholar] [CrossRef] - Leung, Y.; Fischer, M.M.; Wu, W.Z.; Mi, J.S. A rough set approach for the discovery of classification rules in interval-valued information systems. Int. J. Approx. Reason.
**2008**, 47, 233–246. [Google Scholar] [CrossRef] - Dembczynski, K.; Greco, S.; Slowinski, R. Rough set approach to multiple criteria classification with imprecise evaluations and assignments. Eur. J. Oper. Res.
**2009**, 198, 626–636. [Google Scholar] [CrossRef] - A Guide to Photovoltaic (PV) System Design and Installation. Available online: http://www.energy.ca.gov/reports/2001-09-04_500-01-020.PDF (accessed on 13 February 2012).
- Gregg, A.; Parker, T.; Swenson, R. A “real world” examination of PV system design and performance. In Proceeding of the IEEE Photovoltaic Specialists Conference, Austin, TX, USA, June 2005; pp. 1587–1592.
- Tsai, H.C.; Chen, C.M.; Tzeng, G.H. The comparative productivity efficiency for global telecoms. Int. J. Prod. Econ.
**2006**, 103, 509–526. [Google Scholar] [CrossRef] - Guo, P.; Tanaka, H. Fuzzy DEA: a perceptual evaluation method. Fuzzy Sets Syst.
**2001**, 119, 149–160. [Google Scholar] [CrossRef] - Wu, D.; Yang, Z.; Liang, L. Using DEA-neural network approach to evaluate branch efficiency of a large Canadian bank. Expert Syst. Appl.
**2006**, 31, 108–115. [Google Scholar] [CrossRef] - Banker, R.D.; Charnes, A.; Cooper, W.W. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manag. Sci.
**1984**, 30, 1078–1092. [Google Scholar] [CrossRef] - Data Envelopment Analysis: Theory, Methodology and Applications; Charnes, A.; Cooper, W.W.; Lewin, A.Y.; Seiford, L.M. (Eds.) Springer: Boston, MA, USA, 1995.
- Charnes, A.; Cooper, W.W.; Rhodes, E. Measuring the efficiency of decision making units. Eur. J. Oper. Res.
**1978**, 2, 429–444. [Google Scholar] [CrossRef] - Farrell, M.J. The measurement of productive efficiency. J. R. Stat. Soc. Ser. A. Gen.
**1957**, 120, 253–289. [Google Scholar] [CrossRef] - Chen, Y.; Ali, A.I. Output-input ratio analysis and DEA frontier. Eur. J. Oper. Res.
**2002**, 142, 476–479. [Google Scholar] [CrossRef] - Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci.
**1982**, 11, 341–356. [Google Scholar] [CrossRef] - Shyng, J.Y.; Wang, F.K.; Tzeng, G.H.; Wu, K.S. Rough set theory in analyzing the attributes of combination values for the Insurance Market. Expert Syst. Appl.
**2007**, 32, 56–64. [Google Scholar] [CrossRef] - Swiniarski, R.W.; Skowron, A. Rough set methods in feature selection and recognition. Pattern Recogn. Lett.
**2003**, 24, 833–849. [Google Scholar] [CrossRef] - Zhai, L.Y.; Khoo, L.P.; Fok, S.C. Feature extraction using rough set theory and genetic algorithms an application for the simplification of product quality evaluation. Comput. Ind. Eng.
**2002**, 43, 661–676. [Google Scholar] [CrossRef] - Walczak, B.; Massart, D.L. Tutorial rough sets theory. Chemom. Intell. Lab. Syst.
**1999**, 47, 1–16. [Google Scholar] [CrossRef] - Yeh, C.C.; Chi, D.J.; Hsu, M.F. A hybrid approach of DEA, rough set and support vector machines for business failure prediction. Expert Syst. Appl.
**2010**, 37, 1535–1541. [Google Scholar] [CrossRef] - Wen, K.L.; Wang, C.W.; Yeh, C.K. Apply rough set and GM (h,N) model to analyze the influence factor in gas breakdown. In Proceeding of IEEE International Conference on Systems, Man, and Cybernetics Society, London, UK, April 2007; pp. 2771–2775.
- Li, G.D.; Yamaguchi, D.; Nagai, M. A grey-based rough decision-making approach to supplier selection. Int. J. Adv. Manuf. Technol.
**2008**, 1032–1040. [Google Scholar] [CrossRef] - Thangavel, K.; Pethalakshmi, A. Dimensionality reduction based on rough set theory: A review. Appl. Soft Comput.
**2009**, 9, 1–12. [Google Scholar] [CrossRef] - Koza, J. Genetic Programming: On the Programming of Computers by Natural Selection; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
- Davidson, J.W.; Savic, D.A.; Walters, G.A. Symbolic and numerical regression: Experiments and applications. Inf. Sci.
**2003**, 150, 95–117. [Google Scholar] [CrossRef] - Lee, Y.S.; Tong, L.I. Forecasting energy consumption using a grey model improved by incorporating genetic programming. Energy Convers. Manag.
**2011**, 52, 147–152. [Google Scholar] [CrossRef] - Huang, J.J.; Tzeng, G.H.; Ong, C.S. Two-stage genetic programming (2SGP) for the credit scoring model. Appl. Math. Comput.
**2006**, 174, 1039–1053. [Google Scholar] [CrossRef] - Wen, K.L.; Nagai, M.; Chang, T.C.; Wen, H.C. An Introduction to Rough Set Theory and Application; Wu-Nan Book Co. Ltd.: Taipei, Taiwan, 2008. [Google Scholar]
- Komorowski, K.; Ohrn, A.; Skowron, A. The ROSETTA rough set software system. In Handbook of Data Mining and Knowledge Discovery; Klosgen, W., Zytkow, J., Eds.; Oxford University Press: New York, NY, USA, 2002. [Google Scholar]
- Pai, P.F.; Lin, C.S. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega
**2005**, 33, 497–505. [Google Scholar] [CrossRef] - Chen, K.Y.; Wang, C.H. A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan. Expert Syst. Appl.
**2007**, 32, 254–264. [Google Scholar] [CrossRef] - Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signal. Syst.
**1989**, 2, 303–314. [Google Scholar] [CrossRef]

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lee, Y.-S.; Tong, L.-I.
Predicting High or Low Transfer Efficiency of Photovoltaic Systems Using a Novel Hybrid Methodology Combining Rough Set Theory, Data Envelopment Analysis and Genetic Programming. *Energies* **2012**, *5*, 545-560.
https://doi.org/10.3390/en5030545

**AMA Style**

Lee Y-S, Tong L-I.
Predicting High or Low Transfer Efficiency of Photovoltaic Systems Using a Novel Hybrid Methodology Combining Rough Set Theory, Data Envelopment Analysis and Genetic Programming. *Energies*. 2012; 5(3):545-560.
https://doi.org/10.3390/en5030545

**Chicago/Turabian Style**

Lee, Yi-Shian, and Lee-Ing Tong.
2012. "Predicting High or Low Transfer Efficiency of Photovoltaic Systems Using a Novel Hybrid Methodology Combining Rough Set Theory, Data Envelopment Analysis and Genetic Programming" *Energies* 5, no. 3: 545-560.
https://doi.org/10.3390/en5030545