A Critical Review of the Modeling and Optimization of Combined Heat and Power Dispatch

: Combined heat and power (CHP) systems are attracting increasing attention for their ability to improve the economics and sustainability of the electricity system. Determining how to best operate these systems is difﬁcult because they can consist of many generating units whose operation is governed by complex nonlinear physics. Mathematical programming is a useful tool to support the operation of CHP systems, and has been the subject of substantial research attention since the early 1990s. This paper critically reviews the modeling and optimization work that has been done on the CHP economic dispatch problem, and the CHP economic and emission dispatch problem. A summary of the common models used for these problems is provided, along with comments on future modeling work that would beneﬁcial to the ﬁeld. The majority of optimization approaches studied for CHP system operation are metaheuristic algorithms. A discussion of the limitations and beneﬁts of metaheuristic algorithms is given. Finally, a case study optimizing ﬁve classic CHP system test instances demonstrates the advantages of the using deterministic global search algorithms over metaheuristic search algorithms. the inconsistent periods associated with the operation of steam admission valves. We show that a simple piecewise-linear function has the potential to describe valve-point effects more accurately, while introducing less complexity to the CHP dispatch model. A study that validates the relationships describing CHP system operation by comparison with industrial CHP system data would be beneﬁcial to the ﬁeld. Once the relationships describing CHP system operation were validated, the choice of how to model these relationships can be made. Each relationship should be modeled with both accuracy


Combined Heat and Power Background
Conventional centralized power generation using fossil fuels is highly inefficient due to the accumulation of losses from the power plant to the end user. The average net thermal efficiency of natural gas, coal, and petroleum power plants in the US for 2018 were 43.6%, 32.5%, and 30.8%, respectively [1]. In comparison to other countries, the US has historically been roughly average in terms of power generation efficiency, with many developing countries having far lower efficiencies [2]. Once the electricity is generated at the power plant it still needs to be transmitted and distributed to the end users, resulting in an estimated loss of 5% of all electricity delivered [3].
The climate change crisis has created the need for the rapid penetration of renewable energy sources into the electricity grid, but with 64.5% of global electricity production being fueled by fossil fuels in 2017, up from 63.1% in 1990, the global adoption of renewables is not occurring rapidly enough to offset the increased reliance on fossil fuels. The issue is worsened by the fact that global electricity consumption has risen 117% over the same time period [4]. By all indications fossils fuels will play a major role in global electricity production for decades [5]. Accordingly, economically viable methods to reduce the environmental damage caused by fossil fuel electricity production are needed to ease the transition towards a sustainable global electricity system.
The major source of inefficiency in fossil fuel power generation is in the conversion from thermal energy to mechanical energy, or converting heat to work through a turbine [6]. The majority of heat entering the turbine is not converted to work, but rather leaves the system as a stream of waste heat that cannot efficiently be converted to useful work for electricity generation [7]. The waste heat stream has value in applications such as space heating, or thermally activated technologies such as absorption chillers. With centralized power plants often being located many kilometers away from consumers, there is no economically viable way of providing the waste heat to such consumers. Instead, the waste heat is commonly sent to the atmosphere and most consumers heat and power demands are met separately by receiving electricity from the utility, and also having on-site heating and cooling equipment.
Considering the waste of valuable heat from centralized electricity production and the losses arising from long-distance electrical transmission, the benefit of distributed power generation becomes clear. In a decentralized grid the power is generated in close proximity to the consumer, enabling easy delivery of the useful heat to the consumer for heating and/or cooling. Systems which meet consumer heating and power needs simultaneously from the same energy source are known as cogeneration systems, or combined heat and power (CHP) systems. Introducing thermally activated technologies into the CHP system, such as absorption or adsorption chillers, enables the same energy source to meet the cooling demand of the consumer, evolving the CHP system into a combined cooling, heating and power system (CCHP), or trigeneration system [8]. Modern CHP and CCHP systems can achieve thermal efficiencies of over 90%, enabling an overall reduction of environmental consequences from conventional fossil fuel energy sources [9].
Although the concept and necessary technology for decentralized cogeneration has been around for decades [10,11] without it becoming ubiquitous, the heightened social and economic interest in a sustainable electricity system has seen it garner substantial research and industrial attention in recent years [12]. In many situations, the economic advantages of cogeneration alone make it preferred over separate thermal and electrical energy production. In other situations, the costs of non-cogeneration and cogeneration systems are similar, but the environmental benefits of cogeneration make it preferred [13].

Optimal Combined Heat and Power Dispatch
To take full advantage of the economic and environmental advantages of CHP systems, their operation needs to be optimized. CHP systems can contain multiple power-only units, cogeneration units, and heat-only units, all with unique physical limitations and performance characteristics. As a result, it is nontrivial determining how to dispatch the available generating units to meet a given heat and power demand, and especially challenging to determine the most cost-effective and environmentally friendly dispatch. The CHP economic dispatch (CHP-ED) problem is to determine the most cost effective use of the available generating units to meet consumer heat and power demands [14]. In addition to cost, CHP system operators often consider the emissions required to meet consumer demands, resulting in the dispatch problem becoming multi-objective. We will refer to this extension to multi-objective optimization (MOO) as the CHP economic and emission dispatch (CHP-EED) problem [15].
In this paper, we summarize the modeling and optimization work that has been done on the CHP dispatch problems to-date, and from the process systems engineering perspective comment on areas that require further investigation. This paper is distinct from previous CHP dispatch review papers because we critically assess the adequacy of the modeling and optimization approaches investigated to-date through comparison to approaches used outside of the CHP dispatch literature. We also provide some preliminary investigation into alternative modeling and optimization approaches for the CHP dispatch problems which show potential. This is in contrast with other CHP dispatch review papers which focused almost exclusively on comparing methodologies and results from within the CHP dispatch literature.
There has been substantial research on the CHP dispatch problem since the early 1990s [16], with hundreds of papers exploring improved methods of solving the optimal CHP dispatch problems. The vast majority of methods focused on metaheuristic optimization approaches, with seemingly endless creativity in using naturally observable phenomena as justification for the merit of new solution approaches. The well-established metaheuristic methods were thoroughly applied to the problem in the years prior to 2010, such as ant colony search [17], the genetic algorithm [18][19][20], and particle swarm optimization [21,22]. In the years since these pioneering works, the imaginativeness of researchers has continued to flourish. New metaheuristic methods have been applied to the CHP dispatch problem such as cuckoo search [23][24][25][26], invasive weed optimization [27], the artificial immune system algorithm [28], the firefly algorithm [29], krill herd optimization [30], the harmony search algorithm [31][32][33][34], grey wolf optimization [35,36], whale optimization [37], and the hybridizing bat algorithm with artificial bee colony [38]. This is by no means an exhaustive list of metaheuristic approaches investigated.
Despite the study of optimal CHP dispatch being mature in the sense that there has been an abundance of research conducted over several decades, from the process systems engineering perspective there exists important questions that have not been answered in the current literature. The first question is concerned with the widely used CHP dispatch model. In all of the aforementioned studies the CHP dispatch model used has been largely the same. This is of concern because to our knowledge there exists no literature that thoroughly validates the model components for their accurate representation of the actual CHP system physics. Furthermore, over recent decades, novel modeling techniques have been developed which can substantially aid the optimization of process systems. For instance, surrogate models such as Kriging [39][40][41][42][43][44], radial basis functions [45][46][47][48][49][50], artificial neural networks [51][52][53][54][55][56], splines [57,58], among others were shown to accurately represent complex physical systems while aiding optimal search algorithms. No literature exists which explores the application of such techniques to advance the study of CHP dispatch. A thorough understanding of the actual CHP system physics should be established, and it should be modeled with consideration of accuracy and computational simplicity in mind.
The second research question is concerned with the application of deterministic global optimization techniques for solving the CHP dispatch problems. Deterministic global optimization refers to numerical optimization methods which search for solution of the problem within some predefined tolerance of the global optimum. Upon convergence, deterministic global optimization methods provide theoretical guarantees that the solution returned is within the tolerance of the global optimum, or the problem is infeasible. Methods of rigorously bounding function values from above and below over the search space are central to deterministic global optimization. The global search converges when the difference between the bounds on the objective value, known as the optimality gap, is within the predefined tolerance. These optimization methods are deterministic in nature because they do not need to use stochastic rules to explore the search space. Rather, they can rely solely on a sequence of deterministic bounding operations to guide the search to convergence within a finite number of iterations [59,60]. This is in contrast to metaheuristic search algorithms, which are referred to as stochastic global optimization methods. Metaheuristic algorithms rely on stochastic rules to explore the search space because they do not possess global information about the problem in the form of rigorous function bounds. Metaheuristic algorithms do not perform a rigorous global search and cannot provide theoretical guarantees for the solution they converge to. Accordingly, when we refer global optimization throughout this paper, we are referring to deterministic global optimization.
Many researchers justified the need for metaheuristic optimization methods because of the inability of classic optimization techniques to handle the nonconvex nature of the CHP dispatch problems. Basu (2011) states that nonlinear optimization methods cannot handle the nonconvex fuel cost functions of the generating units [61]. Mohammadi et al. (2013) reference the inability of classic gradient-based optimization methods to guarantee finding the global optimum due to the existence of multiple local minima [62]. Jayakumar et al. (2016) state that these methods are impractical because they must approximate the modeling of cost curves which are actually highly nonlinear, non-monotonic, and can contain discontinuities [36]. Beigvand et al. (2016) reference the convergence difficulties of more classical optimization techniques due to the structure of the CHP-ED dispatch problem [63]. Murugan et al. (2018) reference the ineffectiveness of derivative-based optimization approaches at solving nonconvex problems with complex search spaces and computationally expensive cost functions [38]. Zou et al. (2019) state that the classical numerical methods can hardly handle problems which simultaneously have nonconvexity, discontinuity and non-differentiability [64].
While the statements of the above researchers are true for classical deterministic local gradient-based search methods, the ability of global optimization methods to handle the structure of the CHP dispatch problems has seemingly been overlooked. Unlike metaheuristic search methods, tuning of search parameters is not needed to maintain algorithm performance for global optimization methods [65]. This is particularly important when solving the CHP dispatch problems for practical applications where algorithm reliability is of paramount importance. The main drawback of global optimizations methods is their computational complexity, which can limit their application to large-scale nonconvex problems. To investigate the potential for global optimization methods to handle the CHP-ED problem, we apply generic off-the-shelf global solvers in a case study with five classic benchmarking test systems and compare their performance to the metaheuristic approaches review by Nazari-Heris et al. 2018 [66].
The layout of this paper is as follows. Section 2 discusses the modeling of CHP systems, with Section 2.1 introducing the widely used CHP-ED and CHP-EED models, and Section 2.2 discussing potential areas for improving these models. This is followed by Section 3 where solution methods for the CHP-ED and CHP-EED problems are discussed, with Section 3.1 discussing the widely studied metaheuristic solution methods, and the case study in Section 3.2 investigating the application of global optimization algorithms for solving the CHP-ED problem.

Common CHP Dispatch Model
The CHP-ED problem is to determine the heat and power output of some number of cogeneration units, power-only units, and heat-only units, so that operation cost is minimized while the heat and power demand and other physical operating constraints are met. Tables 1-3 define the sets, parameters, and variables used throughout the paper to model CHP dispatch. In the CHP-ED problem all parameters are assumed to be constant and known with certainty. The objective function to minimize is the total operating cost of the generating units, defined as: In the early CHP-ED literature the cost functions of all generating units were represented as quadratic functions of heat and/or power [14,[16][17][18][19][20]67]:   However, it has long been well-known that the valve-point effects on the cost functions of steam power generators cannot be sufficiently described by smooth quadratic functions. The cost function of the power-only units is obtained by collecting operating data as the generator is varied across its operating region. A rippling effects is seen in the cost curve as steam admission valves open intermittently across the operating region, as seen in Figure 1 [68,69]. As a result, in more recent CHP-ED literature its is common to consider a more accurate cost function for power-only units which includes a rectified sinusoidal term, as follows [37,38,62,[70][71][72]: Consideration of valve-point effects introduces nonconvexity into the objective function of the CHP model. Without consideration of valve-point effects, the cost functions are all convex quadratic functions.
In comparison to the CHP-ED problem, the CHP-EED problem has been studied much less. To extend the CHP-ED problem to the CHP-EED problem the total emissions from the CHP dispatch must also be minimized. The following objective for total emissions is widely accepted in the literature addressing the CHP-EED problem [15,66,[73][74][75][76][77]: where the emissions for each unit are defined as: The constraints enforcing the heat and power demand to be met are defined as: The power loss p l variable in Equation (10) is needed to account for the power transmission losses in the system. The total system power loss is a function of the power production of all units, and in CHP dispatch literature is commonly accounted for using the following B-matrix method [62,66]: The physical limitations of the generating units are described through: Equations (15) and (16) capture the interdependence of heat and power production for cogeneration units, with the power and heat limits being dependent on the heat and power output, respectively. Combing these limits yields the two-dimensional feasible operating region of the cogeneration units, The feasible operating region of a set of typical cogeneration units are shown in Figure 2. It can be seen that the feasible operating region of cogeneration units is a potential source of model nonconvexity [17].

Discussion of Common CHP Dispatch Model
For nearly three decades the CHP-ED dispatch model has remained largely unchanged. Looking at the earliest considerations of optimizing the CHP-ED problem of Rooijers and Amerongen (1994) [16] and Guo et al. (1996) [14], and contrasting these to some of the most recent considerations [33,34,37,78,79], the only differences present are consideration of valve-point effects for the power-only units, and power transmission losses. The addition of these terms is not a result of a better understanding of the physical system, as valve-point effects and power transmission losses were understood as early as 1971 [80], and being applied in power economic dispatch problems as early as 1993 [68]. There were substantial technological advances in power plants since 1971, and advances in process system modeling techniques, so it begs the question if some improvements to the CHP dispatch model are necessary. There are hundreds of publications investigating different metaheuristics for solving the CHP-ED problem, but none justified their choice of model used, except to cite earlier metaheuristic research.
In this section, we discuss some areas of CHP dispatch models which require further investigation. Section 2.2.1 addresses the modeling of the fuel cost functions for the generation units. We discuss the lack of validation of the fuel cost functions since their introduction several decades ago. We pay special attention to the modeling of valve-point effects of power-only units. We discuss the computational complexity of the commonly employed rectified sinusoidal method of describing valve-point effects. The question of the applicability of the rectified sinusoidal term is raised by demonstrating the error it contains when predicting real thermal power unit operating data. Section 2.2.2 addresses the combined unit commitment and ED scheduling problem. We discuss the increased computational challenge of this problem, necessitating a less accurate CHP-ED model to remain tractable. Surrogate modeling techniques are introduced which may allow increased CHP-ED model accuracy, while maintaining a reasonable level of computational complexity.

Fuel Cost Functions of Generating Units
Both the CHP-ED and CHP-EED problems are directly concerned with minimizing the fuel cost of operating the generating units to meet heat and power demand. Accordingly, the accuracy of the relationships used to describe the fuel cost over the feasible operating region of the generators is critical for solution quality. It is especially important to validate the relationships describing the fuel cost of generating units as they are not theoretically derived, but rather empirically derived from process data [81][82][83][84]. There are two questions that should be answered regarding the current CHP fuel cost functions. First, have technological advances been made since the introduction of the CHP generating unit fuel cost functions in 1994 which require different functions to accurately describe? Second, are the most appropriate functions being used to describe the CHP generating unit fuel costs with consideration of both accuracy and computational complexity?
To the best of our knowledge, the first publication on the CHP-ED was by Rooijers and Amerongen in (1994) [16]. In this paper the authors state that the cost function of a conventional power-only unit and heat-only unit have quadratic cost functions, such as those shown in Equations (2) and (4). They also state that the practice at the time was to use the quadratic function of Equation (3) to model the cogeneration unit cost function. Recent papers addressing the CHP-ED consider valve-point effects on the cost function of thermal power-only units, which was known to be important as early as 1971 [80]. As a result, the cost function of power-only units which is currently widely accepted in CHP-ED literature is now Equation (5). However, the quadratic fuel cost function of cogeneration units has remained unchanged in CHP dispatch literature since the initial 1994 publication, which provides no validation or citation to justify the function. It is especially intriguing from the engineering perspective as certain cogeneration units consist of thermal power units with heat recovery. In these cases the mechanistic relationship describing the cost of producing power from a power-only unit or cogeneration unit should be the similar, suggesting that valve-point effects should be present in the cost function of this type of cogeneration unit.
With the lack of recent literature validating the cost function of the power-only units that considers valve-point effects, it is possible that advancements have been made in thermal power engineering which requires radically different modeling of valve-point effects since their initial discussion in 1971. This is nothing more than conjecture, we only point it out because of the scientific issues of not reassessing the applicability of a model of a technological system which has advanced substantially over the past 5 decades [2,[85][86][87][88].
For the sake of discussion we will assume the valve-point effects shown in Figure 1 to still be of relevance to thermal power units today, as other researchers have. The question then becomes if the rectified sinusoidal term (|D pc i sin(E pc i (P min i − p p i ))|) is the most appropriate method of modeling valve-point effects. It is assumed that the first introduction of this term was by Walters and Sheble in 1993 to describe the input-output curve data shown in Figure 1, as they provide no citation upon its introduction [68]. Walters and Sheble provide no statistics on the goodness of fit of the rectified sinusoidal term they chose.
When considering adding a component to an optimization model two main factors need to be weighed. One of these factor is the importance of the component to the model, which often means how the accuracy of the model is effected by the inclusion of the component under consideration. The other factor is the complexity of the component, or the computational cost associated with solving a model that contains the component. In general, the more complex the component under consideration is, the more important it must be to justify its use in the model. The rectified sinusoidal term of Walters and Sheble is highly complex as it adds both nonconvexity and nonsmoothness to the optimization problem, prohibiting many classical optimization methods from being applied to models that contain it. As a result, using the rectified sinusoidal term to capture valve-point effects has major implications on the efficient solution of CHP dispatch problems. Evidence of the importance of the rectified sinusoidal term is needed to justify its complexity. We cannot find such evidence in the existing literature, so we perform a simple investigation of the ability of Equation (5) to capturing the fuel input versus power output data provided by Walters and Sheble in Figure 1 [68]. Figure 3a shows the fuel rate data provided by Walters and Sheble, and the predicted fuel rate data using Equation (5) to capture valve-point effects. It is clear that the rectified sinusoidal term cannot fully describe the inconsistent periods of the valve-point effects. The discontinuity of the rectified sinusoidal term does not predict the discontinuity of operating Valve B, resulting in 4.03% error in the predicted fuel rate where Valve B is actually operated. However, it can also be concluded that the rectified sinusoidal term helps predict the actual data better than just a quadratic prediction would, with some of the periodic deviation from a quadratic curve being captured. We have more well-established modeling techniques at our disposal today than Walters and Sheble did when they introduced the rectified sinusoidal term in 1993. The choice of how to capture valve-point effects is not just whether or not a rectified sinusoidal term should be used, but also whether other surrogate models could be used to accurately capture the valve-point effects while introducing less complexity in the model. Given the simple 1-dimensional relationship of the input-output curve, more advanced surrogate modeling techniques such as Kriging or neural networks are unnecessary, although could be applied. Instead, we choose to investigate the ability of a piecewise-linear function to capture the valve-point effects of thermal power units. Figure 3b shows the fuel rate data provided by Walters and Sheble and the fuel rate data predicted using a piecewise-linear function. The piecewise-linear function is made up of 8 linear segments and has a maximum absolute relative approximation error of 1%. By comparing Figure 3a,b it is clear that this relatively simple piecewise-linear function is able to approximate the valve-point effects of thermal power units more accurately than the quadratic function with the rectified sinusoidal term. There are many instances where the computational complexity of complex nonlinear process system models is greatly simplified through piecewise-linear approximation [89][90][91][92][93], and in the case of valve-point effects the piecewise-linear approximation can be more accurate than the current widely used nonlinear model.

Combined Unit Commitment and Economic Dispatch
In the classic CHP-ED problem it is assumed that the number of generating units available for heat and/or power production is known, or the number of units to "commit" is known. It is further assumed that the heat and power demands are constant. In reality, the heat and power demands are frequently changing, resulting in the most economical operating policy being one that commits and decommits units as the changing heat and power demands dictate. To commit a generating unit requires ramping it up to the desired output level over time [94]. The dynamics of ramping units up are longer than the dynamics of the heat and power demands. As a result, the most economic unit commitment (UC) policy must be determined in advance by scheduling the unit commitment according to a forecast of the expected heat and power demands.
The UC scheduling problem typically considers a time horizon ranging from days to a week. The time horizon is then broken into multiple periods, often at hourly intervals [95]. The UC problem is much more difficult than the ED problem because it essentially involves solving multiple ED problems that are linked through the ramping constraints, and integer variables denoting which generating units are committed [88]. The increased difficulty of the UC problem results in the ED subproblem often being less detailed than the CHP-ED model previously introduced. The approach taken by the vast majority of researchers has been to approximate the fuel cost functions of generating units as linear or convex piecewise-linear functions, and their feasible operating regions as convex polytopes [96][97][98]. The resulting problems are mixed-integer linear programs (MILPs) that only require binary variables to indicate whether a particular unit is committed at each time period.
Solving the UC problem with the simplified CHP-ED model results in an operating schedule that is likely to be less economical than the best-possible operating schedule, due to model error. Further, it is possible that the convex approximations of the feasible operating windows of cogeneration units will result in determining operating schedules which are infeasible in practice. This can be avoided if it is ensured that the convex approximation is a restriction of the actual feasible operating window, but the sacrifice is solution quality as feasible operating regions which may yield more economical operation are removed.
Developing a surrogate model of the CHP-ED subproblem may be useful to reduce model complexity while maintaining a higher level of accuracy than the conventional simplification approaches that were previously employed. Given that many of the UC models to-date have been MILPs, owing to their tractability, it is natural to use a surrogate modeling approach which can be embedded in an MILP. Piecewise-linear surrogate models are easily implemented in MILPs, for which there is substantial research attention recently. Rectified neural networks are known to be continuous piecewise-linear functions with the universal approximation ability [99]. As a result, they are a subject of interest for capturing complex nonlinear physics in MILPs [51,100]. Neural networks are popular because of their ability to model large high-dimensional data-sets; however they also possess drawbacks when considering their application in modeling for use in an MILP. One of the reasons deep rectified neural networks are able to approximate large data-sets well, despite being trained by algorithms that are likely to converge to suboptimal solutions, is that they produce piecewise-linear functions with many more linear segments than necessary. In essence, the suboptimal convergence is overcome by using a piecewise-linear superstructure that is much larger than would be needed if convergence to the global optimum were possible. This is a drawback for embedding rectified neural networks in MILPs because the number of combinations of active/inactive binary variables is directly proportional to the number of linear segments in the piecewise-linear function, and so unnecessary complexity is likely to be introduced. As a result, when globally optimal piecewise-linear function generating methods are able to be applied, they should be used. For instance, the convex piecewise-linear function generating technique introduced by Toriello and Vielma (2012) is particularly well-used [101]. The obvious limitation of globally optimal piecewise-linear function generating methods is that they quickly become intractable as the number of data-points and linear segments grow.

Past Metaheuristic Approaches
The approaches used to solve the CHP-ED problem to-date were largely metaheuristic, and of the metaheuristics applied to the CHP-ED problem the vast majority were nature-inspired. The advantageous property of nature-inspired algorithms is their ability to perform a complex search of the search space through coordination of entities that follow simple rules.The coordination of the search entities is achieved through simple algebraic calculations which do not require any differential calculus, resulting in their ability to be applied to non-differentiable functions. Naturally observable phenomena provided a rich source of ideas for metaheuristic algorithms because nature is composed of many instances of entities that follow simple rules to achieve a complex objective with remarkable success. The ability for ants to find the shortest path from their nest to a food source while relying solely on the secretion of pheromone trails has inspired ant colony optimization, which has shown to be especially useful for certain discrete NP-hard problems such as the traveling salesman problem [102]. Particle swarm optimization is built on the hypothesis that individual members in schools of fish profit from the experiences of all other members in the school during the search for food, and this benefit outweighs the disadvantages when the food resource is unpredictably distributed in patches. This hypothesis easily extends to other groups of social animals, such as flocks of birds, and the particle swarm optimization algorithm is most easily understood as simulating the choreography of a flock of birds flying to forage for food [103]. Particle swarm optimization has been shown to efficiently solve certain unconstrained continuous non-convex optimization problems. The ability for a colony of bees to efficiently find sources of nectar by simple interactions between employed bees, onlooker bees, and scout bees, has inspired bee colony optimization, which has also shown to be useful for solving certain unconstrained continuous non-convex optimization problems [104]. The ability for natural selection and the evolution of genetics to produce offspring which are increasingly more suitable for surviving their environment by simple mutation, recombination and selection operators has inspired genetic algorithms [105].
An abundance of other natural phenomena inspired metaheuristic methods similar to those mentioned above. However, there are common limitations that all nature-inspired metaheuristics suffer from because nature cannot provide simple answers for some of the most fundamental issues faced in optimization problems. These common limitations are the ability to handle constraints, and the premature convergence of the search algorithm to solutions that are not the true global optimum [17,18,71].

Difficulty of Handling Constraints
Nature inspired metaheuristic methods do not explicitly consider constraints because the entities they are inspired by do not use simple mechanism to obey the constraints they are subject to, or they are not subject to the types of constraints that exist in many optimization problems. For instance, when ants are foraging for food and come across impassable terrain they are able to sense this through complex physiological processes and avoid it. When considering a constrained continuous optimization problem, there is no simple way to provide these same senses to the search agents in the ant colony optimization algorithm to avoid them from traversing into infeasible search spaces. Or consider a school of fish searching for food in a particular river, it is not possible for any fish in this school to discover a food source in a disjoint body of water. When considering optimization problems with disjoint feasible regions, there is no simple way to ensure the particles searching in particle swarm optimization algorithms move between the disjoint feasible regions to discover the globally optimal solution because this is not a situation dealt with by the natural organisms which they were inspired by.
To deal with the inability of metaheuristic algorithms to deal with constraints, it is common to remove the constraints from the problem and penalize their violation in the objective function using a penalty factor which diminishes as the search progresses, yielding an effect similar to an annealing schedule [17,18,71]. This is not a perfect fix though, because a penalty factor that is set too large initially will cause premature convergence to suboptimal solutions, and a penalty factor set too small initially can dramatically increase solution times because many candidate solutions will be produced which are infeasible. Further, the penalty functions are often ill conditioned near the boundaries of the feasible region, where optimal solutions are commonly located [19]. As a result, the most difficult aspect of the penalty function method is determining the progression of penalty factors which yields desirable algorithm convergence [20]. There exists no theory to determine the correct choice of the penalty factor progression for a given optimization problem, or even for instances of the same optimization problem, and so practitioners have no choice but to tune the penalty factor progression manually until desirable algorithm performance is obtained. This is a serious concern for practical problems such as the CHP-ED and CHP-EED which need to be resolved frequently as heat and power demands change. The penalty factor progression tuned offline cannot guarantee good algorithm performance when deployed online, putting the reliability of decision support tools based on metaheuristics into question. Table 4 summarizes the methods used to handle constraints by the metaheuristics applied to the CHP-ED and CHP-EED.

Premature Convergence Issues
Despite metaheuristics being referred to as stochastic global search algorithms, they do not possess global information about the optimization problem throughout the course of their search. From this, it is impossible for metaheuristic algorithms to guarantee convergence to globally optimal solutions because they cannot assess candidate solutions based on global information. The gradient-free nature of metaheuristic methods means that convergence to locally optimal solutions cannot even be guaranteed. These theoretical shortcomings give rise to the need for metaheuristics to balance the exploitation of the best-known candidate solutions at any stage, with the exploration of the search space in hopes of finding improved candidate solutions. Exploration of the search space is achieved by coding stochastic behavior in the algorithm which gives rise to some probability of search entities randomly selecting solutions outside of the candidate solutions. In ant colony optimization this is typically achieved by search agents selecting the node to move to based on a probabilistic rule that is a function of pheromone level connecting their current node with the prospective node. In bee colony optimization the stochastic behavior is achieved by scout bees which randomly search for new candidate solutions, and stochastic neighborhood search around known candidate solutions. In genetic algorithms the recombination of randomly selected parent genes and the random mutation operator contributes to the discovery of new candidate solutions in a stochastic way. Table 4 summarizes the stochastic rules used by the metaheuristics that were applied to the CHP-ED and CHP-EED to achieve random exploration.
The performance of an algorithm which is imbalanced between the competing objectives of exploitation and exploration will suffer. An algorithm which is overly greedy, or prioritizes exploitation of the best-known candidate solutions, will suffer from premature convergence to solutions that are not of high-quality. An algorithm which prioritizes exploration will suffer from long convergence times, or possibly complete lack of convergence. Like the progression of the penalty factor for constraint handling, in general there are no theoretical results to automate the tuning of parameters effecting the balance between exploration and exploitation. Practitioners are left to manually tune these parameters to achieve desirable performance for the problem instance at hand, but are left with no guarantees that these same parameters will extend to other problem instances. As a result of the random choices in metaheuristic algorithms, the final solution and computing times are actually random variables [106]. This means that repeatedly solving the same problem instance using a metaheuristic is likely to return several different solutions, with a variety of different solution times. This fact compounds the issue of reliability when implementing metaheuristic algorithms in decision support settings where repeatedly obtaining high quality solutions over varying problem instances is required.

Benefits for Multi-Objective Optimization
In MOO problems the objectives are often conflicting, resulting in many solutions which are equally good in the sense that each is better than the rest in at least one objective. These solutions are all optimal solutions because no other solutions exist which can yield an improvement in one objective without making at least one objective worse. These solutions are often referred to as non-dominated and they form what is known as the Pareto-optimal front [107]. A distinct benefit of metaheuristic algorithms is that they can easily be extended to MOO problems, owing to the fact that they naturally generate solutions throughout their search which can be compared to determine which of them are non-dominated. By storing the non-dominated solutions and iteratively applying heuristics to search for possible improvements, a search for the Pareto-optimal front is achieved.
Deterministic global optimization methods cannot so easily be extended to generate the Pareto-optimal front during the course of their natural search. For instance, the well-known branch-and-bound method relies on removing areas of the feasible region through refining bounds on a single objective function for each area. If just one of the objective functions from the MOO problem are chosen as the metric for bounding areas in the branch-and-bound algorithm, then areas which contain solutions on the Pareto-optimal front will be discarded without discovering the non-dominated solutions they contain. As a result, such a method can only generate a single point on the Pareto-optimal front with confidence. To overcome this, a linear weighted sum of the objective functions is often employed. The linear weighted sum of objective functions can repeatedly be solved with different weights, with each solution producing a point on the Pareto-optimal front. This approach is limiting due to the computational burden required to repeatedly solve the problem, but the most serious drawback is that it is usually not true that each Pareto-optimal point can be obtained through a suitable choice weights. The linear weighted sum method cannot generate portions of the Pareto-optimal front when its shape is non-convex, regardless of the weight combinations used, meaning it commonly fails for non-convex problems [108].
Evolutionary algorithms are the type of metaheuristic algorithm investigated the most for solving the CHP-EED problem. Of these algorithms is the multi-objective line-up competition algorithm (MO-LCA). The MO-LCA uses a composite fitness function to evaluation "families" (candidate solutions) based on their ranks for each of the objective functions. New generations of families are produced within a random distance of the previous generation of families, whose maximum range decreases as the composite fitness value of the family increases, and as the algorithm progresses. After the new generation of families is produced, their composite fitness values are compared to the fitness values of their parents, with the more fit family evolving to the next generation. In this way, the solutions which are non-dominated in the candidate solution set are preserved but allowed to make local searches for solutions which are globally non-dominated [109].
Another evolutionary algorithm which has been investigated for solving the CHP-EED is the non-dominated sorting genetic algorithm II (MO-NSGA-II). In this algorithm, solutions are first ranked based on their non-domination level, and a secondary rank in ascending order of crowding distance is assigned. The mating pool is then constructed using random tournament selection based primarily on the non-domination level and secondarily on the crowding distance. Crossover and mutation of the mating pool is then used to produce a child population, which is finally ranked alongside the parent population for selection to the next generation [73].
Recently a hybrid metaheuristic approach combining aspects from MO-NSGA-II and multi-objective particle swarm optimization has been proposed. At each iteration of this algorithm the higher ranked half of the population evolves using standard genetic algorithm principles, and the lower ranked half of the population searches using standard particle swarm optimization principles [77]. Table 4. Summary of key characteristics of past metaheuristic optimization approaches applied to the CHP-ED and CHP-EED.

Comparison with Global Optimization
Global optimization algorithms are not subject to the theoretical and practical shortcomings identified for the abundance of metaheuristics discussed in the previous section. Practitioners applying global optimization algorithms do not need to set parameters that influence the algorithm performance, and upon algorithm convergence a solution within any desired tolerance of the true global optimum is returned, or the problem is determined to be infeasible. The most serious limitation of global optimization techniques for solving NP-hard problems, such as the CHP-ED problem, is the computational complexity associated with the theoretical guarantees that are provided. However, if global optimization of the problem at hand can be solved within times useful for practical application, it is undeniably the preferred choice over metaheuristic algorithms, if not for the guaranteed global optimality of the solution returned, for the algorithm reliability that is ensured.
In this section, we investigate the ability for a global optimization algorithm to solve the five CHP-ED problem test instances that are commonly investigated in the literature. In all test instances the relative optimality gap is set at 0.01%. The objective values of the solutions reported are all rounded up to the nearest whole number to reflect the precision of the optimality gap. The details of these test instances can be found in Appendix A. For the first two test instance valve-point effects are not considered and so no rectified sinusoidal terms are present in the CHP-ED model. For these two test instance the global optimizer BARON version 18.11.12 was used to solve the problems [117,118]. For the remaining three test instance valve-point effects are considered, so the global optimizer we use is COUENNE version 26.1.0 because of its ability to handle the rectified sinusoidal terms [119]. The optimization software used for the case study was GAMS version 26.1.1 [120]. In all trials GAMS was ran using a single thread on an Oracle VirtualBox version 5.1.26 with the Ubuntu 16.04 operating system, a CPU frequency of 3.60GHz, and 4GB of RAM.
We use the metaheuristic algorithm results on these five test instances summarized by the review paper of Nazari-Heris et al. (2018) [66] for comparison to the results we obtain for the deterministic search algorithm. We note that many publications in the field of CHP-ED optimization reported solution times but did not report the hardware used to achieve such times. With solution times on the order of tens of seconds, comparison of solutions times is meaningless because differences may be due to the hardware used rather than the search algorithm itself. Regardless, we provide the solution times reported in the literature when available, simply to provide reference as to what were acceptable solution times in the field to-date.

Test Instance 1
Test instance 1 is made up of one poweronly unit without valve-point effects, two cogeneration units, and one heat-only unit. Of the two cogeneration units, one has a convex feasible operating range and one has a non-convex feasible operating range. Power transmission losses are not considered in this test instance. Table 5 summarizes the performance of 29 metaheuristic approaches and BARON for solving test instance 1. The high, median, and low columns in the table represent the highest value reported for the 29 metaheuristics, the median value of the 29 metaheuristics, and lowest value reported for the 29 metaheuristics. It can be seen that BARON is sufficient for handling CHP-ED problems of this size, with it reducing the relative optimality gap less than 0.01% in 0.06s. In the form of P = (P 1 , ..., P 3 ), H = (H 2 , ..., H 4 ), the optimal solution found by BARON for this test instance 1 is: It should be noted that of the 29 metaheuristic solutions provided by Nazari-Heris et al. (2018), 10 of the solutions provided were infeasible. Three of these infeasible solutions were due to violations of the cogeneration unit's feasible region, and seven were because the solution provided was more than 0.1MW short of the heat or power demand. Even for this simple test instance, of which the researchers were able to tune the constraint handling and premature convergence parameters to be tailored to this test instance specifically, feasibility and optimality issues were present in the many of the metaheuristic algorithms.

Test Instance 2
Test instance 2 is made up of one power-only unit without valve-point effects, three cogeneration units, and one heat-only unit. Of the three cogeneration units, one has a convex feasible operating range and two have a non-convex feasible operating range. Power transmission losses are not considered in this test instance. Table 6 summarizes the performance for 12 metaheuristic approaches and BARON for solving test instance 2, of which three heat and power demand profiles are studied. It can be seen that BARON still has good performance on this larger CHP-ED test instance, with it reducing the relative optimality gap to less than 0.01% for all profiles in under 0.07s. Of the 12 metaheuristic algorithms, seven were able to find feasible solutions for all three demand profiles. Of these seven search algorithms, five were able to find solutions with objective values within 0.01% of the global optimum on all three demand profiles. In the form of P = (P 1 , ...,

Test Instance 3
Test instance 3 is made up of four power-only units with valve-point effects, two cogeneration units, and one heat-only unit. Of the two cogeneration units, one has a convex feasible operating range and one has a non-convex feasible operating range. Power transmission losses are considered in this test instance. Table 7 summarizes the performance for 16 metaheuristic approaches and COUENNE for solving test instance 3. It can be seen that COUENNE has good performance on this test instance, with it reducing the relative optimality gap to less than 0.01% in 1.39s. In the form of P = (P 1 , ..., P 4 ), H = (H 5 , ..., H 6 ), the optimal solution found by COUENNE for test instance 3 is:   Test instance 4 makes a substantial leap in the CHP system size to consider 13 power-only units with valve-point effects, six cogeneration units, and five heat-only units. Of the six cogeneration units, three have convex feasible operating regions, and three have non-convex feasible operating regions. Power transmission losses are not considered in this test instance. Figure 4 shows the search progression of COUENNE over time, which converged to a solution with an objective value of $57826 at 16.47 s. The upper bound represents the objective value of the best-known feasible solution at that time, and the lower bound represents the best-possible objective value known at that time. As the search progresses the upper and lower bounds are refined through a branch-and-bound procedure, until the best-known feasible solution's objective value is within the set tolerance of 0.01% of the best-possible objective value. Figure 4 shows that although COUENNE took 16.47 s to reduce the optimality gap less than 0.01%, at points much earlier in the search feasible solutions were found which were superior to many of those found by the metaheuristic algorithms. The feasible solution found at 4.96 s with an objective value of $57900 is superior to seven of the metaheuristic solutions shown, and COUENNE arrived at the final solution 9.61 s into the search, despite not being certain of its objective value being within 0.01% of the best-possible value until 16.47 s. Further, three of the four metaheuristic solutions with a superior objective value than the COUENNE solution found at 4.96 s are infeasible with respect to the feasible operating regions of the cogeneration units. It should be noted that the solution found by MPSO in the paper by Basu (2015) [113] has not been identified as infeasible previously, but our inspection clearly shows a violation of the feasible operating regions of the cogeneration units identified as units 15 and 17. In the form of P = (P 1 , ..., P 19 ), H = (H 14 , ..., H 24 ), the optimal solution determined by COUENNE for test instance 4 is: 319, 299.199, 299.199, 60, 109.867, 109.867, 109.867, 109.867, 109.867, 76.950, 40, 55, 55, 81, 40, 81, 40, 10, 35), 75, 104.800, 75, 40, 20, 470.400, 60, 60, 120, 120).

Summary
In this critical review, we discussed the modeling and optimization of the CHP-ED and CHP-EED problems. With respect to the modeling of CHP systems, we raised two distinct questions. Namely, are the models introduced in the 1990s still valid for CHP systems today, and are the most appropriate CHP dispatch model components being selected with consideration of both accuracy and complexity? Addressing the latter of these questions, we demonstrate that the commonly employed rectified sinusoidal method of modeling the valve-point effects of thermal power units does not fully describe the inconsistent periods associated with the operation of steam admission valves. We show that a simple piecewise-linear function has the potential to describe valve-point effects more accurately, while introducing less complexity to the CHP dispatch model. A study that validates the relationships describing CHP system operation by comparison with industrial CHP system data would be beneficial to the field. Once the relationships describing CHP system operation were validated, the choice of how to model these relationships can be made. Each relationship should be modeled with both accuracy and computational complexity in mind. A study that justifies the model chosen for each relationship on the basis of these factors is needed. Such a study would likely benefit from investigating current surrogate modeling techniques. We also discussed the combined UC and ED problem, which has received far less research attention than the ED problem alone. Current combined UC and ED literature considers a simplified ED subproblem, which is likely to result in suboptimal and possibly infeasible solutions. Future research attention on both the modeling and optimization of the combined UC and ED problem is needed to improve the scheduling of the generating units in CHP systems.
With respect to solving the CHP-ED problem, we question the suitability of metaheuristic algorithms and demonstrate the strong advantages of using global optimization methods. We discuss the issues of constraint handling and premature convergence that are common among all metaheuristic algorithms. These issues are particularly concerning for practical applications where frequently resolving the problem subject to different operating conditions is required, such as the CHP-ED and CHP-EED problems. A case study is performed on five classic CHP-ED test instances and the computational performance of metaheurstic algorithms and commercial global solvers is compared. The global solvers were shown to converge to a solution within 0.01% of the global optimum in all test instances, and achieved this in solution times comparable to those reported for the metaheuristic methods in all instances. The success of these generic global solvers should inspire future research to focus on the global optimization of CHP dispatch. Based on the success of the naive application of global optimization here, it is likely that more advanced global optimization techniques that exploit the problem structure and composition of model relationships could yield significant improvements in computational performance.
We review the solution methods of the CHP-EED problem, and identify the advantageous properties of metaheuristic algorithms which make them well suited for addressing MOO problems. The CHP-EED problem is important because one of the main benefits of CHP systems is their lower emissions compared with conventional fossil fuel electricity generation. However, metaheuristics suffer the same limitations in solving MOO problems as they do in single objective problems, and so further research in their application to solving the CHP-EED problem is needed. A synergistic approach which integrates classic global optimization techniques alongside metaheuristic techniques could be useful in overcoming the constraint handling and premature convergence issues of metaheursitic algorithms, while also mitigating the theoretical challenges associated with applying global optimization to MOO problems. Research in this direction could be useful to the CHP-EED problem, and potential many other MOO problems.

Conflicts of Interest:
The authors declare no conflicts of interest.