Evolutionary Strategy for Practical Design of Passive Optical Networks

: Passive optical networks (PONs) are an important and interesting technology for broadband access as a result of the growing demand for bandwidth over the past 10 years. An arduous and complex step in the design of such networks involves determining the placement of equipment, optical ﬁber cables and several other parameters relevant to the proper functioning of the network. In this paper, we propose an evolutionary strategy to optimize the infrastructure design of PONs by using genetic algorithm technique. This meta-heuristic is capable of elaborating fast, automatic and efﬁcient solutions for the design and planning of PONs. Our proposal has been developed using real maps, aiming to minimize deployment costs and time spent to carry out PON projects, achieving pre-deﬁned quality criteria. We considered, in our simulations, two scenarios (non-dense and dense), four possible topologies and two regions of interest. The non-dense consists of a scenario in which subscribers are distributed in a dispersed manner in the region of interest. The dense has a considerably higher number of subscribers distributed in a very close way to each other. Based on the obtained results, the potential of our proposal is quite clear, as well as its relevance from a technical, economic, and commercial point of view.


Introduction
The emergence of new services using the Internet (applications such as video-ondemand, 3D high-definition television (TV), 4K, or 8K ultra-high definition (UHD), machineto-machine (M2M), cloud computing, virtual/augmented reality (VR/AR) and fifth-generation mobile network (5G), for example) will require optical networks to transport traffic with different values of quality-of-service (QoS) and bandwidth requirements [1][2][3]. As reported by Cisco Systems [4], the number of global Internet users will be 5.3 billion (66% of the global population) by 2023. Fixed broadband speeds will be more than double by 2023. By 2023, global fixed broadband speeds will reach 110.4 Mbps, up from 45.9 Mbps in 2018. According to Federal Communications Commission's (FCC) broadband guide [5], a common house with regularly simultaneous uses of two or three devices, using applications that require basic or moderate traffic (e.g., email, browsing, video, voice-over-internet protocol (VoIP) and at least one streaming HD video or an online gaming application), requires a service that offers a bandwidth of 12 to 25 Mbps. As a result, the consumer's demands may not be satisfactorily met if a high-quality access network service is not available in the subscriber's region.
Broadband services have traditionally been provided over digital line subscriber (DSL) or cable modem networks [6]. However, it becomes even harder to support the fastgrowing demand of users due to the fundamental bandwidth limitations of copper-based networks [7]. As a result, technologies based on optical fiber emerge as an alternative to meet these demands and the high quality-of-service required in access networks [8]. According to the Brazilian National Telecommunications Agency (ANATEL) report [9], in August 2019, approximately 65.2% of the Brazilian's access networks were still based on copper cables. However, it is worth mentioning the recent growth in the use of optical fiber (reaching the percentage of 26.3% on the same date). Another relevant piece of data is that only 46.7% of Brazilian households have fixed broadband access, which means that the optical fiber access network services will grow in the next few years.
Several fiber access network architectures have been developed (e.g., point-to-point, active optical network and passive optical network) [10]. However, passive optical networks (PONs) stand out. A passive optical network means that it does not need electrical power in the outside plant. PONs' advantages include [11]: easy installation and updating; low cost of operation and maintenance; high reliability, electromagnetic immunity; and compact size cables. Nowadays, PONs have been launched into the practical market with Gigabit-capable PON (GPON), and following on from this, with its next generations XGS-PON and HS-PON (which are already standardized by ITU-T). Their ITU-T recommendations containing all technicals details are available in [12][13][14]. It should be noted that network designers and planners are therefore faced with several key challenges at all stages of the design process and this is not always an easy task, as many possibilities exist [15]. Thus, the PONs planning task can require several days to be completed, wasting resources that could be allocated to other assignment and demands.
In this paper we propose an algorithm for PON planning based on the use of genetic algorithm (GA) technique. By using basic information such as streets, avenues, central office (CO) position, subscriber location and four possible topologies, the proposed algorithm finds a solution to the PON planning problem, considering four possible topologies in two simulation scenarios (non-dense and dense) and two different regions of interest. In the non-dense scenario, subscribers are distributed in a dispersed manner in the region of interest. The dense scenario has a considerably higher number of subscribers distributed very close to each other. The objective is to minimize the network deployment cost, as well as reduce the time spent to find the problem solution. In addition, we explain in detail the capital expenditure (Capex) model, concepts of graphs theory used by our proposed algorithm for PONs planning optimization, the process of importing maps in real time and the optical power budget. This paper is organized as follows: In Section 2, the problem characterization is discussed. In Section 3, we present the four possible topologies adopted in our simulations. In Section 4, our proposed algorithm for PON optimization and, in Section 5, the simulation scenarios considered in this paper are described in detail. In Section 6, the results are discussed and, finally, in Section 7, we present the conclusions.

Discussion
There are numerous optical network technologies. However, PON stands out by offering a highly efficient and cost-effective access network [1]. A PON's main feature is that it is implemented as a point-to-multipoint architecture, in which unpowered fiber optic splitters are used to enable a single optical fiber to serve multiple end-points (consumers) [16]. Due to the consideration of many design factors, the optical network planning process often exhibits several challenges from the optimization point of view [17,18]. To design a cost-effective PON, it is required the consideration of many factors such as the split ratio, position of optical splitters in the optical distribution network (ODN) and the assignment of subscriber premises to the splitters. In addition, all the planning rules must be satisfied including constraints on equipment capacity and the maximum cabling distances which are derived from power budget constraints [19]. For this purpose, several optimization techniques have been used for access network planning in order to find an efficient infrastructure to meet requirements, aiming to obtain the lowest possible deployment cost [20]. Nowadays, network planning is one of the most important topics of PONs. Therefore, several proposals based on mathematical techniques, heuristic and meta-heuristic algorithms have been reported for this optimization problem [18,[20][21][22].
Chu et al. [18] proposed a heuristic algorithm based on ant colony optimization (ACO) to minimize the PON deployment cost. In summary, the ACO algorithm is a probabilistic technique for solving computational problems. This technique is inspired by the behavior of ants that leave their nest seeking the shortest path between their colony and a source of food. Chu et al. [18] developed a system that was able to minimize the PON deployment for a specific tree PON topology. The considered topology was limited to use just one level of optical splitters and drop closures for the distribution of optical fibers. The objective function was based on the deployment network cost. In addition, when the cost was evaluated, only optical cables, optical splitters and drop closures were considered, ignoring others important deployment costs such as optical splices, network elements in central office, etc. However, Chu et al. [18] do not inform how the obtained results were validated. Regarding the adopted scenarios for the tests, the authors informed only that they were based on real scenarios, but did not present used maps and neither inform more details about them.
Pehnelt et al. [20] proposed an algorithm which deals with optimization of PON designs. The proposed system used a combination of different methods to find the solutions. Firstly, Pehnelt et al. [20] introduced an algorithm based on Dijkstra's algorithm to find optimum (minimum) distances between optical line terminals (OLT) and optical network terminals (ONT). Then, for the second part (optimum splitter placement), k-mean clustering and hierarchical clustering techniques were used. The proposed algorithm found an optimal metric, thus creating the optimized tree topology mainly focused on summary trenching distance (although endpoint attenuation and minimum summary length of optical fibers were considered). For tests, it was used a real map of a small residential neighborhood of Prague (Czech Republic). Raw data were obtained from Open Street Maps (OSMs) software. For simulations, a scenarios using 40 ONTs and one single OLT, randomly placed on the map, using one to three levels of optical splitters in the outside plant was considered. The proposed algorithm was a good alternative to conventional methods and the results were presented in graphical and table forms. However, important parameters that also influence the deployment cost were not considered (such as splices, cabinets, etc.) and only one type of cable was used for the entire network. The tested scenario considered by Pehnelt et al. [20] was relatively small, which greatly reduces the search space and simplifies the task of finding the optimal solution.
Villalba et al. [21] proposed a genetic algorithm for solving PONs projects considering three different types of basic topologies, such as tree, ring and bus. The proposed GA was responsible for determining the optical splitters' placement in the network and also the optical cables' routing, seeking for the lowest cost of implanting the network. However, despite presenting a good performance when compared to manual methods, the developed tool was quite simple and considered only the splitters cost and the cost of a single type of cable, not taking into account several other factors involved in proper network planning. In [21], the used maps were not georeferenced and no further details were provided on the process of importing the maps.
Eira et al. [22] introduced an integer linear programming (ILP) model which was capable of designing a single-stage and multiple-stage splitting PON. In order to reduce the computation time for larger sizes, a two-stage heuristic was proposed to tackle with scenarios that were beyond the computational abilities of the optimal method. The aim of the paper was finding the least costly tree topology deployment configuration considering equipment and installation costs. For the single-stage ILP model, the authors adapted the classical concentrator location problem (CLP) applied to telecommunications network. For the multistage problem, new ILP model was formulated to deal with it. The following inputs were considered: OLT and optical network units (ONU) locations; number of candidate locations for splitter placement; and respective interconnection costs. The maximum split ratio allowed by each PON port was assumed to be 64. For the heuristic approach, developed to cope with large-scale multistage PONs, the authors proposed an algorithm that used the same inputs as the ILP model. The basic concept was to obtain an initial cost-effective configuration and then locally to search for alterations in the network layout in order to reduce the overall cost. It should be noted that power budget constraint was considered in both approaches. Although not addressed in the paper, the authors highlighted that presented approaches can be easily adapted to other PON technology types. However, in the simulations, georeferenced or real maps were not used, which provides poor details compared to a real PON planning problem. Moreover, the adopted size of the available splitters placing locations was reduced (search space). These possibilities were quite small, reaching only 50 possibilities (in the worst case), leading to a reduction in the optimization problem complexity.
Some recent works involve evaluation scenarios that are not part of the scope of this paper, such as: radio-over-fiber, internet of things and data centers. As an example, in [23], the authors present a novel radio-over-optical fiber network architecture with multi-stratum resources optimization using software-defined networking. The proposed architecture can globally optimize radio frequency, optical spectrum and cloud baseband unit processing resources effectively to maximize radio coverage and meet the quality of service requirement. In [24], the authors propose a brain-like productive service provisioning scheme with federated learning for industrial internet of things. The scheme combines production information into network optimization, and uses the interfactory and intrafactory relations to enhance the accuracy of service prediction. In [25], the authors present a novel architecture that can enable cross-stratum optimization of application and optical network stratum resources, and enhance multiple-layer resource integration in ubiquitous data center optical interconnection.
In this paper, we propose an evolutionary strategy to optimize the design of PONs. Compared to the existing approaches presented in previous papers [18,[20][21][22], our proposal is simple but efficient and complete, supplying the main deficiencies found in previous papers [18,[20][21][22]. The technique proposed is based on genetic algorithm and this choice is mainly due to the simplicity of its implementation and the good results obtained in problems involving graphs. In summary, the proposed algorithm in this paper aims to minimize deployment costs and time spent to make the project, using real georeferenced maps, displaying the results in a graphical, friendly and complete way. For network cost calculation, all the typical materials and services involved in GPON deployment were considered. The restrictions considered in this paper are also stricter and are based on real scenarios, which means more practical results. The efficiency of the proposed system has been validated by comparisons with manual planning of PONs (traditional way), resulting on great performance, with total cost values lower than those obtained with the traditional way. Moreover, all simulations and tests of the proposed system were performed on real maps, considering two different simulation scenarios (non-dense and dense), four GPON topologies and two regions of interest. Table 1 describes a summary of the contributions of the papers available in the literature and the main characteristics of our proposal in this paper.

Adopted Topologies for Proposed Systems
PON systems can use bus, ring, tree or a mix of these previous topologies [21]. This choice occurs at the network planning stage and each topology has different characteristics, being implemented according to the situation and requirements of each project [26]. Therefore, the appropriate topology choice depends on the project premises, such as redundancy requirement, service capillarity and how subscribers and potential subscribers will be geographically arranged.
Among the basic PON topologies, Internet service providers (ISPs) typically adopt the tree topology [27]. This is mainly due to cost reduction and greater network connectivity for subscribers who lives in a certain region [22]. However, the wide range of splitter configuration and placement possibilities create an extensive list of tree-topology types using single or multiple stages. The way the optical splitters are arranged in the network defines whether the system is composed of a centralized topology (with a single splitter stage in the outside plant) or a distributed (also known as cascated) topology (with multiple splitter stages in the outside plant) [26,28]. In general, a centralized approach typically offers lower operational costs and is easier to access and to control for technicians. On the other hand, distributed topology brings faster return-on-investment, lower initial costs and lower fiber costs [28]. This leads to a typical benefits and disadvantages trade-off decision.
It also should be highlighted that the filling rate of an OLT PON port has a direct impact on the number of OLT PON cards and the size of the OLT shelf. Consequently, it has a direct impact on the deployment cost. The way how fiber distribution hubs (FDHs) are distributed and connected in the ODN (e.g., same areas or areas with different fiber service appetites) may influence how PON cards are gradually activated or utilized. In other words, PON cards activations can be postponed or reduced based on how FDHs are inter-connected.
Although our proposal can be easily adapted for different scenarios, this paper focuses on fiber-to-the-home (FTTH) deployment planning using GPON system. For this purpose, the following topologies were considered (Sections 3.1-3.4): (1) Centralized Topology-Type 1 (CTT1), (2) Centralized Topology-Type 2 (CTT2), (3) Distributed Topology-Type 1 (DTT1) and (4) Distributed Topology-Type 2 (DTT2). It is worth mentioning that the specifics of these hypothetical topologies are considered in order to represent a real GPON deployment using the two aforementioned approaches for tree topologies (known as centralized and distributed topologies). These topologies are commonly used by ISPs and basic concepts are described in [28]. Figure 1 shows the physical topology diagram for: (a) CTT1, (b) CTT2, (c) DTT1 and (d) DTT2, and Table 2 presents the summary of the adopted topologies in terms of topology type approach, number of splitter stages, network split ratio and splitter's placement.

CTT1
This topology uses a single-stage splitter configuration with a split ratio of 1 × 64. This single-stage splitter is placed in an outside plant (OSP) telecommunications enclosure or cabinet. In this paper, we call this enclosure a fiber distribution hub (FDH). In comparison with other topologies, due to the high splice ratio of access splitter (1 × 64), this topology has a reduction in the number of passive network elements in the OSP. However, on the other hand, in general, FDH are installed further away from subscribers, leading to a significant increase in the length of drop cables. Figure 1a shows the physical diagram of the network when CTT1 is adopted. A feeder cable with a high amount of fibers feeds access splitters (with split rate of 1 × 64) placed at FDHs in an OSP and these splitters serve subscribers.

CTT2
This approach uses a combined split ratio of 1 × 64, with two levels of splitters, these being 1 × 2 splitters placed inside the CO and 1 × 32 splitters placed in an OSP-FDH. This topology is also considered centralized because it still only has a single-stage splitter in the OSP. In comparison with the previous presented topology (Section 3.1), an increase of FDHs is expected in the external network (due to the lower capacity of access splitter) and a greater use of fibers in the feeder cable. However, on the other hand, the tendency is a reduction in the length of drop cables (used to serve subscribers) due to the increased number of access splitters in the OSP. Figure 1b shows the physical diagram of the network when CTT2 is adopted.

DTT1
This topology uses two levels of splitters in the OSP and has no splitter in the CO. The first level of splitting is used for distribution, has a splice ratio of 1 × 4 and is installed in a fiber optic splice closure (FOSC) in the OSP. The second level of splitting is used for subscriber access, has a splice ratio of 1 × 16 and is placed in FDHs. In relation to the above-mentioned topologies (Sections 3.1 and 3.2), a significant increase in the number of passive elements along the external network is expected, as is a reduction in the used fibers of the feeder cable. However, the tendency is a reduction in the length of drop cables used to serve subscribers once, in these topologies, FDHs will be closer to subscribers. Figure 1c shows the physical diagram of the network when DTT1 is adopted.

DTT2
This last approach also uses two levels of splitters in the OSP and has no splitters inside the CO. The first splitter (distribution splitter) has a splice ratio of 1 × 8 and is installed in a FOSC in the OSP, while the second level has other 1 × 8 splitters (access splitters) placed in FDHs. In this topology, access splitters are placed very close to subscribers, leading to a great reduction in drop cable length. However, on the other hand, among considered topologies in this paper (Sections 3.1 and 3.3), this one has a higher number of passive network components in the OSP. Figure 1d shows the physical diagram of the network when DTT2 is adopted.

Proposed Algorithm for PON Optimization
For the mathematical representation of the region, in which a PON is planned and optimized (region of interest), the elementary concepts of graphs theory were used. These concepts are discussed in details in Section 4.1. In Section 4.2, some basic aspects of genetic algorithm and, in Section 4.3, our proposed GA are presented.

Theory of Graphs
Graphs are an important mathematical tool and have been widely used to represent problems in the most diverse areas of knowledge [29]. They can reproduce any network, such as a telecommunications network [29]. Figure 2 shows a simple, non-oriented graph, in which V represents vertices (or nodes) and E represents edges (or links). One of the most simple way to represent a graph mathematically is through the use of adjacency matrices. These matrices describe how vertices of the graph are connected. In general, given a graph with V vertices (nodes), it can be represented using a matrix A of dimension V × V. For the representation of non-directed graphs without weights at the edges, each value of matrix A can be defined by a i,j = 1, if vertices v i and v j are adjacents; 0, otherwise.
It is then noted that a binary matrix is formed for this case. If the edges have associated weights, the entered value for a ij will have, instead of 1, the numeric value associated with the edge. To illustrate, the graph shown in Figure 2 is represented by the following adjacency matrix (A) (without weights) It should be emphasized that the use of graphs as a tool for map representation is very common in the literature. The graphs are used to describe the PONs design problem. In general, corners can be represented by vertices, edges can represent streets and avenues, and weights (of each edge) are distances between two vertices [20,21,27].

Genetic Algorithm
The first research on GAs was developed and published by John Holland [30]. Since then, genetic algorithms have been successfully applied to a wide range of optimization and machine learning problems, including the optimization of PONs planning [21].
Initially, when a GA is executed, P individuals are generated randomly and each individual represents a possible solution to the problem. Each individual is composed of chromosomes and these are represented by a set of genes. As a further process of the algorithm, this set of individuals (population) is evaluated by using a fitness function. The better-fitting individuals are selected from the current population, and unfitted individuals are discarded. From this point on, the selected individuals have their genes modified through genetic operators known as crossover and mutation. The new generation of possible solutions is then used in the next iteration of the algorithm. This cycle is repeated until an appropriate solution is found or the number of interactions (generations), previously defined, is reached. The main goal of this technique is to find the individual that most fit the environment during the evolution process [31].
One of the greatest advantages of GAs is the simplicity in the optimization problem formulation [32]. Usually, fixed-length bit strings are used as input for the algorithm, which perfectly adapts to problems involving graphs [33][34][35]. Another advantage is the fast convergence time, in comparison to polynomial algorithms, for problems that involve a large number of variables [36]. Although GA is an algorithm that usually presents robust results, as it is a meta-heuristic method, the solution can converge to a local minimum, presenting a false optimal response. However, this premature convergence can be minimized with an appropriate adjustment in the number of generations, population size and genetic operators [37,38]. Figure 3 shows the flowchart of our proposed genetic algorithm for optimization of PONs planning. The algorithm can be represented by the following steps:

Our Proposed Genetic Algorithm for Optimization of PONs Planning
(1) Import the georeferencing data obtained from the map: In this step, the georeferenced data of the region of interest is imported to Matlab ® . OSM software is used to obtain the region's raw data. The region of interest is selected and, in the file exported by the OSM, the intersections between streets (corners) are identified with global position system (GPS), as well as the links among those intersections. These raw data (.xml file) are then imported into Matlab ® and a georeferenced graph of the region of interest is created. The weight of each edge of the graph's adjacency matrix is the calculated distance (in meters) among the network nodes. Figure 4 shows a graphical example of importing georeferenced data to Matlab ® , considering: (a) example of a region of interest (Costa Azul neighborhood, Salvador city, Brazil) with overlapping graph; (b) example of a region of interest to be imported into Matlab ® ; and (c) network graph, containing 108 nodes, imported to Matlab ® .
(2) Set initial parameters: here, initial parameters must be defined. These parameters are: (2.1) Number of generations: defines how many iterations the GA will run; (2.2) Population size: defines the number of individuals of each GA generation; (2.3) Topology type: defines the topology (one of the four possible topologies described in Section 3); (2.4) Initial state: Defines the position of each potential subscriber in the respective scenario. Subscriber positions are represented by a matrix of N × M, where N is 1 and M is the map graph dimension. Each value of the matrix will be the number of subscribers in that specific node. In this paper, we defined as possibility only two, four or seven subscribers. In this stage, we also set the location of the CO using another matrix N × M. In the matrix, the node position holding the CO is set to 1 while other positions are set to 0. Figure 5 shows examples of initial states used as input of our simulations (as can be seen in Section 6). Green, yellow and red nodes represent, respectively: two, four and seven subscribers on the node; (2.5) Selection percentage: Defines the number of most fit individuals (in %) who will participate to the subsequent steps (genetic operators-crossover and mutation) in each generation.
(3) Calculate the minimum graph distance matrix: Using the obtained data in the previous step (step 1), the algorithm performs the calculation of the graph minimum distance matrix, using a well-known Floyd-Warshall algorithm [39]. The choice of this algorithm is due to the good performance and simplicity of its implementation. The minimum distance matrix is M × M (in which M is the graph dimension) and contains the minimum distances (in meters) among all graph nodes.  (4) Generate the initial population randomly with P individuals: In this step, a set of P random individuals is generated, allowing a wide range of possible initial solutions. Each individual will be represented by a line in the population matrix. This matrix will be P × M, in which P is the population size and M is the graph dimension. In addition, each gene (of the individuals) represents one possible node position for optical splitters in the network graph. Thus, each individual represents a possible solution to the problem. It is worth mentioning that topologies with two splitting stages will be represented with two population matrices (each one for each splitting level). The following matrices (A 1 and A 2 ) represent a random initial population for an hypothetical scenario of a PON using two splitter stage. Figure 6 represents graphically the first individual of both matrices. (4) Figure 6. Graphical example of the first individuals of the initial population matrices (with two optical splitter levels) presented in Equations (3) and (4). Each pair of individuals in the matrices will represent a different solution for the problem.  Figure 7 shows the flowchart and the following steps describe in details the fitness function steps:  2) Find the shortest path between subscribers-splitters (respecting constraints) and discard unused splitters: Using matrix operations, the algorithm finds the shortest path between all subscribers and splitters (respecting the following constraints: (1) the last mile connection will always use a drop cable and must respect the maximum limit of 400 m; (2) the number of subscribers served by an optical splitter cannot be greater than the number of available ports; (3) every subscriber must be connected to an optical splitter). Splitters that do not have assigned subscribers are discarded (gene value in the specific position of the matrix is set to 0). Individuals who do not meet the restrictions have their costs defined as infinite and, consequently, are discarded in the next generations; (5.3) Find the shortest path between 2nd level splitters and 1st level splitters (respecting constraints) and discard unused splitters: Similarly to step (5.2), using matrix operations, the algorithm calculates the shortest path between all 2nd and 1st level splitters (if a multiple stage is adopted). Connections must respect the available ports of first level splitters. Empty splitters are discarded; (5.4) Find the shortest path between 1st level splitters and CO (and check the cabling reuse possibility): Again, using matrix operations, the algorithm calculates the shortest path between all 1st level splitters and CO. Based on the found minimum distances, the algorithm checks the possibility of creating a shared feeder cable route among 1st level splitters; (5.5) Calculate the link budget for each individual and discard non-viable solutions: In this step, the individual optical power budget is also calculated. For the calculus of optical attenuation, typical attenuation values defined on the ITU-T G.671 recommendation [40] were used, which characterizes optical components. To complement and to increase the accuracy of the calculation, we also adopted a vendor (Furukawa) real typical values of attenuation for passive (Table 3) and active network elements (Table 4). It is worth mentioning that, for optical link budget calculation, the lowest value of launched power was considered (worst case). In the proposed algorithm, the optical attenuation was calculated only for the 1310 nm wavelength (used for GPON upstream). This is due to the greater optical losses in this wavelength and, therefore, it can define the condition of the network functioning properly. The received power and total attenuation can be calculated for each optical link by and in which P r represents the received power (in dBm), P tx the transmitted power (in dBm), ∑ P at the sum of link attenuation (in dB), ∑ α c the total loss of each optical connector (in dB), ∑ α s the total loss of each splice (in dB), ∑ α sp the total loss of optical splitters (in dB), ∑(α f ,λ × L) the fiber loss (in dB/km) for the wavelength λ and L the optical link length (in km). In this paper, we only considered power budget in our proposal. Crosstalk analysis between different wavelength channels and the impact of the transmission impairments (chromatic dispersion, laser phase noise, fiber nonlinearity and limitations due to Kerr nonlinearity) are part of proposals for future works; (5.6) Calculate the network total cost for each individual: using the imported cost table, the shortest distances already stored and knowing the positions of all optical splitters, the network cost (in R$) is calculated (for each individual). (6) Select the most fit individuals: In this step, individuals are organized in ascending order according to each value of total deployment cost obtained by using the fitness function (step 5). Part of the most fit individuals are copied and maintained (to guarantee not worse solutions). Then, the most fit individuals (best solutions) are selected and participate in the crossover and mutation procedures (step 7). The rest of the individuals are discarded (do not participate in next generations). (7) Crossover and mutation procedures: The genetic operators are applied to the selected individuals in the previous step (step 6). The crossover function randomly combines characteristics (genes) of two individuals (among those selected). In the crossover operation, genes not drawn will inherit the values of the first selected individual. The mutation function arbitrarily alters one or more characteristics (genes) of the selected individual. Figure 8 shows a graphical example of the crossover and mutation genetic operators applied to hypothetical individuals. For better results, an index-to-check and method to monitor the remaining number of generations have been developed. In this way, as the generations advance, the system reduces the probability of too many changes in the genes of the selected individuals (through genetic operators). This adjustment allows a greater chance of fine-tuning when individuals have an advanced generation index. It is worth mentioning that all probability parameters were set empirically. (8) New population: in this step, the new population is stored and is formed by: a part of the most fit individuals maintained in step (step 6)-for keeping the most fit individuals in case of worsening solutions-and new individuals emerged from the genetic operators such as crossover and mutation. With the new population, the algorithm performs G interactions (repeating the previous steps 1 to 7) until reach the predefined value for generations. When this value reaches G Max , the algorithm recalculates the network cost for the last stored population and the best individual is considered as the final solution to the proposed problem.

Simulation Setup
In our simulation scenarios, we considered two Brazilian regions and we called them Region I and Region II. For both regions, we suppose an ISP willing to deploy a new GPON network. The GPON technology operation is outlined in the ITU-T G.984.x standards series and its main characteristics are described in Table 5 (together with values adopted in all simulations carried on in this work). The first region is the central part of Pituba neighborhood (Region I). This region is located in the south of Salvador city, Bahia, Brazil. This region has approximately 1.85 km 2 of area and, after the importation process, is represented by a graph of 314 nodes. This choice is due to the medium-large size of the neighborhood. The second region is significantly larger and has a larger number of nodes. It is the central part of Camaçari, another city of Bahia state in Brazil (Region II). This second region has approximately 5.7 km 2 of area and, after the importation process, is represented by a graph of 714 nodes. This choice is mainly due to the large size of the resultant graph and also because this region represents a large-sized non-capital city in Brazil.
Both regions (Region I and Region II) were tested in two different scenarios, called non-dense and dense scenario (NDS and DS, respectively). The first one (NDS) consists of a scenario in which subscribers are distributed in a dispersed manner in the region of interest. The second scenario (DS) has a considerably higher number of subscribers distributed in a very close way to each other. Figure 5 shows the network initial state for non-dense and dense simulation scenarios, considering a network with 314 nodes (Region I) and 714 nodes (Region II). It is worth mentioning that the subscribers' distribution in the graph was carried out in a random manner. For the non-dense scenario, there is a predominance of nodes with two subscribers (green nodes). For the dense scenario, there is a higher incidence of nodes with seven subcribers (red nodes).
All the simulations in this paper were done under Intel ® Core i5 1.8 GHz processor, with 8 GB of RAM and Windows 8 ® . The PONs planning optimization was based on a GA technique (as described in detail in Section 4.3) and developed in Matlab ® software. Table 6 describes the network initial characteristics for Region I and Region II (with non-dense and dense scenarios).

Results
The results are discussed in terms of genetic algorithm solutions and evolution, network optical power link budget and a comparison of the main information generated by the genetic algorithm. For centralized topologies, the green line represents the route of the feeder optical cable and the red diamond represents access splitters. For distributed topologies, the orange line represents the distribution cable. In addition, the red diamond represents first level splitters (distribution) and the black star represents second level splitters (access). Figure 5a,b (Figure 5c,d) show, respectively, the network initial state for non-dense and dense scenario, considering Region I with 314 nodes (and Region II with 714 nodes). This initial state will be used as an algorithm's input to find a solution for each one of the four considered topologies in this paper (as can be seen in details in Section 3).
For non-dense scenario, an index of approximately 88 subscribers was obtained per km 2 for Region I (76 subscribers was obtained per km 2 for Region II). The number of generations was set, empirically, at 750. It was observed that, using higher values, there are no major changes in the final network total cost. For dense scenario, the incidence of nodes with seven subscribers in the initial state was predominant. An index of approximately 443 subscribers was obtained per km 2 for Region I (270.87 subscribers was obtained per km 2 for Region II). Again, it should be noted that subscribers were randomly distributed in the graph. (c) DTT1 and (d) DTT2. In CTT2, first level splitters are not represented (because they are placed at CO). Although considered in total cost, the last mile of cable is not represented in the figure to avoid too much data exhibition. Figure 10a,b and Figure 10c,d show, respectively, the GA evolution for each topology for the non-dense and dense scenarios, considering Region I Region II. As can be observed, there is a substantial reduction of the total network cost between the initial populations and the last ones. In addition, we can observe that negligible reductions are achieved if more generations are used. The stop criteria of 750 generations for Region I NDS and 1000 generations for Region I DS was set empirically.  (c) DTT1 and (d) DTT2. Each bar represents the received optical power (in the upstream transmission) of each link between OLT and network proposed access splitters. We chose not to consider each ONT here to avoid too much data exhibition. However, in each link calculation, it was considered the longest drop cable connected to each access splitter (furthest subscriber-worst case). In addition, x axis represents, sequentially, the position (node) of each access splitter on the graph. Due to the high number of access splitters and the reduced scale, we chose not to show this information. The red and the green dotted line show, respectively, the receiver sensitivity (threshold) in upstream transmission and the receiver sensitivity considering an additional safety margin of 3 dB. This safety margin is used to allow for unexpected losses and ensure performance criteria are met.  Table 7 and Table 8 describe a comparison of the main information generated by the genetic algorithm for the four possible topologies, considering the non-dense and dense scenarios, for Region I Region II, respectively. It is worth mentioning that, in the maps plotted by the GA, the identification number of each node was hidden to improve the visualization of the graphical results (due to the reduced scale).

Conclusions
We proposed a new approach based on GA for optimization of PON planning. Tests on a large-size scenario proved the potential of the tool. In addition, the obtained results for the proposed scenarios show that the adopted topology in each project has a direct influence on the total cost of the network deployment. The distributed topology type 1 (DTT1) presented the lowest implantation cost, followed closely by the distributed topology type 2 (DTT2). This fact is justified due to the greater amount of splitters (of access) used in the network, reducing the amount of last mile cables (which represents a considerable part of the total cost of implementation). However, splitters with large capacity have a high cost and need a large concentration of subscribers in order to dilute their cost among them. Therefore, the trend is that, in scenarios with few subscribers, access splitters with smaller capacities have lower related costs, which makes distributed solutions more adapted to these situations. It is then noted that there is a clear relationship between subscribers location, splitters, PON ports and optical cables. Regarding the optical power link budget, all the solutions found by GA are above the threshold, which means an adequate operation of the proposed GPON system.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.