A Multiobjective Land Use Design Framework with Geo-Big Data for Station-Level Transit-Oriented Development Planning

ydwang@whu.edu.cn; Tel.: +86-27-6877-8969 Abstract: Transit-oriented development (TOD) is among the most feasible strategies for relieving urban issues caused by the unbalanced development of transportation and land use. This study proposes a multiobjective TOD land use design framework for the optimization of the land use layout in station catchments. Given the high density and diverse development in Chinese megacities, a planning model that considers nonlinear impacts on ridership, land use efﬁciency, quality of life, and the environment is constructed. The model applies ﬁne-grained geo-big data to ﬁll gaps in the empirical and statistical data and improve practicability. A genetic multiobjective optimization approach without reliance on objective weighting is used to generate alternative land use schemes. A metro station in Shanghai is applied as a case study. The results indicate that the proposed ridership objective outperforms the commonly used linear function, and the optimization method has superior extreme optima and convergence to baseline models. We also discuss the consistencies and conﬂicts in the objectives and provide a balanced land use scheme considering local policies. This work provides suggestions for sustainable urban design with coordinated land use and transportation.


Introduction
Urban mass rail transit (MRT) systems provide an effective tool for residents and promote a shift from private vehicles to a low-carbon public travel mode [1,2]. Nonetheless, cities face problems such as the inappropriate distribution of facilities and wasted social resources caused by unbalanced land use and MRT development. Therefore, comprehensive strategies that integrate MRT and relevant land use are critical to sustainable urban planning [3]. The transit-oriented development (TOD) theory is considered one of the most feasible integrated land use and transportation theories [4,5]. TOD planning can improve the unbalanced state of transportation and land use and offers a sustainable approach to urban development.
Numerous recent studies have focused on TOD planning and generated various effective strategies and methods within this topic [6][7][8]. Particularly, TOD planning with mathematical models is prominent in generating sustainable land use plans and an intuitive evaluation [9,10]. Quantitative TOD planning methods involve the significant branch of multiobjective optimization, which generates alternative land use schemes according to objectives based on TOD principles and strategies [11,12]. However, mathematical TOD models for China remain limited, although one noteworthy example of a TOD model for designing land use layout given a type and density around a metro station in China was ISPRS Int. J. Geo-Inf. 2022, 11, 364 2 of 18 proposed by Li et al. [13][14][15]. First, the parameters of the objectives mainly comprise coarsegrained statistical data and empirical indicators, which lead the objective functions to have insufficient expressive ability. Second, most objectives have been constructed with linear and straightforward functions, thereby simplifying the complicated relationship between objectives and land use, thereby reducing model reliability. Finally, although multiobjective problems have been solved in previous studies by weighting objectives, determining the relative weights is challenging because of uncertainty, and rarely have studies used actual multiobjective optimization in TOD planning. Fortunately, the appearance of urban geo-big data and the development of the learning model provide excellent potential for the refined construction and optimization of planning objectives.
This paper presents a multiobjective TOD land use design framework to generate alternative land use schemes in the Chinese context, which consists of a TOD planning model and a multiobjective optimization approach for land use design. Traditional TOD planning models always consider several objectives, including the ridership of rail transit, compactness, land use mix, land use conflicts, and environmental effects [12][13][14]. In this work, elements of the built environment are added to the objective of ridership, and land use conflict is replaced by a more specific function of land use density. In addition, the objective of the total distance to the station is designed for the convenience of traveling by public transit. The proposed method makes several contributions. (1) We apply geobig data to describe high-density and high-diversity development in the Chinese context and provide abundant information for constructing objective functions and validating the availability of the method. (2) We explore the nonlinear impact of land use on ridership and represent the objectives of ridership with an ensemble learning method to ensure that the ridership can approach the expectation of the model after optimization to enhance the practicability of the planning model. (3) The method generates alternative land use schemes via a multiobjective genetic algorithm with a designed mix coding, which eliminates the limits of objective weighting and provides further suggestions for decision-makers with a unique reference point structure and normalization process. A case study of a metro station in Shanghai is presented to demonstrate the feasibility of the proposed method. The objective function of ridership is compared with the linear function to show its superiority. The employed optimization method presents a reliable ability to achieve optimization and convergence via comparison with two commonly used genetic algorithms. In addition, an alternative land use layout obtained by the planning method appears to be balanced and consistent with TOD strategies.
The remainder of the paper is organized as follows. Section 2 reviews the related literature on TOD planning. Section 3 describes the research problem in detail and demonstrates the proposed TOD land use planning method. Section 4 introduces a case study in Shanghai and presents land use schemes to verify the effectiveness of the method. Section 5 discusses the relationship between the TOD planning objectives and provides several policy implications. Finally, the conclusions and limitations of the current work are discussed in Section 6.

Related Work
TOD was first introduced for development design in a mixed-use community with high density and diversity that encourages residents to live near a transit station and decreases their dependence on private vehicles [16]. Cervero [17] proposed the basic "3D" principles of "density, diversity and design" for TOD planning and argued that land use with high density and diversity and a pedestrian-oriented design would encourage non-auto travel [18]. Since then, the principles and strategies of TOD have attracted numerous scholars and have increased in complexity to fulfill the requirements of urban sustainability [3,6,[19][20][21]. As a green large-capacity travel mode, MRT is an appropriate research object for TOD planning, and it is acceptable to build stable MRT systems based on TOD strategies.
Although the principles and strategies of TOD have been comprehensively identified, there remain gaps in understanding how to transform TOD theory into a detailed analytical model for sustainable urban planning. The land use design problem (LDP) refers to a mathematical programming model for optimizing the locations, types, and sizes of urban land use [22]. Because of the tight relationship between transportation systems and land use, recent studies have increasingly focused on planning the integration of land use and transit design [23][24][25]. In this context, TOD strategies have been applied to land use design. Lin &Gau [11] proposed a TOD model to increase MRT ridership, improve living and environmental quality, and ensure social equity, but only commercial and residential land use was involved. As an extension, Lin & Li [12] developed a city-region level model that allocated recreational land use and added the objective of maximizing access to nonresidential activities. Li et al. [13] first established a TOD model for China based on the characteristics of Chinese megacities to maintain the sustainable development of the MRT system in China. On this basis, Ma et al. [14] considered the contribution of transportation to station-level planning, and local accessibility was appended to the objectives. Sahu [9] introduced a TOD planning model for modifying land use in the city of Naya Raipur that was based on the global TOD parameters of density, diversity, and distance to transit.
A notable limitation of previous TOD planning studies was that they used only statistical and empirical data, which were insufficient for refined planning and yielded impractical results for specific planning tasks. Recently, geo-big data such as points of interest, mobile sensors, and smart card data containing abundant information about human mobility and urban spatial structures have been widely applied to urban planning [26,27]. Geo-big data provide new opportunities to analyze TOD scenarios quantitatively and are capable of describing real-time urban spatial structures accurately for TOD planning [28]. In addition, ridership indicates the usage degree of the transit services and mobility triggered by land use, which is one of the most important indicators in TOD. Previous TOD planning models have generally estimated ridership with statistical trip generate/attraction ratio and linear (log-linear) functions. However, several studies have illustrated that the impact of land use on ridership is nonlinear, and the influence of ridership cannot be entirely explained by land use [29]. Therefore, a more comprehensive nonlinear model with multiple factors is necessary for constructing the ridership objective in the TOD planning model.
Even when a TOD model is constructed, optimization is challenging because the model involves multiple objectives with conflicting goals, a vast search space for the optimal solution, and a series of constraints. The genetic algorithm (GA) has been indicated to be appropriate for solving such large-scale problems, as it is adept at automatically and optimally searching in vast solution dimensions [30]. Most importantly, the GA has already been modified to solve multiobjective problems [31,32]. Consequently, various GAs have been applied to optimize the LDP and TOD models [9,33,34]. For multiobjective optimization, most TOD planning includes in its design a comprehensive function, F = ∑ obj k=1 w k F k , with weights to simplify the optimization [13,14,35]. However, the weights of the objectives are defined subjectively and different objectives have different dimensions, thereby resulting in models suffering from bias and sensitive outliers.
In summary, although the land use and TOD planning model have been constructed with all-round perspective, station-level mathematical TOD can be improved in terms of data and function representation. An improved reasonable TOD land use planning method is still required to be developed for sustainable urban design with the balanced development of the MRT system and land use in the Chinese context.

Problem Statement
This paper presents a TOD planning method based on strategies appropriate to the Chinese context to solve the land use design problem. As depicted in Figure 1, considering the MRT station as the central point, the land use design problem aims to determine the type, density, and location of the surrounding undeveloped land use cells and to obtain an optimal land use layout based on the objectives relevant to TOD. Finally, alternative land use sketch maps are generated to display land use layouts [35,36]. Inspired by work by [14], seven types of land use are presented in this work, namely: public (e.g., school, government agency, hospital), industry (e.g., factory, warehouse), commercial (e.g., shopping mall, restaurant), economic (e.g., office building, corporate business, finance organization), residential (e.g., apartment, villa), road, and water. Notably, only the first five types of land use are applied to land use design because of the high cost of rectifying roads and water.
Chinese context to solve the land use design problem. As depicted in Figure 1, ering the MRT station as the central point, the land use design problem aims to det the type, density, and location of the surrounding undeveloped land use cells and tain an optimal land use layout based on the objectives relevant to TOD. Finally, a tive land use sketch maps are generated to display land use layouts [35,36]. Insp work by [14], seven types of land use are presented in this work, namely: publ school, government agency, hospital), industry (e.g., factory, warehouse), comm (e.g., shopping mall, restaurant), economic (e.g., office building, corporate busin nance organization), residential (e.g., apartment, villa), road, and water. Notabl the first five types of land use are applied to land use design because of the high rectifying roads and water. In general, this study proposes a multiobjective TOD land use design fram As shown in Figure 2, the method comprises two parts of a TOD planning mode multiobjective optimization approach for land use design in a station catchment a using emerging geo-big data and traditional statistical and empirical data, a TOD ning model is constructed with linear and nonlinear objectives. Then, a robust h algorithm is applied for generating alternative land use layouts. The details are duced in the next sections.   In general, this study proposes a multiobjective TOD land use design framework. As shown in Figure 2, the method comprises two parts of a TOD planning model and a multiobjective optimization approach for land use design in a station catchment area. By using emerging geo-big data and traditional statistical and empirical data, a TOD planning model is constructed with linear and nonlinear objectives. Then, a robust heuristic algorithm is applied for generating alternative land use layouts. The details are introduced in the next sections.
ering the MRT station as the central point, the land use design problem aims to determ the type, density, and location of the surrounding undeveloped land use cells and to tain an optimal land use layout based on the objectives relevant to TOD. Finally, alte tive land use sketch maps are generated to display land use layouts [35,36]. Inspired work by [14], seven types of land use are presented in this work, namely: public ( school, government agency, hospital), industry (e.g., factory, warehouse), comme (e.g., shopping mall, restaurant), economic (e.g., office building, corporate business nance organization), residential (e.g., apartment, villa), road, and water. Notably, the first five types of land use are applied to land use design because of the high co rectifying roads and water. In general, this study proposes a multiobjective TOD land use design framew As shown in Figure 2, the method comprises two parts of a TOD planning model an multiobjective optimization approach for land use design in a station catchment area using emerging geo-big data and traditional statistical and empirical data, a TOD p ning model is constructed with linear and nonlinear objectives. Then, a robust heur algorithm is applied for generating alternative land use layouts. The details are in duced in the next sections.

TOD Planning Model
A dependable TOD planning model needs to consider numerous factors, including ridership, land value, land use efficiency, environmental effects, quality of life, and social equality [11,37]. Nonetheless, different contextual characteristics result in specific applications of TOD strategies [3]. In contrast to most American cities, which are characterized by low density, car orientation, and multicenter layouts [38], most Chinese cities are characterized by high density and high transit ridership. To channel megacity growth in MRT corridors, TOD in China should contain high-density development, mixed land use, and pedestrian-friendly environments to be consistent with the features of Chinese cities. Based on the Chinese context and the concept of TOD, six objectives by referring to previous studies [11][12][13][14][15] are designed to promote the sustainable development of MRT and related land use: the perspectives of ridership, compactness, land use conflict, land use mix, environmental effects, and destination accessibility. Moreover, a set of constraints are considered in the model.

Parameter Definition
The proposed TOD planning model contains the following parameters. I: set of land use cells; cell i ∈ I. K: set of land use types; type k ∈ K. BE: built environment variables in the station area. T: transit service variables in the station area. G: demographic variables in the station area. N i : cells within 8 neighbors of cell i. c kl : conflict degree between adjacent cells of types k and l. Side: length of cells. Area: area of cells. N cell : number of cells. P r k : r-type pollutants generated by unit density in a k-type land use cell. cost r : cost of treating r-type pollutants. L i : distance from cell i to the metro station along the road. Attr k : passenger attraction of unit k-type cell. lowerEI I k /upperEI I k : lower/upper bound, respectively, of the exploitative intensity index of each type of land use in each cell. lowerRatio k /upperRatio k : lower/upper bound, respectively, of the total percentage of each type of land use in the overall layout.

Decision Variables
Two decision variables are applied to the allocation of land use type and density. X ik : binary variable. {X ik = 1 if cell i is assigned to k-type; otherwise, X ik = 0}, which describes the land use type of cells with one-hot encoding.
D ik : k-type land use density of cell i; the unit of land use density is 100 m 2 /cell, and the value of the density is a positive integer for each cell.

MRT ridership
The first objective aims to maximize MRT ridership, which increases the utilization efficiency and profit for MRT and results in economic sustainability. Before the construction of the TOD objectives, an available model is required to determine the nonlinear relationship between land use and ridership. Tree ensemble models such as gradient boosting regression tree (GBRT) have been applied for the description of ridership [39,40]. In this study, as one of the best tree ensemble methods, extreme gradient boosting (XGBoost) [41] is used for nonlinear modeling. Compared with GBRT, XGBoost has important advantages in two main respects: XGBoost employs the second-order Taylor expansion to deal with the cost function, resulting in high model accuracy; and penalty coefficients are added to the cost function to increase the generalization ability. XGBoost consists of numerous classification and regression trees (CARTs), and the result is predicted by the summation of the continuous score in the leaves of the CARTs. The prediction function is as follows: where ϕ(k) and f k ( * ) are the learning rate and function of the k-th CART, respectively, and x i is the i-th leaf in the tree. Following the training of XGBoost, the learning model from the selected indicators becomes available to simulate the impact of multiple factors on MRT ridership, as shown in Equation (2).
where D, BE, T, G are the indicators of land use, built environment, transit service, and demographics, respectively [39]. The detailed data and indicators are presented in Section 4.2.
Owing to the goal of this work, only the land use factors are changeable and the other factors are regarded as fixed control variables. Under these circumstances, the maximum MRT is shown as Equation (3):

Compactness
Compactness is a hot spot in urban planning and is significant for constructing resource-saving and environment-friendly cities [42,43]. The second objective aims to maximize compactness among land use cells, which improves land utilization efficiency and living convenience. There is no unified measurement of compactness owing to its diverse definitions and conceptual frameworks [44][45][46]. Inspired by previous TOD planning applications, this study measures compactness as the number of neighboring cells with the same land use type [9,15]. For example, the compactness of cell i with k-type land use can be computed by the number of cells assigned to k-type land use within the eight cells surrounding i. The objective is formulated as follows:

Land use conflict
Conflicts are commonly generated in adjacent land use cells; for example, industrial lands may bring about noise and pollution and lead to negative effects for people living in nearby residential lands. The third objective aims to minimize the conflict between different adjacent land parcels to improve the quality of life of residents around the station. Nevertheless, the conflict between adjacent cells is formidable to quantitatively describe, so an empirical indicator, conflict degree c kl , is employed according to previous studies. In addition, considering that differently dense regions result in varying conflicts, this study introduces the identification of "adjacency" to the function establishment [47]. The objective is formulated as follows:

Land use mix
Mixed land use not only contributes to stimulating activities but also reduces travel costs and promotes travel by walking and bicycling, which is a significant feature for compact and vital neighborhoods. The third objective aims to increase the degree of mixed land use to design a functionally diversified neighborhood consistent with TOD's basic concept. The degree of mixed land use is measured by the entropy index [48], and the objective is formulated as follows: where A k is the density percentage of k-type land use in the station area; and A k = ∑ i∈I D ik / ∑ i∈I ∑ k∈K D ik .

Environmental effects
The local environmental burden will be heavier as developed land use density and human activities increase, and reducing the environmental footprint in the precondition of TOD principles is important for sustainable planning [12]. By reference to previous studies, the fourth objective aims to reduce the pollution treatment cost, which reflects a decrease in the negative environmental influence of development and maintains environmental quality [13,14]. A high density of industrial land cells may increase the cost of pollution; thus, controlling the allocation of high-pollution industries is important for green planning. The treatment cost is highly correlated with the land use density and needs to consider various types of pollution. The minimum pollution treatment cost is formulated as follows: 6. Destination accessibility Destination accessibility, which is measured by the walkable distance from the origin to the transit station, is an essential principle of TOD [49]. This last objective aims to reduce the total walking distance of residents from the MRT station to their destination to support travel by MRT. Specifically, there are significant divergences in residents' attraction to various land uses. Highly attractive land use should be close to the MRT station, and dense development will create a high cost in terms of walking distance. The objective is formulated as follows:

Constraints
In addition to the above objectives, several constraints exist during the optimization process. First, as the minimum units in the study, each cell must be allocated to one type of land use, and mixed land use is not acceptable. Therefore, the land use type is encoded into a binary variable, X ik , and the constraint is presented as follows: Second, intensive development will exceed environmental capacity; therefore, land use density is limited by the exploitative intensity index, such as the floor area ratio provided by the Urban Management and Planning Ordinance. The land use density of each cell must be constrained as follows: Finally, as regional planning varies across different station locations, such as planning for a commercial center or a cultural center, the percentage of the land use density of different land use types is constrained by the government's overall urban planning:

Optimization Approach
Land use design is a multiobjective programming problem, and it is impossible to achieve the best value for all of the objectives because of the conflict among them. Optimization aims to find the Pareto optimal solutions of land use maps. Therefore, a method based on the nondominated sorting genetic algorithm III (NSGA-III) [32], which is skilled in searching for multiobjective optima without weights, is employed to generate alternative land use maps. Specifically, land use layout A is prior to being chosen only when A performs better than B among all objectives (A dominates B). Therefore, the land use layouts optimize the multiple objectives simultaneously and approach the optimum in the consecutive generations. Finally, a set of alternative land use layouts are obtained for further selection by decision-makers. The flowchart of the optimization model is shown in Figure 3, and the main operators are illustrated below.
ISPRS Int. J. Geo-Inf. 2022, 11, 364 8 Finally, as regional planning varies across different station locations, such as p ning for a commercial center or a cultural center, the percentage of the land use densi different land use types is constrained by the government's overall urban planning:

Optimization Approach
Land use design is a multiobjective programming problem, and it is impossib achieve the best value for all of the objectives because of the conflict among them. O mization aims to find the Pareto optimal solutions of land use maps. Therefore, a met based on the nondominated sorting genetic algorithm III (NSGA-III) [32], which is sk in searching for multiobjective optima without weights, is employed to generate alte tive land use maps. Specifically, land use layout A is prior to being chosen only whe performs better than B among all objectives (A dominates B). Therefore, the land layouts optimize the multiple objectives simultaneously and approach the optimum the consecutive generations. Finally, a set of alternative land use layouts are obtained further selection by decision-makers. The flowchart of the optimization model is sh in Figure 3, and the main operators are illustrated below.  Figure 3. Flowchart of the optimization approach.

Representation and Initialization
In this work, we design a mixed coding for fitting the problem, where real codin applied to describe the decision variables, and each undeveloped cell in the land sketch map is transformed into two genes to represent the land use type (integer 0-4) density (positive floating number). Consequently, the length of the chromosome in individual is twice the number of undeveloped cells. In the initialization step, individ are generated randomly, and unqualified individuals are removed by a constrai judgment. The initialization ends when the number of eligible individuals reaches population size.

Fitness
Fitness is a crucial indicator for evaluating the performance of individuals and is fundamental element for genetic operators. In this work, six fitness values that co spond to the objectives are calculated: { 1, 2, − 3, 4, − 5, − 6}.

Preselection
For each iteration of the reproduction, a set of candidates ( ) consists of both parent ( ) and offspring ( ) in the last iteration, and the purpose of preselection choose the best and most diverse individuals ( with the population size) for the

Representation and Initialization
In this work, we design a mixed coding for fitting the problem, where real coding is applied to describe the decision variables, and each undeveloped cell in the land use sketch map is transformed into two genes to represent the land use type (integer 0-4) and density (positive floating number). Consequently, the length of the chromosome in an individual is twice the number of undeveloped cells. In the initialization step, individuals are generated randomly, and unqualified individuals are removed by a constraining judgment. The initialization ends when the number of eligible individuals reaches the population size.

Fitness
Fitness is a crucial indicator for evaluating the performance of individuals and is the fundamental element for genetic operators. In this work, six fitness values that correspond to the objectives are calculated:

Preselection
For each iteration of the reproduction, a set of candidates (Q t ) consists of both the parent (P t ) and offspring (C t ) in the last iteration, and the purpose of preselection is to choose the best and most diverse individuals (P t+1 with the population size) for the next iteration. The procedure of the operator is shown in Figure 4, and there are two steps in the preselection. Niche-preserving operator: This operator is applied to choose individuals at th same level with the principle of individual diversity (such as F3 in Figure 4). The sketc map of the operator is shown in Figure 5. First, two-layer reference points (112) uniforml distributed on the unit simplex of the solution space are generated. Second, the ideal poin is determined by the minimum value of each objective, and the extreme points are chose as the points nearest to the axes. Third, the fitness of the objectives is normalized base on the ideal point and extreme point, and each solution is associated with the referenc point whose reference line (the line that connects the reference point to the ideal point) i closest to the solution. Finally, when the reference point contains fewer solutions, th solutions associated with this reference point have a greater potential to be selected. Sig nificantly, the reference point intrinsically represents the preference for objectives, whic can be regarded as a novel "weight", and the fitness with normalization displays comparable "score" for all objectives, which helps further decision making.  Niche-preserving operator: This operator is applied to choose individuals at the same level with the principle of individual diversity (such as F3 in Figure 4). The sketch map of the operator is shown in Figure 5. First, two-layer reference points (112) uniformly distributed on the unit simplex of the solution space are generated. Second, the ideal point is determined by the minimum value of each objective, and the extreme points are chosen as the points nearest to the axes. Third, the fitness of the objectives is normalized based on the ideal point and extreme point, and each solution is associated with the reference point whose reference line (the line that connects the reference point to the ideal point) is closest to the solution. Finally, when the reference point contains fewer solutions, the solutions associated with this reference point have a greater potential to be selected. Significantly, the reference point intrinsically represents the preference for objectives, which can be regarded as a novel "weight", and the fitness with normalization displays a comparable "score" for all objectives, which helps further decision making.

Conventional Genetic Operators
iteration. The procedure of the operator is shown in Figure 4, and there are two steps the preselection.  Niche-preserving operator: This operator is applied to choose individuals at same level with the principle of individual diversity (such as F3 in Figure 4). The ske map of the operator is shown in Figure 5. First, two-layer reference points (112) uniform distributed on the unit simplex of the solution space are generated. Second, the ideal po is determined by the minimum value of each objective, and the extreme points are chos as the points nearest to the axes. Third, the fitness of the objectives is normalized bas on the ideal point and extreme point, and each solution is associated with the referen point whose reference line (the line that connects the reference point to the ideal point closest to the solution. Finally, when the reference point contains fewer solutions, solutions associated with this reference point have a greater potential to be selected. S nificantly, the reference point intrinsically represents the preference for objectives, wh can be regarded as a novel "weight", and the fitness with normalization display comparable "score" for all objectives, which helps further decision making.

Conventional Genetic Operators
As excellent individuals are selected by preselection, the model randomly sele pairs of parents from . This work applies uniform crossover and uniform mutat

Conventional Genetic Operators
As excellent individuals are selected by preselection, the model randomly selects pairs of parents from P t+1 . This work applies uniform crossover and uniform mutation operators, and each gene decides whether to reciprocally interchange and mutate based on constants P s and P m . In this study, the genes of the land use type are directly exchanged in the crossover and are randomly mutated to integers (0-4), while the genes of land use density employ a simulated binary crossover and polynomial mutation in the genetic operators. Finally, each individual must satisfy the constraints.

Elitism and Termination
This work also introduces elitism to the model, which preserves the individuals with the best performance for each objective to force the solutions to approach the optima. The algorithm ends when the given maximum iteration is reached; otherwise, the algorithm returns to reproduction.

Study Area
Shanghai is considered to be among the cities with a highly developed MRT system. To date, 16 lines with 416 stations are in operation, and the total length of the Shanghai Metro is 705 km. There is a substantial need for planning to integrate the MRT system and land use to relieve urban issues, and significantly, the government has adopted TOD as a crucial tool in land use planning. As a previous study found, the results of the TOD typologies are valuable for site selection [7]. In this study, the area around Fanghua Road Station is selected as the study area because the development of land at this station has been proven to lag behind the development of rail transit in the TOD typology [6], and the station has the potential to develop in coordination with land use optimization. Fanghua Road station is at the end of line 7 and near the subcenter of Century Park, which is located in the suburbs of Shanghai city (as shown in Figure 6a). The allocation of land use around the station is shown in Figure 6b, where residential regions occupy the largest area in the station catchment, and the diversity and density of land use are insufficient compared to MRT development. Specifically, a distance of between 400 m and 800 m (10-min walk time) is appropriate for TOD planning, and a square circumscribed in the 800 m radius of the station is used for planning in this case study, which is consistent with previous work [13]. Furthermore, the planned area is divided into 20 × 20 cells of uniform size; the size of each cell is 80 m × 80 m. On the one hand, we find that almost all the grids in this size contain a specific land use type, which distinguishes the different types of land cells and accurately describes the land use status. On the other hand, the grid size guarantees the efficiency of the solutions and generates appropriate land use sketch maps for regional planning. operators, and each gene decides whether to reciprocally interchange and mutate based on constants and . In this study, the genes of the land use type are directly exchanged in the crossover and are randomly mutated to integers (0-4), while the genes of land use density employ a simulated binary crossover and polynomial mutation in the genetic operators. Finally, each individual must satisfy the constraints.

Elitism and Termination
This work also introduces elitism to the model, which preserves the individuals with the best performance for each objective to force the solutions to approach the optima. The algorithm ends when the given maximum iteration is reached; otherwise, the algorithm returns to reproduction.

Study Area
Shanghai is considered to be among the cities with a highly developed MRT system. To date, 16 lines with 416 stations are in operation, and the total length of the Shanghai Metro is 705 km. There is a substantial need for planning to integrate the MRT system and land use to relieve urban issues, and significantly, the government has adopted TOD as a crucial tool in land use planning. As a previous study found, the results of the TOD typologies are valuable for site selection [7]. In this study, the area around Fanghua Road Station is selected as the study area because the development of land at this station has been proven to lag behind the development of rail transit in the TOD typology [6], and the station has the potential to develop in coordination with land use optimization. Fanghua Road station is at the end of line 7 and near the subcenter of Century Park, which is located in the suburbs of Shanghai city (as shown in Figure 6a). The allocation of land use around the station is shown in Figure 6b, where residential regions occupy the largest area in the station catchment, and the diversity and density of land use are insufficient compared to MRT development. Specifically, a distance of between 400 m and 800 m (10-min walk time) is appropriate for TOD planning, and a square circumscribed in the 800 m radius of the station is used for planning in this case study, which is consistent with previous work [13]. Furthermore, the planned area is divided into 20 × 20 cells of uniform size; the size of each cell is 80 m × 80 m. On the one hand, we find that almost all the grids in this size contain a specific land use type, which distinguishes the different types of land cells and accurately describes the land use status. On the other hand, the grid size guarantees the efficiency of the solutions and generates appropriate land use sketch maps for regional planning.

Data Description and TOD Planning Model Construction
Geo-big data have a high spatial resolution, fast updating, and convenient access compared to traditional survey information and are suitable for describing complicated scenarios around stations. The proposed method uses both geographical data with a high spatial resolution and empirical and statistical data for objective construction.
Land use data: As substantial variables, land use allocation around the station is determined by the area of interest (AOI), which is a type of geographic data that comprises land parcels with exact types, locations, and outlines. Land use density is estimated mainly by the floor area ratio (FAR) from planning data and housing websites for residential land and building outlines with floor information for other types of land.
Data for ridership: Ridership is estimated by the inbound and outbound passenger flow using smart card data that contain abundant information on individual travel, including the user, time, stop, and line. In this study, the average workday ridership is calculated from 266 metro stations in January 2018, with a total of 258,896,211 records for modeling. The data that correspond to the variables of ridership are illustrated in Table 1 (buffer with 800 m).
Empirical data: Several parameters are employed in the planning model. An artificial indicator, conflict degree c kl (0-8), is designed to estimate the conflict between adjacent parcels because quantitative evaluation involving numerous factors faces significant challenges [13,14]. The total cost of pollution treatment is calculated by pollution emissions P r k and unit cost cost r for different pollutant types, which is estimated by the relevant policy [50]. Passenger attraction Attr k determines the distribution of residents' destinations and is estimated according to the Technical Standards of the Traffic Impact Analysis of the Shanghai Construction Project [51]. The case study is conducted with the data mentioned above. The initial land use layout can be acquired based on the land use data by mapping the land use allocation of AOI data to grids and determining the density of the existing land use development from FARs. Then, the TOD planning model is established for optimization. With the count of the total (both boarding and alighting) ridership from smart card data and the preparation of the control variables and explanatory variables listed in Table 1, the XGBoost model is trained with data for ridership (all metro stations) for the first Objective of MRT ridership. The parameters of conflict degree c kl , pollution emissions P r k and unit cost of different kinds of pollutants cost r are estimated from the empirical data, which are used to build up the objectives of land use conflict, and environmental effects, respectively. To construct the objective of destination accessibility, passenger attraction Attr k are estimated with empirical data and the distances from land use cells to the metro station L i are calculated based on the OSM. Finally, the objectives of the land use mix and compactness are calculated with the land use layout, which does not involve additional information.

Ridership Modeling
Fivefold cross-validation (5-CV) is applied to search for the optimal structure of XG-Boost to ensure a robust model, and the total sample is divided into a training set with 80% of the data, and a test set with the other 20% of the data for evaluation. Consequently, XGBoost for nonlinear modeling is developed with a maximum of 10,000 CARTs and a learning rate of 0.005. In addition, the training is stopped early to improve the generalization of the model. The average goodness of fit (R 2 ) and root mean square error (RMSE) of the 5-CV are used to evaluate the model. Table 2 displays a comparison of the results of XGBoost and those of the linear model. The result of the ridership regression shows that XGBoost obtains a better result than the linear model in the training step, and 61% of the ridership of the test samples can be explained by the XGBoost model, which is slightly higher than that of linear regression. In addition, this study compares the result of XGBoost with another two popular tree models, random forest and gradient boosting decision tree (GBDT), which indicate that the employed method is fitter at depicting the objective of MRT ridership than the other tree models. Nonetheless, the accuracy of the test set does not achieve the expected value for several reasons: there is a lack of substantial impact factors, such as car ownership, because of data accessibility; the size of the samples is too small to support the XGBoost model; and the data have deviation and noise. In summary, the model provides an appropriate setting for constructing the ridership objective.

Optimized Land Use Layouts
The NSGA-III is used for multiobjective optimization, and the population size is set to 112. The maximum number of generations is 2000 because it is necessary to search a large number of generations for optimal solutions in such a large solution space. The initial land use map is acquired by transforming the land use allocation in the planning area to 20 × 20 cells, which consists of 233 developed cells and 167 undeveloped cells (Figure 7a). Finally, 105 alternative non-dominated land use layouts are obtained after 2000 iterations. The extreme optimal solutions (best performance in six objectives) are selected to validate the effectivity of the optimization approach, and the employed method is compared with the elitist genetic-based algorithm (EGA) [53] and the nondominated genetic algorithm-II (NSGA-II) [31]. Table 3 shows that the optimal land use layouts obtained by NSGA-III achieve a maximum ridership of 129,418 persons per day (A), maximum compactness of 1520 (B), minimum land use conflict of 3.328 (C), maximum land use mixed degree of 0.696 (C), lowest pollution treatment cost of 8.917 × 10 5 RMB per year (D) and shortest walking distance of 2.557 × 10 8 meters (E). The employed method performs better than the EGA in most objectives (ridership, land use mix, and pollution cost), suggesting that the extreme optimization of the model is as excellent as that of the EGA. However, the EGA is not powerful enough to solve multiobjective problems. In addition, the nondominant solution set of NSGA-III is compared to that of NSGA-II by a set coverage metric (C-metric) [54]. The comparison shows that 0.8% of the solutions in the Pareto solution set of the NSGA-III are dominated by the solutions in the Pareto solution set of the NSGA-II; and inversely, 2.7% of the solutions in the Pareto solution set of the NSGA-II are dominated by the solutions in the Pareto solution set of the NSGA-III, which shows that the applied model has remarkable convergence. Therefore, the NSGA-III is an appropriate optimization method with no empirical weights for the TOD planning model. performs better than the EGA in most objectives (ridership, land use mix, and pollution cost), suggesting that the extreme optimization of the model is as excellent as that of the EGA. However, the EGA is not powerful enough to solve multiobjective problems. In addition, the nondominant solution set of NSGA-III is compared to that of NSGA-II by a set coverage metric (C-metric) [54]. The comparison shows that 0.8% of the solutions in the Pareto solution set of the NSGA-III are dominated by the solutions in the Pareto solution set of the NSGA-II; and inversely, 2.7% of the solutions in the Pareto solution set of the NSGA-II are dominated by the solutions in the Pareto solution set of the NSGA-III, which shows that the applied model has remarkable convergence. Therefore, the NSGA-III is an appropriate optimization method with no empirical weights for the TOD planning model.  As discussed above, several consistencies and conflicts among the objectives need to be considered by decision-makers. Although the extreme optimal solutions achieve the best performance for each objective, a balanced solution is more favorable for practical planning. According to the "comprehensive plan and general land use plan of Pudong New Area, Shanghai" [55], the Fanghua Road Station of rail transit line 7 is regarded as the core, and the construction of both cultural parks and cultural facilities is planned to improve the comprehensive support of regional services and maintain ecological sustainability. Therefore, the percentage of public land use has to be set to a high value to

Further Selection: A Feasible Solution for the Study Station
As discussed above, several consistencies and conflicts among the objectives need to be considered by decision-makers. Although the extreme optimal solutions achieve the best performance for each objective, a balanced solution is more favorable for practical planning. According to the "comprehensive plan and general land use plan of Pudong New Area, Shanghai" [55], the Fanghua Road Station of rail transit line 7 is regarded as the core, and the construction of both cultural parks and cultural facilities is planned to improve the comprehensive support of regional services and maintain ecological sustainability. Therefore, the percentage of public land use has to be set to a high value to satisfy this requirement, and the planning solution focuses more on land use conflict, land use mix, and pollution cost. Based on the purposes of the planning, an alternative balanced solution, which is selected from the aforementioned 105 alternative solutions, achieves the highest score of 0.455 after normalization and is associated with the reference point having the preference of (0.083333, 0.083333, 0.25, 0.25, 0.25, 0.083333) in each objective dimension. The solution for ridership is 58,616 persons per day, compactness is 1298, land use conflict is 4.742, land use mix is 0.256, pollution cost is 3.751 × 10 6 RMB per year, and total walking distance is 6.157 × 10 8 meters. Figure 7b shows the optimized land use sketch map and Figure 8 depicts the land use density of the selected solution around the station. Significantly, commercial land use is vigorously developed and mainly concentrated in the internal and intermediate areas of the station to attract the attention of residents. Economic land use is uniformly allocated in the planning area to offer plenty of employment. Residential land use is mainly aggregated in the periphery of the planning area, which is not influenced by the noisy station, and residents can walk home within a comfortable distance. Industrial land use is dispersed in the area with a small number of cells to maintain the land use mix and environmental protection. Finally, public land use including parks, cultural places, and public services, occupies a large space in the area, which is consistent with regional planning. The selected land use layout largely achieves the objectives of TOD planning and conforms to local political strategies [55], which provides suggestions regarding station-level land use development.
land use mix, and pollution cost. Based on the purposes of the planning, an alternative balanced solution, which is selected from the aforementioned 105 alternative solutions, achieves the highest score of 0.455 after normalization and is associated with the reference point having the preference of (0.083333, 0.083333, 0.25, 0.25, 0.25, 0.083333) in each objective dimension. The solution for ridership is 58,616 persons per day, compactness is 1298, land use conflict is 4.742, land use mix is 0.256, pollution cost is 3.751 × 10 RMB per year, and total walking distance is 6.157 × 10 meters. Figure 7b shows the optimized land use sketch map and Figure 8 depicts the land use density of the selected solution around the station. Significantly, commercial land use is vigorously developed and mainly concentrated in the internal and intermediate areas of the station to attract the attention of residents. Economic land use is uniformly allocated in the planning area to offer plenty of employment. Residential land use is mainly aggregated in the periphery of the planning area, which is not influenced by the noisy station, and residents can walk home within a comfortable distance. Industrial land use is dispersed in the area with a small number of cells to maintain the land use mix and environmental protection. Finally, public land use including parks, cultural places, and public services, occupies a large space in the area, which is consistent with regional planning. The selected land use layout largely achieves the objectives of TOD planning and conforms to local political strategies [55], which provides suggestions regarding station-level land use development.

Discussion
Analyzing the TOD planning objectives is imperative for integrated development of transportation and land use, and highlights valuable policy implications. The coefficient confusion matrix is constructed with the Pearson correlation coefficients of the objectives in pairs to seek out the relationship between the planning objectives. The correlation coefficients are shown in Figure 9 and several interesting findings are acquired.
Firstly, similarly to the results of previous studies [39,40], we find that a high degree of mixed land use results in an increase in ridership, suggesting that functionally diverse land use provides abundant space for various activities and generates substantial travel

Discussion
Analyzing the TOD planning objectives is imperative for integrated development of transportation and land use, and highlights valuable policy implications. The coefficient confusion matrix is constructed with the Pearson correlation coefficients of the objectives in pairs to seek out the relationship between the planning objectives. The correlation coefficients are shown in Figure 9 and several interesting findings are acquired.
Firstly, similarly to the results of previous studies [39,40], we find that a high degree of mixed land use results in an increase in ridership, suggesting that functionally diverse land use provides abundant space for various activities and generates substantial travel demands for MRT systems. Therefore, promoting functional diversity for land use is crucial for planning the undeveloped or developing metro station catchments.
Secondly, high values of compactness enhance the local land use efficiency and decrease the conflict between adjacent land cells, but it negatively affects the degree of mixed land use. This is consistent with findings from a previous study [14]. For urban planners, it is important to keep a balance simultaneously between regional global land use functional diversity and compact local land use when making decisions for TOD planning.
(the total distance divided by ridership). Figure 9b shows the relationship between average personal walking distance and ridership (both of them are standardized by max-min normalization), which indicates that the average walking distance of each person does not increase, confirming that an appropriate land use allocation could shorten the walkable distance and enhance travel convenience to MRT stations, both of which increase the attractiveness of rail transit for residents. −0.2−

Conclusions
This paper proposes an improved multiobjective TOD land use design framework to finely optimize the land use layout surrounding MRT stations and promote coordinated development between public transportation and land use in Chinese megacities. The method is constructed via a TOD planning model and a multiobjective optimization approach. Based on TOD strategies in the Chinese context, the planning model focuses on economic rail transit, efficient and functionally diverse land use, high-quality and convenient living, and a low-polluting environment. Specifically, the planning model introduces geo-big data and explores the practical nonlinear impacts of land use on MRT ridership. An improved genetic algorithm is used to solve the multiobjective programming model, and it generates land use schemes with specific land use allocations and density features without objective weighting. Significantly, the optimized approach pro- Thirdly, the local pollution treatment cost increases as ridership increases because high-density land development both increases residents' travel demands and pollution emissions. This result has also been proved by a previous study [13]. Therefore, policies should advocate multi-mode public and low-carbon transportation and expand efforts to develop sustainable buildings, energy, and waste practices to construct green and livable TODs.
Finally, the total walking distance increases along with land use mix and ridership, as shown in Figure 9a, because high-density and high-diversity development raises passage flows. Furthermore, we calculate the average walking distance of each person (the total distance divided by ridership). Figure 9b shows the relationship between average personal walking distance and ridership (both of them are standardized by max-min normalization), which indicates that the average walking distance of each person does not increase, confirming that an appropriate land use allocation could shorten the walkable distance and enhance travel convenience to MRT stations, both of which increase the attractiveness of rail transit for residents −0.2.

Conclusions
This paper proposes an improved multiobjective TOD land use design framework to finely optimize the land use layout surrounding MRT stations and promote coordinated development between public transportation and land use in Chinese megacities. The method is constructed via a TOD planning model and a multiobjective optimization approach. Based on TOD strategies in the Chinese context, the planning model focuses on economic rail transit, efficient and functionally diverse land use, high-quality and convenient living, and a low-polluting environment. Specifically, the planning model introduces geo-big data and explores the practical nonlinear impacts of land use on MRT ridership. An improved genetic algorithm is used to solve the multiobjective programming model, and it generates land use schemes with specific land use allocations and density features without objective weighting. Significantly, the optimized approach provides objective preference information and normalized scores of the solutions for further selection to satisfy the local policies. The proposed method is validated in the case study of Shanghai. The results indicate that (1) the TOD planning model, constructed with geo-big data and statistical and empirical data with the consideration of nonlinear relationships between the land use and TOD objectives, is closer to the actual situation than other models; (2) The employed optimization approach is effective in solving the complicated TOD planning model and in generating a set of alternative land use schemes with favorable extreme optima and convergence. Furthermore, this study discovers several interesting findings based on analyzing the relationships between the TOD planning objectives, and provides meaningful suggestions for urban planning and policy-making. Most of the data in this framework are from open sources and the objective functions can be established; thus, the method is applicable for other Chinese megacities with some revisions (e.g., different study areas may use different grid sizes, model parameters, and constraints corresponding to local policies), which would provide valuable insights from a micro perspective for the future balanced development of MRT and station-level land use.
Although the proposed method solves problems regarding TOD planning, there are several features to be improved and extended in the future. Overall, this study applies fixed-dimension cells for designing, allocates a pure land use category to each cell, and does not consider the mixed uses of buildings, which reduces the diversity of the realistic urban environment. Therefore, it would be significant to incorporate different land parcel shapes and mixed-use buildings into planning. Secondly, the alternative land use sketch maps from the proposed model are scattered and discrete, which creates problems in practical urban planning. Accordingly, critical spatial constraints will be introduced to the model. Moreover, some objectives, such as environmental effects, are simplifications that construct relationships only with land use and do not consider other elements of the built environment; thus, combining with more factors that contribute to the objectives can be beneficial for improving the TOD planning model. The multiple factors change with the transformation of the land use, so more complicated dynamic models can be considered for planning as an extension of this study. Finally, it is difficult to practically measure the attraction of land use with empirical data, but mobile sensor data such as cellular signaling data offer the possibility of precisely estimating the distribution of passenger flow, which will be used in future works.