Data-Driven Design and Optimization for Smart Logistics Parks: Towards the Sustainable Development of the Steel Industry

The design of steel logistics parks acts as fundamental infrastructure supporting the operations of storage, allocation, and distribution of steel products in the steel logistics industry, which actually lags behind the development of other logistics industries, such as e-commerce logistics, due to its large lot bulk storage, low turnover rate, and costly transportation and operations. This research proposes a data-driven approach for a specific steel logistics park, aiming to improve its operational efficiency in terms of product layout and allocation in multiple yards. The entry and delivery order data are analyzed comprehensively so as to determine the products with high operational frequency and the corresponding relevancy among them. Experimental results show that, among the 69 steel specifications, 14 high-frequency products are identified, and the correlation among the 14 identified high-frequency products possesses evident distribution characteristics concerning their brands and specifications. The identified frequency and correlation among various products can not only facilitate the product layout and allocation in steel logistics parks, but also advance the vehicle scheduling efficiency for product pick-up and delivery. Moreover, the research methodology and framework can provide managerial insights for other industries with mass data processing requirements.


Introduction
The market condition of the steel industry continues to deteriorate due to a variety of reasons. On the one hand, overcapacity in the steel industry is becoming increasingly serious, which results in excessive product storage and a high storage cost correspondingly. On the other hand, the steel price holds on at a high level, and the export volume of steel falls significantly, further exacerbating market deterioration. In order to maintain sustainable development, more and more steel enterprises are seeking new profit options, among which steel logistics services become a promising choice. A multitude of large steel enterprises intend to handle their logistics activities, such as material procurement, production, and product distribution, internally by establishing their own steel logistics parks, which can help avoid the high cost of activity outsourcing. However, in contrast to other logistics industries, such as e-commerce logistics, the development of steel logistics lags due to its inherent industrial features. The product life cycle of steel is relatively long, and the throughput of a steel logistics park is significantly affected by the market condition. The profit of a steel logistics enterprise is reduced as well because of the high steel storage cost and low operational efficiency.
The design of steel logistics parks acts as fundamental infrastructure supporting the operations of storage, allocation, and distribution of steel products in the steel logistics industry. With the expansion of steel warehousing varieties and spot trading varieties, and the increase of e-commerce transactions, the delivery volume of a steel logistics park increases substantially. However, most of the current layout design cannot meet the increasing requirements of inventory management and facility utilization. The operational efficiency of steel yards decreases due to the excess of vehicle turnaround time and unexpected waiting time, and unnecessary frequent facility operations. Queuing for product pick-up frequently appears because of insufficient storage capacity and inappropriate product allocation, which also increases the warehouse operating cost. All the above problems necessitate the improvement and optimization of the layout and operation of steel logistics parks.
Moreover, both social development and technological progress have stimulated and advocated the operational and managerial paradigm shift of the steel industry towards sustainability development. Environmental considerations have been promoted due to social development and become the topmost consideration for industries, especially heavy industries, such as the steel industry [1]. On the one hand, the continuous consumption of non-renewable resources, such as iron ore and carbon coke, undermines the development of the steel industry. On the other hand, environmental awareness rises as an increasing number of consumers are paying attention to product production and transportation. The operations of steel products are commonly environmentally hazardous in terms of noise pollution, carbon emission, air contamination, and so on. Concerning the technological progress, with the application of various Internet of things (IoT) devices in steel logistics, a data-rich environment is generated in terms of large data volume, high data velocity, and multiple data variety [2]. For instance, the location of in-transit steel products can be precisely acquired, providing a more accurate product arrival estimation. The operations inside the steel logistics park can be digitalized and further exploited for optimization.
This research aims to improve the operational efficiency of yard management in steel logistics parks by redesigning the location and allocation schemes of different steel varieties according to their relevance degree. Statistical methods and data mining approaches are applied to analyze the warehousing entry and delivery data so as to find the operational frequency and correlation among different steel varieties. Steel varieties with high operational frequency are suggested to be located near the entrance/exit. Steel varieties with high relevance degree are suggested to be allocated in nearby yards considering the available yard capacities. Meanwhile, steel products are suggested to be deployed in multiple yards evenly, which can balance the usage rate of gantry cranes and improve the overall operational efficiency.
The rest of this paper is organized as follows. Related literature is reviewed and examined in Section 2 in terms of both the logistics park design and methodology application in related areas. Section 3 describes the case scenario in this paper. Then, algorithm application and data analysis are conducted in Section 4. Finally, conclusions are drawn in Section 5.

Literature Review
As a significant role of supply chain management, the design of a logistics park is a strategical issue for large-scale enterprises, which is indeed a complex and integrated issue consisting of logistics management systems, advanced information systems, and cooperative freight transportation systems [3,4]. A well-designed logistics park can facilitate reduction in inventory costs and ease of freight pick-up so as to enhance the overall operational efficiency. As it is acknowledged that the establishment of a new logistics park is considerably costly, especially for the steel industry, the operational optimization of steel logistics parks becomes critically important, attracting interest from an increasing number of researchers and practitioners towards the development of smart logistics [5,6]. Moreover, in practice, enterprises in the steel industry are commonly large-scale conglomerates, such as POSCO, TATA steel, JFE, Nippon Steel Corporation, Baosteel Group, etc., which exemplify the conventional operations of the steel industry and supply chain. Wu et al. [7] studied the supply chain and logistics patterns of these enterprises and concluded that the distinguishing features of iron and steel logistics indicate the importance of proper operational and tactical planning.
Both qualitative and quantitative approaches have been applied to facilitate the design and operation of steel logistics parks. For instance, Wang et al. [8] proposed a rule-based ontology reasoning method to support the decision-making of steel logistics managers. Yilmaz et al. [9] applied a fuzzy analytical network process to handle the material operations for in-plant logistics. Regarding the quantitative solutions, the systematic layout planning (SLP) method is frequently utilized for solving the logistics park planning problems [10,11]. Moreover, many researchers tend to combine SLP and other heuristic algorithms in the logistics parks study. For example, Zhang et al. [12] integrated a genetic algorithm (GA) and SLP for solving the functional area layout of a railway logistics park. Chen et al. [13] designed an improved adaptive GA with scatter search to handle the SLP problem in a non-rectangular logistics park with split lines. Palominos et al. [14] integrated SLP with quality function deployment (QFD) for the layout design of a service-oriented facility.
In addition to the conventional approaches for logistics park planning, data-driven methods become a promising choice with the support from data mining techniques due to the application of various data-enriched devices and facilities. For instance, He and Xue [15] applied fuzzy clustering for solving the layout planning problem of regional logistics parks. Hsieh and Huang [16] developed a k-means batching (KMB) and self-organization map batching (SOMB) mechanism to improve total travel distance and average vehicle utility in logistics parks. Kulak et al. [17] proposed a novel tabu search (TS) algorithm integrated with a novel clustering algorithm to solve the order batching and picker routing problems jointly for multiple-cross-aisle warehouse systems. Chiang et al. [18] presented a new association measure approach, named weighted support count (WSC), to analyze the intensity and nature of the relationships among different products in a distribution center in order to facilitate the order picking efficiency. Pang and Chan [19] applied a data mining-based algorithm for the storage location assignment of piece picking items in a randomized picker-to-parts warehouse by extracting and analyzing the association relationships between different products in customer orders.
In contrast to the aforementioned research work, frequent pattern growth (FP-growth) is applied in this research to explore and exploit the entry and departure orders. FP-growth is a classical association rule mining approach, which was initially introduced by Han et al. [20,21], as a successful and popular extension of the conventional Apriori method [22]. Ever since its first introduction, the FP-growth approach has gained much popularity due to its unique design of FP-tree and high processing capability. Kuo et al. [23] combined FP-growth and an artificial immune network (AIN) for supplier selection and order allocation in the logistics industry. Feng et al. [24] proposed an expert recommendation algorithm based on the Pearson correlation coefficient and FP-growth. Wu and Zhang [25] designed an analysis model for electronic evidence based on an FP-growth algorithm. An initial investigation suggests that although FP-growth has been applied to solve miscellaneous specific problems from multiple industries, little empirical research pertaining to the data analytics for the steel industry using data mining techniques has been conducted, which motivates this research.

Case Description
A case study is conducted in this research aiming to provide a straightforward and visualized understanding of the operations of steel logistics parks. The steel logistics service provider involved in this research, with the alias as GC logistics, is a subsidiary enterprise of a large Iron and Steel Corporate, which is among the top five Iron and Steel Corporates in China. The GC logistics park plays a vital role in the operations of the Iron and Steel Corporate, accounting for around 20% of sales volume of the entire corporate. The existence of the GC logistics park not only serves as a regional logistics center to provide storage and distribution services of steel products, but also serves as a favorable sales channel, which generates a huge amount of sales profits to the Iron and Steel Corporate every year. However, along with market development of the steel trade, the steel warehouse operations in the GC logistics park are facing a growing number of challenges. For example, queuing for product pick-up occurs frequently, which decreases the product turnover rate substantially. The workload imbalance among multiple yard cranes is frequently observed as well. Moreover, the allocation for arrival products are mainly conducted based on personal experience lacking optimization operations. Therefore, the management team of GC logistics parks are seeking various approaches to improve the operational efficiency and reduce the operational cost with the purpose of sustainable development.
As shown in Figure 1, the GC logistics park possesses six yards, among which the first five yards are designed for rebar storage purposes, and Yard 6 is for backup and storage of other spare and scarce steel products. Each yard has one gantry crane for product stacking. When steel products arrive at this steel logistics park, the warehouse manager needs to assign storage units for different steel products considering their brands and specifications. When a delivery order arrives, the logistics manager needs to schedule proper vehicles and facilities to pick up steel products from different yards. An initial investigation finds that most of the entry and delivery operations are conducted based on the knowledge and experience of warehouse and logistics managers, which lacks quantitative support and optimization. The increase of entry and delivery orders and steel varieties necessitate the redesign of product layout and optimization of logistics activities.

Data Investigation and Analysis
In order to handle the location and allocation problem of various steel products in steel logistics parks, this research is conducted in two dimensions, i.e., steel order analysis and steel correlation analysis. The steel order data are firstly investigated and explored in view of the operational frequency of different steel types. Then, correlation among the steel types with high operational frequency is further examined using the FP-growth approach. The frequency identification and correlation analysis can facilitate the layout and deployment of different steel varieties, and further improve the overall efficiency of steel logistics parks.

Data Investigation
The collected order data comprises steel entry records and steel delivery records from 2017.10 to 2018.8. The steel entry data are relatively simple, which contains 5725 records. The format of the steel entry record is fixed, as shown in Table 1. Each entry number contains only one type of rebar. By comparison, the complexity of steel delivery data is much more complicated, which contains 77,018 records, as illustrated in Table 2. One of the major differences between entry operations and delivery operations is that the entry operations are mainly conducted in batch scale, while the delivery operations are more scattered and diversified. One delivery number commonly contains multiple steel types. Averagely, there are around 15 entry operations and more than 200 delivery operations on a daily basis. Moreover, the delivery operations are more time-consuming and laborious. Therefore, the steel delivery data is further analyzed in detail to explore and exploit the correlation among high-frequency steel types. An initial data investigation shows that six steel types are operated in this steel logistics park, i.e., HB400, HB400E, HB500, HB500E, HTB600, and HTB600E. Each steel type contains different specifications in terms of different diameters (centimeter) and lengths (meter). Twenty-four specifications are involved, as 12*12, 12*9, 14*12, 14*9, 16*12, 16*9, 16*7, 18*12, 18*9, 18*7, 20*12, 20*9, 20*7, 22*12, 22*12, 22*9, 22*7, 25*14, 25*12, 25*9, 25*7, 28*12, 28*9, 32*12, and 32*9. Totally, there are 69 combinations of steel types and steel specifications. Herein, the term steel specification is used to represent the combination of steel type and steel specification in the following context for simplification.

FP-Growth Analysis
Correlation analysis among the 14 frequent steel specifications is conducted in this section using the FP-growth approach. FP-growth (frequent pattern) is a classical and popular data mining approach, which was initially introduced by Han, Pei and Yin [20]. One of the inherent features of the FP-growth approach is the design of an FP-tree structure, which facilitates the data processing significantly in contrast to the previous Apriori algorithm. The frequent itemset can be discovered after two-time data traversal due to the application of the FP-tree. Moreover, the FP-growth approach comprises a header table and a node list enabling the construction and analysis of the FP-tree. The major operations and steps of the FP-growth approach are described in Figure 4 comprising two main steps, i.e., FP-tree generation and FP-tree mining. As described in the previous section, there exist 77,018 records in the delivery data. However, only 37,458 delivery numbers exist, which means one delivery number contains multiple records. Therefore, data cleaning is firstly implemented to consolidate the delivery records with the same delivery number. After that, the FP-growth approach is applied to process the consolidated delivery data. Figure 5 illustrates an example of an FP-tree when considering the first 7 high-frequency steel specifications with support degree as 400. The notations in Figure 5 are refined for display purposes. The first and second line in each node represent the steel brand and specification, respectively. The third line represents the support degree related to this specification. The FP-tree with 14 high-frequency specifications is not provided due to the page limit and its complicated structure.
The support degree in FP-growth is tuned tentatively so as to find meaningful combinations among these high-frequency steel specifications. Meanwhile, the complexity of the FP-tree structure diminishes along with the increase of support degree setting. Table 3 shows that 18 frequent 5-itemsets are found with the setting of support degree as 100. When the support degree increases from 100 to 200, no frequent 5-itemsets exist. Instead, 10 4-itemsets are found as presented in Table 4. Table 5 actually is the first two lines of Table 4, as the frequent 4-itemsets still exist. However, when the support degree increases from 300 to 400, the optimal frequent itemset contains only three items as shown in Table 6. Furthermore, Table 7 presents the analytical results when the support degree increases to 900, in which only frequent 2-itemsets can be discovered.

Discussion
The target of this study was to improve the overall operational efficiency and performance of steel logistics parks by exploiting the association rules among different steel products using data mining approaches. Based on the procedures carried on in the mentioned case study, a generic prototype design for data-driven optimization is proposed, as shown in Figure 6. Multiple data sources, i.e., product entrance records, product delivery records, and interior operations records, are taken into account to support the implementation of a data analytics module. The data analytics module comprises three steps of data preprocessing, model building, and parameter tuning, and identifies the operational frequency and association patterns among multiple steel products. After that, the output of the data analytics module underpins the operations management in view of the features of incoming orders and the real-time utilization of yards so as to achieve the optimal allocation and deployment of steel products in this steel logistics park.
To be more specific, the frequency and correlation analysis among various steel specifications can facilitate the layout and deployment of the steel logistics park in several aspects. First of all, the steel specifications of high operational frequency are suggested to be deployed in Yard 1 and 3, which are near the entrance and exit area of the logistics park. Secondly, when multiple steel specifications arrive at this steel logistics park simultaneously, the items with high correlation are suggested to be allocated in the same yard considering the remaining capacity of each yard. Thirdly, the arrival of steel specifications with less correlation are advised to be assigned to yards with the same product categories. In addition, the products with low operational frequency or rare types are suggested to be stored in a separate yard.

Conclusions
This research aimed to provide tactical management insights towards the layout and deployment of steel logistics parks. The entry and delivery order data of a steel logistics park was investigated and analyzed using a statistical and data mining approach. The operational frequency analysis can facilitate the classification of different steel specifications and improve the utilization of different yards. The correlation analysis among different steel specifications can help to identify the products with high relevance, which can affect the product allocation decisions. The analysis results act as a tactical reference, facilitating the performance improvement of this steel logistics park. Moreover, the proposed data analytics procedures and approaches can be applied to the other steel logistics enterprises in view of their similar operations and management strategies, and the prototype design for a data-driven optimization framework can shed light on the role of data analytics and facilitate the digital transformation of the steel industry.
Future research can be conducted in terms of the operational decision-making requirements. For instance, the specific product allocation for each arrival can be optimized considering the daily operational requirements and real-time capacity remaining for each yard. The order pick-up sequence and vehicle scheduling for delivery tasks can be further analyzed. The performance evaluation criteria of logistics parks can be one promising research area considering different industrial features. Moreover, the application of IoT devices in steel logistics can be another fruitful research direction towards industry digitalization.

Conflicts of Interest:
The authors declare no conflict of interest.