A K-Nearest Neighbors Algorithm in Python for Visualizing the 3D Stratigraphic Architecture of the Llobregat River Delta in NE Spain

: The k-nearest neighbors (KNN) algorithm is a non-parametric supervised machine learning classiﬁer; which uses proximity and similarity to make classiﬁcations or predictions about the grouping of an individual data point. This ability makes the KNN algorithm ideal for classifying datasets of geological variables and parameters prior to 3D visualization. This paper introduces a machine learning KNN algorithm and Python libraries for visualizing the 3D stratigraphic architecture of sedimentary porous media in the Quaternary onshore Llobregat River Delta (LRD) in northeastern Spain. A ﬁrst HTML model showed a consecutive 5 m-equispaced set of horizontal sections of the granulometry classes created with the KNN algorithm from 0 to 120 m below sea level in the onshore LRD. A second HTML model showed the 3D mapping of the main Quaternary gravel and coarse sand sedimentary bodies (lithosomes) and the basement (Pliocene and older rocks) top surface created with Python libraries. These results reproduce well the complex sedimentary structure of the LRD reported in recent scientiﬁc publications and proves the suitability of the KNN algorithm and Python libraries for visualizing the 3D stratigraphic structure of sedimentary porous media, which is a crucial stage in making decisions in different environmental and economic geology disciplines.


Introduction
Numerical modeling has expanded the classic qualitative geological visualizations toward strategies for delineating essential stratigraphic elements of sedimentary basins under quantitative criteria. Nowadays, numerical modeling tools allow interactive 3D visualizations while measuring the magnitude and uncertainty of the modeled geological variables and parameters [1][2][3][4]. As a result, modern 3D modeling has the ability to assimilate new data into the models as they are generated, thus allowing more real, accurate, and interactive visualizations than classic 2D representations. These features make the numerical modeling of great interest to take decisions in different environmental and economic geology disciplines.
There is a variety of applications for 3D visualization based on different interpolation algorithms and programming languages, including commercial software such as MOVE (Petroleum Experts Ltd., Edinburgh, UK), 3D Geomodeller (Intrepid Geophysics), Autocad Civil (Autodesk, Inc.), Gocad (Emerson Paradigm Roxar), ArcGis, PETREL (Geology and Modeling from Schlumberger), VOXI (Earth Modeling from Geosoft), and Geoscene 3D (I-GIS), as well as open source tools such as Gempy [5] and OSGeo [6]. In general, commercial applications have friendly environments and technical support for users, but they are expensive. The advantage of open source applications is the zero cost and adaptability (modifying or extending the sources), but the absence of technical support for users and sometimes low reliability are its negative parts. The open source Python libraries [5][6][7] and posts listing libraries [8][9][10] devoted to Geographic Information Systems (GIS) and mapping are of special interest for visualizing geological structures and stratigraphic elements in the fields of mining, engineering, and hydrogeology. Some scientific documents develop or apply computer tools to different fields of geology [11][12][13][14], and some social media channels post data analytics and machine learning educational applications focused on geology [15].
The experience of the researchers' team of this paper with Python libraries for geological data handling and 3D visualization [16] was decisive in developing a new application for classifying and visualizing the 3D stratigraphic architecture of sedimentary porous media based on a machine learning KNN algorithm and Python libraries. This application uses (i) the KNN algorithm to create a consecutive 5 m-equispaced set of horizontal sections of the granulometry classes grouped as an interactive 3D HTML model; and (ii) different Python libraries to create an interactive 3D HTML model of the essential stratigraphic elements (coarse lithosomes and basement top surface) of the sedimentary basin. On the basis of the high density of boreholes and the subsequent geological knowledge gained during the last six decades, the Quaternary onshore Llobregat River Delta (LRD) near Barcelona city in northeastern Spain ( Figure 1) was selected to show the application. Autocad Civil (Autodesk, Inc.), Gocad (Emerson Paradigm Roxar), ArcGis, PETREL (Geology and Modeling from Schlumberger), VOXI (Earth Modeling from Geosoft), and Geoscene 3D (I-GIS), as well as open source tools such as Gempy [5] and OSGeo [6]. In general, commercial applications have friendly environments and technical support for users, but they are expensive. The advantage of open source applications is the zero cost and adaptability (modifying or extending the sources), but the absence of technical support for users and sometimes low reliability are its negative parts. The open source Python libraries [5][6][7] and posts listing libraries [8][9][10] devoted to Geographic Information Systems (GIS) and mapping are of special interest for visualizing geological structures and stratigraphic elements in the fields of mining, engineering, and hydrogeology. Some scientific documents develop or apply computer tools to different fields of geology [11][12][13][14], and some social media channels post data analytics and machine learning educational applications focused on geology [15]. The experience of the researchers' team of this paper with Python libraries for geological data handling and 3D visualization [16] was decisive in developing a new application for classifying and visualizing the 3D stratigraphic architecture of sedimentary porous media based on a machine learning KNN algorithm and Python libraries. This application uses (i) the KNN algorithm to create a consecutive 5 m-equispaced set of horizontal sections of the granulometry classes grouped as an interactive 3D HTML model; and (ii) different Python libraries to create an interactive 3D HTML model of the essential stratigraphic elements (coarse lithosomes and basement top surface) of the sedimentary basin. On the basis of the high density of boreholes and the subsequent geological knowledge gained during the last six decades, the Quaternary onshore Llobregat River Delta (LRD) near Barcelona city in northeastern Spain ( Figure 1) was selected to show the application.  [17,18], Alonso et al. [19], and Gámez et al. [20]. Geological cross-section A-A' is displayed in Figure 2. This paper uses the public granulometry dataset prepared by the Water Authority of Catalonia (Agència Catalana de l'Aigua, ACA) in the LRD region, which is available on request. A Jupyter notebook describing the data classification and 3D visualization, as well as an operative version of the Python code, can be downloaded from the GitHub repository described in Supplementary Materials. The HTML files do not require any additional tools apart from a web browser to view different perspectives, hide elements, enlarge or focus on specific areas or elements and take snapshots of a particular view. These files are included in Supplementary Materials.  [17], Medialdea and Solé-Sabarís [18], Alonso et al. [19], and Gámez et al. [20]. This paper uses the public granulometry dataset prepared by the Water Authority of Catalonia (Agència Catalana de l'Aigua, ACA) in the LRD region, which is available on request. A Jupyter notebook describing the data classification and 3D visualization, as well as an operative version of the Python code, can be downloaded from the GitHub repository described in Supplementary Materials. The HTML files do not require any additional tools apart from a web browser to view different perspectives, hide elements, enlarge or focus on specific areas or elements and take snapshots of a particular view. These files are included in Supplementary Materials.

Study Area
The LRD is a densely populated coastal plain of 98 km 2 , forming the southwestern sector of the metropolitan area of Barcelona city in the Catalonia region in northeastern Spain ( Figure 1). This area includes other minor-order cities such as El Prat de Llobregat, L'Hospitalet de Llobregat, Cornellà, Sant Boi, Viladecans, Gavà and Castelldefels. The water abundance and its strategic location near Barcelona city have favored the development of an important industrial activity since the XIX century. The high groundwater exploitation rates to supply to the increasing population and industrial activity have produced negative consequences on groundwater quantity and quality, including seawater intrusion into aquifers and high levels of pollution [17,[21][22][23][24].
Other modern development milestones occurred in Barcelona city, and its metropolitan area, such as the Olympic Games in 1992 and the Llobregat Delta Infrastructure and Environment Plan (Pla d'Infraestructures i Medi Ambient del Delta del Llobregat, PDL) started in 1994 [25] have modified the LRD land use. The PDL included large civil infrastructures with variable underground development affecting groundwater bodies and posing at risk the water provision for different uses. In response, in 2004, ACA created the Technical Unit of the Llobregat Aquifers (Mesa Tècnica dels Aqüífers del Llobregat, METALL) to compile and homogenize the huge geological and hydrogeological information in order to prepare groundwater flow numerical models aimed at assessing the cumulative impact of the large civil works on the groundwater resource [26]. This public geological and hydrogeological database is available on request and has been used in this paper.
From a geological point of view, the LRD is regionally fed by the Llobregat River and its tributaries arriving from the Pre-Pyrenean Range and locally from the Llobregat River lower valley reliefs (Garraf and Collserola massifs) belonging to the Catalonian Coastal Range [23,24]. This range is a NE-SW-oriented mountain chain that gives pass downward to the Mediterranean coast ( Figure 1). The LRD is also bounded toward the NE by the Montjuïc relief. The geological studies in the area started at the end of the XIX century by Almera [27]. In the previous century, several studies [18][19][20]28,29] allowed proposing geological maps and 2D cross-sections aimed to support hydrogeological studies. Sedimentological studies performed in the 1970s and 1980s [28][29][30] allowed clarifying the geology of the LRD. In the 1980s, the prodeltaic bodies of the emerged delta, dated as Holocene, were studied in detail [31] and the geological characterization of the continental shelf with support of 2D marine seismic reflection took place [32][33][34]. These studies, with a strong sedimentological component, allowed the sequential division of the LRD and the arrangement of the Quaternary materials. In the 1990s, this huge geological background was combined with the former geological information compiled from dozens of boreholes to make modern groundwater evaluations [35]. Coinciding with the PDL development at the beginning of the XXI century, new geological data allowed fine research [20,[36][37][38][39] aimed to detail the 2D stratigraphic architecture of the LRD, thus giving a response to the Pliocene-Quaternary boundary, the confident definition of those coarse detritic levels officially cataloged as productive, high-yielding aquifers and the detailed identification of those interconnected stratigraphic structures (of sedimentary and tectonic origin) through which seawater intrusion into aquifers and mobilization of pollutants take place [22,23,26].
The Quaternary record was divided into two depositional sequences, the Pleistocene and Holocene ages [18][19][20]32,33,37,39]. The terms upper detrital complex and lower detrital complex have also been adopted in the scientific literature [20] for the same depositional sequences ( Figure 2). According to geophysical studies in the offshore delta in the marine platform [32][33][34], the lower detrital complex can be divided into three parasequences [20]. In general terms, the lower detrital complex is made of conglomerate bodies (locally with sand) with intercalated silt-and clay-rich intervals [30][31][32][33][34]. The upper detrital complex, from bottom to top, is made of a sand layer, a silt bed, gravel (locally with sand), and upper silt and clay cover forming the current alluvial plain and the associated coastal wetlands and marshes [30,39]. The LRD coastal plain is also modeled by different streams coming from the neighbor reliefs [37][38][39] and the regional littoral drift, which distributes the shoreline sedimentation towards the SW [40,41].  [17][18][19][20]38,39]. The Quaternary record was divided into two depositional sequences, the Pleistocene and Holocene ages [17][18][19][20]32,33,37,39]. The terms upper detrital complex and lower detrital complex have also been adopted in the scientific literature [20] for the same depositional sequences ( Figure 2). According to geophysical studies in the offshore delta in the marine platform [32][33][34], the lower detrital complex can be divided into three parasequences [20]. In general terms, the lower detrital complex is made of conglomerate bodies (locally with sand) with intercalated silt-and clay-rich intervals [30][31][32][33][34]. The upper detrital complex, from bottom to top, is made of a sand layer, a silt bed, gravel (locally with sand), and upper silt and clay cover forming the current alluvial plain and the associated coastal wetlands and marshes [30,39]. The LRD coastal plain is also modeled by different streams coming from the neighbor reliefs [37][38][39] and the regional littoral drift, which distributes the shoreline sedimentation towards the SW [40,41].

Data Compilation
The public geological and hydrogeological database prepared by ACA, which is available on request, was consulted, and the granulometry dataset generated by METALL from both Lab test values and proxy values after visual recognition of the predominant lithologies in the LRD were compiled. The lithological records of 433 boreholes in the onshore LRD [42] and their granulometry records were checked in order to detect possible outliers. The detected outliers were suppressed or corrected when possible, thus providing the granulometry dataset used in this paper. The dataset consisted of georeferenced XLS (Excel) files with meter-by-meter granulometry values. The boreholes' location (coordinates x and y) and prospecting depth (coordinate z) lead to a georeferenced array of granulometry data associated with the prospected lithologies over space and depth. The georeferenced data were clustered into four main granulometry classes: clay-silt (<1 mm), coarse sand (1-5 mm), gravel (>5 mm), and basement. This working flow is synthesized in Figure 3A.

Data Compilation
The public geological and hydrogeological database prepared by ACA, which is available on request, was consulted, and the granulometry dataset generated by METALL from both Lab test values and proxy values after visual recognition of the predominant lithologies in the LRD were compiled. The lithological records of 433 boreholes in the onshore LRD [42] and their granulometry records were checked in order to detect possible outliers. The detected outliers were suppressed or corrected when possible, thus providing the granulometry dataset used in this paper. The dataset consisted of georeferenced XLS (Excel) files with meter-by-meter granulometry values. The boreholes' location (coordinates x and y) and prospecting depth (coordinate z) lead to a georeferenced array of granulometry data associated with the prospected lithologies over space and depth. The georeferenced data were clustered into four main granulometry classes: clay-silt (<1 mm), coarse sand (1-5 mm), gravel (>5 mm), and basement. This working flow is synthesized in Figure 3A.

Python Programing Language
Python is an open-source language widely used in many environmental top cluding geological ones. The object-oriented, high-level Python programming lan [43] was used for analyzing the granulometry data and visualizing the 3D HTML of essential stratigraphic elements. From the many packages or modules that use th gramming language for releasing a wide variety of problems, the Python package here were (i) NumPy [44] for data computing, (ii) Pandas [45] for data analysis an cessing, (iii) Plotly [46] as a graphing library, (iv) Scipy [47] for interpolation and algorithms, and (v) Scikit-learn [48] for the machine learning KNN algorithm. In ad an inverse distance weighted interpolation algorithm obtained from the 3D Terrain elling in Python viewer from the GEODOSE block [49] was also used. This workin is synthesized in Figure 3. The Jupyter notebooks describing the Python code and planatory instructions, as well as the operative HTML version of the code, can be loaded from the GitHub repository described in Supplementary Materials. This version can be opened and analyzed by using a web browser only, without the n install Python. The Jupyter notebooks also explain step by step how to proceed to a our goals (a figure, a prediction, a cluster, etc.). If the readers want to adapt the c their data or particular aims, they will need to know Python and might have to our auxiliary functions accordingly.

KNN Algorithm
The k-nearest neighbors (KNN) algorithm is a non-parametric supervised m learning classifier that uses proximity and similarity of data to make classifications dictions about the grouping of an individual data point [50][51][52][53]. This ability ma KNN algorithm well suited for regression and classification problems of geologic ables and parameters [51,52]. The KNN algorithm is used to classify the granul dataset prior to 3D visualization by assuming that the predicted value at a point is dependent on the neighboring data. Under this same assumption of similarity, th algorithm was used to infer data from a collection of measuring to a wider space o

Python Programing Language
Python is an open-source language widely used in many environmental topics, including geological ones. The object-oriented, high-level Python programming language [43] was used for analyzing the granulometry data and visualizing the 3D HTML models of essential stratigraphic elements. From the many packages or modules that use this programming language for releasing a wide variety of problems, the Python packages used here were (i) NumPy [44] for data computing, (ii) Pandas [45] for data analysis and processing, (iii) Plotly [46] as a graphing library, (iv) Scipy [47] for interpolation and render algorithms, and (v) Scikit-learn [48] for the machine learning KNN algorithm. In addition, an inverse distance weighted interpolation algorithm obtained from the 3D Terrain Modelling in Python viewer from the GEODOSE block [49] was also used. This working flow is synthesized in Figure 3. The Jupyter notebooks describing the Python code and its explanatory instructions, as well as the operative HTML version of the code, can be downloaded from the GitHub repository described in Supplementary Materials. This HTML version can be opened and analyzed by using a web browser only, without the need to install Python. The Jupyter notebooks also explain step by step how to proceed to achieve our goals (a figure, a prediction, a cluster, etc.). If the readers want to adapt the code to their data or particular aims, they will need to know Python and might have to modify our auxiliary functions accordingly.

KNN Algorithm
The k-nearest neighbors (KNN) algorithm is a non-parametric supervised machine learning classifier that uses proximity and similarity of data to make classifications or predictions about the grouping of an individual data point [50][51][52][53]. This ability makes the KNN algorithm well suited for regression and classification problems of geological variables and parameters [51,52]. The KNN algorithm is used to classify the granulometry dataset prior to 3D visualization by assuming that the predicted value at a point is mostly dependent on the neighboring data. Under this same assumption of similarity, the KNN algorithm was used to infer data from a collection of measuring to a wider space of individuals, in this case, the nodes of a grid in the LRD.
Before using the KKN algorithm, the strategy for predicting new data attending to the environmental forces controlling the granulometry spatial distribution must be analyzed. The onshore LRD is formed of sub-horizontal layers of sedimentary material, thus determining more accurate 2D predictions than 3D ones. Since the target is 3D visualization, a consecutive 5 m-equispaced set of horizontal layers was adopted. In each horizontal layer, the KNN algorithm searches the K-nearest granulometry data and adopts the predominant value. A weight inversely proportional to the distance was assigned to each neighbor's data in order to prioritize the nearest ones. The rationale for choosing the parameter K considered that smaller K values might produce unrealistic polygonal regions while larger K values lead to more natural and smother regions although they may ignore isolated data. This procedure follows the geological logic, thus avoiding problems with edge/boundary effects commonly seen in KNN. Once the granulometry dataset was carefully checked for possible outliers, we consider that every piece of data must be considered. So, we shall use K = 1, i.e., the KNN algorithm will only check the nearest neighbor to every point. This working flow is synthesized in Figure 3A. A Jupyter notebooks hosted in the GitHub repository described in Supplementary Materials include the code.

The 3D Mapping of the Granulometry Horizontal Sections
The Python module Pandas was used to read and process the boreholes' granulometry data XLS file. We used Google Earth to draw the LRD contour and the Python geometry function Polygon to create the LRD contour polygon. Then, we defined (i) the X and Y bounds and the grid size where the KNN algorithm will run, (ii) the exploring depth limit defined by the first prospection of the basement top surface, (iii) the adopted 5 m exploring depth interval, and (iv) the function 'layer_function' to classify the granulometry data attending to the above defined three classes and its depth.
Suitability of the 5 m equidistance of the horizontal KNN maps was based on (i) the LRD depth vs. length ratio since the Quaternary onshore LRD is about 120 m depth and its horizontal extension is about 15 km length; (ii) the borehole granulometry source data are arranged meter to meter; (iii) the sedimentary bodies interesting for us must have a certain mappable entity, for instance of 5 m thick at least; and (iv) there is a dense net of boreholes, but those are not closer of the order of the ten meters in the best of the cases.
Next, the KNN algorithm was executed, and the matplotlib function scatter used to create the 2D horizontal layers with predictions bounded within polygonal regions. The Python Plotly function scatter3d was used to integrate the 2D predictions into an interactive 3D HTML model (3D_Horizontal_Sections_LRD.html (accessed on 12 July 2022)). Some auxiliary functions to arrange data in a specific format were used. This working flow is synthesized in Figure 3A. The Jupyter notebooks hosted in the GitHub repository as Supplementary Materials include the code.

The 3D Mapping of the Stratigraphic Architecture and Basement Top Surface
This section describes the two steps followed to create the interactive 3D HTML model (3D_Lithosomes_LRD.html (accessed on 12 July 2022)) that allows a complete view of coarse lithosomes (gravel and coarse sand) of the onshore Quaternary LRD and the basement top surface.
Firstly, the spatial info about coarse granulometry classes clustered in every equispaced horizontal layer created with the above KNN algorithm was used to define the volume of coarse lithosomes in the incipient 3D HTML model. To this end, the function 'grouping' was the recursive cluster procedure used to group the points around a given start point selected in each cluster of points. The recursive nature of this function 'grouping' implies repeating the same calculation many times, but its definition is simple, and it does well with the nucleation strategy. Once the granulometry data of a lithosome were clustered, the Convex Hull algorithm developed by the SciPy community [54] was implemented. The 3D convex hull of a georeferenced dataset is the smallest polyhedron that wraps them all. The convex hull function must be applied to each obtained group. The function 'lithosome' was used to calculate the corresponding Convex Hull. The output of the function 'lithosome' is a list of four elements (points, vertices, simplices, and name) defining the computed convex hull. At this stage, an overall checking was needed to ensure output is geologically suitable by removing points distorting the shape of groups or adding others with specific granulometry values in sparse data areas. Attending to the general decreasing data density with depth, the radio to get proper clusters was redefined, and the 'grid' parameter was decreased from 150 to 50 to reduce the computation time. Once all grid points are classified by their granulometry by using the KNN algorithm, they are grouped by lithosomes by means of the function 'grouping'. All clusters should be wrapped together by using their corresponding convex wrappings.
In a second step, the function data_lithosome we defined used the function Mesh3d by plotly.graph_objects. This allowed to shape the data in the proper drawing format, and the basement top surface was added to complete the interactive 3D HTML model. To this end, the basement KNN predictions were used, and the grid size decreased again up to 50 to reduce the computation time. The Pandas package used the KNN predicted points to calculate where the basement is firstly prospected (defining the coordinate z) and saved the output data as a CSV file. An Inverse Distance Weighting (IDW) interpolation algorithm was used to map the basement top surface. The mapped basement top surface was processed again to remove possible outliers prior to being merged with the above grouped coarse lithosomes. This working flow is synthesized in Figure 3B. Jupyter notebooks hosted in the GitHub repository as Supplementary Materials include the code.

The 3D Mapping of the Granulometry Horizontal Sections
As described in Section 3.1, the public granulometry dataset produced by the ACA from 433 boreholes in the onshore LRD was clustered into the clay-silt (<1 mm), coarse sand (1-5 mm), gravel (>5 mm), and basement classes. The KNN algorithm was used to create the consecutive 5 m-equispaced set of horizontal layers of the granulometry classes from 0 to 120 m b.s.l. These horizontal layers used a regular 300 m × 300 m grid over the entire onshore LRD surface. The granulometry dataset at a given depth was used to produce the granulometry of the nodal grid points at that depth. Figure 4 shows six representative horizontal sections of the granulometry classes at different depths, in which the location of boreholes with granulometry data is also displayed. As shown, the data density (the number of boreholes) decreases with depth, so the accuracy of the KNN predictions also decreased with depth. However, how the mapping uncertainty of the spatial KNN predictions increases attending to the decreasing data density with depth was not evaluated because this task is out of the scope of this paper and the subject of ongoing research. Figure 5 shows the interactive 3D HTML model created from the consecutive 5 mequispaced set of horizontal layers of the granulometry classes created with the KNN algorithm from 0 to 120 m below sea level. An interactive 3D HTML version of this model (3D_Horizontal_Sections_LRD.html (accessed on 12 July 2022)) is provided in Supplementary Materials. The interactive 3D HTML model can be opened with any browser and allows observing different views, zooming, rotating, and moving around, as well as hiding elements by clicking in the legend to focus on details.     Figure 6 shows the interactive 3D HTML model created for visualizing the stratigraphic architecture (coarse lithosomes and the basement top surface) of the onshore Quaternary LRD, including some partial views relative to (i) coarse lithosomes and basement top surface ( Figure 6A), (ii) gravel lithosome and basement top surface ( Figure 6B), (iii) coarse sand lithosome and basement top surface ( Figure 6C), and (iv) basement top surface only ( Figure 6D). These partial views help to interpret the spatial distribution of lithosomes and the shape of the basement top surface. An interactive 3D HTML version of this model (3D_Lithosomes_LRD.html (accessed on 12 July 2022)) is included in Supplementary Materials. This interactive 3D HTML model can also be opened with any browser and allows observing different views, zooming, rotating, and moving around, as well as hiding elements by clicking in the legend to focus on details.

The 3D Mapping of the Stratigraphic Architecture and Basement Top Surface
ternary LRD, including some partial views relative to (i) coarse lithosomes and basement top surface ( Figure 6A), (ii) gravel lithosome and basement top surface ( Figure 6B), (iii) coarse sand lithosome and basement top surface ( Figure 6C), and (iv) basement top surface only ( Figure 6D). These partial views help to interpret the spatial distribution of lithosomes and the shape of the basement top surface. An interactive 3D HTML version of this model (3D_Lithosomes_LRD.html (accessed on 12 July 2022)) is included in Supplementary Materials. This interactive 3D HTML model can also be opened with any browser and allows observing different views, zooming, rotating, and moving around, as well as hiding elements by clicking in the legend to focus on details.
There is a large, continuous, in time, gravel lithosome near the current Llobregat River course and other minor ones at different depths in the SW sector of the onshore Quaternary LRD ( Figure 6B). There are also two extensive coarse sand lithosomes at different depths, the shallowest one being more important ( Figure 6C). The basement top surface shows a general steeped shape deepening toward the marine platform with an over-imposed horst-graben structure, probably due to faulting ( Figure 6C). When the subsquare raised and sunken sectors structure of the basement is compared with the geological sketch map from Figure 1, a clear correlation between the horst-graben boundaries and the main Tibidabo, Llobregat, and Morrot fault families and their associated minororder faults is observed. These results reproduce well the complex sedimentary structure of the onshore LRD reported in recent scientific publications [20,[37][38][39]55,56].  There is a large, continuous, in time, gravel lithosome near the current Llobregat River course and other minor ones at different depths in the SW sector of the onshore Quaternary LRD ( Figure 6B). There are also two extensive coarse sand lithosomes at different depths, the shallowest one being more important ( Figure 6C). The basement top surface shows a general steeped shape deepening toward the marine platform with an over-imposed horst-graben structure, probably due to faulting ( Figure 6C). When the sub-square raised and sunken sectors structure of the basement is compared with the geological sketch map from Figure 1, a clear correlation between the horst-graben boundaries and the main Tibidabo, Llobregat, and Morrot fault families and their associated minor-order faults is observed. These results reproduce well the complex sedimentary structure of the onshore LRD reported in recent scientific publications [20,[37][38][39]55,56].

Discussion and Conclusions
This paper shows the suitability of the combined use of the KNN algorithm and Python libraries for classifying geological data and visualizing the 3D stratigraphic structure of sedimentary basins. Visualizing first the 3D stratigraphic architecture of sedimentary basins is crucial for making decisions in different environmental and economic geology disciplines, with clear applications in groundwater, oil, engineering, and mineral exploration, as well as in surveying a variety of physical and mechanical ground properties. In the onshore LRD, coarse (gravel and coarse sand) lithosomes storing groundwater are what matter, so this paper focused on them. However, this target may change in other areas with other lithosomes and other resources or physical and mechanical ground properties to be explored. This paper also successfully meets the challenge of evolving from qualitative geological features subjected to interpretation toward quantitative geological data associated with physical parameters. This is a challenge in applied geology, which this research addresses by using parameter granulometry instead of lithological descriptions. The use of quantitative data allows successive numerical modeling exercises with assimilation of new data as they are generated to refine predictions progressively. The public granulometry dataset produced by ACA from 433 boreholes [10] in the onshore LRD was essential in conducting this research.
With regards to the KNN algorithm, it was used for classification purposes under the assumption that the granulometry value in a point must be similar to the measured magnitude in nearby locations. The KNN algorithm predicted the granulometry classes in a consecutive 5 m-equispaced set of horizontal sections created from 0 to 120 m below sea level in the onshore LRD. The created interactive 3D HTML model is included in Supplementary Materials (3D_Horizontal_Sections_LRD.html (accessed on 12 July 2022)). On the other hand, Python has a wide variety of libraries for data handling (such as Pandas and Numpy) and visualization (such as Matplotlib), including a friendly KNN interface available through the machine learning library scikit-learn to infer the magnitude of a given parameter or variable in points where data could not be collected. As deduced, Python is a suitable programming language for the modeling of geological parameters and variables, as the authors of this paper already proposed in previous research [16]. The created interactive 3D HTML model for visualizing the 3D stratigraphic architecture (coarse lithosomes and the basement top surface) of the LRD is included in Supplementary Materials (3D_Lithosomes_ LRD.html (accessed on 12 July 2022)). In general, the results reproduce well the complex sedimentary structure of the LRD reported in recent scientific publications [20,[37][38][39]55,56], thus proving the suitability of the open-source KNN algorithm and Python libraries for visualizing the 3D stratigraphic structure of sedimentary basins. Other proves of the 3D model reliability are as follows. We recently used an algorithm different from KNN to create other 3D models of the LRD; the results being quite similar to these [16]. Moreover, we have determined the horizontal and vertical reliability of our 3D model. At the LRD scale, the lower reliability of the 3D model was tentatively evaluated as 25% of the estimated value, whereas reliability at a small scale of observation was a function of distance and magnitude of granulometry data among nearby points reaching ±50% in the worst cases. The specific methodology to evaluate the 3D model reliability will be the subject of a future publication.
Although there are many commercial and open-source libraries and software devoted to geological visualization, its use requires money in the case of the commercial ones and effort to learn how they work. The open-source KNN algorithm and Python libraries have proven efficient, accessible, and friendly for the purpose of visualizing the 3D stratigraphic architecture of sedimentary basins. Currently, additional research to develop routines aimed at assessing the mapping uncertainty is ongoing.
The introduced Python code can be implemented in other areas with different geological features to model other variables and parameters. Here, the introduced application uses granulometry for visualizing coarse lithosomes of the LRD, which is a sedimentary porous media with huge information about this parameter. In this and other cases, the accuracy of predictions depends on available data density, as expected. In other sparse data areas, the readers can use widely-known empirical relationships to indirectly infer granulometry from physical parameters deduced from hydraulic tests such as pumping tests, geotechnical tests such as Lefranc, Digital Image Analysis of ground textures, or pore-water findings from geophysical surveys such as Electrical Resistivity Tomography and Ground Penetrating Radar techniques [57][58][59][60][61][62][63]. In these cases, a representative enough granulometry dataset from each lithological class is needed to ensure the statistical representativeness of the measured vs. deduced data pairs experimental relationships. In this paper, we used granulometry as a physical parameter, but other physical and chemical variables and parameters such as density, porosity, hydraulic conductivity, thermal conductivity, distribution of a pollutant, and occurrence of minerals, among most others, can also be used. This widens the scope of the application introduced in this paper to many environmental and/or economic interests.
Supplementary Materials: The two interactive 3D HTML models 3D_Horizontal_Sections_LRD.html (accessed on 12 July 2022) and 3D_Lithosomes_LRD.html (accessed on 12 July 2022) can be downloaded at: https://www.mdpi.com/article/10.3390/jmse10070986/s1. The Python code, the KNN algorithm, and the detailed instructions on how to download and run the code are hosted in a GitHub repository and can be downloaded at: https://github.com/dcabezas98/knn-stratigraphicvisualization (accessed on 12 July 2022).

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.