EventGeoScout: Fostering Citizen Empowerment and Augmenting Data Quality through Collaborative Geographic Information Governance and Optimization

: In this manuscript, we present EventGeoScout, an innovative framework for collaborative geographic information management, tailored to meet the needs of the dynamically changing landscape of geographic data integration and quality enhancement. EventGeoScout enables the seamless fusion of open data from different sources and provides users with the tools to refine and improve data quality. A distinctive feature of our framework is its commitment to platform-agnostic data management, ensuring that processed datasets are accessible via standard Geographic Information System (GIS) tools, reducing the maintenance burden on organizations while ensuring the continued relevance of the data. Our approach goes beyond the boundaries of traditional data integration, enabling users to fully harness the power of geospatial information by simplifying the data creation process and providing a versatile solution to the complex challenges posed by layered geospatial data. To demonstrate the versatility and robustness of EventGeoScout as an optimization tool, we present a case study centered on the Uncapacitated Facility Location Problem (UFLP), where a genetic algorithm was used to achieve outstanding performance on both traditional computing platforms and smartphone devices. As a concrete case study, we applied our solution in the context of the Málaga City Marathon, using the latest data from the last edition of the marathon.


Introduction
For those who use them, technology, in general, and the Internet, in particular, are no longer considered to be merely useful tools to aid in certain activities at given moments or in specific contexts.Historically, users clearly differentiated between their offline and online lives, but nowadays this line is blurred and technology (especially smartphones) has become a central part of everyday life.It has become a natural way to access information and operate.
In this context, information acquires an importance that it has never previously enjoyed.The concept of an information society defines a society in which technology is used to create, distribute, and manipulate information.Information has, therefore, become more easily collected and structured, increasing the value of data, thanks to the publishing initiatives of Open Data [1] and Linked Data [2].
The Open Data initiative, or opening up of data, is a philosophy and practice which pursues the idea that certain data and information from public administrations should be accessible and available to all, without legal and technical restrictions.The end goal of the Open Data initiative is that information can be redistributed and reused by citizens as well as companies to the greater benefit of all parties.Therefore, any member of the public or company can analyze, reuse, and redistribute this data (open government), in collaboration with the administration.This, in turn, generates new services to meet their needs, and in doing so, creates wealth through the intelligent management of resources (intelligent government) [3].
Open data represents a valuable asset and resource to help clarify what is happening in our cities and allow us to make informed decisions that guarantee the optimal use of resources in Smart Cities and Intelligent Cities.Within the Smart City strategy, Open Data plays a fundamental role beyond merely publishing a data catalog on a simple web page [4].The Open Data initiative has to construct an all-encompassing, balanced, and active ecosystem of users, and this ecosystem is, in turn, responsible for the efficient management of the data.For this reason, the so-called Open Data community [5], is empowered through specific actions that seek to facilitate and encourage the reuse of data provided by central governments and local councils through the various agencies involved: infomediaries, research bodies, companies and entrepreneurs, civil organizations, the government itself and/or local councils, and, above all, the general public.
In this context, the citizens themselves could constitute a service of the platform (Internet of People) [6,7].This service would be charged with identifying new data sets of interest, collecting and producing data, amending and updating them (both published and open data and their associated metadata) placing the responsibility of ensuring the integrity of restricted data on the administration, but preventing the administration from offering up only the data that it considers relevant through mediums that it considers to be appropriate.In this way, the citizens contribute toward easing the maintenance costs of the information and preventing the information from becoming obsolete by returning new layers of information to the system.After supervision, this can then be reused by the ecosystem.
Given this backdrop, the concept of neogeography gains heightened significance.Neogeography refers to the practice where individuals engage in geographic data collection and mapping activities, often without formal training in traditional geographic disciplines.This phenomenon has been facilitated by the widespread accessibility of Internet technologies, GPS-enabled devices, and user-friendly mapping tools, allowing citizens to play an active role in the creation and manipulation of geographic information.
This participatory approach aligns with the principles of the Open Data initiative, as neogeography empowers citizens to contribute to the vast pool of geographic data.They can identify new datasets of interest, collect and produce data, and amend and update both published data and associated metadata.Such contributions are vital in ensuring the currency and relevance of geographic information, thus enhancing the quality and utility of data within our proposed GIS framework.By integrating neogeographical practices, our system not only democratizes data generation but also enriches the data ecosystem with diverse, real-time, and ground-level insights.
Therefore, our framework not only facilitates efficient data management but also harnesses the power of citizen-driven data generation, embodying the essence of neogeography.This approach underscores the transformative potential of integrating public participation in reshaping geographic data management and optimization paradigms, particularly in the context of Smart Cities and Intelligent Government.
This paper introduces a Geographic Information System (GIS) designed to effectively manage various components of geographic information sourced from diverse public entities.Our framework not only streamlines data management but also embraces active public participation, allowing citizens to engage with the system.Users are empowered to edit existing data and contribute additional layers of information, alleviating the financial burden on governmental bodies responsible for data compilation and maintenance.
To illustrate the framework's capabilities and validate its concepts, we present a practical case study focusing on the Uncapacitated Facility Location Problem (UFLP) in Section 5, which demonstrates how our framework efficiently locates a series of facilities to provide optimal services to a specific number of clients.The problem entails determining the total number of facilities to open, which facilities should be selected, and which should remain closed.The framework generates a visual map illustrating the allocation of clients to each facility, ultimately achieving an assignment with minimized costs.
In the realm of scientific and academic discourse, it is imperative to explore alternatives beyond Artificial Intelligence (AI) solutions, primarily due to emerging concerns regarding their substantial environmental impact.A recent study conducted by the University of Massachusetts [8] has shed light on the alarming carbon footprint associated with the training of large-scale AI models.This study reveals that the process can emit over 626,000 pounds of carbon dioxide, equivalent to nearly five times the lifetime emissions of an average automobile, inclusive of its manufacturing.This environmental concern underscores the necessity of seeking more sustainable problem-solving approaches in scientific research and technology development.
This paper highlights the transformative potential of our GIS framework in reshaping the paradigms of geographic data management and optimization.We place particular emphasis on highlighting the invaluable contributions of citizen engagement and the use of intelligent geoprocessing tools.To support our claims, we have implemented, tested, and evaluated a genetic algorithm that effectively identifies and displays peak citizen concentrations in real time during specific events.Before applying the genetic algorithm to the case study, performance was tested on Android and traditional platforms using the commonly used ORLIB dataset to verify the validity of the implementation.As a concrete case study, we applied our solution in the context of the Málaga City Marathon, using recent data from its most recent edition.Our framework's ability to provide instant insights into peak crowd densities at critical moments positions it as a key asset for emergency response efforts.Real-time information on peak crowd densities can be instrumental in timely responses to potential incidents, enabling rapid action by medical professionals, firefighters, law enforcement, and other relevant authorities.
Structure.The paper is structured as follows.We delve into a comprehensive exploration of the existing body of knowledge in the field of collaborative geographic data management.Section 2 offers a thorough review of prior research, highlighting key contributions and gaps in the literature.Section 3 presents the core of our framework, providing insights into its development and applications.In Section 4, we venture into the realm of geoprocessing, focusing on the uncapacitated facility location problem and heuristic methods for optimization.Section 5 unfolds a practical application of our framework in the case study.This case study elucidates our framework's real-world application.Lastly, in Section 6, we synthesize our findings, offering a glimpse of future research directions and the broader implications of our work.

Related Works
The integration of genetic algorithms within the landscape of data science and Geographic Information Systems (GIS) has marked a significant advancement in tackling complex optimization challenges, such as the Uncapacitated Facility Location Problem (UFLP).Studies by Huang et al. [9], Ozyurt et al. [10], and Gallego et al. [11] exemplify their efficacy in optimizing vehicle routes, designing utility networks, and solving urban location-allocation problems.Despite these advancements, a notable gap remains in the application of genetic algorithms in GIS, particularly in the realm of collaborative open data management [12].
Our framework aims to bridge this gap by leveraging genetic algorithms in open data environments, transforming data publication into a dynamic, interactive process involving end-users in data quality enhancement.This approach is particularly pertinent in the era of mobile computing and IoT, where citizens increasingly contribute to data generation, echoing the concept of 'Open Data: People as Service' [13].
Collaborative platforms, such as GeoNode and Esri's ArcGIS Hub, have revolutionized the way open geo-data are integrated and applied, facilitating community engagement and data-driven decision-making [14,15].In the realm of Volunteered Geographic Information (VGI), platforms like Mapillary and KartaView have utilized crowdsourcing to transform geographic data collection [16,17].Virtual Research Environments (VREs), like the Australian Urban Research Infrastructure Network (AURIN) [18], have become instrumental in providing collaborative tools and data for urban research.
Standards and initiatives for interoperability and quality management, such as the Open Geospatial Consortium (OGC) [19] standards and the Infrastructure for Spatial Information in Europe (INSPIRE) Directive [20], are crucial in enhancing the usability and accessibility of spatial data.The role of institutions like the Lincoln Institute of Land Policy in enhancing the quality of Volunteered Geographic Information (VGI) is crucial, particularly in the context of the growing focus on data accuracy and validation methodologies.The Institute's work emphasizes the need for rigorous validation processes, a principle exemplified by platforms and initiatives, such as GeoNames, Google Earth Engine, ArcGIS, Ushahidi, Mapillary, the Humanitarian OpenStreetMap Team (HOT), and Global Forest Watch, each of which provides unique insights into the challenges and solutions in VGI quality management.
GeoNames, for instance, is a vast database of geographical names where data quality is paramount.The Lincoln Institute's approach to VGI quality aligns with the practices in GeoNames, where user-contributed data undergo scrutiny for accuracy [21].Similarly, Google Earth Engine offers an extensive array of environmental data analysis tools, presenting a case study in managing and verifying large datasets, a challenge also pertinent to VGI [22].The use of ArcGIS tools in community mapping and spatial analysis further underscores the necessity of reliable data, a goal that the Lincoln Institute's focus on VGI quality seeks to achieve [23].
In the realm of crisis mapping, platforms like Ushahidi demonstrate the critical importance of accurate and reliable geographic information, especially in emergency situations.This aligns with the Lincoln Institute's emphasis on dependable data in VGI [24].Mapillary's approach to street-level mapping through crowdsourced imagery presents another example where the quality of VGI is vital, resonating with the Lincoln Institute's objectives in promoting data reliability [16].
Moreover, the Humanitarian OpenStreetMap Team (HOT) leverages VGI in humanitarian contexts, where the quality and timeliness of data can significantly impact relief efforts.This use case exemplifies the kind of challenges that the Lincoln Institute aims to address in VGI quality assurance [25].Lastly, Global Forest Watch's real-time monitoring of forests using satellite imagery highlights the need for high-quality, reliable VGI, a standard that the Lincoln Institute of Land Policy seeks to uphold in its advocacy and research efforts [26].
Recognizing the challenges faced by various geospatial platforms, our framework advocates for a user-centric approach.It integrates official data sources with user-generated data, ensuring quality and reliability.The capabilities and limitations of platforms like Mapbox, GeoServer, and KartaView highlight the diversity of contemporary GIS tools and the importance of addressing issues like data coverage, privacy, and technical expertise [17,27,28].
Additionally, volunteer-generated GIS has proven indispensable in emergency scenarios, such as the Mexican Gulf oil spill and the earthquakes in Haiti and Chile [29].The Sahana project exemplifies the power of free software in disaster management, improving coordination and resource allocation during emergencies [30].Geographic information's relevance and accuracy are continually challenged by the rapid pace of real-world changes.Projects like OSM and Wikimapia have showcased the potential of crowdsourced geographic information in maintaining up-to-date, relevant data [31,32].Other initiatives, such as the Global Positioning System (GPS) and remote sensing technologies, have further enhanced data collection and accuracy in GIS.
Emerging technologies like Artificial Intelligence (AI) and Machine Learning (ML) are also reshaping the GIS landscape.Platforms integrating AI for automated data analysis and pattern recognition are becoming increasingly prevalent, offering sophisticated tools for geospatial data analysis and decision-making.
The expansion of collaborative platforms, the integration of advanced technologies like AI and ML in GIS, and the development of standards and tools for managing VGI and open geo-data are collectively redefining the field of geospatial data management.These advancements not only enhance the efficiency and accuracy of data management but also promote inclusivity and collaborative problem-solving in the realm of GIS.

Pioneering Collaborative Geographic Data Management: Our Cutting-Edge Framework
The framework in question came about to address the growing need to integrate geographic data of diverse nature, dimensions, and sources.More specifically, our framework operates with open data available from different government bodies as well as data captured by the users of these entities themselves.The integration of these information sources is conducted within the framework.
The framework consists of two different modules, illustrated in Figure 1: a geographic information collection system and an open geoprocessing tool.The collection part includes the information from different sources, public services, and tools that we have developed and that will be explained in detail in Section 3.1.The original information can be processed in the information analysis module to produce new information.In this way, our proposal not only stores information, but is also capable of generating new information based on the information obtained by the capture module.This is the objective of the Geoprocessing Module for Data Evaluation and Optimization, see Section 3.3.
In a nutshell, the framework embodies essential functionalities that are characteristic of systems of this type, including (a) data ingestion, (b) information management, (c) data analysis, and (d) user interaction.In the following sections, we provide a comprehensive breakdown of each of these integral components, inviting the reader to delve deeper into the intricacies of our framework.

Client and Gateway Functionality in Geoinformation Collection
GeoServer is a server widely used in the community to host geospatial information.In order to create a collaborative environment, we encountered two main problems that affect the final outcome of the system.Firstly, GeoServer's access tools are currently not usable on mobile devices, but only on conventional platforms.Second, Geoserver manages data at the layer level, not at the level of individual elements such as points, lines, or polygons.This situation forced client management to be much more complex than it needed to be, adding data and layer management tasks to the clients.
For all of the above, the solution adopted in the design of the system was the creation of an intermediate server, or gateway, which allowed the management of the layer clients to be relieved and the interaction with the GeoServer to be simplified.The protocol between the gateway and the GeoServer is realized through REST connections, while the connection between the clients and the gateway required the design of a proprietary protocol using Java sockets.
Both Java and Android clients play an important role in the field of geospatial data collection.The design of each client is specifically tailored to the capabilities of their respective devices.The Java client, designed for larger screen interfaces, allows users to enter data through an interactive process of selecting geographic points with a mouse.This method is facilitated by the client's user interface.
Conversely, the mobile client uses the built-in GPS functionality of smartphones to capture accurate, real-time geospatial data in a similar way to taking a photograph.Regardless of the client type, the system is capable of displaying a variety of geospatial objects, from single points to more complex shapes such as linear features, polygons, and volumes.
To increase functionality, reusability, and portability, we have developed two different types of clients.The Android client (Figure 2a) is optimized for cataloging individual objects in urban environments, while the Java client (Figure 2b) is more suitable for analyzing fixed points of interest that are easily identifiable on a map but may be difficult to reach physically.
The system framework includes a gateway that acts as a central link to various potential data sources.It performs several key functions: (a) configuration management, (b) user management, (c) integration with different spatial data sources, and (d) connection to external services for information retrieval.These functions are supported by a set of data layer management tools and an additional component that allows the gateway to interface with external geospatial data acquisition services using URLs as service parameters.

Publishing and Validation of Open Data through GeoServer Integration
The life cycle of the open data is closed through a component, which links the gateway with GeoServer so as to publish, if so desired, the information layers on the server and allow this layer to be newly available to the ecosystem in the form of open data.In other words, we can receive a point set from a public organization, edit, or add new elements, and then, once more, publish them in GeoServer so that the institution can validate and update the information with the users' contributions.As mentioned, this component allows the users' information to be monitored, validating data and improving its quality.The inclusion of GeoServer means that the information in the framework is available to any GIS through the services that GeoServer offers, in this specific case, the Web Feature Service (WFS) [33].In addition, once the data are stored in Geoserver, they can be translated into any standard format, as shown in Figure 3, thus freeing the other parts of the system from the management of formats and leaving this function to the Geoserver.
This step represents an innovation with respect to the routine use of open data, where they are usually only available for user queries (subject or spatial searches and multilocation searches for subsequent analysis) with slight possibilities to change the original information or to provide feedback to the system itself.

Geoprocessing Module for Data Evaluation and Optimization
Based on the compiled data in Sections 3.1 and 3.2, the system provides a module, in open, to evaluate the data layers.This module is a tool for geoprocessing one or more data layers subject to a target function that the user wishes to maximize or minimize, together with an initial set of restrictions imposed by the context of the problem.Starting with this mathematical representation of the problem, the module carries out an intelligent processing of the information, in accordance with the imposed restrictions, and returns the result to the user in distinct formats to make it clearer to read and understand.
The graphical representation in Figure 4a shows the application that we have carefully developed.This application implements the Genetic Algorithm, the details of which are described in Section 4.1.After user authentication, the application provides a comprehensive view of the available system layers.In our specific case, the image shows the layers representing the participants of the 2022 Malaga Marathon at different milestones: the 5th, 10th, 20th, 30th, the finish line of the first runner, and the intervals one, two, and three hours after the finish of the first runner.In addition, a special layer was created to show the exact position of the ambulances along the marathon route.These are the layers that will be used later in the case study in Section 5.
This application boasts two distinct buttons situated at the bottom left corner, each tailored to offer specific functionality: 1.
Genetic: This button facilitates the execution of the algorithms detailed in Table 1, allowing for the evaluation of the genetic algorithm's performance and efficiency.Figure 4b serves as an illustrative example of the algorithm's execution in solving the 'cap71' problem, showcasing the solution derived through the genetic algorithm.

2.
UFLP: The 'UFLP' button activates the Uncapacitated Facility Location Problem algorithm and allows the user to select two of the available layers-one designating the facility layer and the other acting as the client layer.The adaptable design of the application lends itself to different scenarios where the UFLP algorithm is applicable.
While we have used it in the context of a marathon, the application can be seamlessly adapted for use in other mass events, such as Easter, the Malaga Fair, or the Three Kings Cavalcade, all of which are very popular in the city of Malaga.It is also important to note that this framework has the potential to be extended to other cities and scenarios involving geolocated citizens in need of services, such as ambulances and police, ensuring effective coordination and resource allocation.
As an example of the versatility of the tool, in Figure 5, we show the execution of a problem to optimally locate a warehouse in a province of Andalusia.Note the difference in implementation between the Java and Android clients.A key aspect of our approach is to develop an accessible and straightforward interface that allows users to select appropriate output formats based on the dataset being used, the results being produced, and the objectives of the analysis.The system is designed to support a range of output options, including maps, graphs, and tables.This functionality is essential to ensure that the system is adaptable to different user requirements and analytical objectives.

Exploring Geoprocessing Capabilities: The UFLP Case Study and Heuristic Approaches
Our framework is designed to address various challenges in organizing geographic information in layered formats.To demonstrate its effectiveness, we have chosen the Uncapacitated Facility Location Problem (UFLP) as a case study to assess the system's geoprocessing capabilities.
As described by Verter et al. [34], the UFLP involves determining the optimal number of facilities and their locations so as to minimize both the fixed setup costs and the variable costs of meeting market demand from those facilities.This optimization problem, which has been studied extensively for decades, is classified as NP-hard, although certain cases can be solved in polynomial time.
The UFLP can be defined as follows: let I = 1, . . ., m represent possible facility locations and J = 1, . . ., n the set of clients.Each facility i incurs a fixed cost f i , while c ij denotes the transportation cost from facility i to client j.The goal is to ascertain the optimal number of facilities to minimize the total cost, represented by the following expression: subject to: x ij = 1 for all j ∈ J; (2) 0 ≤ x ij ≤ y i and y i ∈ {0, 1}, for each i ∈ Iand for each j ∈ J; where: x ij represents the quantity supplied at each facility i for client j; y i indicates whether the facility i is available or not.
Below is an overview of some of the most relevant algorithms for resolving the UFLP.The dual-based algorithm [35] is recognized as one of the most effective techniques for generating potential solutions.Some researchers have also employed the branch and bound algorithm [36].However, when dealing with search spaces as extensive as those found in the UFLP, and considering that the most efficient algorithms for solving it demand exponential time, it becomes evident that conventional search and optimization techniques are impractical due to their computational cost.Consequently, it is necessary to turn to heuristic methods, which offer a more practical solution within a shorter time frame.
A heuristic method is an approach that seeks to find good solutions, often nearly optimal, at a reasonable computational cost, even though it does not guarantee feasibility or optimality.In some cases, it is even impossible to determine how close a feasible solution is to being optimal.Among the state-of-the-art heuristic techniques, we can include methods such as genetic algorithms [37][38][39][40][41], simulated annealing [42], and tabu search [43][44][45].
Despite the advantages associated with these methods, the relevant code for their implementation is not readily available to readers interested in running simulations.Nevertheless, we do possess computational times and reports on the behavior of these algorithms when applied to the ORLIB [46] problem collection.
Our primary objective, therefore, is to introduce a genetic algorithm capable of solving the UFLP more efficiently than existing approaches in the literature, particularly when addressing the ORLIB problem collection.To ensure experiment reproducibility, all the code and data used are available in the repository (https://github.com/montenegro-montes/EventGeoScout (accessed on 1 February 2024)).

A Genetic Algorithm Approach for the UFLP
Algorithm 1 presents the pseudocode that underlies our genetic algorithm.This algorithm adheres to the fundamental principles of genetic algorithms [47] while integrating elements, notably a meticulous selection process and an adept crossover operator.Additionally, we have implemented a pragmatic solution to circumvent premature convergence, a common challenge in genetic algorithms [48].Our solution represents a significant advancement in mitigating convergence issues, further enhancing the robustness and efficiency of the algorithm.
The genetic algorithm's effectiveness and reliability have been rigorously evaluated and substantiated through extensive testing.For validation, we conducted a series of trials using benchmark problems sourced from the Operations Research Library (ORLIB) [46].This comprehensive assessment provides solid evidence of the algorithm's capacity to deliver high-quality solutions.The utilization of such established benchmarks not only ensures the algorithm's trustworthiness but also offers valuable comparisons to other approaches in the field, thus highlighting its competitive edge.
This genetic algorithm, honed through meticulous design and validation, constitutes a pivotal component of our framework.It exemplifies our commitment to developing stateof-the-art tools that address complex geospatial optimization challenges.Its integration into the framework amplifies its potential for various applications, making it a valuable asset for researchers and practitioners in the field.
Our approach to representing genetic individuals aligns with methodologies presented by various researchers in the field.In our model, individuals are depicted as a binary string of length m.Each binary variable within this string takes a value of 1 to indicate the presence of a facility at that location, or 0, if a facility is not present.The optimal solution for the problem is identified as the representation that yields the minimum value of the genetic algorithm's evaluation function.Here, the values of x ij are selected to minimize the distance to the nearest available facility.
A GA commences its operation with an initial population of chromosomes, which is randomly generated.Subsequently, it enhances this population by applying genetic operators rooted in natural processes, thereby yielding improved chromosomes.The population evolves through natural selection.Over successive iterations, called generations, the suitability (fitness or adaptation) of the chromosomes are evaluated and assessed as solutions and based on this evaluation, a new population of chromosomes is chosen, using a selection mechanism and specific genetic operators, like cross and mutation.The majority of the approaches consulted determine the initial population through the random selection of individuals.However, our solution is based on the work [38], which establishes a procedure for generating the initial population.Basically, the initial population is located in those areas of the search space where there is a greater possibility that more optimal solutions exist.

Performance Assessment of the Genetic Algorithm for UFLP
The experiments were carried out on a MacBook Pro running macOS Ventura, equipped with a 2.3 GHz Intel Core i5 processor and 8 GB of RAM for the desktop version.For the Android system, a Samsung Galaxy A14 running Android 13 was used.Our software implementation was developed using Java 1.8.0_152 and executed within the Eclipse 2023-09 development environment and Android Studio Studio Giraffe 2022.3.1.This setup provided a stable and controlled platform for our algorithmic analysis.
To assess the performance and robustness of our genetic algorithm, we conducted tests using 15 different problem instances sourced from the ORLIB dataset.These problem instances were carefully selected to represent a range of complexities.Specifically, we categorized them into three groups: Additionally, we extended our analysis to address problem instances capA-capC, which posed a significant increase in complexity.These instances encompassed 100 clients and 1000 facilities, a configuration that demands a more extensive exploration of the solution space.To tackle this, we increased the population size to 200 individuals.
To evaluate our genetic algorithm, we ran each of the problems in the ORLIB dataset 1000 times.The results of the runs on both the traditional and mobile platforms are shown in Table 1.Our comprehensive evaluation provided valuable insights.In the most demanding scenario (capc), we observed that an average of 2264 generations, equivalent to 53 s of computation time on the traditional platform and 63 s on the mobile platform, were required to successfully solve the most complicated problem instance.This result highlights the ability of the algorithm to effectively deal with highly complex problem domains.In stark contrast, the simplest problem instance (cap74) was solved quickly, with only 10 generations taking only 1 and 15 ms on the traditional and mobile platforms, respectively.This highlights the adaptability and efficiency of the algorithm when dealing with less challenging problem instances.In summary, our extensive experiments and analyses have demonstrated the versatility and reliability of the algorithm across a spectrum of problem complexity, confirming its suitability for various real-world optimization tasks.
When applying the genetic algorithm to solve the UFLP with a two-layered framework, fixed costs can be set in two ways, either randomly or manually entered by the user.In case they are calculated randomly, they will be assigned a value between 0 and 1000 units.An automatic calculation of the fixed costs makes the application of the algorithm easier, although the application can be edited by the user to include any target values.The costs of transport, however, are automatically calculated using the latitude and longitude for each client and facility, using the Haversine Formula (4), which evaluates the difference between the latitude and longitude of two points.
where R represents Earth radius, and ϕ and λ are latitude and longitude, respectively, in radians.Specifically, ϕ 1 is the latitude of point 1, ϕ 2 is the latitude of point 2, and (△λ) is the longitude difference between points 1 and 2.

Case Study: Analysis of Runner Distribution and Emergency Resource Allocation in Malaga Marathon
Our research is centered on the Malaga marathons of 2019, 2021, and 2022 (the 2020 edition was canceled due to COVID restrictions).This choice of case study is motivated by its relevance in addressing a significant issue: the notable and variable disparities among marathon runners, which present challenges in optimizing the allocation of emergency resources for effective coverage.Figure 6b visually illustrates the comparison of the distances (in kilometers) between the first and last runners in these selected marathon editions.As depicted in the graph, this pattern of disparity remains consistent across the three editions we have analyzed.The gap between the front-runner and the last participant gradually widens as the marathon progresses, reaching its peak distance of approximately 27 to 25 km at the moment the marathon winner crosses the finish line.Following the winner's triumphant finish, this distance gradually contracts over the subsequent three and a half hours.This specific time frame holds particular importance, as it represents a concentrated period when runners, who may potentially require medical assistance, are most densely clustered.
In our endeavor to obtain the Malaga marathon route, we leveraged the Wikiloc website (https://www.wikiloc.com/(accessed on 1 February 2024)), a collaborative platform where users collaboratively share and upload outdoor trails designed for a variety of outdoor activities.Our initial step involved downloading a GPS eXchange Format (GPX) file, meticulously delineating the marathon course (refer to Figure 6a) through a sequence of precise GPS points.The data concerning the runners was meticulously sourced from the race website (https://sportmaniacs.com/es/races/generali-maraton-malaga-2022(accessed on 1 February 2024)).It is pertinent to highlight that our dataset exclusively encompasses data from runners who successfully completed the race.
Both the runner data and the marathon course were extensively processed using an R script.This script played a pivotal role in generating heat maps at different intervals, notably at the 5, 10, 20, 30, and 38 km markers, and the finish line, as provided by the race organizers.For the heat maps pertaining to intervals post the marathon winner's finish, we estimated the runners' locations by taking into consideration the race pace of each individual participant.These heat maps represent invaluable tools for determining optimal medical services placement, as they provide a visual representation of runner concentration at various points along the marathon route.
We conducted this comprehensive evaluation for all three editions of the marathon to gain insights into the behavior of the runner cohort.Regrettably, due to space constraints (the complete evaluation of the three editions at the different kilometer points can be consulted at https://github.com/montenegro-montes/EventGeoScout/tree/main/R/HeatMap (accessed on 1 February 2024)), we have included only the evaluation conducted two hours after the marathon winner's finish (see Figures 7 and 8), which serves as our primary case study.As evident from the figure, the distribution observed at the scenario evaluated two hours after the marathon winner's finish exhibits consistent patterns across the years 2019, 2020, and 2022.This consistency allows us to assert that the findings and problem evaluation from one edition can be effectively extrapolated to the remaining years.In the heat map of the runners distribution, the colour ramp from highest to lowest density is red, yellow, green and blue.

Utilizing UFLP in Healthcare Service Provision
The application of the UFLP algorithm with ambulances as facilities and runners as customers can be described as follows: • Facilities (F): Representing the available ambulances designated for providing emergency services.• Customers (C): Encompassing the marathon runners who might necessitate medical assistance during emergencies.• Fixed Costs (F i ): Represents the cost of each ambulance, which may represent the number of medical personnel in each ambulance.We identified a fixed cost of five in the case study we did in Sections 5.2 and 5.3.

•
Transportation Costs (C ij ): Signifying the expenses associated with serving each customer (runner) from each facility (ambulance), with costs contingent upon the geographical distance, using the Haversine Formula (4), described in Section 4.2.
The UFLP algorithm serves as a pivotal tool to identify the optimal placement of facilities (ambulances) with the primary objective of minimizing overall costs.This optimization task hinges on making critical determinations regarding the strategic distribution of ambulances across various geographic regions to ensure an efficient response to runners in need of assistance.Upon resolving this problem, the algorithm yields an optimal ambulance deployment plan, including the allocation of ambulances to individual runners based on minimizing transportation costs.

Fixed Ambulance Assignment: Initial Approach to the Optimization Problem
The selected scenario is the 2022 Malaga Marathon.However, if we examine the following images (Figure 7), we can see a significant similarity in the behavior of the 2022 cohort of runners compared to the 2019 and 2021 editions of the Malaga Marathon.
Let us first examine the case where the ambulances remain in a static position.In this specific illustration, each facility, denoting an ambulance, is associated with a fixed cost of 5, which can be construed as signifying the number of available ambulance operators stationed at each facility.For the sake of consistency, we have assigned the same costs to all facilities in the subsequent examples, but it is imperative to emphasize that our application allows for the assignment of distinct values, or even the introduction of random values.When uniform costs are applied to all ambulances, the allocation is predominantly influenced by transportation cost, which hinges on the distance of the ambulances from the marathon route.Regrettably, this approach results in an imbalanced allocation of ambulances, with certain ambulances being heavily utilized, while others remain considerably underutilized.
Figure 9 offers insight into the distribution of runners' positions at the 5 and 20 km marks as the lead runner progresses.In Figure 9a, we implement the genetic algorithm to allocate merely two ambulances (ambulances 1 and 9) out of the eleven ambulances avail-able, predicated on cost per distance.The solution yields a total cost of 71,198, achievable within a period of 574 ms.It is worth noting, however, that this allocation disproportionately assigns the majority of runners (approximately 89%) to a solitary ambulance.This salient observation underscores the pressing need for a more equitably balanced resource allocation.
As the race advances to the 20 km mark, demonstrated in Figure 9b, there is a change in the race's trajectory.In this specific instance, four ambulances (ambulances 2, 3, 8, and 10) are selected, culminating in a total cost of 92,242, with a corresponding execution time of 592 ms.Although the allocation remains somewhat unequal (spanning from 9% to 61%), it does indicate a more equitable distribution compared to the antecedent cases.The culminating two cases, as presented in Figure 10, transpire one and two hours after the race's victor crosses the finish line.These contexts feature progressively diminishing numbers of runners, culminating in 2379 and 794 remaining runners, respectively.Despite the dwindling number of runners, these scenarios hold a pronounced interest, given that they are likely to encompass runners who are most exigent in their need for assistance.
In these instances, the allocation includes three ambulances (ambulances 7, 9, and 10) and six ambulances (ambulances 2, 4, 5, 6, 7, and 9), correspondingly.The costs exhibit a declining trend, amounting to 79,604 and 22,701, and the execution times mirror this descent, registering at 508 and 279 ms.These cases, notably, depict a more uniform distribution in allocation, although they continue to underscore the formidable challenge of achieving an optimally balanced allocation of resources.The last two cases (Figure 10) are one (Figure 10a) and two (Figure 10b) hours after the winner crosses the finish line.We can observe that the number of runners is decreasing, not becoming plausible after one hour (2379 runners), but the decrease is important at two hours, leaving 794 runners.Although the number of runners has decreased, these cases are considered interesting because they are possibly the runners most in need of assistance.
The allocations obtained are three (7-32%, 9-62%, 10-6%) and six (2-3%, 4-4%, 5-10%, 6-35%, 7-39%, 9-9%) ambulances.Both the costs obtained (79,604 and 22,701) and the execution times (508 and 279 ms) follow a decreasing trend due to the reduction of active runners.With respect to the distribution of the allocations, we observe that the last case has a more equal distribution, although it confirms the initial assumptions that it is far from an optimal allocation of resources.

Optimizing Ambulance Allocation: A Dynamic Approach for Precise Assignment
The algorithm was applied to the same scenario, the marathon of the most recent edition available to date, as explained in the previous section.To illustrate the optimization process, we have focused on a specific scenario within the marathon context.Specifically, we concentrated on the period following the conclusion of the race, extending to two hours thereafter.The progression of allocating ambulances is graphically represented in Figure 11, with individual depictions provided in Figure 11a,b.The strategic placement of ambulances was determined by making use of the visual cues presented in Figure 7c and the heatmap, which is weighted based on runner density, as displayed in Figure 8c.It is crucial to emphasize that the initial fixed allocation cost of ambulances was 22,701, as is evident in Figure 10b.With a systematic approach to fine-tuning the allocation of ambulances along the marathon route, based on the heatmap that reflects the locations of the runners, we consistently reduced the initial cost from 22,701 to 12,008, marking a substantial reduction of nearly 50% in allocation expenses.
The allocation represented in Figure 11b, featuring a cost of 12,008, deploys nine out of the eleven available ambulances.However, it is important to note that the assignment of corridors to each ambulance is not distributed evenly, with proportions of 33%, 4%, 7%, 6%, 15%, 11%, 7%, 10%, and 7%.It is worth mentioning that during multiple problem runs, it was observed that two of the eleven available ambulances remained unused.Consequently, we opted to run the algorithm with only nine ambulances.This adjustment not only yielded a significantly reduced cost but also delivered a more equitable distribution in ambulance allocation, with proportions of 25%, 22%, 8%, 6%, 14%, 11%, 5%, and 9%.Although the allocation's equilibrium improved in the latter case, there is still room for further enhancement in the allocation process.
Significantly, the execution times for these scenarios were notably reduced since we had fewer corridors and fewer ambulances to consider.This resulted in a substantially reduced computation time, taking only 105 ms.Importantly, it is essential to underline that while we have assigned uniform values to each ambulance in this case study, the program's flexibility allows for customized cost allocation, thus enabling a closer approximation to real-life problem scenarios.

Conclusions and Future Directions
This study explores neogeography, focusing on tools and techniques that enable users to create digital geographic products for personal use.These emerging fields have cultivated a community of volunteers who contribute georeferenced information, facilitating various applications like map updates and disaster data management.
Our key contribution is the development of a framework for managing geographic information from multiple sources.It allows users to edit and contribute data, reducing the burden on organizations to maintain and update this information.The framework supports collaborative open data initiatives where public input can enhance and update existing data.Data collection is facilitated by a mobile application that allows users to easily capture geographic information.For areas where mobile data collection is impractical, a Java client helps to select data on maps.This collected data is integrated into the framework and made available to other users.
The framework manages geographic data at the layer level, encouraging its wider use.It allows users to contribute to the data pool, encouraging a collaborative approach to geographic information management.In addition, the framework includes geoprocessing capabilities, demonstrated through a case study on the Uncapacitated Facility Location Problem (UFLP).This capability allows users to identify optimal facility locations even in the absence of government-provided data.
Future research will explore different algorithms for solving the UFLP and apply genetic algorithms to large-scale real-world scenarios, such as optimizing the placement of electric vehicle charging stations.We are also working on incorporating new geospatial processing tools.The developed software and ongoing developments are available on the public github repository (https://github.com/montenegro-montes/EventGeoScout(accessed on 1 February 2024)).
The framework has promising applications in emergency response scenarios, as demonstrated by the case study of the Malaga City Marathon 2022.Its versatility is also evident in its ability to manage crowd densities at various large-scale events, thereby improving emergency response coordination.Our framework is committed to making a significant contribution to collaborative geographic data management and analysis.We achieve this by utilizing volunteer contributions and advanced geoprocessing techniques.

Figure 1 .
Figure 1.System and framework description.

Figure 4 .
Figure 4. Geoprocessing module for data evaluation and optimization.(a) Main screen of Java client.(b) Execution of cap71 problem.

Figure 5 .
Figure 5. Result of using the UFLP algorithm in Java and Android client.

Figure 6 .
Figure 6.Malaga Marathon information on 2019, 2021, and 2022 edition.(a) Malaga Marathon circuit.(b) Distance between the first and last runner.

Figure 9 .
The allocation of 12 ambulances in km 5 and 20.(a) Position of runners at km 5. (b) Position of runners at km 20.

Figure 10 .
The allocation of 12 ambulances at the finish, after 1 and 2 h.(a) Position of runners at Finish +1 h.(b) Position of runners at Finish +2 h.

Table 1 .
Genetic algorithm evaluation on MacBook Pro and Android.