Multivariate Statistical Analysis for Water Quality Assessment: A Review of Research Published between 2001 and 2020

: Research on water quality is a fundamental step in supporting the maintenance of environmental and human health. The elements involved in water quality analysis are multidimensional, because numerous characteristics can be measured simultaneously. This multidimensional character encourages researchers to statistically examine the data generated through multivariate statistical analysis (MSA). The objective of this review was to explore the research on water quality through MSA between the years 2001 and 2020, present in the Web of Science (WoS) database. Annual results, WoS subject categories, conventional journals, most cited publications, keywords, water sample types analyzed, country or territory where the study was conducted and most used multivariate statistical analyses were topics covered. The results demonstrate a considerable increase in research using MSA in water quality studies in the last twenty years, especially in developing countries. River, groundwater and lake were the most studied water sample types. In descending order, principal component analysis (PCA), hierarchical cluster analysis (HCA), factor analysis (FA) and discriminant analysis (DA) were the most used techniques. This review presents relevant information for researchers in choosing the most appropriate methods to analyze water quality data.


Introduction
The topic of water has received high visibility and attention on the global sustainability agenda.This is due to increasing pressure from factors such as economic development models, climate change, population growth and public health [1,2].Sustainable development objective number 6 (SDG6) of the United Nations 2030 Agenda for Sustainable Development is entirely dedicated to water and, in addition to addressing major challenges of universal access to sanitation and water in desirable quantity and quality, presents issues related to water resources management [3].
The analysis, assessment and monitoring of water quality are important tools for water resource management, providing a comprehensive understanding of the state of water [4,5].Although water quality data at the global level remain sparse, mainly due to the lack of monitoring in less developed countries, there has been a tendency for the generation of these data to increase, via studies that analyze water quality [1,6].
Water quality can be understood as a measure of the suitability of water in relation to natural quality, pollution effects or specific use based on physical, chemical and biological attributes [7].This measure provides objective evidence that is needed in decision making in water resource management, in the use of water quality monitoring programs [8], in alerting people to ongoing and emerging problems (including chemical and microbial contamination, eutrophication, emerging contaminants, issues related to climate change, among others), in determining compliance with legal standards, in protecting the beneficial uses of water, in the assessment of environmental status, in temporal trends in water quality [9] and in the assessment of the effects on aquatic ecosystems [10].
The elements involved in water quality measurement are naturally multidimensional, because many aspects must be considered.Furthermore, the presence of anthropic, geological, meteorological and hydrological external factors contributes to the spatial and temporal variation in water quality [11].This multidimensional nature encourages researchers to statistically examine the data generated.Selecting the most appropriate statistical methods is critical when seeking to obtain meaningful results, especially when evaluating complex datasets.
Among the different approaches to exploring the variables analyzed in water quality, multivariate statistical analysis (MSA) stands out [12,13].MSA is applied in many fields of study and its use has become very common, due in large part to the increasingly complex nature of research projects and questions.It aims to explain or predict the relationships between many independent and/or dependent variables that are correlated with each other.The greater the number of variables, the more difficult it is to analyze via common methods.MSA can provide both a descriptive (patterns in the data) and an inferential (testing hypotheses about patterns of interest) approach [14,15].
MSA is a set of data analytical techniques that is under constant development.Highlights among the most established multivariate analyses include principal component analysis and factor analysis, multiple regression and multiple correlation, multiple discriminant analysis, multivariate analysis of variance and covariance, canonical correlation analysis, cluster analysis, and multidimensional scaling and analysis correspondence [16][17][18].These techniques are valuable tools in scientific studies that assess water resources, and understanding how they have been applied is essential for the improvement of water quality research and management.
In this sense, scientometrics has emerged as a useful tool in mapping scientific literature and has been used in different areas of research, such as public health [19] and the social [20] and environmental sciences [21].Scientometric analysis can increase the performance of research findings, identifying the characteristics of publications [22] and providing scientific and relevant results in the study of specific subjects [23].
In the area of water resources, it has been used, for example, in mapping research on drinking water [24], groundwater [25], the assessment and simulation of river water quality [26] and integrated water assessment and modeling [27].
This study presents a review of publications (2001-2020) that used MSA for water quality data analysis.Understanding the evolution of scientific research and how MSA has been applied is an important step for the water quality research process.The topics of review cover quantitative descriptive aspects of the publications, such as publication type, annual results, conventional journals, Web of Science subject categories, most cited publications, keywords, as well as the water sample type analyzed, country or territory where the study was conducted, and the MSAs most commonly used in studies involving water quality analysis.

Methodology
Data were obtained from Clarivate Analytics' expanded Web of Science (WoS) database, the world's most widely used and trusted database of research publications and citations [28,29].According to the 2021 Journal Citation Reports™ (JCR), WoS indexed 20,942 journals in 254 search categories, with authorship from 113 countries represented [30].An advanced search was performed with the terms TS (topic) = (water quality AND multivariate) within the limitation of the year of publication from 2001 to 2020.
In total, 5006 publications met the search criteria.Records related to publication type, authors, title, journal name, language, keywords, abstract, year of publication, WoS subject categories and number of citations were downloaded from the database.Documents in languages other than English, experimental or laboratory studies, reviews, retractions and any that did not comply with specific criteria-water as an analysis matrix and studies that did not apply MSA in the evaluation of data (univariate analyses, indexes, models)-were excluded.The final database contained 2889 publications.Manual coding was performed for country/territory (where the water samples were sampled), water sample type, MSA used in the studies, the h-index of 15 countries with the highest number of publications and journal impact factor (JIF) of the 10 most productive journals, the latter of which was taken from the JCR published in 2020.Keyword search was performed using VOSViewer™ software, version 1.6.18(Leiden, The Netherlands) in order to identify the frequency of co-occurrence of keywords-in our case, the authors' keywords-to identify possible clusters of most used terms.
The water sample types were classified into 12 different categories, taking into account sources or uses of water.Analogous or synonymous terms have been compiled to be included in the following categories: river, groundwater, lake, drinking water, seawater, wastewater, reservoir/dam, swamp, rainwater, aquaculture pond, meltwater and navigation channel.Figure 1 presents a flowchart of the steps of the scientometric review.
matrix and studies that did not apply MSA in the evaluation of data (univariate analyses, indexes, models)-were excluded.The final database contained 2889 publications.
Manual coding was performed for country/territory (where the water samples were sampled), water sample type, MSA used in the studies, the h-index of 15 countries with the highest number of publications and journal impact factor (JIF) of the 10 most productive journals, the latter of which was taken from the JCR published in 2020.Keyword search was performed using VOSViewer™ software, version 1.6.18(Leiden, The Netherlands) in order to identify the frequency of co-occurrence of keywords-in our case, the authors' keywords-to identify possible clusters of most used terms.
The water sample types were classified into 12 different categories, taking into account sources or uses of water.Analogous or synonymous terms have been compiled to be included in the following categories: river, groundwater, lake, drinking water, seawater, wastewater, reservoir/dam, swamp, rainwater, aquaculture pond, meltwater and navigation channel.Figure 1 presents a flowchart of the steps of the scientometric review.

Publications Outputs and WoS Subject Categories
"Journal article" ranked first in publication type with 93.53% (2702), followed by "Articles published in annals of events" with 6.47% (187) of publications.The number of publications related to the use of MSA in the water quality research increased from 32 in 2001 to 350 in 2020, a significant growth in the last 20 years, with 2020 being the year with the highest number of publications (Figure 2).

Publications Outputs and WoS Subject Categories
"Journal article" ranked first in publication type with 93.53% (2702), followed by "Articles published in annals of events" with 6.47% (187) of publications.The number of publications related to the use of MSA in the water quality research increased from 32 in 2001 to 350 in 2020, a significant growth in the last 20 years, with 2020 being the year with the highest number of publications (Figure 2).This increase in the number of studies that used MSA in water quality research is directly linked to the fact that there was an increase in scientific publications as a whole.In the last decade alone, there has been an increase of approximately 4% per year in global research output, including peer-reviewed scientific articles and conference papers, in the most diverse areas, including water research [31].This increase in the number of studies that used MSA in water quality research is directly linked to the fact that there was an increase in scientific publications as a whole.In the last decade alone, there has been an increase of approximately 4% per year in global research output, including peer-reviewed scientific articles and conference papers, in the most diverse areas, including water research [31].
Scientific production was divided into four distinct periods.The first period (2001-2005) consists of 197 publications, representing 6.82% of the total publications, with 2002 being the year with the fewest publications in the period (31 publications).Among the five most cited publications (according to the WoS database) of the period, there are studies that used MSA in the analysis of complex matrices to assess water quality in rivers [32][33][34][35] and groundwater [36].
The second period (2006-2010), composed of 460 publications, represents 15.92% of the total publications.It is in this period that the most cited publication on the subject is found, where MSA was used as a tool in the temporal and spatial evaluation of an extensive matrix of data from a river [37].The other four most cited publications from the period used MSA to aid research in groundwater [38,39], lakes [40] and rivers [41].
The third period (2011-2015) consists of 866 publications, which corresponds to 29.98% of the total.For this period analyzed, the most cited publications applied MSA to an analysis of the influence of natural and anthropogenic factors on the quality of surface (river) and groundwater in urban and rural areas [42]; in the analysis of fluoride, arsenic and physical-chemicals in groundwater [43]; in the evaluation of heavy metals in the water-sediment compartment of a river [44]; and in the identification of sources of contamination of groundwater in an aquifer system [45].
The fourth and last analyzed period (2016-2020) represents almost half of the total publications (20 years) with 47.28% or 1366 publications.This considerable increase is mainly due to the global growth of scientific publications in the last 10 years, driven by the economic growth of emerging countries, increased international collaboration in research and improved access to technology [31,46].
Of the five most cited articles in the fourth period, four of them applied MSA in groundwater quality research: for health risk assessment [47], for analyzing trace element contamination [48], for evaluating hydrogeochemical processes and evaluation of the Scientific production was divided into four distinct periods.The first period (2001-2005) consists of 197 publications, representing 6.82% of the total publications, with 2002 being the year with the fewest publications in the period (31 publications).Among the five most cited publications (according to the WoS database) of the period, there are studies that used MSA in the analysis of complex matrices to assess water quality in rivers [32][33][34][35] and groundwater [36].
The second period (2006-2010), composed of 460 publications, represents 15.92% of the total publications.It is in this period that the most cited publication on the subject is found, where MSA was used as a tool in the temporal and spatial evaluation of an extensive matrix of data from a river [37].The other four most cited publications from the period used MSA to aid research in groundwater [38,39], lakes [40] and rivers [41].
The third period (2011-2015) consists of 866 publications, which corresponds to 29.98% of the total.For this period analyzed, the most cited publications applied MSA to an analysis of the influence of natural and anthropogenic factors on the quality of surface (river) and groundwater in urban and rural areas [42]; in the analysis of fluoride, arsenic and physical-chemicals in groundwater [43]; in the evaluation of heavy metals in the watersediment compartment of a river [44]; and in the identification of sources of contamination of groundwater in an aquifer system [45].
The fourth and last analyzed period (2016-2020) represents almost half of the total publications (20 years) with 47.28% or 1366 publications.This considerable increase is mainly due to the global growth of scientific publications in the last 10 years, driven by the economic growth of emerging countries, increased international collaboration in research and improved access to technology [31,46].
Of the five most cited articles in the fourth period, four of them applied MSA in groundwater quality research: for health risk assessment [47], for analyzing trace element contamination [48], for evaluating hydrogeochemical processes and evaluation of the quality of water for domestic use and irrigation [49] and in the evaluation of arsenic and heavy metals [50].
Studies related to the topic returned a total of 77 WoS subject categories.Of the 2889 publications, 1484 were classified in 1 WoS subject category, 727 in 2 categories, 552 in 3 categories, 108 in 4 categories and only 6 publications were classified in 5 subject categories.Figure 3 shows the 15 categories that appeared the most in the studies, with "Environmental Sciences" comprising a total of 1590 publications, followed by the categories "Water Resources" (852), "Multidisciplinary Geosciences" (418), "Marine and Freshwater Biology" (329) and "Environmental Engineering" (280 publications).
quality of water for domestic use and irrigation [49] and in the evaluation of arsenic a heavy metals [50].
Studies related to the topic returned a total of 77 WoS subject categories.Of the 28 publications, 1484 were classified in 1 WoS subject category, 727 in 2 categories, 552 in categories, 108 in 4 categories and only 6 publications were classified in 5 subject categ ries.Figure 3 shows the 15 categories that appeared the most in the studies, with "En ronmental Sciences" comprising a total of 1590 publications, followed by the categor "Water Resources" (852), "Multidisciplinary Geosciences" (418), "Marine and Freshwa Biology" (329) and "Environmental Engineering" (280 publications).Figure 4 presents the time trend of the five main WoS subject categories between 20 and 2020.The category "Environmental Sciences" is at the top of publications for ea year of the analyzed period, with the exception of the year 2006, in which "Marine a Freshwater Biology" surpassed it.The "Water Resources" category showed growth fro 2007 onwards, while the "Marine and Freshwater Biology" category showed an inver behavior from the same year.As of 2012, the "Multidisciplinary Geosciences" catego surpassed the "Marine and Freshwater Biology" category, remaining in third place un 2019.
According to the scope of the WoS subject categories, "Environmental Sciences" c vers several areas of the environment, such as monitoring, technology, management, e vironmental contamination, toxicology, environmental health, geology, soil science a conservation, water resources research and engineering, climate change, biodiversity co servation and even regional natural resources.As it includes several interrelated dis plines, this category was included in more than half of the publications.

Key Journals and Most Cited Publications
A total of 604 journals published studies related to the water quality analysis and the use of MSA in the period between 2001 and 2020.Among these, 498 (82.45%) contained less than 10 publications.The 10 journals that published the most on the use of MSA in According to the scope of the WoS subject categories, "Environmental Sciences" covers several areas of the environment, such as monitoring, technology, management, environmental contamination, toxicology, environmental health, geology, soil science and conservation, water resources research and engineering, climate change, biodiversity conservation and even regional natural resources.As it includes several interrelated disciplines, this category was included in more than half of the publications.

Key Journals and Most Cited Publications
A total of 604 journals published studies related to the water quality analysis and the use of MSA in the period between 2001 and 2020.Among these, 498 (82.45%) contained less than 10 publications.The 10 journals that published the most on the use of MSA in water quality research, the impact factor of these journals (with and without self-citation) and the percentage in relation to the total number of publications analyzed (n = 2889) are shown in Table 1.Water Research (JIF 11.263), Science of the Total Environment (JIF 7.963) and Marine Pollution Bulletin (JIF 5.553) were the journals with highest impact factor.Environmental Monitoring and Assessment was the journal with the most publications on the topic, with 8.10% of the total publications, followed by Environmental Earth Science (4.91%), Environmental Science and Pollution Research (3.08%), Science of the Total Environment (2.56%) and Water (2.32%).Water Research is ranked as the second most published journal in the "Water Resources" WoS subject category [51].It is one of the leading and most comprehensive journals focusing on various aspects such as the anthropogenic water cycle, water quality and water management, thus reflecting advances in water science, technology and policy [52].Water Research was also the most productive journal in the scientometric study of drinking water treatment technologies [53] and the second most productive journal in scientometric study on quantitative microbial risk assessment in water quality analysis [54].
The journal Environmental Monitoring and Assessment (JIFA 2.513) was the most productive in the scientometric analysis of water quality research in India [55] and in scientific mapping of published literature on water quality indices (WQI) [56].
Table 2 presents the 15 most cited publications in water quality research using MSA, according to the WoS database.
As shown in Table 2, the 15 most cited studies in water quality research using MSA were published between 2002 and 2010.The water sample type classified as "River" was analyzed in 9 of the 15 publications, "Groundwater" in 6 publications, and "Lake" and "Seawater" analyzed in 1 publication each.

Journal of Hazardous Materials
Yes PCA-FA River and Groundwater * Number of citations until submission date.MSA = multivariate statistical analysis; HCA = hierarchical cluster analysis; PCA = principal component analysis; FA = factor analysis; DA = discriminant analysis; NHCA = nonhierachical cluster analysis (k-means); MR = multivariate regression; CCA = canonical correspondence analysis; DCA = detrended canonical correspondence analysis.
Of these 15 publications, 5 were published in the Water Research journal and all of them were published in an open-access system.Studies have shown that open-access articles have more citations than the media of non-open-access journals and benefit from such things as greater chances of disclosure and a broader increase in research confidence [62,63].
The most frequently cited publication, with 1207 citations, was "Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan" [37].In this article, the authors temporally and spatially evaluated a large matrix of water quality data from an important river in the region using MSA, such as cluster analysis, principal component analysis, factor analysis and discriminant analysis.
The second most cited publication, "Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)-a case study" [32], with 975 citations, evaluated the water quality of the largest tributary of the River Ganga, India.The authors analyzed an extensive data matrix with 17,790 observations, using four different types of MSA.

Countries/Territories and Water Sample Types
The worldwide geographic distribution of water quality research using MSA between 2001 and 2020 is shown in Figure 5.In a scientometric study carried out in 2017, it was shown that these three countrie together were responsible for 38% of global research related to water.Of a total of 224,00 publications, China was responsible for 19% of publications, followed by the USA (14% and India (5%) [1].The fact that China, India and the USA lead the number of publication reflects the general trend for these countries to have the largest number of all scientif publications in the world [64].Table 3 shows the 15 countries with the highest number o publications that used MSA in water quality research.The scientific research studies that used MSA in the analysis of water quality data were conducted in 134 different countries or territories.Of this total, 87 countries had less than 10 studies on the subject.China was the country with the highest number of studies that used MSA for water quality research, with a total of 441 publications, followed by India with 371 publications and the USA with 229 publications.
In a scientometric study carried out in 2017, it was shown that these three countries together were responsible for 38% of global research related to water.Of a total of 224,000 publications, China was responsible for 19% of publications, followed by the USA (14%) and India (5%) [1].The fact that China, India and the USA lead the number of publications reflects the general trend for these countries to have the largest number of all scientific publications in the world [64].Table 3 shows the 15 countries with the highest number of publications that used MSA in water quality research.
China's freshwater bodies account for nearly 7% of the world's total freshwater bodies, ranking sixth globally in terms of volume and with approximately one-third of lakes and rivers polluted to a level that renders their use inappropriate for human consumption.With approximately 18.5% of the world's population, China has faced an unprecedented water crisis in terms of quantity and quality [65,66].Since 2001, great efforts have been made to assess water pollution in the country.Such efforts can be evidenced by the increase in the number of scientific publications related to water in recent years in China, which has the highest number in terms of research impact [1,67].These studies are mainly focused on optimizing water allocation, on advanced technologies for saving and protecting water resources, on restoring aquatic ecosystems and on exploiting unconventional water sources [68].India, the country with the second most publications on water quality research and MSA, has approximately 17.7% of the world's population and approximately 4% of its fresh water.The country's rapid population and economic growth has put enormous pressure on its water resources [69].More than 80% of freshwater resources are consumed by agriculture in the country, and the advent of new technologies has led to an increase in agricultural productivity and a consequent increase in the degradation of water bodies.Therefore, there has been an increase in research on water quality and its qualitative estimation in recent years.South Asian countries, mainly India and China, have experienced rapid change in land use and land cover.Accelerated economic development has led to disorderly urbanization in these countries, which has affected the quantity and quality of their water resources [55,56].
The h-index aims to quantify the productivity and impact of scientists based on their most cited articles.The highest h-index described in Table 3 is correlated with the highest production of research and citation in a country or territory.The USA has the highest h-index, corresponding to its high potential to conduct research [54].China, which was the most productive country, shows a lower h-index than other countries such as Canada and Australia.This can be explained by the higher level of cooperation in research between these countries, while China shows more reserved cooperation tendencies [70].
Table 4 presents the water sample types most commonly found in publications and their analogous or synonymous terms.The water sample classified as "River" was most evaluated in the studies, with 1231 publications (41%)."Groundwater" ranked second with 806 publications (27%), followed by the "Lake" category with 300 publications (10%).The three categories together accounted for 78% of the total publications (n = 2889)."Rainwater", "Meltwater" and "Navigation Channel" were the categories with the fewest publications, with 15, 3 and 1 publications, respectively.Figure 6 presents the water resource categories found in the publications.The water sample classified as "River" was most evaluated in the studies, with 1231 publications (41%)."Groundwater" ranked second with 806 publications (27%), followed by the "Lake" category with 300 publications (10%).The three categories together accounted for 78% of the total publications (n = 2889)."Rainwater", "Meltwater" and "Navigation Channel" were the categories with the fewest publications, with 15, 3 and 1 publications, respectively.Figure 6 presents the water resource categories found in the publications."River", "Groundwater" and "Lake" were the most studied water sample types, as they are very useful freshwater sources and are important in maintaining freshwater aquatic life and the hydrological cycle [71].The most recurrent category, rivers are the main inland water resources and provide a variety of services to humans, being widely used for domestic and irrigation purposes [72].River water is subject to great stress and, as it is used in various human activities, it can be easily contaminated.Thus, studies of surface water pollution have increased and focused mainly on rivers, where most of the scientific tools developed by regulatory and protection agencies are applied to protect water quality in this segment of surface freshwater [73].
As shown in Figure 7, China, India and the USA are the countries that published the most studies in which researchers evaluated the quality of river water through MSA.These three countries share in common the fact that they have large watercourses, used for various purposes, such as the Mississippi River in the USA, and which have been facing serious pollution problems, such as the Yellow River in China and the Ganges River in India [74,75].In addition, these countries rank 3rd (China), 4th (USA) and 7th (India) in terms of the size of their territories and together have approximately 40% of the planet's water-resource-dependent population [76,77].
Groundwater, the category with the second-highest number of studies, is an important water resource for irrigated agriculture and especially for domestic drinking water supply in several countries.It is a vulnerable resource that actively composes the hydrological cycle [78].Groundwater research has increased in recent years, mainly due to the drastic decrease in aquifer water levels and general deterioration in water quality [25].India, Iran and Pakistan are the countries where the number of publications that evaluated groundwater was higher than the publications that evaluated river water quality (Figure 7)."River", "Groundwater" and "Lake" were the most studied water sample types, as they are very useful freshwater sources and are important in maintaining freshwater aquatic life and the hydrological cycle [71].The most recurrent category, rivers are the main inland water resources and provide a variety of services to humans, being widely used for domestic and irrigation purposes [72].River water is subject to great stress and, as it is used in various human activities, it can be easily contaminated.Thus, studies of surface water pollution have increased and focused mainly on rivers, where most of the scientific tools developed by regulatory and protection agencies are applied to protect water quality in this segment of surface freshwater [73].
As shown in Figure 7, China, India and the USA are the countries that published the most studies in which researchers evaluated the quality of river water through MSA.These three countries share in common the fact that they have large watercourses, used for various purposes, such as the Mississippi River in the USA, and which have been facing serious pollution problems, such as the Yellow River in China and the Ganges River in India [74,75].In addition, these countries rank 3rd (China), 4th (USA) and 7th (India) in terms of the size of their territories and together have approximately 40% of the planet's water-resource-dependent population [76,77].
Groundwater, the category with the second-highest number of studies, is an important water resource for irrigated agriculture and especially for domestic drinking water supply in several countries.It is a vulnerable resource that actively composes the hydrological cycle [78].Groundwater research has increased in recent years, mainly due to the drastic decrease in aquifer water levels and general deterioration in water quality [25].India, Iran and Pakistan are the countries where the number of publications that evaluated groundwater was higher than the publications that evaluated river water quality (Figure 7).India is the largest consumer of groundwater in the world, with an annual extraction of 243 km 3 .Approximately 85% of rural areas use groundwater for supply, 62% for irrigation and more than 50% of the country's urban consumption comes from aquifers.Currently, the number of wells used for irrigation in the country is estimated at more than 25 million [79].
Iran is also among the largest consumers of groundwater in the world and with the majority of the population living in areas heavily dependent on groundwater for irrigation and supply.Groundwater provides approximately 60% of the total water supply, and agriculture accounts for over 90% of groundwater withdrawals.Since the 1960s, the number of irrigation wells and the amount of water pumped has increased, leading to a decrease in the groundwater level in many aquifers across the country [80,81], in addition, agricultural, agro-industrial and domestic human activities have contributed to the pollution of groundwater resources in some regions in the country [82,83].
Pakistan is the third largest user of groundwater for irrigation in the world, where 73% of all irrigation comes directly or indirectly from groundwater resources.Total groundwater extraction is estimated to be approximately 60 billion m 3 , with 1.2 million private tube wells operating in the country [84].
The third most commonly found category in this review, lakes, represents approximately 49.8% of the Earth's total surface freshwater.Lakes are important ecosystems that share many ecological and biogeochemical processes, with multiple uses ranging from supply, through irrigation, fishing and recreation.Population growth and urbanization have increased lake contamination problems.Furthermore, lakes are confined bodies of water with no strong self-cleaning flow and are therefore more prone to pollutant accumulation [85,86].
China, the USA and Canada were the countries that most published studies related to the analysis of lake water quality through MSA (Figure 7).In China, lakes play a less important role compared with other bodies of water.However, they provide a wide range of services to Chinese ecological and social systems.Most of the country's freshwater lakes are used for multiple uses, including drinking water, industrial and agricultural production, as well as aquaculture.Chinese lakes have undergone intense changes in the last India is the largest consumer of groundwater in the world, with an annual extraction of 243 km 3 .Approximately 85% of rural areas use groundwater for supply, 62% for irrigation and more than 50% of the country's urban consumption comes from aquifers.Currently, the number of wells used for irrigation in the country is estimated at more than 25 million [79].
Iran is also among the largest consumers of groundwater in the world and with the majority of the population living in areas heavily dependent on groundwater for irrigation and supply.Groundwater provides approximately 60% of the total water supply, and agriculture accounts for over 90% of groundwater withdrawals.Since the 1960s, the number of irrigation wells and the amount of water pumped has increased, leading to a decrease in the groundwater level in many aquifers across the country [80,81], in addition, agricultural, agro-industrial and domestic human activities have contributed to the pollution of groundwater resources in some regions in the country [82,83].
Pakistan is the third largest user of groundwater for irrigation in the world, where 73% of all irrigation comes directly or indirectly from groundwater resources.Total groundwater extraction is estimated to be approximately 60 billion m 3 , with 1.2 million private tube wells operating in the country [84].
The third most commonly found category in this review, lakes, represents approximately 49.8% of the Earth's total surface freshwater.Lakes are important ecosystems that share many ecological and biogeochemical processes, with multiple uses ranging from supply, through irrigation, fishing and recreation.Population growth and urbanization have increased lake contamination problems.Furthermore, lakes are confined bodies of water with no strong self-cleaning flow and are therefore more prone to pollutant accumulation [85,86].
China, the USA and Canada were the countries that most published studies related to the analysis of lake water quality through MSA (Figure 7).In China, lakes play a less important role compared with other bodies of water.However, they provide a wide range of services to Chinese ecological and social systems.Most of the country's freshwater lakes are used for multiple uses, including drinking water, industrial and agricultural production, as well as aquaculture.Chinese lakes have undergone intense changes in the last three decades, mainly due to climate change, human activities and population density [87,88].
The United States has approximately 250 freshwater lakes that together add up to a surface area of approximately 35,000 km 2 [89].Although many of these lakes are in good condition, a considerable proportion are in altered condition for nutrients, with 40% of the lakes containing excessive concentrations of total phosphorus and 35% having excessive concentrations of nitrogen [90,91].In Canada, this resource is of great importance to the country.Canada has more than two million lakes, 900,000 of them measuring up to 0.1 km 2 and 560 measuring more than 100 km 2 , together representing 37% of the total lake area in the world.The United States and Canada share the Great Lakes, which together contain 18% of the world's fresh water [92,93].

Keywords Co-Occurrence
Keyword co-occurrence analysis is used to identify the main themes in a field of research or a domain of knowledge.It is based on the assumption that when two items appear in the same context, they are related to some degree [94,95].
In this scientometric review, a total of 5550 keywords were listed by the authors.With the application of the criterion of minimum occurrence-where a term must appear in at least 20 publications-and filtering of synonymous words and similar terms, 67 keywords were selected, divided into 4 groups with 1107 links.
The United States has approximately 250 freshwater lakes that together add surface area of approximately 35,000 km 2 [89].Although many of these lakes are i condition, a considerable proportion are in altered condition for nutrients, with 40% lakes containing excessive concentrations of total phosphorus and 35% having ex concentrations of nitrogen [90,91].In Canada, this resource is of great importance country.Canada has more than two million lakes, 900,000 of them measuring up km 2 and 560 measuring more than 100 km 2 , together representing 37% of the to area in the world.The United States and Canada share the Great Lakes, which to contain 18% of the world's fresh water [92,93].

Keywords Co-Occurrence
Keyword co-occurrence analysis is used to identify the main themes in a fiel search or a domain of knowledge.It is based on the assumption that when two ite pear in the same context, they are related to some degree [94,95].
In this scientometric review, a total of 5550 keywords were listed by the author the application of the criterion of minimum occurrence-where a term must appe least 20 publications-and filtering of synonymous words and similar terms, 67 key were selected, divided into 4 groups with 1107 links.
As shown in Figure 8, the size of each circle is proportional to the occurrence keyword.Red group 1 (n = 27) grouped terms with high occurrence in publication as "water quality" (714), "analysis" (354), "river" (330) and "multivariate statistica ysis" (299), with terms related to water quality monitoring, biomonitoring and ment, such as: "monitoring", "pollution", "biomonitoring", "bioassessment", "bio tor", "eutrophication", "phosphorus", "nutrient", "phytoplankton", "chlorophyll" and "diatom".The keywords "water quality" and "multivariate statistical analysis" were tw most commonly found terms in the publications, as they were included as a searc in the topic field (title, keywords and abstract) of the WoS database.The high occu The keywords "water quality" and "multivariate statistical analysis" were two of the most commonly found terms in the publications, as they were included as a search term, in the topic field (title, keywords and abstract) of the WoS database.The high occurrence of the keyword "river" can be explained by the fact that it was the most common water sample type to be analyzed in the publications (Figure 6).
In green group 2 (n = 26), the most frequent keywords "groundwater" (370) and "statistical analysis" (267) were grouped with terms frequently used in the analysis of groundwater quality such as "heavy metal", "water quality index" (WQI), "hydrogeochemistry", "hydrochemistry", "geochemistry", "drinking water", "risk assessment", "health risk", "salinity", "fluoride", and "arsenic", among others.The keyword "groundwater", with the highest occurrence in group 2, was the second most analyzed water sample type in the publications.Its connection with terms such as "heavy metals", "WQI", "drinking water", "fluoride", and "arsenic", demonstrates a tendency of these publications toward the evaluation of groundwater for human supply purposes.
Blue group 3 (n = 7) grouped the terms with high frequency related to MSA, such as "principal component analysis" (450), "cluster analysis" (327), "factor analysis" (233) and "discriminant analysis" (87) with terms such as "correlation analysis", "physicochemical parameters" and "water pollution".The high frequency of these keywords suggests that these MSAs are those used most frequently in water quality research.The connection between these terms further suggests that these MSAs are being used together in the studies.The purple cluster 4 (n = 6) gathered the keywords with less occurrence such as "anthropogenic activity" (25), "water quality assessment" (3), "seasonal variation" (36), "source apportionment" (38), "spatial variation" (41) and "temporal variation" (43).The keywords with the highest occurrence among the four groups (water quality, groundwater and principal component analysis) had a total of 62, 53 and 61 links with other terms, respectively.

Multivariate Statistical Analysis (MSA) for Water Quality Assessment
MSA aims to analyze multiple variables in a single relationship or set of relationships [17].It has been considered one of the most effective and widely used tools in assessing the water quality of a given water body [13,37].Of the 2889 publications analyzed, 43.7% (1262) used only one MSA as a tool for assessing water quality.Another 45.0% (1300) used two analyses, 9.4% (272) applied three methods, and 1.9% of the publications (55) applied four or more MSA.Table 5 summarizes the main MSA that were applied in water quality research studies between 2001 and 2020.
As shown in Table 5, principal component analysis (PCA) was the most used MSA in the studies (1405 publications), followed by hierarchical cluster analysis (HCA) used in 1275 studies, factor analysis (FA) used in 248 publications and discriminant analysis (DA) used in 246 publications.The frequency of these MSA is directly linked to the clustering and high occurrence of these keywords in publications, as shown in Figure 8.A brief summary of the main MSA applied and their relationship to water quality research studies is shown below.
Factor analysis refers to a class of MSA whose main purpose is to define the underlying structure in a data matrix.It analyzes the structure of correlations between a large dataset by defining a set of common latent dimensions called factors.There are basically two types of factor analysis, exploratory, the most frequently used analysis that aims to identify the nature of factors that influence a set of responses, and confirmatory, which tests whether a specified set of factors is influencing responses in a predicted way [17,96].
PCA is an exploratory statistical method for the graphical description of information present in large datasets.It is one of the best known and most used MSA in several scientific disciplines [97,98].The central idea of the analysis is to reduce the dimensionality of a dataset where there are a large number of interrelated variables, keeping as much of the variation present in the dataset as possible [99].The analysis is designed to transform the original variables into new uncorrelated variables (axes), called principal components (PC), which are linear combinations of the original variables.The PC provides information on the most significant variables, which represent a matrix with data reduction and minimal loss of original information [100].The first PC gives the largest eigenvalue and maximum total variance in the dataset.The second PC (orthogonal) is not correlated with the first, has a lower eigenvalue and is responsible for the maximum residual variance [101].
The use of EFA after PCA aims to reduce the contribution of less significant variables and further simplify the data structure taken from PCA [37].PCA-generated PCs are sometimes not readily interpreted.This purpose can be achieved by rotating the axis defined in the PCA, according to well-established rules, and building new variables (varifactors).As a result, large loads become larger and small loads become smaller, thus generating a small number of factors accounting for approximately the same amount of information as the larger set of original observations [102,103].In summary, the EFA should be used in order to make observations about the factors that are responsible for a set of observed responses.PCA can be used simply for data reduction [104].
In water quality research, EFA and PCA are tools used primarily to find parameters that describe the processes that govern water chemistry and extract important information using only the most significant variables [105].Principal component or factor loads are commonly used to explain the relative contribution of variables to overall water quality.
Cluster analysis is the formal study of methods and algorithms in order to group objects according to measured or perceived intrinsic characteristics, or similarity [121].In general, the objective of cluster analysis is to identify groups, or clusters, of similar objects, where elements in a cluster are more similar to each other than elements in different clusters [122].
In cluster analysis, a large number of methods are available by which to classify objects based on their similarities.The main types of cluster analysis are the hierarchical methods, partitioning methods, and methods that allow overlapping clusters.Within each type of method, there is a variety of specific techniques and algorithms [123].
In water quality research, HCA has often been used with the main objective of grouping similar sampling sites (spatial variability) [32,37,124].The analysis can also extract useful information from complex datasets and provide a reasonable and efficient approach to studying the chemical characteristics of water [125].Of the 15 publications most cited in this review, 9 used HCA as a multivariate statistical tool to assess water quality (Table 2).
Discriminant analysis is an MSA that analyzes whether the classification of data is adequate in relation to the survey data.It is used in situations where the groups are known, classifying an observation, or several observations, in these known groups [126,127].It aims to predict and explain a categorical variable representing different groups using various range variables as predictors [128].
In studies that analyzes water quality, DA is used to differentiate a given classification variable using numerous characteristics.This variable classification can refer to land use types or sources of pollution, flow events and seasonal factors.In most cases, the DA approach is limited to the accuracy of the spatial classification, which is based on selected influential variables [129,130].
Among the most cited works (Table 2), the two first publications applied DA to each data matrix to assess spatial and temporal variation in water quality in rivers in the basin.Location (spatial) and season (temporal) were the grouping variables (dependent), while all analysis parameters constituted the independent variables.Discriminant analysis gave the best results for spatial and temporal analysis.This allowed a reduction in the dimensionality of the large dataset, outlining some indicator parameters responsible for large variations in water quality [32,37].

MSA Limitations
Multivariate statistical analysis has been used to reduce variables, grouping and classification in water quality studies and, despite its extensive application, it has some limitations.This is because these methods have the merit of computational simplicity and provide a geometrically intuitive interpretation.In addition, water quality assessment and monitoring programs can last for decades, increasing the likelihood of changes in a sampling method, frequency, location, and analytical accuracy, which in turn can limit the use of statistical analysis [11,131].
In the case of PCA and EFA, the two methods often provide descriptive rather than inferential information and are commonly used in exploratory data analysis in conjunction with other techniques.In the case of EFA, the level of subjectivity arising from the many methodological decisions a researcher must make to complete a single analysis accurately depends largely on the quality of those decisions.Some problems, such as low correlations, outliers and missing data, poorly distributed data, small sample numbers and lack of linearity, are factors responsible for limiting the use of the methods [18,132,133].
In HCA, the various clustering methods often give very different results.This is due to the criteria for merging clusters (including cases).As clustering algorithms involve many parameters, generally operate in high dimension and spaces, and have to deal with noisy, incomplete and sampled data, their performance can vary substantially for different applications and data types.In practice, it becomes a difficult effort, given a dataset or problem, to choose a suitable cluster [134].In DA, which is typically used to predict membership in naturally occurring groups rather than groups formed by random assignment, questions such as why we can reliably predict group membership or what causes differential membership are often not asked [18].

Conclusions
Water quality analysis is an essential for the integrated management of water resources.Due to the multidimensional properties involved in water quality assessment, many researchers have been encouraged to use statistical techniques as a way of interpreting the generated data.Among these tools, the MSA has stood out.Therefore, this review proposed a mapping of the scientific literature published on the topic in a 20 year citation window.A total of 2889 publications, available between 2001 and 2020, in the main Web of Science database were considered for review.The following main observations were recorded:

•
The number of publications has increased considerably in the last 20 years, confirming a growing application of MSA in water quality studies.In the last of four analyzed periods (2016-2020), more than half of the studies were published.

•
The three WoS subject categories in which the studies most fit were "Environmental Sciences", "Water Resources" and "Multidisciplinary Geosciences".The "Environmental Sciences" subject category covers several areas of the environment, and therefore included in 1590 of 2889 analyzed publications.

•
The studies were carried out on water samples from 134 different countries or territories, and the most active countries in the research domain were discussed in the review.The review showed that developing countries have carried out more studies using MSA in water quality research.

•
River, groundwater and lake were the water sample types most evaluated in the studies.Only one study analyzed the water quality in a navigation channel.

•
China, India and the USA were the countries that most used MSA in river water quality research.India, Iran and Pakistan had the highest number of groundwater studies.

•
More than 5000 keywords were listed, with the terms water quality, groundwater and principal component analysis having the highest occurrences.

•
The most used MSAs were principal component analysis, hierarchical cluster analysis, factor analysis and discriminant analysis.
Multivariate statistical analysis has been widely used in the most diverse areas, especially in environmental sciences, including water quality analysis.The methods and techniques of MSA are applied for different purposes in the water quality research as discussed in this review.This study provides a practical reference and useful information for future research into the application of MSA in water quality studies.

Figure 1 .
Figure 1.Flowchart of the stages of the scientometric review of publications that used MSA in water quality assessment research between 2001 and 2020.

Figure 1 .
Figure 1.Flowchart of the stages of the scientometric review of publications that used MSA in water quality assessment research between 2001 and 2020.

Figure 2 .
Figure 2. Relationship between the number of publications and the year.

Figure 2 .
Figure 2. Relationship between the number of publications and the year.

Figure 3 .
Figure 3.The fifteen most encountered WoS subject categories.

Figure 3 .
Figure 3.The fifteen most encountered WoS subject categories.

Figure 4 23 Figure 4 .
Figure4presents the time trend of the five main WoS subject categories between 2001 and 2020.The category "Environmental Sciences" is at the top of publications for each year of the analyzed period, with the exception of the year 2006, in which "Marine and Freshwater Biology" surpassed it.The "Water Resources" category showed growth from 2007 onwards, while the "Marine and Freshwater Biology" category showed an inverse behavior from the same year.As of 2012, the "Multidisciplinary Geosciences" category surpassed the "Marine and Freshwater Biology" category, remaining in third place until 2019., 10, x FOR PEER REVIEW 6 of 23

Figure 4 .
Figure 4. Time trend of the top five WoS subject categories between 2001 and 2020.

Hydrology 2023 ,Figure 5 .
Figure 5. Worldwide geographic distribution of water quality research using MSA between 200 and 2020.

Figure 5 .
Figure 5. Worldwide geographic distribution of water quality research using MSA between 2001 and 2020.

Figure 6 .
Figure 6.The twelve categories of water sample types found in the publications.

Figure 6 .
Figure 6.The twelve categories of water sample types found in the publications.

Figure 7 .
Figure 7. Water resource categories of the 15 countries that most published on the topic.

Figure 7 .
Figure 7. Water resource categories of the 15 countries that most published on the topic.

Figure 8 .
Figure 8. Networks of associations between keywords most commonly found in publicati used MSA in water quality studies between 2001 and 2020.

Figure 8 .
Figure 8. Networks of associations between keywords most commonly found in publications that used MSA in water quality studies between 2001 and 2020.

Table 1 .
Impact factors and total publications percentage in relation to the total of the 10 most productive journals in the use of MSA for water quality research.
JIF A = journal impact factor in 2021, JIF B = journal impact factor without self-citation in 2021, TP = total publications.

Table 2 .
The 15 most cited publications in water quality research using MSA, according to the WoS database.

Table 3 .
The 15 countries with the highest number of publications that used MSA in water quali research and their h-index.

Table 3 .
The 15 countries with the highest number of publications that used MSA in water quality research and their h-index.

Table 4 .
Water sample types and analogous or synonymous terms.

Table 5 .
Multivariate statistical analyses most used in publications.
• A total of 604 journals published studies related to water quality research and the use of MSA in the analyzed period.The five most influential journals, in descending order of JIF, that published papers on the topic were: Water Research, Science of the Total Environment, Marine Pollution Bulletin, Ecological Indicators and Environmental Science and Pollution Research.• All 15 most cited publications are open access and 9 of them were published in Water Research.The two most cited publications used four types of MSA to analyze large datasets.