A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector

Kidd, Jess; Bergbusch, Nathanael T.; Epstein, Graham; Gunn, Geoffrey; Swanson, Heidi; Courtenay, Simon C.

doi:10.3390/w17142153

Open AccessArticle

A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector

by

Jess Kidd

^1,*

,

Nathanael T. Bergbusch

¹

,

Graham Epstein

¹

,

Geoffrey Gunn

²,

Heidi Swanson

³

and

Simon C. Courtenay

^1,4

¹

School of Environment, Resources and Sustainability, University of Waterloo, 200 University Avenue, Waterloo, ON N2L 3G1, Canada

²

Canada Water Agency, 510-234 Donald Street, Winnipeg, MB R3C 1M8, Canada

³

Department of Biology, Wilfrid Laurier University, 75 University Ave W, Waterloo, ON N2L 3C5, Canada

⁴

Canadian Rivers Institute, University of Waterloo, 200 University Avenue, Waterloo, ON N2L 3G1, Canada

^*

Author to whom correspondence should be addressed.

Water 2025, 17(14), 2153; https://doi.org/10.3390/w17142153

Submission received: 30 May 2025 / Revised: 4 July 2025 / Accepted: 15 July 2025 / Published: 19 July 2025

(This article belongs to the Section Biodiversity and Functionality of Aquatic Ecosystems)

Download

Browse Figures

Versions Notes

Abstract

It is well established that effective management and restoration of freshwater ecosystems is often limited by the availability of reusable data. Although numerous public, private, and nonprofit organizations collect data from freshwater ecosystems, much of what is collected remains inaccessible or unusable by Rights holders and end users (including researchers, practitioners, community members, and decision-makers). In Canada, the federal government plans to improve freshwater data sharing practices through the newly formed Canada Water Agency, which is currently drafting a National Freshwater Data Strategy. Our study aimed to support these efforts by synthesizing insights from the technology sector, where data management and sharing practices are more mature. We interviewed 12 experts from the technology sector, asking them for advice on how to improve data sharing practices in the freshwater science sector. Using a Reflexive Thematic Analysis of participants’ responses to semi-structured interview questions, we identified nine broad recommendations. Recommendations centred on motivating open data sharing, promoting data reuse through data licences, training and skill building, and developing standards and digital solutions that enable data discovery, accessibility, interoperability, and reuse. These recommendations can support the numerous initiatives that are working to improve access to high-quality freshwater data and help address the pressing crisis of global freshwater ecosystem degradation.

Keywords:

freshwater data; open data; data discoverability; data accessibility; data reusability

1. Introduction

Freshwater resource protection and restoration require data to assess the status of freshwater systems, understand stressors and cumulative effects, and predict future conditions [1,2,3,4,5]. Freshwater data are “paramount to supporting evidence-based decision-making” in freshwater resource management [6]. The ability to discover and access freshwater data is critical as the global community recognizes the rapid loss of freshwater biodiversity [7,8], and the importance of freshwater ecosystem services to human health and community well-being [9,10,11,12]. The United Nations Environment Programme recently found that many countries are struggling with freshwater data management, which is a “weak link in the water quality monitoring and assessment chain…” [8] (p. xii). In Canada, most watersheds are data-deficient for baseline studies, ecosystem health assessments, and water resource development project design [1,13,14,15]. Thus, decision-makers urgently need more timely access to comprehensive freshwater data [3,16,17,18].

Global efforts to collect freshwater data have intensified in response to the degraded state of freshwater ecosystems and decline in freshwater biodiversity, resulting in freshwater ecosystems being among the most intensively monitored ecosystems worldwide [7]. Data from freshwater ecosystems across Canada, such as surface water chemistry, water flows, fish tissue chemistry, and fish habitat, are collected by diverse groups, including government agencies, non-profit organizations, academics, communities, and private companies, but are mostly inaccessible [17,19,20,21]. These data often remain in filing cabinets and field books, floppy discs, bookshelves, and on personal hard drives [5,14,22,23]. Freshwater scientists can be reluctant to share their data for a number of reasons, including concerns about competition from other researchers (i.e., other researchers using their data for publications), the time required to manage data for sharing (including data cleaning), lack of recognition for data sharing (i.e., data are not treated similar to formal publications), and fears of others misinterpreting data or exposing errors [5,14,22,24]. Datasets are also sometimes withheld because of perceived liability concerns and data readiness [14]. Essentially, open data sharing is viewed by some scientists as a time sink that detracts from tasks that support their careers while posing a risk to their reputations. Those who are willing to share their data may be unable to do so for a variety of reasons, including a lack of data literacy, lack of detailed guidance on data sharing, confusion over who owns the data, inadequate digital infrastructure, staff turnover, lack of time and/or support, and/or weak data management policies and procedures [5,20,24,25,26]. Ultimately, the requirements for open data have increased without parallel increases in funding, which are needed for scientists to have the time and resources to effectively manage and share their data for reuse [24]. More work is needed in the freshwater science sector to support scientists in making the large quantities of data that are collected available.

Even when datasets are technically available, freshwater scientists can struggle to discover and reuse data in combined datasets from multiple sources [5,22,24]. In Canada, the various organizations that collect and manage freshwater data store them on multiple platforms, creating a large, disparate, and fragmented network of data [3,19,21]. This issue extends to the global community, where online freshwater databases are created in isolation, resulting in data that are scattered and difficult to combine for synthetic analyses across multiple jurisdictions or across long temporal timescales [4,7,24]. When data are collected and managed using multiple methods and/or organizations, the resulting lack of interoperability makes it difficult to combine data into larger datasets [5]. When synthetic or collative exercises are undertaken, considerable time and effort are required to discover and organize data, which creates inefficiencies that reduce the time available for data analysis and interpretation [3,14,27,28]. Further, limited budgets and resources allocated for freshwater studies do not allow for extensive data searches and collation efforts. Overall, inefficient, non-standardized data management and sharing practices in freshwater science reduce the quality of the studies and limit the utility of those data for decision-makers [5].

The reuse of open and discoverable freshwater data is hindered when data are poorly or inconsistently managed [3,5,24,29]. Natural scientists struggle to know how to share their data effectively because they typically receive minimal formal data management training [24,29]. Many natural scientists are self-taught and develop personal protocols for data preparation (i.e., data wrangling: the process of data cleaning, transformation, and structuring), managing, and sharing [30]. Consequently, data are often manually prepared (without coding) in Microsoft Excel with data processing decisions documented in Excel, Word, or emails, if at all [30]. A survey of datasets from natural science research studies published in journals with open data mandates found that 56% of the datasets were incomplete (e.g., missing data or insufficient descriptors about how the data were collected—i.e., metadata) and 64% were not completely reusable (e.g., processed rather than raw data, non-machine-readable file formats) [29]. Journals and data repositories lack the funding and resources required for data quality control that ensures data are reusable [29]. Data must be analysis-ready to inform freshwater resource management decision-making [31].

The United Nations Environment Programme recently recommended that national monitoring authorities adopt open data policies and identify actions to improve freshwater data management practices in support of freshwater resource protection and restoration [8]. In Canada, the newly formed Canada Water Agency has made the development of a National Freshwater Data Strategy one of its priorities to ensure freshwater resource management decisions are informed by robust datasets [6]. To date, many of the recommendations for the strategy have come from stakeholders within the freshwater science sector who have environmental science education and experience [6,32]. While this expertise is essential, the Canada Water Agency may be missing insights from sectors that have more mature data management and sharing practices.

As the global freshwater science sector strives to improve its data management and sharing practices, it is important to learn from different sectors to accelerate progress, avoid duplication of effort, and facilitate the creation of a strategy that will effectively and efficiently achieve its goals. We therefore gathered insights from the technology (tech) sector to explore how the freshwater science sector can improve its data management and sharing practices to produce data that are discoverable, accessible, interoperable, and reusable. There is no standard definition for the “tech sector”, but the term is commonly used to generally refer to companies that build or use advanced technology as part of their business model [33]. For this study, we narrowed our focus to tech companies that have incentive structures associated with effective data management and sharing. We spoke with technology experts (software engineers, software developers, and data scientists) working in the tech sector to discover if there are pragmatic and effective solutions that the freshwater science sector has not yet considered or currently struggles to implement. Our working hypothesis was that technology experts who have worked in the tech sector would have in-depth knowledge of methods and tools not yet considered broadly by freshwater science experts. We present our findings as recommendations for improving freshwater data management and sharing that can be undertaken by individuals and organizations across the freshwater science sector.

2. Materials and Methods

2.1. Study Design

We used Reflexive Thematic Analysis (RTA) to gather recommendations for improving data management and sharing practices from the tech sector [34,35,36]. This widely used approach to thematic analysis, as originally described in Braun & Clarke’s 2006 publication “Using thematic analysis in psychology” was the third most-cited paper of the twenty-first century [37]. “Reflexive thematic analysis is an easily accessible and theoretically flexible interpretative approach to qualitative data analysis that facilitates the identification and analysis of themes” [36].

RTA was particularly appropriate for our cross-sector approach. As freshwater scientists, we did not have a preconceived idea of the recommendations that the technology experts would offer, and we needed an inductive and iterative approach that would allow us to use our sectoral knowledge to interpret responses from technology experts and formulate applicable recommendations through the development and refinement of codes as the interviews progressed. Figure 1 illustrates the approach we undertook and how it aligns with the core steps of Virginia Braun and Victoria Clarke’s approach to RTA [36,38]. Unlike more traditional codebook approaches, the reflexive approach allowed us to learn from participant responses and organically revise codes to identify themes as we learned more with each successive interview [34,36]. RTA does not require coding all data—like Grounded Theory methods—but instead allows flexibility in the coding coarseness to focus on addressing key research objectives [34]. We coded only opinions and insights that supported recommendations for freshwater data management and sharing. Themes identified through interview results were reframed as recommendations once coding all interview transcripts was completed.

2.2. Interviews

Between August and October 2024, the first author conducted one-on-one semi-structured interviews with 12 key informant technology experts (Table 1). Before reaching out to potential study participants, ethical approval was obtained from the University of Waterloo Research Ethics Board (REB #46395). Informed consent was secured from all participants, who were assured of their right to withdraw from the study, without consequence, at any point ahead of publication. The duration of each interview was approximately 60 min. First, we targeted people we knew to be technology experts and knowledgeable about data management and sharing best practices [39]. We then used snowball sampling by subsequently contacting additional potential participants who were recommended by initial participants [39]. Two participants were gained from the snowball sampling.

Using the methods described by Guest et al. [40], we assessed our data saturation rate every two interviews following the first four interviews. No new themes emerged in the two interviews following our first ten interviews. We thus inferred that we had reached data saturation—no new information gained—with our first ten interviews (Supplementary Material 1) and did not seek new participants after completing 12 interviews.

We used strict recruitment criteria. Our inclusion criteria for study participants were minimum five years experience as a software developer, software engineer, or data scientist AND currently working for a company in which revenue is directly linked to effective data management OR not currently working, but held a senior role in a company/organization in which revenue was directly linked to effective data management for a minimum of 10 years. Currently working individuals include those who work independently as contractors. Key informants were sourced from the lead author’s personal network. Participants were recruited using a standard recruitment email (Supplementary Material 2).

At least one week prior to scheduled interviews, each participant was sent a study background document that summarized the current state of data management and sharing in the freshwater science sector (Supplementary Material 3). This allowed all participants to have the same base understanding of issues to be discussed during the interview. Each participant was also sent the interview guide so that they could come to the interview with prepared responses (Supplementary Material 4). The interview guide included a question that asked interviewees if they had specific advice for the federal Canada Water Agency. Participants were informed of the Canada Water Agency and its plans to develop a National Freshwater Data Strategy to provide context on current national freshwater data initiatives within Canada; however, we framed responses to this question as recommendations for federal government agencies generally.

Interviews were semi-structured, with prepared and emergent questions that allowed the interviewer to lead each interview based on participant responses. When participants described experiences or offered insights and opinions, the interviewer reframed the response as a recommendation for the freshwater science sector and asked for feedback. The participant would then agree or disagree with the interviewer’s interpretation and could add further clarification. Interviews were conducted online, and audio was recorded using Microsoft Teams. The audio files were uploaded to Otter.ai (Otter.ai Inc., Mountain View, CA, USA) for transcription [41]. Transcripts were edited in Otter.ai, and, when necessary, recordings were used to confirm the accuracy of the transcripts. The transcripts were reviewed to identify where participants offered insights and opinions that could be framed as recommendations for the freshwater science sector.

For each interview, recommendations, along with quotes from the transcript that supported them, were documented in an interview summary document. The summary document and the full interview transcript were sent to the participant for their review prior to analysis. While participant validation is not necessary in RTA, it is commonly used in thematic analysis to avoid misinterpretations [35]. We asked participants to review specific recommendations captured from their interviews, not the final recommendations we developed from interpreting the collection of recommendations from all of the interviews. Thus, our final recommendations were a product of our interpretive work, which is an appropriate outcome from RTA [35].

2.3. Analysis

We completed a thematic analysis of the participants’ recommendations discussed during interviews. “[Thematic analysis] can be likened to a distillation process by which the researcher identifies or comes face to face with the explicit and implicit meanings they have discerned in the data, and then synthesizes these findings” [42]. Recommendations included statements and discussion points around what the participants thought the freshwater science sector should do and/or should not do to improve data management and sharing practices. We uploaded transcripts to NVivo software (Lumivero, Denver, CO, USA) [43] (Version 14.24.0), which was used to code participant responses. We coded all recommendations, opinions, and insights that were directly relevant to how the freshwater science sector can improve data management and sharing. As described by Finlay [42], thematic analysis involves creating initial codes, searching for themes in the initial codes, and then collapsing or splitting initial themes such that they can be woven together into a narrative that answers the research question. Initial codes reflected specific recommendations offered directly by the participant or interpreted by the interviewer and confirmed by the participant in each interview. As the interviews progressed and the lead author produced themes, specific recommendations were re-coded under more general recommendations to capture overarching themes. Each of these overarching themes contained at least one subtheme that was framed as a specific recommendation for the freshwater science sector. The lead author reviewed the complete list of specific recommendations with co-author NB, who collaborated on developing the final list of broad recommendations. Broad recommendations encompassed several specific recommendations to reduce redundancies, and recommendations that were not supported by at least three participants were removed. The themes and corresponding recommendations are described in our codebook (Supplementary Material 5).

Study Assumptions and Limitations

Our key informants were not randomly selected. It is possible that interviewing a different group of people would have yielded different recommendations. The lead author’s social network was selected because it contained many highly experienced technology experts with a broad array of educational backgrounds and work experiences that span a range of companies in the tech sector. Participants included entrepreneurs that run one-person companies and employees of some of the largest technology firms in the world. Academic backgrounds ranged from no formal computer science training to a PhD in computer science (Table 1). Participants were also located in various jurisdictions, including Canada, the United States of America, Australia, and Germany. Thus, while our participants were not randomly sampled, we believe the interviewees reflected a diverse range of experiences and that the captured common themes would have emerged from other groups.

The interviews were largely guided by the study background document (Supplementary Material 3). We strived to make this document a comprehensive snapshot of the current state of data management and sharing struggles within the freshwater science sector by reviewing academic and grey literature. However, it is possible that a different research team would have created a different summary that could have directed the interviews in different directions.

Developing codes as interviews progressed could have caused codes from earlier interviews to bias coding of later interviews. We did address this concern with interview summaries that we used to gain participant verification for the initial recommendations we distilled from the interviews. However, not all participants reviewed their interview summary.

A core assumption of the RTA approach is the researcher’s role in knowledge production, and their subjective interpretations that influence the organic and recursive coding of data are seen as a strength [44]. Importantly, themes are generated by the researcher rather than “emerging” from the data [44]. The RTA approach does not involve coding reliability measures used in other forms of thematic analysis to test for reliable and accurate coding [44]. All interviews were conducted by the first author, and it is possible that a different researcher, especially one with different knowledge and experiences with data management and sharing practices, could have interpreted the responses of participants differently and developed different themes. We could have improved the reliability of this study using a different methodological approach, such as a coding reliability approach that uses triangulation of researcher interpretations by having other co-authors independently review and code interview transcripts. However, due to our selected methodological approach and resource constraints, all interviews were coded by the first author, who then collaborated with co-author NB to define the final list of the high-level broad recommendations. It is also possible that different reviewers would have developed different, broader recommendations. In the interest of full transparency, we welcome readers to review our original list of 33 specific recommendations in Supplementary Material 6 to see if they have different interpretations of our results. De-identified quotes that support each of the 33 specific recommendations are also available upon request.

All authors of this study are natural and/or social scientists who research freshwater topics. It is possible that including a co-author who was a technology expert would have led to different interpretations of participants’ responses. However, a technology expert may have imparted their own bias based on their experiences and preferences. Instead, we had, as an advisor, a senior software developer (with over 20 years of experience) who we did not interview and was available to clarify topics or terms without influencing coding or theme development.

3. Results and Discussion

Key terms used in the presentation of the results are defined in Table 2. While these terms refer to independent processes, they are sometimes used as umbrella terms that include multiple processes. For example, while data cleaning and data transformation are different steps in the data preparation process, some people consider data transformation as a step in data cleaning, and data cleaning as a step in data transformation [45]. All of these terms refer to important steps in the data preparation process that ensure data can be joined with other datasets and/or analyzed. We use the term data preparation to generally refer to the process that encompasses all of the terms described in Table 2.

We organized the participants’ responses into nine broad recommendations for the freshwater science sector to improve data management and sharing practices (Table 3). These broad recommendations encompass the initial specific 33 recommendations (Supplementary Material 6), except for those that were supported by fewer than three interviewees’ responses. The nine broad recommendations are described in detail below, and each recommendation is bolded.

3.1. Open Data Culture

The freshwater science sector should create a culture of open data with incentive structures that reward data sharing and leaders who demonstrate and promote effective open data practices (92% of participants). “That’s my big-picture take away from all this stuff. If you can’t solve the social problems, the technology won’t really help” (Participant 4). Participants generally focused on the need to prioritize cultural and social motivations that encourage people to want to share their data, over the technical solutions and skills people need to know how to share their data.

The freshwater science sector can learn from the tech sector’s open-source culture to foster an environment that motivates open data sharing (Participants 1, 2, 3, 4). “Even if you’re working for a large corporation, you’re still using open-source software where people have contributed for free to the shared project. It’s kind of crazy that it works. Everybody’s using this software that nobody paid for. But the reason it works is because the social dynamics have been figured out” (Participant 4). Over the past 20 years, open-source culture has permeated the tech sector, spurring rapid innovation by allowing everyone to build on each other’s work—for free (Participants 1, 4, 10). “…open source is one of the key reasons that software has developed as quickly as it has and the space has moved as quickly as it has…” (Participant 10). Similarly, open data culture could accelerate the freshwater science sector’s progress toward the goals of freshwater resource protection and restoration.

To achieve such a culture shift in the freshwater science sector, there needs to be incentives beyond “a vague sense that you should do it out of obligation” to encourage open data sharing (Participants 4). The open-source culture works in the tech sector because there is a lot of prestige, recognition, and social credit granted to individuals who contribute to open-source software (Participant 4). “At tech conferences, all the headline speakers are open-source contributors. The stars of the field are open-source contributors” (Participant 4). Employers also generally show a preference toward hiring open-source contributors and will review open-source contributions to evaluate an applicant’s competencies (Participant 4). These social incentives motivate people to work on open-source projects during their evenings and weekends and seek employment with companies that contribute to open-source projects (Participant 4).

Incentives that make open data sharing prestigious and professionally valuable can motivate scientists of all career stages to upgrade their data skills, including recognizing data as formal publications, rewards for reproducible datasets, and open data mandates for funding and publications (Participants 1, 4, 5, 9, 12). “I think incentive alignment is going to be probably the most important factor” (Participant 3). Scientists have also acknowledged the need for incentives that reward open data sharing to promote a culture change [5,22,23,29,50]. For example, freshwater scientists have acknowledged that they wish to be cited for the datasets they openly share to help support career advancement [24]. Thus, incentives should be designed to help advance the careers of those who openly share high-quality datasets. To be effective, participants stressed these incentives must be paired with processes that verify data against a standard; otherwise, data sharing remains merely a checkbox that fails to encourage people to spend the time and effort required to manage and share reusable data (Participant 1, 4, 5, 12).

Researchers have previously found that data shared in response to open data mandates can be difficult to reuse due, in part, to a lack of formal data peer-review processes [24,29,51]. To alleviate the burden of dataset review at the manuscript publication stage, Sholler et al. [51] recommended integrating data management review throughout the research process. Similarly, Participant 1 recommended that everyone involved in each step of a research study, including the principal investigators, supervisors, funding agencies, reviewers, and study publishers, should be responsible for verifying appropriate data management and sharing (Participant 1).

Federal government agencies interested in promoting better freshwater data management and sharing practices should prioritize developing incentives for data sharing (Participant 3, 4, 7, 8). A national government agency would have the widespread influence needed to bring freshwater stakeholders together to identify and support incentives for individuals and private and public organizations to share their data (Participant 3, 4, 7, 8). The national government agency could lead a working group of representatives from the public and private sectors to design and select incentive structures (Participant 4).

While incentives are developed at the institution level, individual freshwater scientists can promote the open data cultural transition by acting as thought leaders who widely communicate the how and why of open data sharing through blog posts, conference talks, social media posts, journal articles, and guest lectures (Participants 1, 3, 5). In addition to success stories and examples of useful tools and methods, thought leaders can motivate data sharing by communicating how open data can maximize the positive impacts of one’s work (Participants 1, 2, 3). According to Participant 1, open-source software became popular, in part, because people realized that was how they could “make impactful improvements on our world” (Participant 1). Scientists have also recommended continued discussion and demonstration of the benefits of open data sharing to continue the momentum of recent progress toward open data practices in the freshwater science sector [24]. Freshwater researchers are beginning to take on these leadership roles by publishing papers [3,5,7] and attending workshops [14,32] that discuss the need for open data to conserve freshwater ecosystems. When freshwater researchers see clear benefits to data sharing, mastering the requisite technical skills becomes a goal to achieve rather than a burden to avoid.

3.2. Data Licences

Freshwater scientists should use data licences to support data reusability (33% of participants). “I feel like licensing is going to inevitably be a big part of this… part of the reason open-source software grew as much as it did and had as much success as it has had, I think that’s because people had the right licences early on and tended towards very open licences…” (Participant 3). Data licences are documents that describe how data can be used (Participants 1, 2, 3). “When data is released, by default the author holds the full copyright to it. That means other people don’t have rights to use it.” (Participant 1). Thus, freshwater scientists who want to maximize the impact of their data need to share it under a licence that allows others to reuse it (Participants 1, 2, 3). An awareness of data licences may also encourage scientists to share their data, since licences also ensure the contributor receives the right attribution and protection from liability associated with the reuse of their data (Participants 1, 3). Data owners could use licences to guide how they wish their data to be reused and acknowledged rather than requiring individuals to contact them to request data access, which tends to delay projects or discourage people from using that data source [24].

Rather than create new licences, the freshwater science sector should use Creative Commons Licences, which are already appropriate for data sharing and allow a range of data reuse restrictions (Participants 1, 2, 3, 8). Using non-standard data licences can make accessing data time-consuming for those who have to interpret them [7]. It is becoming more common in the freshwater science sector to share data with a data licence, including Creative Commons licences [24]. Through widespread use of Creative Commons Licences, freshwater scientists will maintain control over how their data are used and credited, while freshwater science is advanced through data reuse.

3.3. Data Skills Development

Data literacy in the freshwater science sector should be strengthened by integrating data skills into academic curricula and fostering workplace mentorship and peer-to-peer learning (58% of participants). “In science, I see a lot of really not normalized data, like data that can’t really be brought together into a centralised database, because it’s just a bit of a mess…” (Participant 3). Some participants agreed with scientists who have previously recommended formal education-based strategies to address the lack of data management training in the natural sciences [30,52]. Data management and sharing practices should be embedded in all academic courses that deal with data (Participants 3, 10, 12). Participants also stressed the importance of skills building outside of academia. The tech sector particularly values mentorship and peer-to-peer learning since academia does not provide up-to-date knowledge and skills development for technology that is constantly changing (Participants 2, 3, 4, 6). “I’m a huge advocate for mentorship…because, a lot of times that works out better than any kind of formal training” (Participant 6). Accordingly, the tech sector frequently hires people without academic degrees in computer science, including 42% of our participants (Table 1). Employers in the freshwater science sector should also offer employees mentorship and peer-to-peer learning opportunities to continuously develop their data skills (Participants 2, 3, 10, 12). Continuous training keeps everyone updated on changing technology and increases exposure to data literacy, making it “less scary and less intimidating” (Participants 10, 12). Academia can ensure scientists enter the workforce with a strong data literacy foundation while employers support the current cohort of scientists and keep the sector up-to-date on best practices and current technology.

3.4. Freshwater Data Standard Development

A freshwater data standard is needed to guide data collection and management so that data are reusable and comparable across datasets (58% of participants). Participant 4 described the development of a freshwater data standard as “a huge win, because even if you don’t have complicated IT solutions, or a centralised database, you are getting data that can be compared against other researchers without doing too much work.” Data standards should include data collection and entry methods so that data are in a reusable format from the start, simplifying data management and sharing (Participant 10). “…let’s make the experience at data collection and entry way better, so that it’s already in the format that we want to be able to then reuse” (Participant 10). A standard ontology (i.e., definition of standard terms and their relationships) and clear metadata standards (i.e., information describing how, when, where, and why the data were collected) are also needed to ensure data are reusable and comparable across datasets (Participants 4, 6, 8, 9, 12). While developing a complete ontology may be difficult and unnecessary, there needs to be at least standard definitions and formats for the minimum data required for reuse, such as units of measurement, dates, and location data (Participants 8, 9, 12). The minimum data requirements should be guided by the current and future questions freshwater data users will likely want to answer (Participants 7, 12). Essentially, a freshwater data standard must guide data collectors to enter their data in a way that others can understand and reuse.

The freshwater science sector has acknowledged the need for a freshwater data standard that describes how data should be formatted and managed for reuse, including metadata standards [3,6,24,53]. Several recent initiatives aim to standardize freshwater data, including the publication of the Freshwater Data Publishing Guide by Lento and Schmidt-Kloiber [54]. According to Participant 5, existing standards should be reviewed prior to developing a new standard to avoid duplicated efforts. Thus, any initiative that aims to create a freshwater data standard should complete an updated global review of freshwater data standards as this field is developing rapidly.

A freshwater data standard should be created through a collaborative process and offer examples and support to ensure it is widely adopted and achieves standardization (67% of participants). “For anything that’s about standardization, you need buy-in from the people who are going to have to be using it” (Participant 12). Various freshwater stakeholders should be involved to create a data standard that makes sense for everyone (Participants 1, 4, 12). A draft standard, along with example datasets, should be publicly released for stakeholders to test the standard with real-world data and offer feedback (Participants 1, 4, 6, 8). This feedback process avoids a completely new standard having to be created in a few years (Participants 1, 8). “It’s totally okay to have a version two of the [standard]…But ideally, version two is an evolution, not a revolution” (Participant 8). The standard needs to provide specific guidance along with clear examples that help people know exactly how to use it; otherwise, people will interpret it in various ways, and it will fail to achieve standardization (Participants 1, 2, 5, 8). “I think giving people examples is always the best way to get them to understand the standard” (Participant 5). Managers of the standard can support users with checklists of required components, personnel that offer guidance, scripts that demonstrate how data should be aggregated and formatted, and tools that ingest data and return error messages with clear instructions on how to fix the data (Participants 1, 6, 8). Freshwater science stakeholders need to work together to create a data standard that is easy to follow and widely adopted so that freshwater data can be reusable.

A national government agency with broad knowledge of and influence in the freshwater science sector would be well-positioned to take a leading role in developing a freshwater data standard that requires collaboration with stakeholders and widespread adoption (Participants 4, 5, 6, 9, 10, 12). “The hard part is getting people to sit down and agree on the parts that are common. That’s the kind of thing that’s hard for an individual to do. But a federal agency could force people to come together and do that” (Participant 4). A national agency would also likely have a clear understanding of who the Rights holders and stakeholders are that should be included in the collaborative process to gain consensus on standard development and application.

After creating the standard, the national agency could have the widespread reach and influence to take on the difficult task of getting people to want to use the standard (Participant 4). A national data standard would allow stakeholders throughout the freshwater science sector to have a single standard to point to rather than developing or finding their own standard (Participants 5, 6, 10). The Canada Water Agency is currently developing a data standard as part of the National Freshwater Data Strategy [6,32].

3.5. Building a Centralized Data Solution

A centralized data solution should be designed based on available resources, with the simplest and most essential being one that facilitates data discoverability, then advances with accessibility functionality, and culminates with functionality that ensures data reusability (83% of participants). A centralized data solution is a digital platform that can function as a data registry and/or repository to consolidate data discovery and access in a single place. The participants agreed that a centralized freshwater data solution is needed to reduce the time and resources spent finding, accessing, and reusing freshwater data [3,6,31,53,55]. However, the participants had a variety of opinions on what a centralized freshwater data solution should provide, presenting a spectrum of options that can be selected based on the resources available. These options range from the simplest, which allows users to discover data from one location, to the most resource-intensive option, which hosts analysis-ready data that conforms to a single data standard on a single platform.

The “minimum viable product” for a central data solution would be a data registry—a website with links to freshwater data to support data discoverability (Participant 5). “…as long as there’s a solution that allows me to know where I can get this data first, whether or not it’s standardized, whether or not it’s easily downloadable, that’s a problem for another day, as long as I can find it, I think that’s the key” (Participant 5). Data discoverability is important to prioritize because it is unlikely that a single repository will ever contain all freshwater data (Participants 1, 2, 12), which has held true in the freshwater science sector, where centralisation efforts remain fragmented, with open data records stored in different data portals and repositories [3,7]. Rather than creating yet another repository, a tool should be built that helps users search and access data from pre-existing sources (Participants 1, 5, 9, 12). Once the discoverability problem is addressed, this solution can eventually evolve into a platform with more advanced functions that improve data accessibility by providing a centralized place where people can share their data (Participants 5, 12).

Data repositories that host and share data should focus on features that make data sharing easy (Participant 5). “…avoid having too many requirements and making it too much of a barrier to actually add the data…because then if people see that it’s a lot of work, they might just give up and not add it.” (Participant 5). Data uploads should be simple without numerous steps or detailed forms, and the repository should be easy to find without having to click through several pages to access it (Participant 5). Data contributors should not have to adhere to a data standard, especially freshwater scientists who may lack the skills and/or time required for data preparation (Participants 5, 6, 7, 8, 12). “Data cleaning is a tough, tough thing…So, I worry that it’s hard for the repository to push that back onto the researcher or the company who says, ‘Hey, I want to give you this data.’ And then you say, ‘Oh, well, can you spend 28 h cleaning it?’” (Participant 8). Mesman et al. [24] found that the difficulties in meeting the requirements of data repositories were among the main challenges experienced by freshwater scientists when trying to openly share their data. Data contributors can still be encouraged to follow a data standard, especially for new datasets (Participants 4, 8, 11, 12). The repository can identify which datasets comply with the standard to help data users select datasets that are appropriate for them based on the time and skills they have for data preparation (Participants 2, 8, 9, 12). Thus, data contributors and users who cannot complete data preparation tasks are not blocked from using the repository (Participant 8).

To maximize data contributors and users, a centralized solution would make it easy to reuse data by ensuring all data meet the same standard and are prepared for reuse (Participant 6). “You’re wanting to get it to the point where you have a data warehouse, where you have normalized the data in such a way that you can utilise it more effectively” (Participant 6). To achieve this, the repository needs personnel and tooling to prepare data that are received from contributors to ensure they meet a single standard and are reusable (Participants 6, 7). Volunteers, such as citizen scientists or students, can be recruited to help with data preparation tasks (Participant 11). For example, the Canadian Institute of Ecology and Evolution runs the Living Data Project that recruits students to help archive historical environmental science datasets [56]. Such a repository would meet the needs of freshwater scientists who have been calling for a data repository that helps them avoid learning how to use multiple repositories and join data from multiple sources [24]. While this is the most resource-intensive solution, it could be the most efficient because it relieves the need for all teams throughout the freshwater science sector to acquire data preparation skills—which is unrealistic.

A national government agency could lead the development of a centralized data solution, which it could either own and manage or support by funding, selecting, and coordinating the organization that manages it (Participants 1, 2, 4, 5, 6, 7, 10, 9, 12). The first step would be for the agency to create an inventory of the freshwater data it already owns (Participant 9; e.g., Miller et al. [3]). The agency can then include the requirement to share data to the repository with funding they provide to researchers and scientists (Participants 4, 6).

The manager of a centralized data solution needs to plan for its long-term maintenance (50% of participants). “…people not in software want to treat it like their civil engineering project where we built this bridge, now we can mostly leave it alone. The software tends to be a living thing that accrues rot much more quickly.” (Participant 6). Regardless of the type of data solution selected, it needs to be maintained to stay up-to-date with the latest data (Participants 1, 2, 5). “It’s like a garden, you have to keep grooming it and make it up to date” (Participant 2). The long-term maintenance of existing online open freshwater databases has been complicated by a lack of project funding and commitment [7]. This issue should be addressed and mitigated at the design stage. For example, four research institutes worked together to fund the Freshwater Information Platform, which was designed for long-term use and maintenance [7]. The choice of data solution should be guided by the long-term funding that is available to maintain necessary full-time staff and technology (Participants 1, 2, 5, 6, 7, 11).

A centralized data solution should be designed with user-friendly features, protections for sensitive and private data, and community engagement functionality to naturally emerge as the centralized solution (92% of participants). “I think the fragmentation is a big problem here. That’s something I don’t really have a lot of good solutions for myself, other than a community organising around something that works well for them…” (Participant 3). As online freshwater data platforms proliferate, scientists lament that data continue to be fragmented and difficult to find [7]. True centralization occurs when data contributors and users unite behind a single platform that they recognize as superior to alternatives (Participant 10). Participants offered features and functionality that attract users to a data platform, so that it becomes the centralized solution.

Builders of a centralized data solution should first explore pre-existing tools to inform the design of new tools with features that work well (Participants 1, 3, 7, 10). “…before whoever is the next organization to want to build one of these repositories, first look at a pre-existing tool like GitHub that is very popular in the tech space, and consider, can we just use that, and if not, what can we learn from it to then apply to our own model” (Participant 10). GitHub became the preferred platform in the tech sector because it “had the best tooling, the one that made discoverability the easiest, community contributions the easiest” (Participant 10). Accordingly, GitHub was the tool participants most commonly recommended for the freshwater science sector to explore (Participants 1, 3, 4, 5, 7, 8, 10). “I can’t think of a better example than GitHub” (Participant 5). The natural science community is increasingly using GitHub to share data and associated code used to prepare and analyze data [57], as described by researchers from the Ocean Health Index project who demonstrated how their group used GitHub to make their research more efficient and reproducible [30]. Participants highlighted two GitHub features: data tags, which improve data discoverability with descriptive identifiers that allow users to find the data they need; and version control, which supports data reusability and study reproducibility by allowing data users to know when/if datasets have changed (Participants 1, 2, 5, 7, 8, 9). Regardless of the feature, functionality must be user-friendly to attract users (Participants 5, 10).

Data contributors are more willing to share their data with repositories that offer options to control access to sensitive and private data (Participants 2, 8). “…companies and people with a lot of useful, valuable data are never going to share their data with you unless you have some basic controls in place” (Participant 2). Although making all data openly accessible is easiest, the freshwater science sector must consider and respect data sovereignty and privacy [50]. Indigenous data sovereignty stipulates that Indigenous governments and communities have jurisdiction over the collection and sharing of data collected by or about their peoples and land [50,58]. As such, data sharing platforms need to support Indigenous Data Governance principles, including OCAP (ownership, control, access, and possession) and CARE (collective benefit, authority to control, responsibility, and ethics) [59,60]. While participants did not have experience with data sovereignty, they offered techniques to protect private and sensitive data, including allowing data contributors to restrict data access to certain individuals and organizations, masking sensitive data within a dataset, controlling the level of data aggregation, or having the authority to approve/deny data access requests (Participants 5, 8, 12). Exploring how the recommendations made here could be aligned with Indigenous Data sovereignty principles was beyond the scope of this research study but could be explored in the future by Indigenous-led studies. The ability to restrict data access will also be important to scientists with data containing sensitive information, such as location data for rare or endangered species [52]. Data platforms need functionality that enables data contributors to control access to their data to avoid missing out on valuable datasets.

Community support is one of the most important determinants of the popularity of a data platform. Managers and designers of a centralized data solution can gain community support by ensuring the repository is designed with functionality that facilitates peer-to-peer learning and reciprocity amongst data contributors and users (Participant 7). “All the contributors are part of a forum. If they have questions, if they have concerns, it’s like a whole community where people help each other out.” Online forums allow platform users to interact with questions, guidance, and analytical insights (Participant 7). The platform also needs mechanisms for data users to engage with data contributors about their data to support data reusability (Participant 2). Community members form a symbiotic relationship as contributors share their data with users, who return the favour by reporting data errors and sharing prepared and analyzed datasets (Figure 2; Participants 3, 7, 8, 9, 10, 12). Data users become data contributors, which can be especially important when original data contributors share their data but then become disengaged or unresponsive to questions about their data. The community benefits are amplified when the code (i.e., software) used for data preparation and analysis is also shared, exposing errors, reducing duplicated efforts, and making research truly reproducible (Participants 5, 8, 12). Scientists have also been promoting the sharing of code associated with datasets to further enhance the transparency and reproducibility of environmental science (e.g., [30,57]). Essentially, the community works together to improve and iterate on each other’s work, accelerating collective progress (Participant 10). This community attracts and engages data contributors and users and works to address data fragmentation because the sector unifies around one solution (Participants 3, 7, 8, 12).

The freshwater science sector should not seek out Blockchain technology (75% of participants). “The big problem you have to focus on solving is getting people to come together and share data. And the technology part should be as boring and as simple as possible. And blockchain technology is not boring and simple. It’s abstract, complicated, and unapplied” (Participant 4). Blockchain technology is the only recommendation in the study background document (Supplementary Material 3) that respondents mentioned as something they would not recommend. Blockchain technology has previously been recommended for the freshwater science sector for data security and transparency, with the caveat that it may be cost-prohibitive [22,61]. However, none of the participants thought Blockchain would be useful for the freshwater science sector, and none mentioned costs as its limiting factor. “I’m not a blockchain prioritizer in any way, shape or form. So let’s get that one off the table” (Participant 9). The blockchain recommendation from Gunn and Stanley [22] was based on recommendations for environmental data from the Fintech sector when blockchain was an emerging technology being widely considered for its potential applications. It is important to note that none of our participants identified themselves as working in Fintech, indicating perhaps a bias toward blockchain in the Fintech sector, where the technology is useful. A lot has also changed in technology in recent years, and perhaps experts from the Fintech sector would also no longer recommend blockchain technology for the freshwater science sector. Our study participants cautioned that Blockchain is a relatively new, complicated technology with few proven real-world applications and may deter data contributions as attribution can never be altered or removed (Participants 1, 3, 4, 7, 8, 10). Rather than be a first mover with novel and complicated technology, the freshwater science sector needs solutions that are simple to adopt (Participants 1, 3, 4, 7, 8).

4. Conclusions

Through interviews with twelve technology experts from the tech sector, we developed nine recommendations to help the freshwater science sector improve its data management and sharing practices. Future work could seek to validate our recommendations with a survey of a larger sample of experts from the technology and freshwater science sectors.

One of the biggest hurdles the freshwater science sector needs to overcome is the general reluctance of people to share their data. To address this issue, a culture change that embraces open data is needed, which can be achieved through incentives and leadership that encourage open data sharing. Once people are motivated to want to share their data, they need to become knowledgeable of data sharing practices that make data reusable, including the use of data licences, like Creative Commons Licences. Data literacy foundations should be offered in academia and continuously developed in the workplace. A freshwater data standard can help make data reusable and comparable across datasets—if it contains clear guidance for data collection and entry that is developed in collaboration with freshwater stakeholders. Finally, the freshwater science sector needs simple digital tools that facilitate data discovery and access. The recommended options for a centralized data solution ranged from solutions focused on discovering data in its various original formats to the most resource-intensive option that focuses on data reusability by sharing data that is prepared internally. The best option should be selected based on the need for sophistication and available resources. Regardless of the option selected, the solution must have user-friendly features and functionality that attract and retain data contributors and users so that it becomes the centralized data platform for freshwater scientists. These recommendations can help guide many of the initiatives throughout the global freshwater science sector that are striving to improve open data practices. Freshwater data that are findable, accessible, interoperable, and reusable will help progress the collective goals of the freshwater science sector to advance the practices of freshwater ecosystem protection and restoration.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17142153/s1, Table S1: Saturation assessment with a base size of 4 interviews and run length of 2 interviews; Table S2: Code Book; Table S3: Summary list of specific recommendations from technology experts for the freshwater science sector; Document S1: Recruitment Email; Document S2: Study Background Document [3,6,13,14,15,16,17,19,20,21,22,25,26,27,28,29,30,31,32,50,53,62 63,64,65,66,67,68,69]; Document S3: Interview Guide.

Author Contributions

Conceptualization: J.K.; methodology: J.K., N.T.B., G.E., G.G., S.C.C.; data collection: J.K.; data analysis and interpretation: J.K., N.T.B.; supervision: H.S., S.C.C.; visualization: J.K.; writing—original draft: J.K.; writing—review and editing: N.T.B., G.E., G.G., H.S., S.C.C. All authors have read and agreed to the published version of the manuscript.

Funding

During the time of this study, J.K. (lead author) received funding from the Province of Ontario and the University of Waterloo through an Ontario Graduate Scholarship (OGS)/Queen Elizabeth Scholarship in Science & Technology (QEII-GSST), and the Faculty of Environment at the University of Waterloo through funds provided by S. Courtenay (2700-100-10001-10256).

Data Availability Statement

De-identified quotes that support each of the recommendations are available on request from the corresponding author.

Acknowledgments

We thank the technology experts who generously gave their time to participate in interviews for this study, and Andrew McGrath for his knowledge and advice as a technology disciplinary expert. During the preparation of this study, J.K. (lead author) used Otter.ai [41] to generate transcripts of the audio files for each interview. J.K. reviewed and edited each transcript using the original audio files. J.K. used Zotero [70] (Version 7) to format the in-text citations and reference list. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arnold, L.M.; Hanna, K.; Noble, B. Freshwater Cumulative Effects and Environmental Assessment in the Mackenzie Valley, Northwest Territories: Challenges and Decision-Maker Needs. Impact Assess. Proj. Apprais. 2019, 37, 516–525. [Google Scholar] [CrossRef]
Blakley, J.; Russell, J. International Progress in Cumulative Effects Assessment: A Review of Academic Literature 2008–2018. J. Environ. Plan. Manag. 2022, 65, 186–215. [Google Scholar] [CrossRef]
Miller, C.B.; Cleaver, A.; Huntsman, P.; Asemaninejad, A.; Rutledge, K.; Bouwhuis, R.; Rickwood, C.J. Predicting Water Quality in Canada: Mind the (Data) Gap. Can. Water Resour. J. Rev. Can. Ressour. Hydr. 2022, 47, 169–175. [Google Scholar] [CrossRef]
Schmidt-Kloiber, A.; De Wever, A. Biodiversity and Freshwater Information Systems. In Riverine Ecosystem Management; Schmutz, S., Sendzimir, J., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 391–412. ISBN 978-3-319-73249-7. [Google Scholar]
Smits, A.P.; Hall, E.K.; Deemer, B.R.; Scordo, F.; Barbosa, C.C.; Carlson, S.M.; Cawley, K.; Grossart, H.; Kelly, P.; Mammola, S.; et al. Too Much and Not Enough Data: Challenges and Solutions for Generating Information in Freshwater Research and Monitoring. Ecosphere 2025, 16, e70205. [Google Scholar] [CrossRef]
Environment and Climate Change Canada. In Toward the Creation of a Canada Water Agency; Environment and Climate Change Canada: Gatineau Quebec, QC, Canada, 2021; p. 37.
Schmidt-Kloiber, A.; Bremerich, V.; De Wever, A.; Jähnig, S.C.; Martens, K.; Strackbein, J.; Tockner, K.; Hering, D. The Freshwater Information Platform: A Global Online Network Providing Data, Tools and Resources for Science and Policy Support. Hydrobiologia 2019, 838, 1–11. [Google Scholar] [CrossRef]
United Nations Environment Programme. In Progress on Ambient Water Quality: Mid-Term Status of SDG Indicator 6.3.2 and Acceleration Needs, with a Special Focus on Health; United Nations Environment Programme: Nairobi, Kenya, 2024.
Brandes, O.; Simms, R.; O’Riordan, J.; Bridge, G. Towards Watershed Security: The Role of Water in Modernized Land Use Planning in British Columbia; POLIS Project on Ecological Governance: Victoria, BC, Canada, 2020; p. 56. [Google Scholar]
Lynch, A.J.; Cooke, S.J.; Arthington, A.H.; Baigun, C.; Bossenbroek, L.; Dickens, C.; Harrison, I.; Kimirei, I.; Langhans, S.D.; Murchie, K.J.; et al. People Need Freshwater Biodiversity. WIREs Water 2023, 10, e1633. [Google Scholar] [CrossRef]
Ncube, S.; Beevers, L.; Momblanch, A. Towards Intangible Freshwater Cultural Ecosystem Services: Informing Sustainable Water Resources Management. Water 2021, 13, 535. [Google Scholar] [CrossRef]
Valenca Pinto, L.V.; Inácio, M.; Pereira, P. Green and Blue Infrastructure (GBI) and Urban Nature-Based Solutions (NbS) Contribution to Human and Ecological Well-Being and Health. Oxf. Open Infrastruct. Health 2023, 1, ouad004. [Google Scholar] [CrossRef]
Carver, M.; Utzig, G.; Hartwig, K. Hydrology Workshop Proceedings—Expanding Water Monitoring within the Upper Columbia Basin; Living Lakes Canada: Nelson Invermere, BC, Canada, 2020; p. 50. [Google Scholar]
Goucher, N.; DuBois, C.; Day, L. Workshop Report: Data Needs in the Great Lakes Region; The Gordon Foundation: Toronto, ON, Canada, 2021. [Google Scholar]
Water Rangers Looking to the Future. Available online: https://www.watershedreports.ca/looking-to-the-future/ (accessed on 6 June 2025).
Hartwig, K.; Thurston, P.; Smith, H.; Carver, M.; Utzig, G.; MacDonald, R.; Jollymore, A.; Trigg, N. Water Security through Community-Directed Monitoring in the Canadian Columbia Basin: Democratizing Watershed Data. Water Int. 2024, 49, 429–438. [Google Scholar] [CrossRef]
Stanley, M.; Gunn, G. Using Technology to Solve Today’s Water Challenges. 2018. Available online: https://www.iisd.org/system/files/publications/using-technology-solve-water-challenges.pdf (accessed on 29 April 2024).
Water Europe Water Resilience Strategy: A Step towards a Water-Smart Society 2025. Available online: https://www.iawd.at/eng/libraryresource/1072/details/w/0/water-resilience-strategy-a-step-towards-a-water-smart-society/ (accessed on 6 June 2025).
DataStream House of Commons Standing Committee on Environment and Sustainable Development: Study on Freshwater 2024. Available online: https://www.ourcommons.ca/Content/Committee/441/ENVI/Brief/BR12853527/br-external/DataStream-e.pdf (accessed on 24 January 2024).
Ho, E. Freshwater Quality Monitoring in Ontario: Strengths, Weaknesses, Opportunities, Threats, and Recommendations. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 2021. [Google Scholar]
Wrona, F.J.; Munkittrick, K. Brief on Monitoring, Evaluation and Reporting (MER) Challenges for Freshwater Systems in Canada 2024. Available online: https://www.ourcommons.ca/Content/Committee/441/ENVI/Brief/BR12990321/br-external/WronaFrederick-e.pdf (accessed on 17 March 2024).
Gunn, G.; Stanley, M. Harnessing the Flow of Data: Fintech Opportunities for Ecosystem Management; International Institute for Sustainable Development: Winnipeg, MB, Canada, 2018; p. 13. [Google Scholar]
Stall, S.; Yarmey, L.; Cutcher-Gershenfeld, J.; Hanson, B.; Lehnert, K.; Nosek, B.; Parsons, M.; Robinson, E.; Wyborn, L. Make Scientific Data FAIR. Nature 2019, 570, 27–29. [Google Scholar] [CrossRef]
Mesman, J.P.; Barbosa, C.C.; Lewis, A.S.L.; Olsson, F.; Calhoun-Grosch, S.; Grossart, H.-P.; Ladwig, R.; La Fuente, R.S.; Münzner, K.; Nkwalale, L.G.T.; et al. Challenges of Open Data in Aquatic Sciences: Issues Faced by Data Users and Data Providers. Front. Environ. Sci. 2024, 12, 1497105. [Google Scholar] [CrossRef]
Hahnel, M.; Smith, G.; Scaplehorn, N.; Schoenenberger, H.; Day, L. The State of Open Data 2023; Digital Science: London, UK, 2023; p. 6220141. [Google Scholar] [CrossRef]
Roche, D.G.; Granados, M.; Austin, C.C.; Wilson, S.; Mitchell, G.M.; Smith, P.A.; Cooke, S.J.; Bennett, J.R. Open Government Data and Environmental Science: A Federal Canadian Perspective. FACETS 2020, 5, 942–962. [Google Scholar] [CrossRef]
Johnson, D.; Lalonde, K.; McEachern, M.; Kenney, J.; Mendoza, G.; Buffin, A.; Rich, K. Improving Cumulative Effects Assessment in Alberta: Regional Strategic Assessment. Environ. Impact Assess. Rev. 2011, 31, 481–483. [Google Scholar] [CrossRef]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; Da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
Roche, D.G.; Kruuk, L.E.B.; Lanfear, R.; Binning, S.A. Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoS Biol. 2015, 13, e1002295. [Google Scholar] [CrossRef]
Lowndes, J.S.S.; Best, B.D.; Scarborough, C.; Afflerbach, J.C.; Frazier, M.R.; O’Hara, C.C.; Jiang, N.; Halpern, B.S. Our Path to Better Science in Less Time Using Open Data Science Tools. Nat. Ecol. Evol. 2017, 1, 0160. [Google Scholar] [CrossRef]
Great Lakes Observing System. In Common Strategy for Smart Great Lakes; Great Lakes Observing System: Ann Arbor, MI, USA, 2021; p. 24.
Canada Water Agency. National Freshwater Data Strategy Workshop 2024—Summary. Available online: https://www.canada.ca/en/canada-water-agency/data-collaboration/national-freshwater-data-strategy/workshop-2024/workshop-2024-summary.html (accessed on 15 February 2025).
Hooton, C. Defining Tech: An Examination of How the ‘Technology’ Economy Is Measured. Nord. Balt. J. Inf. Commun. Technol. 2018, 2018, 101–120. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Can I Use TA? Should I Use TA? Should I Not Use TA? Comparing Reflexive Thematic Analysis and Other Pattern-based Qualitative Analytic Approaches. Couns. Psychother. Res. 2021, 21, 37–47. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Toward Good Practice in Thematic Analysis: Avoiding Common Problems and Be(Com)Ing a Knowing Researcher. Int. J. Transgender Health 2023, 24, 1–6. [Google Scholar] [CrossRef]
Byrne, D. A Worked Example of Braun and Clarke’s Approach to Reflexive Thematic Analysis. Qual. Quant. 2022, 56, 1391–1412. [Google Scholar] [CrossRef]
Pearson, H.; Ledford, H.; Hutson, M.; Van Noorden, R. Exclusive: The Most-Cited Papers of the Twenty-First Century. Nature 2025, 640, 588–592. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Using Thematic Analysis in Psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
Young, J.C.; Rose, D.C.; Mumby, H.S.; Benitez-Capistros, F.; Derrick, C.J.; Finch, T.; Garcia, C.; Home, C.; Marwaha, E.; Morgans, C.; et al. A Methodological Guide to Using and Reporting on Interviews in Conservation Science Research. Methods Ecol. Evol. 2018, 9, 10–19. [Google Scholar] [CrossRef]
Guest, G.; Namey, E.; Chen, M. A Simple Method to Assess and Report Thematic Saturation in Qualitative Research. PLoS ONE 2020, 15, e0232076. [Google Scholar] [CrossRef] [PubMed]
Otter.ai Inc. Otter.Ai 2024. Available online: https://otter.ai (accessed on 6 June 2025).
Finlay, L. Thematic Analysis: The ‘Good’, the ‘Bad’ and the ‘Ugly’. Eur. J. Qual. Res. Psychother. 2021, 11, 103–116. [Google Scholar]
QSR International Pty Ltd. NVivo 2023. Version 14.24.0. Available online: https://lumivero.com/products/nvivo/ (accessed on 6 June 2025).
Braun, V.; Clarke, V. Reflecting on Reflexive Thematic Analysis. Qual. Res. Sport Exerc. Health 2019, 11, 589–597. [Google Scholar] [CrossRef]
Enterprise Big Data Framework the Difference between Data Wrangling and Data Cleaning. Differ. Data Wrangling Data Clean. 2023. Available online: https://www.bigdataframework.org/knowledge/the-difference-between-data-wrangling-and-data-cleaning/ (accessed on 15 February 2025).
Salesforce Guide to Data Cleaning: Definition, Benefits, Components, and How to Clean Your Data. Available online: https://www.tableau.com/learn/articles/what-is-data-cleaning (accessed on 6 June 2025).
Teradata. What Is Data Standardization? Available online: https://www.teradata.com/insights/data-platform/what-is-data-standardization (accessed on 6 June 2025).
Ibitola, J. Data Normalization Demystified: A Guide to Cleaner Data. Available online: https://www.flagright.com/post/data-normalization-demystified-a-guide-to-cleaner-data (accessed on 6 June 2025).
Snowflake Inc. Database Normalization for Faster Data Science. Available online: https://www.snowflake.com/trending/data-normalization-flexible-data-science/ (accessed on 15 January 2025).
Zipper, S.C.; Stack Whitney, K.; Deines, J.M.; Befus, K.M.; Bhatia, U.; Albers, S.J.; Beecher, J.; Brelsford, C.; Garcia, M.; Gleeson, T.; et al. Balancing Open Science and Data Privacy in the Water Sciences. Water Resour. Res. 2019, 55, 5202–5211. [Google Scholar] [CrossRef]
Sholler, D.; Ram, K.; Boettiger, C.; Katz, D.S. Enforcing Public Data Archiving Policies in Academic Publishing: A Study of Ecology Journals. Big Data Soc. 2019, 6, 2053951719836258. [Google Scholar] [CrossRef]
Powers, S.M.; Hampton, S.E. Open Science, Reproducibility, and Transparency in Ecology. Ecol. Appl. 2019, 29, e01822. [Google Scholar] [CrossRef]
Sheelanere, P.; Noble, B.F.; Patrick, R.J. Institutional Requirements for Watershed Cumulative Effects Assessment and Management: Lessons from a Canadian Trans-Boundary Watershed. Land Use Policy 2013, 30, 67–75. [Google Scholar] [CrossRef]
Lento, J.; Schmidt-Kloiber, A. Freshwater Data Publishing Guide; Global Biodiversity Information Facility: Copenhagen, Denmark, 2025. [Google Scholar] [CrossRef]
Maasri, A.; Jähnig, S.C.; Adamescu, M.C.; Adrian, R.; Baigun, C.; Baird, D.J.; Batista-Morales, A.; Bonada, N.; Brown, L.E.; Cai, Q.; et al. A Global Agenda for Advancing Freshwater Biodiversity Research. Ecol. Lett. 2022, 25, 255–263. [Google Scholar] [CrossRef] [PubMed]
Canadian Institute of Ecology and Evolution. Living Data Project. Available online: https://www.ciee-icee.ca/ldp.html (accessed on 6 June 2025).
Cheruvelil, K.S.; Soranno, P.A. Data-Intensive Ecological Research Is Catalyzed by Open Science and Team Science. BioScience 2018, 68, 813–822. [Google Scholar] [CrossRef]
Carroll, S.R.; Garba, I.; Figueroa-Rodríguez, O.L.; Holbrook, J.; Lovett, R.; Materechera, S.; Parsons, M.; Raseroka, K.; Rodriguez-Lonebear, D.; Rowe, R.; et al. The CARE Principles for Indigenous Data Governance. Data Sci. J. 2020, 19, 43. [Google Scholar] [CrossRef]
Jennings, L.; Anderson, T.; Martinez, A.; Sterling, R.; Chavez, D.D.; Garba, I.; Hudson, M.; Garrison, N.A.; Carroll, S.R. Applying the ‘CARE Principles for Indigenous Data Governance’ to Ecology and Biodiversity Research. Nat. Ecol. Evol. 2023, 7, 1547–1551. [Google Scholar] [CrossRef]
First Nations Information Governance Centre. The First Nations Principles of OCAP®. Available online: https://fnigc.ca/ocap-training/ (accessed on 6 June 2025).
DataStream. Blockchain and DataStream. Available online: https://datastream.org/en-ca/article/blockchain-and-datastream (accessed on 6 June 2025).
Corporation for Digital Scholarship Zotero 2025. Version 7. Available online: https://www.zotero.org/ (accessed on 6 June 2025).
Blodgett, D.; Read, E.; Lucido, J.; Slawecki, T.; Young, D. An Analysis of Water Data Systems to Inform the Open Water Data Initiative. JAWRA J. Am. Water Resour. Assoc. 2016, 52, 845–858. [Google Scholar] [CrossRef]
Statistics Canada About the Data Science Network for the Federal Public Service. Available online: https://www.statcan.gc.ca/en/data-science/network/about. (accessed on 18 July 2024).
Government of Canada GC Data Community. Available online: https://www.csps-efpc.gc.ca/partnerships/data-community-eng.aspx (accessed on 18 July 2024).
Canada School of Public Service Learning Catalogue. Available online: https://catalogue.csps-efpc.gc.ca/catalog?cm_locale=en&reveal_topic=0&reveal_subtopic=0&reveal_competency=0&reveal_duration=0&reveal_type=0&reveal_delivery_method=0&query=Data&products_per_page=12&products_sort_order=relevancy&pagename=Catalog (accessed on 18 July 2024).
Government of Canada Standard on Geospatial Data. Available online: https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=16553 (accessed on 18 July 2024).
Kroetsch, N. Improving Environmental Monitoring Collaborations Through Co-Development of Data Management Plans: A Guide for Resource Management Agencies and Environmental Stewardship Groups; Simon Fraser University: Vancouver, BC, Canada, 2021; p. 44. [Google Scholar]
Digital Research Alliance of Canada DMP Assistant. Available online: https://dmp-pgd.ca/ (accessed on 18 July 2024).
Government of Canada 2023-2026 Data Strategy for the Federal Public Service. Available online: https://www.canada.ca/en/treasury-board-secretariat/corporate/reports/2023-2026-data-strategy.html (accessed on 18 July 2024).

Figure 1. Illustration of study methods and how methods align with the core steps of Reflexive Thematic Analysis [36,38]. The lead author refined the themes into the initial list of 33 specific recommendations and then reviewed the complete list of specific recommendations with co-author NB, who collaborated on developing the final list of broad recommendations. Broad recommendations encompassed several specific recommendations to reduce redundancies, and recommendations that were not supported by at least three participants were removed.

Figure 2. A symbiotic relationship between data contributors and users that can be facilitated with a data repository.

Table 1. Profile of study participants, including their participant number identifiers (No.), current job title (title), years of professional experience (years), skills and expertise, and formal education from a post-secondary academic institution.

No.	Title	Years	Skills and Expertise	Formal Education
1	Independent Open-Source Developer	20+	Build open-source software, some of which is used by millions of people	Bachelor of Computer Science, specialization in software engineering
2	Director of Engineering	12	Web application development at scale	Bachelor of Science in Physics
3	Software Developer	15	Product development and web front-end technology	Bachelor of Fine Arts
4	Software Developer and Entrepreneur	20	Specializing in knowledge translation in the technology sector	Bachelor of Computer Science
5	Data Scientist in E-Commerce	7	Big data analysis, machine learning, a/b testing, SQL, Python	Bachelor of Mathematics, minor in Computer Science
6	Chief Software Architect for a Market Research Company	25	Large data processing and distributed systems, designed some of the largest performing order processing systems in the world	None
7	Software Developer, Chief Technology Officer, and Entrepreneur	8	Location data and mapping	Bachelor of Computer Science
8	Software Developer for Mobile Applications	16	Building and using relational databases	Master’s degree in Software Engineering
9	Technical Leader	30	Data architecture and data science	None
10	Chief Technology Officer in Technology Startups	20+	Data engineering and data platforms	Bachelor of Computer Engineering
11	Software Engineer	15	Data streaming systems	PhD in Programming Languages and Stream Fusion
12	Data Scientist in Energy and Tech	7	Generalist	PhD in Math

Table 2. Definitions of data science terms used in describing recommendations.

Term	Definition
Data cleaning	Data cleaning is the process of removing data that should not be in your dataset [46]. This process includes removing duplicate data; identifying and correcting errors, inconsistencies, and missing values; and removing problematic outliers [46].Data cleaning is a typical step in Quality Assurance and Quality Control (QA/QC) processes. Some view data cleaning as also ensuring the data is in a consistent format that conforms to a data standard, and transforming the data into a format that is useful for analysis [45].
Data transformation	Data transformation involves converting data from one format or structure into another (i.e., data wrangling, data munging) to prepare it for storage and analysis [46]. Transformations include converting data from wide to long or long to wide dimensions.
Data standardization	“Data standardization is the process of converting data from various sources into a common, uniform format” [47]. The process of data standardization involves data transformation [47]. This process involves standardizing data formats, naming conventions, and values [47]. For example, dates may appear in two different formats, such as “DD-MM-YYYY” and “MM-DD-YYYY”. Data standardization transforms all dates into a selected uniform date format (e.g., “YYYY-MM-DD”).
Data normalization	Data normalization is a term typically used to refer to database design [48]. Data normalization is used to “clean up” unstructured or semi-structured data that is difficult to analyze [49]. Data normalization is a process that applies a set of rules to standardize and organize data and that removes data anomalies and redundancies so that data can then be easily grouped, understood, and interpreted [49]. Data normalization is important to be able to combine datasets from multiple sources [49]. There are different rules associated with data normalization. For example, the first rule is meant to make the data easier to search by ensuring all attributes have a unique name, each entry is unique, and each cell has only a single value [49]. Although they are distinct, data normalization is sometimes used interchangeably with data standardization and data cleaning, partly because one of the goals of normalization is standardization [47,48,49]. Data normalization aims to achieve clean, structured, and standardized data [48].

Table 3. Summary list of broad recommendations from technology experts for the freshwater science sector. The participant identifier numbers are included to show which participants shared insights that supported each recommendation. Participant identifier numbers correspond to Table 1.

Theme/Recommendation	Participants
Open Data Culture
1. The freshwater science sector should create a culture of open data with incentive structures that reward data sharing and leaders who demonstrate and promote effective open data practices.	1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12
Data Licences
2. Freshwater scientists should use data licences to support data reusability.	1, 2, 3, 8
Skills Development
3. Data literacy in the freshwater science sector should be strengthened by integrating data skills into academic curricula and fostering workplace mentorship and peer-to-peer learning.	2, 3, 4, 6, 7, 10, 12
Freshwater Data Standard Development
4. A freshwater data standard is needed to guide data collection and management so that data are reusable and comparable across datasets.	4, 6, 7, 8, 9, 10, 12
5. A freshwater data standard should be created through a collaborative process and offer examples and support to ensure it is widely adopted and achieves standardization.	1, 2, 4, 5, 6, 7, 8, 12
Building Centralized Data Solutions
6. A centralized data solution should be designed based on available resources, with the simplest and most essential being one that facilitates data discoverability, then advances with accessibility functionality, and culminates with functionality that ensures data reusability.	1, 2, 4, 5, 6, 7, 8, 9, 11, 12
7. The manager of a centralized data solution needs to plan for its long-term maintenance.	1, 2, 5, 6, 7, 11
8. A centralized data solution should be designed with user-friendly features, protections for sensitive and private data, and community engagement functionality to naturally emerge as the centralised solution.	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12
9. The freshwater science sector should not seek out Blockchain technology.	1, 3, 4, 6, 7, 8, 9, 10, 11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kidd, J.; Bergbusch, N.T.; Epstein, G.; Gunn, G.; Swanson, H.; Courtenay, S.C. A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector. Water 2025, 17, 2153. https://doi.org/10.3390/w17142153

AMA Style

Kidd J, Bergbusch NT, Epstein G, Gunn G, Swanson H, Courtenay SC. A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector. Water. 2025; 17(14):2153. https://doi.org/10.3390/w17142153

Chicago/Turabian Style

Kidd, Jess, Nathanael T. Bergbusch, Graham Epstein, Geoffrey Gunn, Heidi Swanson, and Simon C. Courtenay. 2025. "A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector" Water 17, no. 14: 2153. https://doi.org/10.3390/w17142153

APA Style

Kidd, J., Bergbusch, N. T., Epstein, G., Gunn, G., Swanson, H., & Courtenay, S. C. (2025). A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector. Water, 17(14), 2153. https://doi.org/10.3390/w17142153

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fresh Perspective on Freshwater Data Management and Sharing: Exploring Insights from the Technology Sector

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Interviews

2.3. Analysis

Study Assumptions and Limitations

3. Results and Discussion

3.1. Open Data Culture

3.2. Data Licences

3.3. Data Skills Development

3.4. Freshwater Data Standard Development

3.5. Building a Centralized Data Solution

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI