The Data Stewardship Challenge
Effective fisheries management relies on robust, accessible data on abundance, health, and environmental conditions integrated over large spatial areas and fine temporal scales. Yet across many fisheries, data remain fragmented, inconsistently measured, and incomplete across time, space, and life history stages, which limits their value for research, hypothesis testing, and decision-making Marmorek et al. (2011); Inman et al. (2021); Diack et al. (2024). Collections are often managed by diverse agencies and organizations in isolation, undermining timely, integrated analyses necessary for adaptive management. The absence of organizational support and dedicated roles for data stewardship further compounds this problem, with critical tasks often treated as ad hoc responsibilities rather than core management functions. These issues limit the utility of these data for research and management.
Fragmented Landscapes
For salmon in particular, the integration and mobilization of salmon data mirrors the complexity of their life cycle, life histories, and migrations across ecological regions (Groot & Margolis, 1991) and jurisdictional boundaries. Salmon biologists routinely collect information managed by diverse agencies and organizations, often isolated within their operational context and without a focus on interoperability. While localized successes in data coordination exist—particularly within regional fisheries management offices and treaty commissions (Fraser River Panel Data Application, 2025)—salmon data integrated across agencies for each phase of the salmon life cycle is uncommon and costly for most programs. Even within organizations, data can be siloed by data type with freshwater, estuary, open-ocean, and commercial fishery data each housed in separate data systems with limited ability to easily re-connect the data through shared identifiers. The result is a highly fragmented data landscape that spans agencies, organizations, and monitoring programs, making integration especially difficult. This lack of coordinated stewardship has become a significant impediment for adaptive salmon management and conservation, preventing the full use of existing data to inform timely and effective decisions.
This fragmentation is compounded by the number of scientific disciplines involved in piecing together a comprehensive understanding of individual salmon stocks. Geneticists, oceanographers, freshwater ecologists, stock assessment biologists, and fisheries managers are just a sample of disciplines that contribute data to piece together a complete life history for a salmon population. Each uses their own domain-specific conventions and workflows, standards, and metadata. This tangle of disciplinary and organizational fragmentation slows integration, hinders reproducibility, and delays analyses that could otherwise inform time-sensitive management decisions, conservation actions, or restoration plans. Large volumes of data collected through long-term monitoring programs hold tremendous value, especially for secondary users—but are often inaccessible due to a lack of time, resources, and incentives for data producers to publish them (LINDENMAYER et al., 2012). When critical datasets are hard to find, access, or interpret, biologists and analysts lose valuable time trying to reconstruct or harmonize them. This reduces transparency, increases the risk of errors, delays urgent conservation or management responses, which in turn undermines the public trust in both the science and the resulting management decisions. Modernizing these systems will require coordinated communities of practices, shared practices, and shifting collective expectations and culture towards shared international data standards and stewardship practices that accommodate the full disciplinary and geographic diversity of salmon science.
Coordinated Data Stewardship
Data stewardship offers to face the growing challenges in salmon management from escalating environmental uncertainties due to climate change which demands rapid, regionally integrated, and robust data Ward et al. (2025). The mismatch between fragmented data systems and fixed administrative and jurisdictional boundaries creates an urgent need for interoperable, dynamic, multi-scale data stewardship that can adapt to shifting ecological and management priorities. Despite the scale and importance of these datasets, biologists who collect and manage salmon data are ofteno expected to act as de facto data stewards without training, guidance, incentives, organizational support, or access to community-agreed best practices. Tasks such as documenting methods, aligning terminology, formatting for data sharing, and publishing data are typically performed off the side of a biologist’s desk. Done well, data stewardship is skilled, labor-intensive work—combining domain expertise, information science, and relationship-building—and it requires dedicated time and institutional support rather than ad hoc effort. A lack of organizational support (Diack et al., 2024), training (Volk et al., 2014), and dedicated roles for data management further relegate critical data stewardship tasks to an ad hoc status. Deferred documentation, governance, and standardization tasks accumulate as data technical debt—work left undone that compounds over time, making integration and reanalysis progressively slower and more error-prone. The absence of clear roles, standards, and community-endorsed practices leaves even motivated scientists unsure how to structure their data for future use. As a result, data stewardship is inconsistent and reactive, and data integration remains a major bottleneck to adaptive management and ecosystem-scale learning.
Due to a lack of practical resources, both salmon biologists struggle to align their data with community-agreed principles such as FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al., 2016) and Indigenous Data Sovereignty frameworks like the CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics) (Carroll et al., 2019; Jennings et al., 2023). Adhering to CARE data management principles is all the more important when it comes to salmon related data given the sociocultural importance of salmon to First Nations, Tribes, and Indigenous communities throughout the North Pacific and North Atlantic regions (Earth Economics, 2021; Ween & Colombi, 2013). While both FAIR and CARE present laudable goals with respect to data stewardship, these sets of principles can sometimes be at odds with each other and implementing both presents an inherent challenge. Nonetheless, several of the best practices we propose can help clarify shared objectives and increase transparency and trust that is essential for integrating both FAIR and CARE principles. Without clear support and guidance, well-intentioned practitioners are left with ad hoc approaches that limit reuse and interoperability. This gap can only be bridged by equipping both data producers and stewards with tools, support, and organizational backing to publish interoperable, machine-readable metadata and datasets in alignment with shared principles.
Framework for Action
In this paper, we provide a framework for translating data stewardship principles into actionable practices, examples, and workflows to help salmon biologists improve the usability, traceability, and long-term impact of their data. We offer seven practices for salmon data stewardship Table 1 and use a retrospective, real-world case study of cross-jurisdictional sockeye productivity synthesis (Peterman & Dorner, 2012)to illustrate how the absence of shared practices made integration time-consuming and keeps annual updates costly and difficult to this day. To support adoption, we provide real-world examples from multiple organizations (Appendix 1), a training roadmap for biologists taking on stewardship responsibilities (Appendix 2), and a getting-started checklist (Appendix 3). We also map the seven practices to widely used data-lifecycle model Table 2 to make adaptation straightforward outside salmon contexts. Although salmon provide the worked example, the approach is intended as a transferable blueprint: other scientific communities can substitute their own vocabularies, repositories, governance structures, and incentives to turn data stewardship principles in to practical applications.
A coordinated approach to stewarding salmon data should follow established open science standards and principles (Carroll et al., 2020; Johnson & Stap, 2024; Wilkinson et al., 2016), tailored specifically within the context of salmon research and management. Our practices build upon and extend existing data and metadata standards and infrastructure including Darwin Core, OBIS, schema.org, and OBO Foundry ontologies, ensuring compatibility with broader biodiversity informatics infrastructure while avoiding reinventing foundational frameworks. Achieving meaningful interoperability of data among programs, organizations and scientific domains demands both breadth and depth. Broad interoperability integrates diverse scientific domains, systems, and formats, requiring structured, machine-readable data and metadata published openly for maximum discoverability. Deep interoperability demands precise definitions of domain-specific terms and methods, ensuring data remains meaningful and usable across contexts. Salmon data stewards can improve outcomes for salmon by coordinating across boundaries to develop a shared foundation of data stewardship practices. To address these foundational challenges, we must establish clear data stewardship roles and practices that span the entire data lifecycle and salmon lifecycle—from collection and documentation through integration, long-term preservation, and reuse and from gravel to estuary and the high seas back to the gravel.
Defining Data Stewardship in Salmon Science
Data stewardship encompasses the coordinated practices, roles, and responsibilities necessary to effectively manage, share, and reuse data throughout its lifecycle (NOAA, 2007; Peng et al., 2018; Plotkin, 2014). It includes ensuring data quality, compliance with agreed-upon standards, and the establishment of clear governance to guide data collection, documentation, integration, and preservation. However, given the complexity of the salmon data landscape, stewardship goes beyond mere technical data management to include actively facilitating collaboration, communication, and consensus-building among data producers and users across multiple organizations and jurisdictions.
Data stewardship represents a critical sub-discipline within the broader field of data science. While data science is often narrowly associated with machine learning and statistical modeling, we adopt a more comprehensive view that encompasses how we treat, handle, and represent data, along with the social and technical information systems that enable data-intensive science. Data stewardship focuses on the practical implementation of these principles—ensuring that data infrastructure, standards, and practices actually serve scientific and management needs rather than remaining theoretical constructs.
Effective salmon data stewards serve as boundary spanners bridging agencies, Nations, and disciplines serving as community coordinators, convening diverse stakeholders to build sustained communities of practice. This boundary-spanning role is particularly critical in transboundary contexts where data integration requires navigating complex jurisdictional and cultural boundaries (Ward et al., 2025). These responsibilities demand specialized expertise and sustained effort; without dedicated capacity, standards and workflows tend to degrade, and technical debt accumulates. By facilitating communication, translating between different organizational cultures and technical systems, and maintaining long-term relationships, data stewards create the social infrastructure necessary for effective cross-boundary data collaboration.
Effective salmon data stewards perform several critical functions:
Technical oversight: Ensuring metadata completeness, adherence to standardized terminologies and vocabularies, and robust quality assurance protocols.
Social and organizational facilitation: Leading stakeholder engagement, capacity-building activities, and negotiation of data access and sharing agreements, including addressing First Nations, Tribes, and Indigenous Peoples’ rights and interests in data governance.
Organizational advocacy: Championing the organizational recognition of data stewardship roles, promoting sustained investment and dedicated resources for data management infrastructure and practices.
Implementation and adoption facilitation: Actively promoting data use and ensuring that standards and practices remain practical and relevant by maintaining close contact with real-world applications. This includes monitoring data utilization, gathering feedback from users, and iteratively refining standards based on actual implementation challenges to prevent theoretical approaches that fail in practice.
Data stewards can implement FAIR and CARE principles through concrete technical and governance mechanisms they control, such as documenting consent constraints and access levels in metadata, using controlled vocabularies to ensure consistent terminology, and establishing repository roles that enforce data sovereignty requirements. FAIR is sometimes interpreted as “open by default,” but it does not require unrestricted openness; rather, it requires that data and metadata be discoverable and accessible under explicit, machine-readable conditions. CARE’s Authority to Control can therefore place legitimate constraints on access and reuse, particularly when data are held or governed by First Nations, Tribes, and Indigenous Peoples. In these cases, stewardship often means publishing rich, standards-aligned metadata while implementing governed access, consent-aware reuse conditions, and culturally appropriate protocols for the data themselves. For example, stewards can document consent constraints in metadata fields and enforce access restrictions via repository user roles, ensuring that Indigenous data sovereignty is respected while maintaining data discoverability and appropriate reuse (Local Contexts, 2025; Montenegro, 2019). This governance approach is particularly critical for sensitive data such as Traditional Knowledge and sensitive habitat locations, where stewardship practices must balance open science principles with appropriate access controls and cultural protocols.
Data stewards play a critical role bridging the gap between biologists and Information Technology (IT) staff by translating data needs into application or data system features. A user-centred design approach to salmon data stewardship is critical and focuses on creating tools that align with biologists’ needs. When data management is separated from biologists, accountability weakens, and quality issues go unnoticed. While IT expertise is essential for infrastructure and security, effective data system design requires IT to act as an enabler, rather than gatekeeper, provisioning self-serve data infrastructure. The Data Steward, serving as a translator between IT and biologists, enables biologists to engage independently with data systems, fostering ownership and accountability and ultimately improving data quality for research and management.
Dedicated stewardship roles empower salmon biologists to bridge disciplinary divides and jurisdictional barriers, transforming fragmented datasets into cohesive, interoperable resources. By proactively defining, implementing, and maintaining data standards and workflows, salmon data stewards create conditions for timely, accurate, and reproducible analyses. Such stewardship positions salmon biologists to better inform adaptive management decisions, ultimately strengthening salmon conservation and resilience.
Updating Sockeye Productivity Synthesis: A Case Study for What Agencies Could Do Now
This case study revisits a comprehensive Sockeye Salmon Oncorhynchus nerka productivity dataset assembled from several agency sources by academic researchers (Peterman & Dorner, 2012). We reflect not on the significant work the research team accomplished, but rather on the preventable organizational and technical barriers that impeded their work—and continue to burden data compilation updates and reuse efforts today. Their study examined productivity trends across 64 Sockeye Salmon stocks spanning Washington, British Columbia (B.C.), and Alaska and found common trends in declining productivity across the North American range of Sockeye Salmon (Peterman & Dorner, 2012). This work was important for sockeye conservation and management in that it pointed to common, broad-scale marine trends in sockeye salmon productivity, rather than idiosyncratic stock-specific, freshwater effects—an insight not accessible without a monumental data integration effort. Attempting to replicate or build upon this analysis today is an arduous, time-consuming, and error-prone endeavour due to fragmented data sources, inconsistent formats, and lack of standardized practices among the key organizations involved: the Washington Department of Fish and Wildlife (WDFW), Fisheries and Oceans Canada (DFO), and the Alaska Department of Fish and Game (ADF&G).
Each section below highlights a key challenge faced by the team and proposes practical steps based on our best practices (Table 1) that data-holding agencies could do to enable easier integration, validation, and updating of salmon datasets across jurisdictions and decades. This case study illustrates how implementing the foundational concepts and practical recommendations outlined in this paper can transform data stewardship practices within these organizations. By doing so, they can significantly enhance data accessibility, quality, and interoperability—ultimately enabling more efficient and accurate analyses that support salmon conservation and management.
Challenge 1: Interpreting the Data — What do these numbers actually mean?
Peterman’s team frequently worked with datasets that lacked basic contextual information. Fields such as “year,” “return,” or “age class” were often undefined or inconsistently used. For example, some datasets recorded returns by calendar year while others used brood year, and few included metadata to clarify the distinction. In many cases, the team had to reconstruct metadata by back-checking against reports or simulating assumptions (e.g., about age structure) to interpret the data correctly.
Remedies:
Best Practice 3: Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with Persistent Identifiers (PIDs). Assigning PIDs such as digital object identifiers (DOIs) to protocols, methods, and people (via ORCIDs) and linking them together using data stores and catalogues links data to its provenance and ensures that methods, context, and interpretation decisions are traceable.
Best Practice 4. Use Shared Data Models, Vocabularies and Metadata to Enable Integration. To prevent this kind of ambiguity, agencies can now adopt internationally recognized metadata schemas such as ISO 19115 or Ecological Metadata Language, data models (Darwin Core Data Package) to model age and age type data concepts, and use controlled vocabularies to restrict the permissible values in the age field to calendar year, brood year, or otherwise.
Challenge 2: Accessing and Using the Data — Where is it stored, and how do I get it?
The Peterman dataset was compiled from multiple files scattered across email inboxes, regional offices, and grey literature. Data were stored in inconsistent formats, lacked clear versioning, and were difficult to discover outside of specific research networks. Even today, no application programming interface (API) or structured access mechanism exists to update or query the data programmatically. As a result, researchers hoping to build on the dataset may have to start from scratch.
Remedies:
Best Practice 2: Reuse Proven Infrastructure to Save Time and Increase Interoperability
Rather than developing bespoke data catalogues or repositories, agencies should adopt existing catalogues used beyond their own organization such as the Ocean Biodiversity Information System, Zenodo, or the Knowledge Network for Biocomplexity). These are proven platforms with a broad user base that support persistent storage, discoverability, and interoperability.Best Practice 5: Store and Analyze Data in Ways That Others Can Easily Access, Use, and Trust
Agencies can use open-access data repositories or their own organizational data repositories or catalogues that make data discoverable using PIDs and provide programmatic access to data possible using APIs.
Challenge 3: Sustaining the Dataset — Who is responsible, and why should I contribute?
Once Peterman and his team completed their analysis, no formal plan existed for sustaining or updating the dataset. Responsibility for ongoing maintenance fell informally to former students and collaborators. Despite its national and international relevance, the dataset was never adopted by an agency as a living product. Moreover, the original data contributors often lacked incentives, support, or recognition for their efforts—conditions that persist in many data environments today.
Remedies:
Best Practice 1: Make Data Governance Explicit to Support Trust and Reuse Agencies should define roles, responsibilities, and decision-making processes through formal governance mechanisms such as data product charters. Use a Data Management Plan with a responsibility matrix such as “responsible, approver, consulted, informed” (RACI) to clarify governance, assign maintenance responsibility, and ensure continuity across staff turnover and organizational change.
Best Practice 6: Incentivize and Track Data Sharing and Reuse Visibility, credit, and metrics are critical for motivating data sharing. Agencies can embed citation guidance in metadata and track dataset reuse through COUNTER-compliant dashboards or DataCite APIs.
Best Practice 7: Build Community Through Co-Development and Mutual Benefit Effective data stewardship requires collaboration between biologists, First Nations, Tribes, Indigenous communities, managers, and data professionals. Participatory design ensures that systems and standards meet user needs and are adopted over time. Practical application: Facilitate cross-jurisdictional working groups to co-develop data standards and align on shared outcomes for priority datasets.
While the barriers encountered in compiling, interpreting, and maintaining the data made the work considerably more difficult than necessary and hinder efforts by others seeking to extend or build upon this work, they remain instructive. These challenges are not unique to Peterman’s team—they reflect systemic gaps in data governance, documentation, infrastructure, and incentives. By adopting the seven best practices detailed in Table 1, organizations and researchers can transform legacy datasets into living resources, enabling reuse, easing collaboration, and accelerating insight across the salmon research and management community.
The challenges and solutions demonstrated in this salmon case study generalize across fisheries and environmental monitoring domains. Cross-jurisdictional data harmonization, quality assurance and control patterns, standardized metadata requirements, and long-term archiving strategies are universal needs that extend far beyond salmon science. Similar barriers and solutions apply to trawl survey data integration, invertebrate monitoring programs, and water quality datasets that span multiple agencies and jurisdictions.
| Best Practice | Start here |
|---|---|
| 1. Make Data Governance Foundational and Explicit to Establish and Sustain Trust and Reuse. Establishing clear governance structures promotes quality, accountability, and alignment with FAIR and CARE principles enabling trust and long-term stewardship across multi-organization projects. | - Document roles and responsibilities using a Data Product Governance Charter and structured responsibility frameworks. - Integrate CARE principles to respect First Nations, Tribes, and Indigenous data rights. - Form a governance or oversight committee to review data standards, timelines, and agreements. |
| 2. Reuse Proven Infrastructure to Save Time and Increase Interoperability. Leveraging existing platforms and technologies by building on and extending them rather than building bespoke solutions reduces costs and improves long-term interoperability and sustainability. | - Use domain-specific repositories - Publish and archive data with KNB or Zenodo. |
| 3. Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with Persistent Identifiers. Persistent identifiers (PIDs) connect data with researchers, organizations, and outputs—supporting data citation, reuse, and automated attribution. | - Encourage use of ORCIDs for researchers. - Use ROR IDs for organizations. - Assign DOIs via DataCite for data packages. - Embed DOIs in dashboards and metadata. |
| 4. Use Shared Data Models, Vocabularies and Metadata to Enable Integration. Common vocabularies, metadata standards, and ontologies support integration across systems and preserve semantic meaning. | - Adopt ISO 19115, EML, or DataCite metadata standards. - Re-use terms defined in Salmon Domain Ontology. - Model datasets using the Darwin Core Data Package Model. |
| 5. Store and Analyze Data in Ways That Others Can Easily Access, Use and Trust. Structured and accessible data formats ease reusability, and support integration with analytical tools and applications while data analyzed or wrangled using programmatic scripts (R, Python etc.) enable reproducibility and increase trust. | - Provide APIs using FastAPI, Flask, or Django REST. - Archive in trusted repositories (e.g., GBIF, FRDR, USGS). - Write scripts in a programming language to wrangle, transform, and analyze data. - Use GitHub to host code for collaboration and transparency and the GitHub / Zenodo integration for DOI assignment and preservation. |
| 6. Incentivize and Track Data Sharing and Reuse. Recognizing data contributors and tracking reuse promotes a culture of sharing and supports professional recognition. | - License data with CC-BY 4.0. - Include citation text and visible credit fields. - Use COUNTER metrics and DataCite APIs to monitor reuse. - Encourage dataset citation in references. |
| 7. Build Community Through Co-Development and Mutual Benefit. Engaging users early ensures tools and standards meet real-world needs and enhances long-term stewardship. | - Participate in RDA Salmon Interest Group. - Facilitate workshops for metadata and vocabulary alignment. - Support community-engaged research with tangible benefits. |
Metadata governance as a cross-cutting foundation
Unlike the sequential stages of the data lifecycle, metadata governance operates as a continuous, cross-cutting practice that spans all phases simultaneously. While data moves through Plan → Obtain → Process → Preserve → Access → Disposition, metadata governance must be active throughout, ensuring that documentation, quality control, and discoverability are maintained at every stage. This cross-cutting nature means that metadata governance failures at any point can compromise the entire data stewardship effort, making it a critical foundation rather than a discrete step in the process.
The lifecycle mapping in Table 2 reveals that data governance elements appear in every stage: planning metadata requirements (Plan), documenting collection methods (Obtain), structuring and validating metadata (Process), ensuring long-term preservation (Preserve), enabling discovery and access (Access), and managing final disposition (Disposition). This pervasive presence underscores why metadata governance must be treated as an organizational capability rather than a project-specific task, requiring dedicated resources, trained personnel, and systematic processes that operate continuously across all data activities.
How our seven practices align to the data lifecycle model
Our seven best practices map directly to established data lifecycle models, demonstrating their broad applicability beyond salmon science. The NOAA Data Lifecycle provides a widely recognized framework with six sequential stages (Plan, Obtain, Process, Preserve, Access, Disposition) and four cross-cutting elements (Document, Track and Monitor, Quality, Security) (NOAA Data Governance Committee, 2024). This alignment ensures our practices are grounded in established federal data management standards and can be readily adopted by other agencies and research communities.
The mapping shown in Table 2 demonstrates how each practice addresses specific lifecycle stages while the cross-cutting elements ensure comprehensive data stewardship throughout the entire lifecycle. For example, Practice 1 (Data Governance) spans the entire lifecycle from planning through disposition, while Practice 4 (Shared Data Models) primarily supports the Process and Preserve stages. This systematic alignment with established frameworks enhances the credibility and portability of our approach across different domains and organizations.
| Best Practice | Plan | Obtain | Process | Preserve | Access | Disposition | Cross-cutting |
|---|---|---|---|---|---|---|---|
| 1. Data Governance | ● | ● | ● | ● | ● | ● | Document, Quality |
| 2. Reuse Infrastructure | ● | ● | ● | Track and Monitor | |||
| 3. Persistent Identifiers | ● | ● | ● | ● | Document, Track | ||
| 4. Shared Data Models | ● | ● | ● | Quality | |||
| 5. Accessible Storage | ● | ● | ● | Security, Quality | |||
| 6. Incentivize Sharing | ● | ● | ● | Track and Monitor | |||
| 7. Community Building | ● | ● | ● | ● | ● | ● | Document, Quality |
Conclusion
Salmon biologists and data stewards across the globe have generated extensive datasets on salmon abundance, environmental conditions, and biological characteristics. When integrated, these data become valuable assets, a fact powerfully demonstrated by studies such as (Peterman & Dorner, 2012). However, as noted by reports to the Cohen Commission investigating the decline of Fraser River sockeye salmon (Marmorek et al., 2011), these data are often incomplete, inconsistently collected, and fragmented across organizations and jurisdictions. Integrating across such diverse sources can be done but requires effort that is often not accounted for in smaller-scale studies or perpetually underestimated in long-term ongoing data projects. This fragmentation hinders a deeper understanding of the drivers of change across salmon life stages and regions, and limits the effectiveness of management decisions, particularly concerning in the face of anthropic climate change and biodiversity loss.
We argue that many persistent bottlenecks in salmon analysis and management are fundamentally data problems, and that coordinated data stewardship offers one of the most impactful, near-term opportunities for improvement. By adopting shared best practices in data governance, metadata standardization, persistent identification, infrastructure reuse, and community co-development we can radically improve the transparency, reusability, and interoperability of salmon data. A coordinated, future-oriented data stewardship strategy can leverage the full potential of data in science and management. The case study presented in this paper—drawn from one of the Pacific Region’s most influential salmon survival syntheses (Peterman & Dorner, 2012)—illustrates how technical and social data management gaps directly obstructed efforts to answer pressing questions. If some of the best practices we propose had been adopted by the data producers substantial time and resources could have been saved. The case study offers a clear and cautionary tale, as well as a hopeful roadmap. Because data practices underpin nearly every analysis and decision, strengthening data stewardship is one of the most effective ways salmon programs can improve outcomes within their control.
The emergence of the data stewardship role (Plotkin, 2014) represents one of the most critical organizational shifts needed to realize this vision. Historically, the work of managing, documenting, and maintaining data has been diffuse and undervalued—often falling to biologists without support, training, or recognition. As the volume and complexity of scientific data grow, so too does the need for clearly defined data stewardship responsibilities embedded within research teams and organizations. Training biologists in the principles and practices of data stewardship—while also supporting dedicated professionals who specialize in this work—is essential to sustaining trustworthy, reusable, and interoperable salmon data systems.
Realizing this vision requires concrete organizational commitments organizations should formally appoint dedicated data stewards with clear roles, responsibilities, and reporting structures. Agencies can adopt centralized metadata repositories and establish compliance metrics to track progress toward FAIR and CARE principles. Key implementation steps include: (1) designating stewardship roles within existing organizational structures, (2) investing in metadata management infrastructure, (3) establishing data governance committees with cross-organization representation, and (4) developing performance indicators that measure data discoverability, interoperability, and reuse. These organizational changes ensure that data stewardship becomes embedded in organizational culture rather than remaining an ad hoc responsibility.
The visionary future state is one where salmon researchers and stewards—across agencies, Indigenous Nations, academic laboratories, and community groups—can easily access and contribute to well-documented, versioned, and machine-readable datasets. In this future, field biologists, Indigenous guardians, modellers, and policymakers interact with a living knowledge system—one that is flexible, easy to implement, and rooted in principles of FAIRness and Indigenous Data Sovereignty. Metadata standards, controlled vocabularies, and shared governance frameworks are not afterthoughts but integral to the culture of data collection and use. Scientists receive credit for publishing high-quality data, and users trust the provenance and structure of the datasets they rely on to make critical management decisions.
Realizing this vision will require investment in both people and systems. Key to this transformation is the emergence of the data steward as a professional role: a hybrid expert who understands operational field biology, information science, governance protocols, and community needs. As highlighted by Roche et al. (2020), institutionalizing data stewardship roles ensures long-term capacity for data governance, quality control, and interoperability—functions that are often neglected or left to informal actors. We must not only train new data stewards but also support and upskill biologists to take on stewardship responsibilities in collaborative, interdisciplinary settings. This is essential to address the data technical debt of unmanaged data and to modernize research practices in line with open science norms. By embedding these practices into the everyday work of data generation, documentation, publication, and reuse, we can move salmon science decisively into the era of data-intensive discovery.
To do items:
- Discuss differences between data management plans, data governance charters, data sharing agreements
- Incorporate ref to Streamnet Data Exchange Standards somehow
- Add in figures
- fill out appendix 1 more thoroughly
- Refine Reorg the content in appendix 2 (traning roadmap) and decide if it Makes sense to put some of that content into a 3rd column in table 1
- Consider removing para 2 (Data Science adjancency) in Data Stewardship Defn
- Consider re-working table to replace practical applications column with ‘next step’
- Incorporate edits to Appendices from PSF folks
- Incorporate edits from Gottfried Pestal
- Update with links to DFO Salmon Ontology and Data Package Spec and Custom GPT
- Figure out where to cite https://doi.org/10.1139/cjfas-2024-0387 Atkinson et al 2025
- Cite Price et al 2025
Acknowledgements
Conflicts of Interest
None declared.
Ethics Statement
This manuscript does not involve primary research with human or animal subjects. All cited research was conducted in accordance with applicable ethical guidelines and legal requirements of the jurisdictions in which the studies were performed.
Funding
This work was supported in part by the Pacific Salmon Foundation and Fisheries and Oceans Canada.
References
Appendices
Appendix 1. Real-world Example Applications of the Best Practices
Here we provide detailed descriptions of the seven best practices for salmon data stewardship, along with practical applications and real-world examples. This is not an exhaustive list, but rather a starting point for salmon biologists and data stewards to implement effective data stewardship practices in their work based on examples from the salmon research and management community.
1. Make Data Governance Explicit to Support Trust and Reuse
Clear governance defines roles, responsibilities, and procedures ensuring data quality, long-term maintenance, accountability, and compliance with community principles such as FAIR and CARE. Effective governance fosters trust, and facilitate data sharing, reduces ambiguity regarding decision making, and is critical for coordinating both technical and sociocultural aspects of data stewardship.
In collaborative international or multi-organizational settings, establishing governance at the outset of a project is crucial for aligning diverse groups, including biologists, data managers, Indigenous communities, policymakers, and other participants. Early governance planning should establish clear, collaborative frameworks that respect each group’s expertise and needs from the beginning.
Practical Applications:
1.1 Document roles and responsibilities clearly at project start using a Project or Data Product Governance Charter and structured frameworks (e.g., DACI or RACI charts) that relate to organizational data policies.
Example of a Data Management Plan from the California Department of Water Resources
Data Management Plan Templates from DMPTool, and NOAA Data Management Handbook
1.2 Integrate CARE principles to ensure ethical governance and respect Indigenous data rights.
- Northwest Indian Fisheries Commission use password protected website to host all the WDFW and tribal data in a one-stop shopping website for co-managers to pull data they need for decision-making process. https://fisheriesservices.nwifc.org/
1.3 Create a governance or oversight committee for regular data practice reviews and decision making regarding data structures, timelines, data sharing agreements and interoperability protocols
- Pacific Salmon Commission has formed a Technical Committee on Data Sharing including both US and Canadian data contributors. https://www.psc.org/membership-lists/
2. Reuse Proven Infrastructure to Save Time and Increase Interoperability
Building custom solutions should be avoided where possible. Maximizing existing platforms and technologies reduces costs, accelerates implementation, and increases data interoperability. Building modular, interoperable systems grounded in proven technologies ensures sustainable long-term stewardship.
Practical Applications:
2.2 Use free data catalogue services such as the Knowledge Network for Biocomplexity (KNB) or Zenodo
- The Pacific Salmon Foundation’s spawner surveys dataset on Zenodo (Carturan and Peacock 2025) received more views within weeks than a analogous dataset in the Salmon Data Library (Pacific Salmon Foundation 2025) did over several years, illustrating that leveraging established public data infrastructures, rather than developing institution-specific ones, can substantially increase discoverability.
3. Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with PIDs
Persistent identifiers (PIDs), including Digital Object Identifiers (DOI) are essential for tracking the provenance and reuse of data, and linking data, protocols, organizations and people. They allow for consistent referencing, integration across systems, and automated credit via data citation.
Practical Applications:
3.1 Encourage researchers to register for an Open Researcher and Contributor ID (ORCID) and include ORCIDs in metadata records and submission forms
3.2 Register your organization with the Research Organization Registry (ROR) and use ROR IDs to identify institutions involved in salmon science.
- Several salmon data holding institutions are already registered with ROR. As a result, those organizations can track and demonstrate their scholarly impact from data publications: DataCite Commons: Pacific Salmon Foundation
3.3 Assign DOIs to data packages, protocols, and reports using DataCite. Maintain version history for all metadata records and document the provenance of metadata creation, updates, and quality control processes to ensure accountability and traceability.
- The North Pacific Anadromous FIsh Commission (NPAFC) assigns DOIs to IYS-related data packages which are served by a CKAN catalogue at https://data.npafc.org. The Commission also assigns DOIs to NPAFC Technical Reports and Bulletins.
- The Pacific Salmon Foundation’s State of Salmon report (stateofsalmon.ca) has unique DOIs for each version, improving specificity and citability of each year’s results.
3.4 Embed DOIs in dashboards, figures, and metadata so they persist in derivative products.
Best Practice 5: Store and Analyze Data in Ways That Others Can Easily Access, Use, and Trust
Making data easily accessible promotes its use in research and management, enabling seamless integration with tools and applications. Ensuring accessible, persistent data storage requires more than just file hosting. Data should be structured, accessible via API, and stored in repositories that support long-term preservation.
Practical Applications:
5.1 Provide Direct Data Access via Application Programming Interfaces (APIs) using tools such as FastAPI, Flask, or Django REST Framework that allows users to access, filter, and retrieve data programmatically, facilitating automation and integration into analytical tools and decision-support systems
- The Pacific States Marine Fisheries Commission make’s their PIT Tag Information System data accessible via the PTAGIS API
5.2 Archive data in certified long-term, domain-specific repositories such as the Global Biodiversity Information Facility, the Federated Research Data Repository (FRDR), or NOAA’s NCEI, USGS ScienceBase, or EMODnet
- TODO
5.3 Leverage the integration between GitHub and Zenodo to automate archiving and DOI assignment, ensuring long-term data preservation.
- The Pacific Salmon Foundation conducted an extensive cleaning of DFO’s New Salmon Escapement Database System (NuSEDS) to maximize data recovery. The R code for this procedure is available on GitHub, while the resulting dataset is archived on Zenodo, with accompanying R Markdown documentation linked on both platforms. The code and dataset are updated in parallel with NuSEDS, and each dataset version is assigned a DOI to ensure proper citation and traceability.
6. Incentivize and Track Data Sharing and Reuse
The currency of research lies in recognition—credit, citations, and opportunities for collaboration or co-authorship. Promoting data sharing requires both cultural and technical infrastructure. The cultural infrastructure requires a shift towards viewing data publication as equal in importance to article publication. The infrastructure put in place needs to support the process of generating citation records that give credit to all First Nations, Tribes, agencies, and organizations. By recognizing contributions, tracking reuse, and supporting citation, data stewards can create a system where sharing is rewarded.
Practical Applications:
6.1 License data for reuse using liberal licenses
- All data accessible through the NPAFC data catalogue is licenced as Creative Commons Attribution 4.0 International
6.2 Provide recommended citation text and visible credit fields in metadata
6.3 Create summary dashboards that highlight reuse using COUNTER Code of Practice compliant metrics to track dataset views/downloads and the DataCite APIs
6.4 Ensure that datasets are properly cited in journal articles using in text citations and the recommended citation in the articles list of references, not just in a Data Availability statement
- In late 2024, the NPAFC began citing data sets using in-text citations and the recommended citation in the list of references with the publication of NPAFC Bulletin 7 titled, Highlights of the 2022 International Year of the Salmon Pan–Pacific Winter Expedition.
- The cleaned NuSEDS dataset (see Section 5.3) was used in an analysis and cited in the associated publication (Atkinson et al., 2025) using the DOI and citation provided with the dataset on Zenodo.
6.5 Promote the view that well documented data publications are primary research outputs and are significant contributions to the field
7. Build Community Through Co-Development and Mutual Benefit
Creating an infrastructure that standardizes and provides cross-border and cross-ecosystem data integration is only effective if there’s community engagement. Standards and tools must be co-developed with their intended users using user-centred design principles (citation required) to be effective. Engaging biologists, Indigenous stewards, and data managers ensures relevance, usability, and long-term participation.
Practical Applications:
7.1 Participate in salmon data focused communities such as the Research Data Alliance’s Salmon Research and Monitoring Interest Group
7.2 Run participatory workshops for metadata mapping and vocabulary alignment
- American Fisheries Society 2025 WA-BC Chapter Annual Meeting workshop. ’Fishing for Clarity: Knowledge Modelling to Support Cross-organizational Collaboration and Data Sharing about Salmon Escapement
7.3 Support and follow through on Community Engaged Research (e.g. The Salmon Prize Project) that provides tangible value to the communities in which research or monitoring was conducted.
Appendix 2: Training Roadmap for Salmon Biologists Transitioning to Data Stewardship
This roadmap outlines essential topics, resources, and learning materials salmon biologists should engage with to effectively transition into data stewardship roles. The roadmap follows a structured progression similar to roadmap.sh.
1. Foundations of Data Stewardship
- Principles:
- Seminal Papers:
- Wilkinson et al. 2016 FAIR Guiding Principles
- Carroll et al. 2019 Indigenous Data Governance
- Courses and Tutorials:
2. Data Management & Governance
- Seminal Papers and Reports:
- Plotkin, 2014 Data Stewardship
- NOAA, 2007 Environmental Data Management
- Practical Tools:
- Data Management Plan Templates (DMPTool)
- DACI and RACI Frameworks (Atlassian DACI Guide)
3. Metadata Standards and Ontologies
- Standards to Master:
- Case Studies & Examples:
4. Controlled Vocabularies & Persistent Identifiers (PIDs)
- PIDs to Implement:
- Practical Guides:
5. Data Integration & Interoperability
- Seminal Papers:
- Johnson & Stap, 2024 Salmon Ocean Observing System
- Bull et al. 2022 Likely Suspects Framework
- Technical Skills & Tools:
- APIs with FastAPI, Flask, Django REST Framework
- Zenodo-GitHub Integration
6. Data Sharing, Citation & Metrics
- Best Practices:
- Tracking & Metrics:
7. Community Engagement & Co-Development
- Communities & Groups:
- Approaches & Frameworks:
- User-centered Design (Interaction Design Foundation)
- Community Engaged Research (University of Victoria Guide)
Additional Resources
- Free Courses:
- Blogs & Websites:
This roadmap serves as a structured guide to equip salmon biologists with the practical and theoretical skills required to excel in data stewardship roles.
Appendix 3: Getting Started Checklist
Use this practical checklist to assess how well your project, program, or organization aligns to the seven Best Practices. Start at the Project level, then scale to Program and Organization. Check off items you’ve completed and note gaps to prioritize.
Tip: For each item, capture a link to the living source (e.g., repository, shared drive, policy page) and the responsible owner.
Practice 1 — Make Data Governance Explicit
Project
- [ ] Do you have a Data Management Plan (DMP) covering scope, sensitive data, retention, and sharing? (link)
- [ ] Is there a RACI (Responsible, Accountable, Consulted, Informed) table for key tasks? (owner)
- [ ] Are Indigenous knoweledge holders or community members involved in the project?
- [ ] Are Indigenous Data Sovereignty (IDS) requirements identified and documented (who to consult, approvals needed)?
- [ ] Is a data product charter written for each dataset or analysis product with purpose, audience, quality thresholds, release plan?
Program
- [ ] Are DMP and charter templates standardized across projects and stored centrally?
- [ ] Are role definitions for Data Steward, Product Owner, and Maintainer explicit and assigned for priority datasets?
- [ ] Is this 'community-engaged' research that provides tangible benefit to communities?
- [ ] Are data sharing agreements/MOUs and ethical review pathways documented and reusable?
Organization
- [ ] Does a governance policy exist that sets minimum requirements for DMPs, RACI, retention, IDS, and release reviews?
- [ ] Is there a standing review forum (e.g., monthly data governance check‑in) and a registry of governed data products?
Evidence to collect: DMP link, data product charter(s), RACI, IDS guidance, sharing agreements registry.
Practice 2 — Reuse Proven Infrastructure
Project
- [ ] Have you researched the existing data sharing infrastructure and data storage options specific to your data and context? Eg. Ocean Biodiversity Information System, Global Biodiversity Information Facility, Knowledge Network for Biocomplexity, Zenodo, Dataverse, etc.
- [ ] Is your code in version control (e.g., Git) with an issue tracker and releases?
- [ ] Are you using an approved repository or data store rather than creating a new silo? (where)
- [ ] Do you use existing organization authentication/authorization and backup processes?
Program
- [ ] Is there a preferred stack list (storage, metadata catalog, workflow runner, packaging, container base images)?
- [ ] Do projects consistently deposit finalized data in approved repositories with clear intake criteria?
Organization
- [ ] Are enterprise services available and documented (data lake, object store, catalog/portal, archival repository)?
- [ ] Is there a deprecation pathway for legacy systems and a migration plan for priority datasets?
Evidence to collect: repository URLs, infrastructure inventory, intake criteria, backup/DR documentation.
Practice 3 — Use Persistent Identifiers (PIDs) for People, Projects, Data, and Methods
Project
- [ ] Do all contributors have ORCID IDs recorded in metadata?
- [ ] Does the project have a resolvable PID (e.g., DOI for a project page or protocol, internal project ID)?
- [ ] Are datasets assigned DOIs (or other PIDs) at publication, and are versions tracked?
- [ ] Are methods/protocols published and citable (e.g., protocol DOI) and linked from dataset metadata?
Program
- [ ] Is there guidance on when to mint PIDs, by whom, and where they resolve?
- [ ] Are projects linked to organizational identifiers (e.g., ROR for institutions) in metadata?
Organization
- [ ] Is there a PID policy and a provider/registrar configured (e.g., DataCite) with a documented workflow?
- [ ] Are PID linkages automated in the catalog (people ↔ projects ↔ datasets ↔ publications)?
Evidence to collect: ORCID list, PID policy, DOI records, resolver links in the catalog.
Practice 5 — Store and Analyze Data for Easy Access, Use, and Trust
Project
- [ ] Is raw data immutable and separated from processed/analysis outputs?
- [ ] Is there a fully reproducible workflow (scripts/notebooks + environment + parameters) that runs end‑to‑end?
- [ ] Is the computational environment captured (lockfile/conda env, container image) and versioned?
- [ ] Are QA/QC checks automated with logs and thresholds documented?
- [ ] Are access controls and sensitive data handling documented and implemented?
Program
- [ ] Do projects follow a common repo layout and release process (tags, changelog, signed artifacts)?
- [ ] Are standard storage classes, lifecycle policies, and archival rules applied?
Organization
- [ ] Are security, backup/retention, and audit requirements defined and routinely verified?
- [ ] Is there a trusted long‑term archive with fixity checking and preservation metadata?
Evidence to collect: workflow definition, environment files, container references, QA/QC reports, storage/backup settings.
Practice 6 — Incentivize and Track Sharing and Reuse
Project
- [ ] Is a clear citation and license statement included in metadata and README?
- [ ] Are reuse metrics collected (downloads, citations, API hits) and reviewed?
- [ ] Do release notes document what changed and implications for users?
Program
- [ ] Are common metrics dashboards available for priority datasets and updated automatically?
- [ ] Are data citations tracked in assessments, reports, and staff evaluations?
Organization
- [ ] Do policies require citation guidance and permissive, appropriate licensing where possible?
- [ ] Are automated reports of reuse (e.g., via DOI provider APIs) delivered to product owners and leadership?
Evidence to collect: LICENSE, CITATION, reuse dashboard link, policy excerpts, sample citations in reports.
Practice 7 — Build Community Through Co‑Development and Mutual Benefit
Project
- [ ] Are stakeholders identified, including First Nations/Tribes/Indigenous partners, and engagement needs documented?
- [ ] Have you held at least one co‑design session to validate user needs and success criteria?
- [ ] Is there an open feedback channel (issues form, contact) and a published roadmap?
Program
- [ ] Do cross‑project working groups exist for models, vocabularies, and tooling with regular cadence and notes?
- [ ] Are community contributions recognized (authorship, acknowledgements, meeting time, funding)?
Organization
- [ ] Is there an endorsed governance body or community of practice with decision records?
- [ ] Are procurement/funding mechanisms available to support shared components and Indigenous partnerships?
Evidence to collect: stakeholder map, engagement records, roadmap, working group notes, decision log.
Quick Start: 30/60/90‑Day Plan
- First 30 days
- By 60 days
- By 90 days
Minimal Artifacts Checklist (Project Level)
Maintain this list as a living issue in your repository and review quarterly.