Keywords: Salmon data stewardship, data interoperability, FAIR principles, persistent identifiers (PIDs), controlled vocabularies, metadata standards, application programming Interface (API), data citation, ontology development
Abstract
Fisheries management, research, and broader conservation efforts increasingly generate vast, diverse datasets needed for timely decision-making. Yet these data remain fragmented across jurisdictions, disciplines, and legacy infrastructure, limiting responsive management. The lack of access, interoperability, and reuse of data are major impediments to better science; data stewardship—the set of social and technical practices that ensure data are responsibly created, managed, shared, and reused over time—is the solution. Biologists are increasingly taking on data stewardship responsibilities to address these challenges, often without clear guidance, training, or support. Shared, community-agreed practices for implementing domain-specific data standards are needed to move beyond generic data management guidance toward fit-for-purpose tools and workflows. To address this gap—and to show how other communities can do so—we develop seven practices for salmon data stewardship. Through a case study of a historically important sockeye salmon productivity analysis that spanned Pacific Coast jurisdictions, we highlight how clearly defined data stewardship practices could improve data integration, management efficacy, and reduce the overall cost of updating the analysis annually. We provide practical guidance for those transitioning into these essential stewardship roles, including salmon domain specific data tools, templates, and examples. These practices integrate both sociocultural and technical approaches to ensure data meet modern open science principles and respect Indigenous Data Sovereignty. With a foundation of shared practices, data stewards will enable faster, more transparent decision-making. They will expand the reuse of cross-jurisdictional datasets and support machine-actionable data with high-quality metadata and consistent semantics. Together, these outcomes strengthen the management and conservation of salmon populations and the ecosystems they inhabit, with broader relevance to other data-rich fisheries domains.
The Data Stewardship Challenge
Effective fisheries management relies on robust, accessible data on abundance, health, and environmental conditions integrated over large spatial areas and fine temporal scales. Yet across many fisheries, data remain fragmented, inconsistently measured, and incomplete across time, space, and life history stages, which limits their value for research, hypothesis testing, and decision-making Marmorek et al. (2011); Inman et al. (2021); Diack et al. (2024). Collections are often managed by diverse agencies and organizations in isolation, undermining timely, integrated analyses necessary for adaptive management. The absence of organizational support and dedicated roles for data stewardship further compounds this problem, with critical tasks often treated as ad hoc responsibilities rather than core management functions. These issues limit the utility of these data for research and management.
Fragmented Landscapes
For salmon in particular, the integration and mobilization of salmon data mirrors the complexity of their life cycle, life histories, and migrations across ecological regions (Groot and Margolis 1991) and jurisdictional boundaries. Salmon biologists routinely collect information managed by diverse agencies and organizations, often isolated within their operational context and without a focus on interoperability. While localized successes in data coordination exist—particularly within regional fisheries management offices and treaty commissions (“Fraser River Panel Data Application” 2025)—salmon data integrated across agencies for each phase of the salmon life cycle is uncommon and costly for most programs. Even within organizations, data can be siloed by data type with freshwater, estuary, open-ocean, and commercial fishery data each housed in separate data systems with limited ability to easily re-connect the data through shared identifiers. The result is a highly fragmented data landscape that spans agencies, organizations, and monitoring programs, making integration especially difficult. This lack of coordinated stewardship has become a significant impediment for adaptive salmon management and conservation, preventing the full use of existing data to inform timely and effective decisions.
This fragmentation is compounded by the number of scientific disciplines involved in piecing together a comprehensive understanding of individual salmon stocks. Geneticists, oceanographers, freshwater ecologists, stock assessment biologists, and fisheries managers are just a sample of disciplines that contribute data to piece together a complete life history for a salmon population. Each uses their own domain-specific conventions and workflows, standards, and metadata. This tangle of disciplinary and organizational fragmentation slows integration, hinders reproducibility, and delays analyses that could otherwise inform time-sensitive management decisions, conservation actions, or restoration plans. Large volumes of data collected through long-term monitoring programs hold tremendous value, especially for secondary users—but are often inaccessible due to a lack of time, resources, and incentives for data producers to publish them (LINDENMAYER et al. 2012). When critical datasets are hard to find, access, or interpret, biologists and analysts lose valuable time trying to reconstruct or harmonize them. This reduces transparency, increases the risk of errors, delays urgent conservation or management responses, which in turn undermines the public trust in both the science and the resulting management decisions. Modernizing these systems will require coordinated communities of practices, shared practices, and shifting collective expectations and culture towards shared international data standards and stewardship practices that accommodate the full disciplinary and geographic diversity of salmon science.
Coordinated Data Stewardship
Data stewardship offers to face the growing challenges in salmon management from escalating environmental uncertainties due to climate change which demands rapid, regionally integrated, and robust data Ward et al. (2025). The mismatch between fragmented data systems and fixed administrative and jurisdictional boundaries creates an urgent need for interoperable, dynamic, multi-scale data stewardship that can adapt to shifting ecological and management priorities. Despite the scale and importance of these datasets, biologists who collect and manage salmon data are ofteno expected to act as de facto data stewards without training, guidance, incentives, organizational support, or access to community-agreed best practices. Tasks such as documenting methods, aligning terminology, formatting for data sharing, and publishing data are typically performed off the side of a biologist’s desk. Done well, data stewardship is skilled, labor-intensive work—combining domain expertise, information science, and relationship-building—and it requires dedicated time and institutional support rather than ad hoc effort. A lack of organizational support (Diack et al. 2024), training (Volk, Lucero, and Barnas 2014), and dedicated roles for data management further relegate critical data stewardship tasks to an ad hoc status. Deferred documentation, governance, and standardization tasks accumulate as data technical debt—work left undone that compounds over time, making integration and reanalysis progressively slower and more error-prone. The absence of clear roles, standards, and community-endorsed practices leaves even motivated scientists unsure how to structure their data for future use. As a result, data stewardship is inconsistent and reactive, and data integration remains a major bottleneck to adaptive management and ecosystem-scale learning.
Due to a lack of practical resources, both salmon biologists struggle to align their data with community-agreed principles such as FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. 2016) and Indigenous Data Sovereignty frameworks like the CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics) (Carroll, Rodriguez-Lonebear, and Martinez 2019; Jennings et al. 2023). Adhering to CARE data management principles is all the more important when it comes to salmon related data given the sociocultural importance of salmon to First Nations, Tribes, and Indigenous communities throughout the North Pacific and North Atlantic regions (Ween and Colombi 2013; Earth Economics 2021). While both FAIR and CARE present laudable goals with respect to data stewardship, these sets of principles can sometimes be at odds with each other and implementing both presents an inherent challenge. Nonetheless, several of the best practices we propose can help clarify shared objectives and increase transparency and trust that is essential for integrating both FAIR and CARE principles. Without clear support and guidance, well-intentioned practitioners are left with ad hoc approaches that limit reuse and interoperability. This gap can only be bridged by equipping both data producers and stewards with tools, support, and organizational backing to publish interoperable, machine-readable metadata and datasets in alignment with shared principles.
Framework for Action
In this paper, we provide a framework for translating data stewardship principles into actionable practices, examples, and workflows to help salmon biologists improve the usability, traceability, and long-term impact of their data. We offer seven practices for salmon data stewardship Table 1 and use a retrospective, real-world case study of cross-jurisdictional sockeye productivity synthesis (Peterman and Dorner 2012)to illustrate how the absence of shared practices made integration time-consuming and keeps annual updates costly and difficult to this day. To support adoption, we provide real-world examples from multiple organizations (Appendix 1), a training roadmap for biologists taking on stewardship responsibilities (Appendix 2), and a getting-started checklist (Appendix 3). We also map the seven practices to widely used data-lifecycle model Table 2 to make adaptation straightforward outside salmon contexts. Although salmon provide the worked example, the approach is intended as a transferable blueprint: other scientific communities can substitute their own vocabularies, repositories, governance structures, and incentives to turn data stewardship principles in to practical applications.
A coordinated approach to stewarding salmon data should follow established open science standards and principles (Johnson and Stap 2024; Wilkinson et al. 2016; Carroll et al. 2020), tailored specifically within the context of salmon research and management. Our practices build upon and extend existing data and metadata standards and infrastructure including Darwin Core, OBIS, schema.org, and OBO Foundry ontologies, ensuring compatibility with broader biodiversity informatics infrastructure while avoiding reinventing foundational frameworks. Achieving meaningful interoperability of data among programs, organizations and scientific domains demands both breadth and depth. Broad interoperability integrates diverse scientific domains, systems, and formats, requiring structured, machine-readable data and metadata published openly for maximum discoverability. Deep interoperability demands precise definitions of domain-specific terms and methods, ensuring data remains meaningful and usable across contexts. Salmon data stewards can improve outcomes for salmon by coordinating across boundaries to develop a shared foundation of data stewardship practices. To address these foundational challenges, we must establish clear data stewardship roles and practices that span the entire data lifecycle and salmon lifecycle—from collection and documentation through integration, long-term preservation, and reuse and from gravel to estuary and the high seas back to the gravel.
Defining Data Stewardship in Salmon Science
Data stewardship encompasses the coordinated practices, roles, and responsibilities necessary to effectively manage, share, and reuse data throughout its lifecycle (NOAA 2007; Plotkin 2014; Peng et al. 2018). It includes ensuring data quality, compliance with agreed-upon standards, and the establishment of clear governance to guide data collection, documentation, integration, and preservation. However, given the complexity of the salmon data landscape, stewardship goes beyond mere technical data management to include actively facilitating collaboration, communication, and consensus-building among data producers and users across multiple organizations and jurisdictions.
Data stewardship represents a critical sub-discipline within the broader field of data science. While data science is often narrowly associated with machine learning and statistical modeling, we adopt a more comprehensive view that encompasses how we treat, handle, and represent data, along with the social and technical information systems that enable data-intensive science. Data stewardship focuses on the practical implementation of these principles—ensuring that data infrastructure, standards, and practices actually serve scientific and management needs rather than remaining theoretical constructs.
Effective salmon data stewards serve as boundary spanners bridging agencies, Nations, and disciplines serving as community coordinators, convening diverse stakeholders to build sustained communities of practice. This boundary-spanning role is particularly critical in transboundary contexts where data integration requires navigating complex jurisdictional and cultural boundaries (Ward et al. 2025). These responsibilities demand specialized expertise and sustained effort; without dedicated capacity, standards and workflows tend to degrade, and technical debt accumulates. By facilitating communication, translating between different organizational cultures and technical systems, and maintaining long-term relationships, data stewards create the social infrastructure necessary for effective cross-boundary data collaboration.
Effective salmon data stewards perform several critical functions:
Technical oversight: Ensuring metadata completeness, adherence to standardized terminologies and vocabularies, and robust quality assurance protocols.
Social and organizational facilitation: Leading stakeholder engagement, capacity-building activities, and negotiation of data access and sharing agreements, including addressing First Nations, Tribes, and Indigenous Peoples’ rights and interests in data governance.
Organizational advocacy: Championing the organizational recognition of data stewardship roles, promoting sustained investment and dedicated resources for data management infrastructure and practices.
Implementation and adoption facilitation: Actively promoting data use and ensuring that standards and practices remain practical and relevant by maintaining close contact with real-world applications. This includes monitoring data utilization, gathering feedback from users, and iteratively refining standards based on actual implementation challenges to prevent theoretical approaches that fail in practice.
Data stewards can implement FAIR and CARE principles through concrete technical and governance mechanisms they control, such as documenting consent constraints and access levels in metadata, using controlled vocabularies to ensure consistent terminology, and establishing repository roles that enforce data sovereignty requirements. FAIR is sometimes interpreted as “open by default,” but it does not require unrestricted openness; rather, it requires that data and metadata be discoverable and accessible under explicit, machine-readable conditions. CARE’s Authority to Control can therefore place legitimate constraints on access and reuse, particularly when data are held or governed by First Nations, Tribes, and Indigenous Peoples. In these cases, stewardship often means publishing rich, standards-aligned metadata while implementing governed access, consent-aware reuse conditions, and culturally appropriate protocols for the data themselves. For example, stewards can document consent constraints in metadata fields and enforce access restrictions via repository user roles, ensuring that Indigenous data sovereignty is respected while maintaining data discoverability and appropriate reuse (Montenegro 2019; Local Contexts 2025). This governance approach is particularly critical for sensitive data such as Traditional Knowledge and sensitive habitat locations, where stewardship practices must balance open science principles with appropriate access controls and cultural protocols.
Data stewards play a critical role bridging the gap between biologists and Information Technology (IT) staff by translating data needs into application or data system features. A user-centred design approach to salmon data stewardship is critical and focuses on creating tools that align with biologists’ needs. When data management is separated from biologists, accountability weakens, and quality issues go unnoticed. While IT expertise is essential for infrastructure and security, effective data system design requires IT to act as an enabler, rather than gatekeeper, provisioning self-serve data infrastructure. The Data Steward, serving as a translator between IT and biologists, enables biologists to engage independently with data systems, fostering ownership and accountability and ultimately improving data quality for research and management.
Dedicated stewardship roles empower salmon biologists to bridge disciplinary divides and jurisdictional barriers, transforming fragmented datasets into cohesive, interoperable resources. By proactively defining, implementing, and maintaining data standards and workflows, salmon data stewards create conditions for timely, accurate, and reproducible analyses. Such stewardship positions salmon biologists to better inform adaptive management decisions, ultimately strengthening salmon conservation and resilience.
Updating Sockeye Productivity Synthesis: A Case Study for What Agencies Could Do Now
This case study revisits a comprehensive sockeye productivity dataset assembled from several agency sources by academic researchers (Peterman and Dorner 2012). We reflect not on the significant work the research team accomplished, but rather on the preventable organizational and technical barriers that impeded their work—and continue to burden data compilation updates and reuse efforts today. Their study examined productivity trends across 64 sockeye salmon stocks spanning Washington, British Columbia (B.C.), and Alaska and found common trends in declining productivity across the North American range of sockeye salmon (Peterman and Dorner 2012). This work was important for sockeye conservation and management in that it pointed to common, broad-scale marine trends in sockeye salmon productivity, rather than idiosyncratic stock-specific, freshwater effects—an insight not accessible without a monumental data integration effort. Attempting to replicate or build upon this analysis today is an arduous, time-consuming, and error-prone endeavour due to fragmented data sources, inconsistent formats, and lack of standardized practices among the key organizations involved: the Washington Department of Fish and Wildlife (WDFW), Fisheries and Oceans Canada (DFO), and the Alaska Department of Fish and Game (ADF&G).
Each section below highlights a key challenge faced by the team and proposes practical steps based on our best practices (Table 1) that data-holding agencies could do to enable easier integration, validation, and updating of salmon datasets across jurisdictions and decades. This case study illustrates how implementing the foundational concepts and practical recommendations outlined in this paper can transform data stewardship practices within these organizations. By doing so, they can significantly enhance data accessibility, quality, and interoperability—ultimately enabling more efficient and accurate analyses that support salmon conservation and management.
Challenge 1: Interpreting the Data — What do these numbers actually mean?
Peterman’s team frequently worked with datasets that lacked basic contextual information. Fields such as “year,” “return,” or “age class” were often undefined or inconsistently used. For example, some datasets recorded returns by calendar year while others used brood year, and few included metadata to clarify the distinction. In many cases, the team had to reconstruct metadata by back-checking against reports or simulating assumptions (e.g., about age structure) to interpret the data correctly.
Remedies:
Best Practice 3: Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with Persistent Identifiers (PIDs). Assigning PIDs such as digital object identifiers (DOIs) to protocols, methods, and people (via ORCIDs) and linking them together using data stores and catalogues links data to its provenance and ensures that methods, context, and interpretation decisions are traceable.
Best Practice 4. Use Shared Data Models, Vocabularies and Metadata to Enable Integration. To prevent this kind of ambiguity, agencies can now adopt internationally recognized metadata schemas such as ISO 19115 or Ecological Metadata Language, data models (Darwin Core Data Package) to model age and age type data concepts, and use controlled vocabularies to restrict the permissible values in the age field to calendar year, brood year, or otherwise.
Challenge 2: Accessing and Using the Data — Where is it stored, and how do I get it?
The Peterman dataset was compiled from multiple files scattered across email inboxes, regional offices, and grey literature. Data were stored in inconsistent formats, lacked clear versioning, and were difficult to discover outside of specific research networks. Even today, no application programming interface (API) or structured access mechanism exists to update or query the data programmatically. As a result, researchers hoping to build on the dataset may have to start from scratch.
Remedies:
Best Practice 2: Reuse Proven Infrastructure to Save Time and Increase Interoperability
Rather than developing bespoke data catalogues or repositories, agencies should adopt existing catalogues used beyond their own organization such as the Ocean Biodiversity Information System, Zenodo, or the Knowledge Network for Biocomplexity). These are proven platforms with a broad user base that support persistent storage, discoverability, and interoperability.Best Practice 5: Store and Analyze Data in Ways That Others Can Easily Access, Use, and Trust
Agencies can use open-access data repositories or their own organizational data repositories or catalogues that make data discoverable using PIDs and provide programmatic access to data possible using APIs.
Challenge 3: Sustaining the Dataset — Who is responsible, and why should I contribute?
Once Peterman and his team completed their analysis, no formal plan existed for sustaining or updating the dataset. Responsibility for ongoing maintenance fell informally to former students and collaborators. Despite its national and international relevance, the dataset was never adopted by an agency as a living product. Moreover, the original data contributors often lacked incentives, support, or recognition for their efforts—conditions that persist in many data environments today.
Remedies:
Best Practice 1: Make Data Governance Explicit to Support Trust and Reuse Agencies should define roles, responsibilities, and decision-making processes through formal governance mechanisms such as data product charters. Use a Data Management Plan with a responsibility matrix such as “responsible, approver, consulted, informed” (RACI) to clarify governance, assign maintenance responsibility, and ensure continuity across staff turnover and organizational change.
Best Practice 6: Incentivize and Track Data Sharing and Reuse Visibility, credit, and metrics are critical for motivating data sharing. Agencies can embed citation guidance in metadata and track dataset reuse through COUNTER-compliant dashboards or DataCite APIs.
Best Practice 7: Build Community Through Co-Development and Mutual Benefit Effective data stewardship requires collaboration between biologists, First Nations, Tribes, Indigenous communities, managers, and data professionals. Participatory design ensures that systems and standards meet user needs and are adopted over time. Practical application: Facilitate cross-jurisdictional working groups to co-develop data standards and align on shared outcomes for priority datasets.
While the barriers encountered in compiling, interpreting, and maintaining the data made the work considerably more difficult than necessary and hinder efforts by others seeking to extend or build upon this work, they remain instructive. These challenges are not unique to Peterman’s team—they reflect systemic gaps in data governance, documentation, infrastructure, and incentives. By adopting the seven best practices detailed in Table 1, organizations and researchers can transform legacy datasets into living resources, enabling reuse, easing collaboration, and accelerating insight across the salmon research and management community.
The challenges and solutions demonstrated in this salmon case study generalize across fisheries and environmental monitoring domains. Cross-jurisdictional data harmonization, quality assurance and control patterns, standardized metadata requirements, and long-term archiving strategies are universal needs that extend far beyond salmon science. Similar barriers and solutions apply to trawl survey data integration, invertebrate monitoring programs, and water quality datasets that span multiple agencies and jurisdictions.
| Best Practice | Start here |
|---|---|
| 1. Make Data Governance Foundational and Explicit to Establish and Sustain Trust and Reuse. Establishing clear governance structures promotes quality, accountability, and alignment with FAIR and CARE principles enabling trust and long-term stewardship across multi-organization projects. | - Document roles and responsibilities using a Data Product Governance Charter and structured responsibility frameworks. - Integrate CARE principles to respect First Nations, Tribes, and Indigenous data rights. - Form a governance or oversight committee to review data standards, timelines, and agreements. |
| 2. Reuse Proven Infrastructure to Save Time and Increase Interoperability. Leveraging existing platforms and technologies by building on and extending them rather than building bespoke solutions reduces costs and improves long-term interoperability and sustainability. | - Use domain-specific repositories - Publish and archive data with KNB or Zenodo. |
| 3. Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with Persistent Identifiers. Persistent identifiers (PIDs) connect data with researchers, organizations, and outputs—supporting data citation, reuse, and automated attribution. | - Encourage use of ORCIDs for researchers. - Use ROR IDs for organizations. - Assign DOIs via DataCite for data packages. - Embed DOIs in dashboards and metadata. |
| 4. Use Shared Data Models, Vocabularies and Metadata to Enable Integration. Common vocabularies, metadata standards, and ontologies support integration across systems and preserve semantic meaning. | - Adopt ISO 19115, EML, or DataCite metadata standards. - Re-use terms defined in Salmon Domain Ontology. - Model datasets using the Darwin Core Data Package Model. |
| 5. Store and Analyze Data in Ways That Others Can Easily Access, Use and Trust. Structured and accessible data formats ease reusability, and support integration with analytical tools and applications while data analyzed or wrangled using programmatic scripts (R, Python etc.) enable reproducibility and increase trust. | - Provide APIs using FastAPI, Flask, or Django REST. - Archive in trusted repositories (e.g., GBIF, FRDR, USGS). - Write scripts in a programming language to wrangle, transform, and analyze data. - Use GitHub to host code for collaboration and transparency and the GitHub / Zenodo integration for DOI assignment and preservation. |
| 6. Incentivize and Track Data Sharing and Reuse. Recognizing data contributors and tracking reuse promotes a culture of sharing and supports professional recognition. | - License data with CC-BY 4.0. - Include citation text and visible credit fields. - Use COUNTER metrics and DataCite APIs to monitor reuse. - Encourage dataset citation in references. |
| 7. Build Community Through Co-Development and Mutual Benefit. Engaging users early ensures tools and standards meet real-world needs and enhances long-term stewardship. | - Participate in RDA Salmon Interest Group. - Facilitate workshops for metadata and vocabulary alignment. - Support community-engaged research with tangible benefits. |
Metadata governance as a cross-cutting foundation
Unlike the sequential stages of the data lifecycle, metadata governance operates as a continuous, cross-cutting practice that spans all phases simultaneously. While data moves through Plan → Obtain → Process → Preserve → Access → Disposition, metadata governance must be active throughout, ensuring that documentation, quality control, and discoverability are maintained at every stage. This cross-cutting nature means that metadata governance failures at any point can compromise the entire data stewardship effort, making it a critical foundation rather than a discrete step in the process.
The lifecycle mapping in Table 2 reveals that data governance elements appear in every stage: planning metadata requirements (Plan), documenting collection methods (Obtain), structuring and validating metadata (Process), ensuring long-term preservation (Preserve), enabling discovery and access (Access), and managing final disposition (Disposition). This pervasive presence underscores why metadata governance must be treated as an organizational capability rather than a project-specific task, requiring dedicated resources, trained personnel, and systematic processes that operate continuously across all data activities.
How our seven practices align to the data lifecycle model
Our seven best practices map directly to established data lifecycle models, demonstrating their broad applicability beyond salmon science. The NOAA Data Lifecycle provides a widely recognized framework with six sequential stages (Plan, Obtain, Process, Preserve, Access, Disposition) and four cross-cutting elements (Document, Track and Monitor, Quality, Security) (NOAA Data Governance Committee 2024). This alignment ensures our practices are grounded in established federal data management standards and can be readily adopted by other agencies and research communities.
The mapping shown in Table 2 demonstrates how each practice addresses specific lifecycle stages while the cross-cutting elements ensure comprehensive data stewardship throughout the entire lifecycle. For example, Practice 1 (Data Governance) spans the entire lifecycle from planning through disposition, while Practice 4 (Shared Data Models) primarily supports the Process and Preserve stages. This systematic alignment with established frameworks enhances the credibility and portability of our approach across different domains and organizations.
| Best Practice | Plan | Obtain | Process | Preserve | Access | Disposition | Cross-cutting |
|---|---|---|---|---|---|---|---|
| 1. Data Governance | ● | ● | ● | ● | ● | ● | Document, Quality |
| 2. Reuse Infrastructure | ● | ● | ● | Track and Monitor | |||
| 3. Persistent Identifiers | ● | ● | ● | ● | Document, Track | ||
| 4. Shared Data Models | ● | ● | ● | Quality | |||
| 5. Accessible Storage | ● | ● | ● | Security, Quality | |||
| 6. Incentivize Sharing | ● | ● | ● | Track and Monitor | |||
| 7. Community Building | ● | ● | ● | ● | ● | ● | Document, Quality |
Conclusion
Salmon biologists and data stewards across the globe have generated extensive datasets on salmon abundance, environmental conditions, and biological characteristics. When integrated, these data become valuable assets, a fact powerfully demonstrated by studies such as (Peterman and Dorner 2012). However, as noted by reports to the Cohen Commission investigating the decline of Fraser River sockeye salmon (Marmorek et al. 2011), these data are often incomplete, inconsistently collected, and fragmented across organizations and jurisdictions. Integrating across such diverse sources can be done but requires effort that is often not accounted for in smaller-scale studies or perpetually underestimated in long-term ongoing data projects. This fragmentation hinders a deeper understanding of the drivers of change across salmon life stages and regions, and limits the effectiveness of management decisions, particularly concerning in the face of anthropic climate change and biodiversity loss.
We argue that many persistent bottlenecks in salmon analysis and management are fundamentally data problems, and that coordinated data stewardship offers one of the most impactful, near-term opportunities for improvement. By adopting shared best practices in data governance, metadata standardization, persistent identification, infrastructure reuse, and community co-development we can radically improve the transparency, reusability, and interoperability of salmon data. A coordinated, future-oriented data stewardship strategy can leverage the full potential of data in science and management. The case study presented in this paper—drawn from one of the Pacific Region’s most influential salmon survival syntheses (Peterman and Dorner 2012)—illustrates how technical and social data management gaps directly obstructed efforts to answer pressing questions. If some of the best practices we propose had been adopted by the data producers substantial time and resources could have been saved. The case study offers a clear and cautionary tale, as well as a hopeful roadmap. Because data practices underpin nearly every analysis and decision, strengthening data stewardship is one of the most effective ways salmon programs can improve outcomes within their control.
The emergence of the data stewardship role (Plotkin 2014) represents one of the most critical organizational shifts needed to realize this vision. Historically, the work of managing, documenting, and maintaining data has been diffuse and undervalued—often falling to biologists without support, training, or recognition. As the volume and complexity of scientific data grow, so too does the need for clearly defined data stewardship responsibilities embedded within research teams and organizations. Training biologists in the principles and practices of data stewardship—while also supporting dedicated professionals who specialize in this work—is essential to sustaining trustworthy, reusable, and interoperable salmon data systems.
Realizing this vision requires concrete organizational commitments organizations should formally appoint dedicated data stewards with clear roles, responsibilities, and reporting structures. Agencies can adopt centralized metadata repositories and establish compliance metrics to track progress toward FAIR and CARE principles. Key implementation steps include: (1) designating stewardship roles within existing organizational structures, (2) investing in metadata management infrastructure, (3) establishing data governance committees with cross-organization representation, and (4) developing performance indicators that measure data discoverability, interoperability, and reuse. These organizational changes ensure that data stewardship becomes embedded in organizational culture rather than remaining an ad hoc responsibility.
The visionary future state is one where salmon researchers and stewards—across agencies, Indigenous Nations, academic laboratories, and community groups—can easily access and contribute to well-documented, versioned, and machine-readable datasets. In this future, field biologists, Indigenous guardians, modellers, and policymakers interact with a living knowledge system—one that is flexible, easy to implement, and rooted in principles of FAIRness and Indigenous Data Sovereignty. Metadata standards, controlled vocabularies, and shared governance frameworks are not afterthoughts but integral to the culture of data collection and use. Scientists receive credit for publishing high-quality data, and users trust the provenance and structure of the datasets they rely on to make critical management decisions.
Realizing this vision will require investment in both people and systems. Key to this transformation is the emergence of the data steward as a professional role: a hybrid expert who understands operational field biology, information science, governance protocols, and community needs. As highlighted by Roche et al. (2020), institutionalizing data stewardship roles ensures long-term capacity for data governance, quality control, and interoperability—functions that are often neglected or left to informal actors. We must not only train new data stewards but also support and upskill biologists to take on stewardship responsibilities in collaborative, interdisciplinary settings. This is essential to address the data technical debt of unmanaged data and to modernize research practices in line with open science norms. By embedding these practices into the everyday work of data generation, documentation, publication, and reuse, we can move salmon science decisively into the era of data-intensive discovery.
To do items:
- Discuss differences between data management plans, data governance charters, data sharing agreements
- Incorporate ref to Streamnet Data Exchange Standards somehow
- Add in figures
- fill out appendix 1 more thoroughly
- Refine Reorg the content in appendix 2 (traning roadmap) and decide if it Makes sense to put some of that content into a 3rd column in table 1
- Consider removing para 2 (Data Science adjancency) in Data Stewardship Defn
- Consider re-working table to replace practical applications column with ‘next step’
- Incorporate edits to Appendices from PSF folks
- Incorporate edits from Gottfried Pestal