Keywords: Salmon data stewardship, data interoperability, FAIR principles, persistent identifiers (PIDs), controlled vocabularies, metadata standards, application programming Interface (API), data citation, ontology development
To do items:
Shorten title paper to be more punchy
Frame the paper more generally related to Fisheries biology facing these issues and then dive into salmon specific example in the abstract, introduction and start of conclusion and end of conclusion.
Incorporate ref to Streamnet Data Exchange Standards somehow
Abstract
Fisheries research, management, and conservation increasingly generate vast and diverse data crucial for timely decision-making. Yet these data remain largely fragmented across jurisdictions, disciplines, and outdated infrastructure, limiting their use in responsive fisheries management. Biologists are increasingly taking on data stewardship responsibilities to address these challenges, often without clear guidance, training, or support. Shared, community-agreed practices for implementing domain-specific data standards are needed to move beyond generic data management guidance toward fit-for-purpose tools and workflows. To address this gap—and to show how other communities can do so—we develop seven practices for salmon data stewardship and demonstrate their application through a real-world case study. We provide practical guidance for those transitioning into these essential stewardship roles, including domain-specific tools, templates, and examples from salmon research and management. We argue that effective salmon management depends on formally establishing data stewardship as a dedicated, institutionally supported professional role. These practices integrate both sociocultural and technical approaches to ensure data meet modern open science principles and respect Indigenous Data Sovereignty. Through a case study of a historical sockeye salmon productivity analysis spanning Pacific Coast jurisdictions, we highlight how clearly defined data stewardship practices enhance data reproducibility, integration, and management efficacy. With a foundation of shared practices, data stewards will enable faster, more transparent decision-making, support development of machine-actionable datasets that leverage advances in artificial intelligence, and expand the use of cross-jurisdictional datasets—ultimately strengthening the management and conservation of salmon populations and the ecosystems they inhabit, and, by extension, other data-rich fisheries data domains.
Introduction
Integrated, timely, and high-quality data are essential for effective fisheries research, management, and conservation. Such data underpin robust stock assessments, inform adaptive management strategies, enable rapid responses to emerging threats, and support transparent decision-making. Yet, across the fisheries domain, biologists face persistent challenges in achieving these goals. Data on fish populations, health, and environmental conditions are often fragmented, inconsistently measured, and incomplete across time, space, and life-history stages (NOAA Data Governance Committee 2024). These issues limit the utility of fisheries data for research and management.
The challenges are especially pronounced in salmon science, where data must be integrated across multiple ecological regions and jurisdictional boundaries. Salmon biologists routinely collect information managed by diverse agencies and institutions, often in isolation and without a focus on interoperability. This fragmented landscape makes it difficult to conduct the timely, integrated analyses needed for effective management and conservation decisions (Marmorek et al. 2011; Inman et al. 2021; Diack et al. 2024). The experience of salmon biologists thus exemplifies broader data stewardship challenges faced throughout fisheries biology and highlights the pressing need for coordinated, community-driven solutions.
While our focus is salmon, these structural issues are not unique: many fisheries, wildlife, and environmental monitoring programs face the same cross-agency fragmentation and legacy systems. The practices we present are community-specific by design, with salmon as the worked case, and the process we use to convene and ratify implementations can be reused in other domains.
Despite the scale and importance of these datasets, biologists who collect and manage salmon data are often expected to act as de facto data stewards without training, guidance, institutional-support or access to community-agreed best practices. Tasks such as documenting methods, aligning terminology, formatting for data sharing, and publishing data are typically performed off the side of a biologist’s desk. A lack of institutional-support (Diack et al. 2024), training (Volk, Lucero, and Barnas 2014), and dedicated roles for data management further relegate critical data stewardship tasks to an ad hoc status. The absence of clear roles, standards, and community-endorsed practices leaves even motivated scientists unsure how to structure their data for future use. As a result, data stewardship is inconsistent and reactive, and data integration remains a major bottleneck to adaptive management and ecosystem-scale learning.
The growing complexity of fisheries management, combined with escalating environmental uncertainties due to climate change, demands rapid, integrated, and robust data analyses (Bull et al. 2022). Biologists assuming data stewardship responsibilities need practical tools and guidance they can apply now. Communities of practice need to develop clarity regarding data standards, platforms and best practices that reduce friction when discovering, accessing, understanding and integrating third-party data. In this paper, we provide actionable practices, examples, and workflows to help salmon biologists improve the usability, reproducibility, and long-term impact of their data. Our goal is to support salmon biologists and the broader research and management community to effectively steward salmon data. To keep this broadly useful, we emphasize patterns—lifecycle planning, metadata governance, vocabulary alignment, reproducible publishing, and role clarity—that any taxa-centric community can adopt, substituting their own standards and tools. We also map the seven practices to widely used data-lifecycle models to make adoption straightforward outside salmon contexts.
The Issue
Effective integration and mobilization of salmon data mirrors the complexity of salmon biology itself: these fish traverse freshwater, estuarine, and marine ecosystems, crossing provincial, state, tribal, federal, and international management boundaries (Groot and Margolis 1991). While localized successes in data coordination exist—particularly within regional fisheries management offices and treaty commissions—salmon data integrated across agencies for each phase of the salmon life cycle is rare and prohibitively expensive for all but the most pressing challenges. Most salmon datasets remain confined within institutional silos, often undocumented, stored in outdated systems, or formatted according to internal standards that are incompatible with broader integration efforts. Even within organizations, data can be siloed by data type with freshwater data going in one data system while estuary, open-ocean, and commercial fishery data each housed in separate data systems with limited ability to easily re-connect the data through shared identifies. As a result, long-term datasets critical to stock assessment and environmental monitoring frequently become inaccessible, poorly understood, difficult to integrate, or effectively lost once original data holders retire or move on.
This fragmentation is compounded by the number of disciplines and organizations involved. Geneticists, oceanographers, freshwater ecologists, stock assessment biologists, and fisheries managers all contribute data using their own field-specific conventions and workflows. Meanwhile, data is distributed across federal, state, provincial, tribal, and academic institutions—each with its own mandates, technologies, and metadata requirements. Many salmon data-holding organizations rely on aging infrastructure or opaque, undocumented standards that lag behind modern open-science practices. This tangle of disciplinary and institutional fragmentation slows integration, hinders reproducibility, and delays analyses that could otherwise inform time-sensitive management decisions. Modernizing these systems will require coordinated investment, grounded in shared international data standards and stewardship practices that accommodate the full disciplinary and geographic diversity of salmon science.
The consequences of inaction are already visible. When critical datasets are hard to find, access, or interpret, biologists and analysts lose valuable time trying to reconstruct or harmonize them. This reduces transparency, increases the risk of errors, and delays urgent conservation or management responses. Without clear accountability for data stewardship, the system continues to rely on improvised, inconsistent, and ultimately unsustainable practices.
The Need for Coordinated Action
For fisheries managers, modernizing data systems and workflows is essential to improve the quality, speed, and interoperability of operational data assets. These systems must support an increasingly complex decision-making landscape that now depends on integrating broader types and sources of data, often in real time. At the same time, researchers face pressure to generate insights on future salmon abundance, the impacts of changing environmental conditions, and the effectiveness of restoration strategies across all salmon life stages. Yet the current scattered and siloed data landscape remains unfit for purpose—both for science and for management.
Despite operating under different mandates, both researchers and managers struggle to align their data with community-agreed principles such as FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. 2016) and Indigenous Data Sovereignty frameworks like the CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics) (Carroll, Rodriguez-Lonebear, and Martinez 2019; Jennings et al. 2023). Adhering to CARE data management principles is all the more important when it comes to salmon related data given the sociocultural importance of salmon to the Indigenous communities of the Northern Pacific and Trans-Atlantic (Ween and Colombi 2013; Earth Economics 2021). Large volumes of data collected through long-term monitoring programs hold tremendous value, especially for secondary users—but are often inaccessible due to a lack of time, resources, and incentives for data producers to publish them (LINDENMAYER et al. 2012). Without clear support and guidance, well-intentioned practitioners are left with ad hoc approaches that limit reuse and interoperability. This gap can only be bridged by equipping both data producers and stewards with tools, support, and institutional backing to publish interoperable, machine-readable metadata and datasets in alignment with shared principles.
A coordinated approach to stewarding salmon data should follow established open science standards and adhere explicitly to FAIR principles, tailored specifically for salmon research and management (Johnson and Stap 2024). Achieving meaningful interoperability demands both breadth and depth. Broad interoperability integrates diverse scientific domains, systems, and formats, requiring structured, machine-readable data and metadata published openly for maximum discoverability. Deep interoperability demands precise definitions of salmon-specific terms and methods, ensuring data remains meaningful and usable across contexts. Salmon data stewards can improve conservation outcomes for salmon by coordinating across boundaries to develop a shared foundation of data stewardship practices.
Defining Data Stewardship in Salmon Science
Data stewardship encompasses the coordinated practices, roles, and responsibilities necessary to effectively manage, share, and reuse data throughout its lifecycle (NOAA 2007; Plotkin 2014; Peng et al. 2018). Within fisheries science, stewardship involves ensuring data quality, compliance with agreed-upon standards, and the establishment of clear governance to guide data collection, documentation, integration, and preservation. However, salmon data stewardship goes beyond mere technical data management; it involves actively facilitating collaboration, communication, and consensus-building among data producers and users across multiple institutions and jurisdictions.
Specifically, effective salmon data stewards perform several critical functions:
Technical oversight: Ensuring metadata completeness, adherence to standardized terminologies and vocabularies, and robust quality assurance protocols.
Social and organizational facilitation: Leading stakeholder engagement, capacity-building activities, and negotiation of data access and sharing agreements, including addressing Indigenous Peoples’ rights and interests in data governance.
Institutional advocacy: Championing the institutional recognition of data stewardship roles, promoting sustained investment and dedicated resources for data management infrastructure and practices.
A user-centred design approach to salmon data stewardship is critical and focuses on creating tools that align with biologists’ needs. Data stewards play a critical role as business analysts, bridging the gap between biologists and Information Technology (IT) staff by translating data needs into application or data system features. When data management is separated from biologists, accountability weakens, and quality issues go unnoticed. While IT expertise is essential for infrastructure and security, effective data system design requires IT to act as an enabler, rather than gatekeeper, provisioning self-serve data infrastructure. The Data Steward, serving as a translator between IT and biologists, enables biologists to engage independently with data systems, fostering ownership and accountability and ultimately improving data quality for research and management.
Dedicated stewardship roles empower salmon biologists to bridge disciplinary divides and jurisdictional barriers, transforming fragmented datasets into cohesive, interoperable resources. By proactively defining, implementing, and maintaining data standards and workflows, salmon data stewards create conditions for timely, accurate, and reproducible analyses. Such stewardship positions salmon biologists to better inform adaptive management decisions, ultimately strengthening salmon conservation and resilience.
Updating Pacific-wide Sockeye Productivity: A Case Study for What Agencies Could Do Now
This case study revisits a Pacific Coast-wide sockeye productivity dataset assembled from diverse agency sources by academic researchers (Peterman and Dorner 2012). We reflect not on the significant work the research team accomplished, but rather on the preventable institutional and technical barriers that impeded their work—and continue to burden data updates and reuse efforts today. Their study examined productivity trends across 64 sockeye salmon stocks spanning Washington, British Columbia (B.C.), and Alaska. However, attempting to replicate or build upon this analysis today is an arduous, time-consuming, and error-prone endeavour due to fragmented data sources, inconsistent formats, and lack of standardized practices among the key institutions involved: the Washington Department of Fish and Wildlife (WDFW), Fisheries and Oceans Canada (DFO), and the Alaska Department of Fish and Game (ADF&G).
Each section below highlights a key challenge faced by the team and proposes practical steps based on our best practices (Table 1) that data-holding agencies could do to enable easier integration, validation, and updating of salmon datasets across jurisdictions and decades. This case study illustrates how implementing the foundational concepts and practical recommendations outlined in this paper can transform data stewardship practices within these organizations. By doing so, they can significantly enhance data accessibility, quality, and interoperability—ultimately enabling more efficient and accurate analyses that support salmon conservation and management.
Challenge 1: Interpreting the Data — What do these numbers actually mean?
Peterman’s team frequently worked with datasets that lacked basic contextual information. Fields such as “year,” “return,” or “age class” were often undefined or inconsistently used. For example, some datasets recorded returns by calendar year while others used brood year, and few included metadata to clarify the distinction. In many cases, the team had to reconstruct metadata by back-checking against reports or simulating assumptions (e.g., about age structure) to interpret the data correctly.
Remedies:
Best Practice 3: Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with Persistent Identifiers (PIDs). Assigning PIDs such as digital object identifiers (DOIs) to protocols, methods, and people (via ORCIDs) and linking them together using data stores and catalogues links data to its provenance and ensures that methods, context, and interpretation decisions are traceable.
Best Practice 4. Use Shared Data Models, Vocabularies and Metadata to Enable Integration. To prevent this kind of ambiguity, agencies can now adopt internationally recognized metadata schemas such as ISO 19115 or Ecological Metadata Language, data models (Darwin Core Data Package) to model age and age type data concepts, and use controlled vocabularies to restrict the permissible values in the age field to calendar year, brood year, or otherwise.
Challenge 2: Accessing and Using the Data — Where is it stored, and how do I get it?
The Peterman dataset was compiled from multiple files scattered across email inboxes, regional offices, and grey literature. Data were stored in inconsistent formats, lacked clear versioning, and were difficult to discover outside of specific research networks. Even today, no API or structured access mechanism exists to update or query the data programmatically. As a result, researchers hoping to build on the dataset may have to start from scratch.
Remedies:
Best Practice 2: Reuse Proven Infrastructure to Save Time and Increase Interoperability
Rather than developing bespoke data catalogues or repositories, agencies should adopt existing catalogues used beyond their own institution such as the Ocean Biodiversity Information System, Zenodo, or the Knowledge Network for Biocomplexity). These are proven platforms with a broad user base that support persistent storage, discoverability, and interoperability.Best Practice 5: Store and Analyze Data in Ways That Others Can Easily Access, Use, and Trust
Agencies can use open-access data repositories or their own institutional data repositories or catalogues that make data discoverable using PIDs and provide programmatic access to data possible using Application Programming Interfaces.
Challenge 3: Sustaining the Dataset — Who is responsible, and why should I contribute?
Once Peterman and his team completed their analysis, no formal plan existed for sustaining or updating the dataset. Responsibility for ongoing maintenance fell informally to former students and collaborators. Despite its national and international relevance, the dataset was never adopted by an agency as a living product. Moreover, the original data contributors often lacked incentives, support, or recognition for their efforts—conditions that persist in many data environments today.
Remedies:
Best Practice 1: Make Data Governance Explicit to Support Trust and Reuse Agencies should define roles, responsibilities, and decision-making processes through formal governance mechanisms such as data product charters. Use a Data Management Plan with a responisibility matrix such as “responsible, approver, consulted, informed” (RACI) to clarify govermamce, assign maintenance responsibility, and ensure continuity across staff turnover and institutional change.
Best Practice 6: Incentivize and Track Data Sharing and Reuse Visibility, credit, and metrics are critical for motivating data sharing. Agencies can embed citation guidance in metadata and track dataset reuse through COUNTER-compliant dashboards or DataCite APIs.
Best Practice 7: Build Community Through Co-Development and Mutual Benefit Effective data stewardship requires collaboration between biologists, Indigenous communities, managers, and data professionals. Participatory design ensures that systems and standards meet user needs and are adopted over time. Practical application: Facilitate cross-jurisdictional working groups to co-develop data standards and align on shared outcomes for priority datasets.
While the analytical contribution of the Peterman productivity dataset remains significant, the barriers encountered in compiling, interpreting, and maintaining the data are instructive. These challenges are not unique to Peterman’s team—they reflect systemic gaps in data governance, documentation, infrastructure, and incentives. By adopting the seven best practices detailed in Table 1, agencies and researchers can transform legacy datasets into living resources, enabling reproducibility, easing collaboration, and accelerating insight across the salmon research and management community.
Best Practice | Practical Applications |
---|---|
1. Make Data Governance Explicit to Support Trust and Reuse. Establishing clear governance structures ensures quality, accountability, and compliance with FAIR and CARE principles. It enables trust and long-term stewardship across multi-organizational projects. | - Document roles and responsibilities using a Data Product Governance Charter and structured frameworks (e.g., DACI or RACI). - Integrate CARE principles to respect Indigenous data rights. - Form a governance or oversight committee to review data standards, timelines, and agreements. |
2. Reuse Proven Infrastructure to Save Time and Increase Interoperability. Leveraging existing platforms and technologies reduces costs and improves long-term interoperability and sustainability. | - Use domain-specific repositories like OBIS or GBIF. - Publish and archive data with KNB or Zenodo. |
3. Make Data, People, Projects, and Outputs Discoverable, Linked and Citable with PIDs. Persistent identifiers (PIDs) connect data with researchers, institutions, and outputs—supporting data citation, reuse, and automated attribution. | - Encourage use of ORCID iDs for researchers. - Use ROR IDs for institutions. - Assign DOIs via DataCite for data packages. - Embed DOIs in dashboards and metadata. |
4. Use Shared Data Models, Ontologies and Metadata to Enable Integration. Common vocabularies, metadata standards, and ontologies support integration across systems and preserve semantic meaning. | - Adopt ISO 19115, EML, or DataCite metadata standards. - Re-use terms defined in Salmon Domain Ontology - Model datasets using the Darwin Core Data Package Model. |
5. Store and Analyze Data in Ways That Others Can Easily Access, Use and Trust. Structured and accessible data formats ease reusability, and support integration with analytical tools and applications while data analyzed or wrangled using programmatic scripts (R, Python etc.) enable reproducibility and increase trust. | - Provide APIs using FastAPI, Flask, or Django REST. - Archive in trusted repositories (e.g., GBIF, FRDR, USGS). - Write scripts in a programming language to wrangle, transform, and analyze data - Use GitHub to host code for collaboration and transparency and the GitHub / Zenodo integration for DOI assignment and preservation. |
6. Incentivize and Track Data Sharing and Reuse. Recognizing data contributors and tracking reuse promotes a culture of sharing and supports professional recognition. | - License data with CC-BY 4.0. - Include citation text and visible credit fields. - Use COUNTER metrics and DataCite APIs to monitor reuse. - Encourage dataset citation in references. |
7. Build Community Through Co-Development and Mutual Benefit. Engaging users early ensures tools and standards meet real-world needs and enhances long-term stewardship. | - Participate in RDA Salmon Interest Group. - Facilitate workshops for metadata and vocabulary alignment. - Support community-engaged research with tangible benefits. |
Conclusion
Salmon biologists and data stewards across the globe have generated extensive datasets on salmon abundance, environmental conditions, and biological characteristics. When integrated, these data become valuable assets, a fact powerfully demonstrated by studies such as (peterma2012?). However, as noted by reports to the Cohen Commission (Marmorek et al. 2011), these data are often incomplete, inconsistently collected, and fragmented across institutions and jurisdictions. Integrating across such diverse sources can be done, but requires effort that is often not accounted for in smaller-scale studies. This fragmentation is a missed opportunity to deepen our understanding of the drivers of change across salmon life stages and regions, and limits the effectiveness of management decisions, particularly in the face of climate change and biodiversity loss.
But this limitation also reveals an opportunity. By adopting shared best practices in data governance, metadata standardization, persistent identification, infrastructure reuse, and community co-development we can radically improve the transparency, reusability, and interoperability of salmon data. A coordinated, future-oriented data stewardship strategy can transform the role of salmon data in science and management. The case study presented in this paper—drawn from one of the Pacific Region’s most influential salmon survival syntheses (Peterman and Dorner 2012)—illustrates how technical and social data management gaps directly obstructed efforts to answer pressing questions. If some of the best practices we propose had been adopted by the data producers—such as documenting their datasets more thoroughly, storing data in accessible formats, or using persistent identifiers—substantial time and resources could have been saved. The case offers a clear and cautionary tale, as well as a hopeful roadmap.
The emergence of the data stewardship role (Plotkin 2014) represents one of the most critical institutional shifts needed to realize this vision. Historically, the work of managing, documenting, and maintaining data has been diffuse and undervalued—often falling to biologists without support, training, or recognition. As the volume and complexity of scientific data grow, so too does the need for clearly defined data stewardship responsibilities embedded within research teams and organizations. Training biologists in the principles and practices of data stewardship—while also supporting dedicated professionals who specialize in this work—is essential to sustaining trustworthy, reusable, and interoperable salmon data systems.
The visionary future state is one where salmon researchers and stewards—across agencies, Indigenous Nations, academic labs, and community groups—can easily access and contribute to well-documented, versioned, and machine-readable datasets. In this future, field biologists, Indigenous guardians, modelers, and policymakers interact with a living knowledge system—one that is flexible, easy to implement, and rooted in principles of FAIRness Indigenous Data Sovereignty. Metadata standards, controlled vocabularies, and shared governance frameworks are not afterthoughts but integral to the culture of data collection and use. Scientists receive credit for publishing high-quality data, and users trust the provenance and structure of the datasets they rely on to make critical management decisions.
Realizing this vision will require investment in both people and systems. Key to this transformation is the emergence of the data steward as a professional role: a hybrid expert who understands operational field biology, information science, governance protocols, and community needs. As highlighted by Roche et al. (2020), institutionalizing data stewardship roles ensures long-term capacity for data governance, quality control, and interoperability—functions that are often neglected or left to informal actors. We must not only train new data stewards but also support and upskill biologists to take on stewardship responsibilities in collaborative, interdisciplinary settings. This is essential to address the “technical debt” of unmanaged data and to modernize research practices in line with open science norms. By embedding these best practices into the everyday work of data generation, documentation, publication, and reuse, we can move salmon science decisively into the era of data-intensive discovery.