Introduction

Warning

🚧 This section is still in active development and is subject to changes 🚧

Seedcase was born out of a desire to bring modern data engineering practices, technologies, and processes to research environments. Technological and computing skills and knowledge within health research (at least in our experience) often lags far behind what is the current state-of-the-art. The connected design documents describe the architecture of Seedcase software as well as the underlying data architecture that Seedcase software implements as a framework. Our specific goals for the Seedcase Project are:

The framework acts as an (open source) starting template for setting up data infrastructures that follow principles and requirements listed below. Framework here is defined as: 1) a bundled or recommended set of software programs; 2) a defined and standardized set of conventions on the structure and format of the filesystem, URL paths, and API calls; and, 3) a defined and standardized structure to the data and associated documentation, all of which are linked together as modular components.

The Community based aims are covered more in our Community site page. While these aims are not related to the software itself, it does impact architectural and design decisions we make.

Requirements Overview

Primary goals of Seedcase software are:

  1. Ingesting data and metadata into standardized storage: Take generated data from source locations (such as the clinic or laboratories) that may be distributed geographically or organizationally and store it in a secure, centralized location in a standardized and efficient format to form a Data Resource. Metadata are similarly stored and linked to the data.
  2. Showcasing data and metadata for discoverability and data access requests: Take stored metadata as well as some basic descriptive information of the data inside the Data Resource and display them on a web portal as a list. Users can select specific data they are interested in from the list and submit a request to use these data. These requests for data are saved and stored in the backend.
  3. Tracking data projects using the data for managing and auditing: The stored requests for data are displayed for admin users to approve of (or not). Approved requests get converted to ongoing data projects, which are displayed on the web portal.

While the software architecture documentation largely encompasses the software and data components of the Seedcase Project, because of our secondary aims (to create a framework that other research groups and companies can relatively easily implement, and modify, for their own purposes), we need to describe additional requirements that aren’t strictly technical. We’ll cover some Guiding Principles we need to follow for the Seedcase Project, describe our target users and use-cases, and then list the main functionalities of the Seedcase framework.

Users

The Seedcase framework is developed and designed with four users and their associated use cases in mind. These four users we’ve translated into user roles, which are described in more detail on the User Requirements page, are:

  1. Data Contributor: inputs data to the Data Resource, e.g., authorized centers and researchers.
  2. Data Requester: requests access to a subset of data in the Data Resource, e.g., researchers and clinicians.
  3. Administrator: manages the Data Resource. Data controllers, as defined by the GDPR, fall into this category.
  4. External User: views updates on findings and results from the Data Resource such as aggregate statistics, e.g., policymakers, healthcare workers, journalists, researchers, and the general public.

Quality Goals

Please see the Quality requirements section for our thoughts on this.

Stakeholders

This section contains an explicit list of those who will need, use, or review the design documentation. This is the “audience” or readers of these documents. Specifically it is anyone who:

  • Should know the architecture
  • Have to be convinced of the architecture
  • Have to work with the architecture or the code
  • Need the documentation of the architecture for their work
  • Have to come up with decisions about the system or its development

Currently, we’ve identified these stakeholders along with some expectations we believe they have for these documents.

  • Developers/contributors: Either core developer or external contributor, who use these documents to better understand what needs to be done and why things were done the way they are. They will read it for general and technical reasons.

  • Key partners (DD2, SDCA): They were co-applicants for the funding and will be the first users (and testers) of Seedcase software. They will likely review the overview sections and higher-level diagrams for general understanding, rather than for technical purposes.

  • External reviewers: Part of our plan is to send this document to a software consultancy firm to critically review it and provide feedback on it. This will ensure that we do not miss anything important or critical. These reviewers will read these documents critically from a technical perspective.

  • Funder (Novo Nordisk Foundation): We need to send progress reports to our funders. They will likely use them for record-keeping, accountability, and auditing purposes.

  • Data owners/generators (users of the software): These may include study coordinators or principal investigators of studies who will or have generated data and who are looking for solutions to store and share their data. They will likely read this document from a high-level, such as the overview sections and diagrams, to assess if it fits their needs.

  • Interested researchers: Researchers whose duties include research software or data engineering may be interested in the technical side of the Seedcase Project, either to become a contributor or to modify for their own purposes.

  • Institutional admin/IT: Building Seedcase software will not be enough to get researchers using it. We also have to ensure that we have institutional IT on board and see that Seedcase software fits legal, privacy, and security concerns they may have.