Introduction

Notice: This Documentation is in the process of being updated, some of the information may be out of date, or incorrect.

Introduction #

The FAIR Data Pipeline is intended to enable tracking of provenance of FAIR (findable, accessible, interoperable and reusable) data used in epidemiological modelling. Pipeline APIs written in C, C++, FORTRAN, Java, Julia, Python and R can be called by modelling software for data ingestion. These interact with a local relational database storing metadata and the local filesystem, and are configured using a yaml file associated with the model run. Local files and metadata can be synchronised with a remote registry via a command line interface ( fair).

The key benefits of using the FAIR Data Pipeline are:

  • Opensource, all code is available on the FAIRDataPipeline GitHub
  • Data recorded in a FAIR fashion (metadata on all data and code open and available for inspection)
  • Provenance tracing allows model outputs to be traced to inputs and modelling code
  • Multiple language support
  • Designed to run on a broad range of platforms (including HPC, inside Safe Havens)
  • Designed to be set up and completed online (to down-/up-load data) and run offline (Safe Havens will require this)
  • Open metadata provides knowledge of or access to shared central data for specific domains (e.g. COVID-19 epidemiological modelling)

Running Models #

To use the FAIR Data Pipeline with a piece of modelling software, you must add a language specific Pipeline API as a dependency and interact with data registered in the pipeline via the methods it presents. Each model run must be configured using a config.yml file which specifies inputs and outputs by metadata.

Model
localhost
Local API
CLI
read/write/link_*
read/write/link_*
read/link_*
read/link_*
write_*
read_*
(from link_write)
(from link_read)
Model code
Registry
File Store
Pipeline API
config.yml
fair

Getting data #

The command line interface fair is used to download and upload data and metadata required for and produced by model runs.

Local
Remote
fair pull
fair push
fair pull
fair pull
fair push
fair push
Local Registry
Local Filesystem
Remote Registry
Managed Object Store
Arbitrary URI