reference data in Pharma

Written by Heiner Oberkampf | Jan 18, 2021 8:36:00 AM

Reference data in the pharma and life science industries is created in many places and within various functions, both inside and outside of enterprise boundaries. The distributed nature of reference data management often leads to information redundancy, low findability, additional integration and mapping efforts, or even data incompatibility issues. Mitigating these issues is critical for FAIR data management. In this article, we describe why the management of reference data management often cannot be centralized and how easy-to-use global lookup services effectively increase findability, reuse, and interoperability.

Download the complete whitepaper Managing Data as a Product [1] published with Malcolm Chisholm and Christian Senger.

Reference data is managed in many distributed sources.

Reference data in pharma is created in many places and with varied functions. The creation of information happens across institutions and along the translational value chain, going through early research, development, clinical studies, regulatory processes, production, marketing, and finally, the observation of evidence in the real world. Reference data in early research is usually based on public literature and information created by scientific institutions. Partnering entities, such as Contract Research Organizations (CROs), are involved in early research and clinical studies. As part of the process, clinical study and regulatory affairs data are legally required to be based on reference data from authorities (such as the EMA and FDA). Ultimately, the foundational workflows that eventually lead to a life science product involve collecting and reusing data by many different systems.

The functions that fulfill specific tasks along these workflows often ignore the need for using particular reference data.

Depending on the function, creating the same or similar reference data points (such as a new indication) might happen in a strictly regulated environment and in settings where IDs and labels for data can be freely chosen. Thus, the particular function must recognize handling shared and distributed reference data. This involves opening up for sharing labels and codes with other systems or ingesting and reusing codes from different functions and even from external organizations.

Although functions handle the same kind of data – such as records on regions and countries, species, indications, and drugs – this information is used in different contexts and often requires different levels of detail or granularity. Thus, data entities are viewed from different perspectives and have different roles, depending on the function and workflow (this requires different attributes and metadata). In this context, other data producers and users must align the workflows for creating data (and on the data use conditions) along the value chain.

It is vital to bridge data silos between the different functions, minimizing costs and efforts when using interfaces to transfer data between functions.

Satisfying the demands of regulatory authorities is another chief concern, which may involve multiple functions in the product’s value chain.

3 Main Reference Data Challenges in Pharma

Missing Awareness and Redundancies

Internal and external functions are unaware of existing reference data and reference data standards. Thus, alignment is necessary to enable integration and translation (mapping) between reference data sources if multiple standards are required. This lack of awareness leads to reference data being reinvented, which in turn unintentionally creates data silos. In the worst case, functions are aware of other systems but claim to provide the reigning reference data while ignoring other data or standards. Examples of this are the different functions inside the enterprise, such as R&D, production, Regulatory Affairs, Real World Evidence, and the work with CROs. Also, redundancies cannot be avoided if e.g., a proprietary source system in the lab has pre-configured non-standard units of measures. Again harmonization of measurement attributes and corresponding units need to be made.

External Authorities

Authorities require particularly detailed information from functions, often unavailable due to a lack of sharing and aligning reference data. Thus, such demands may result in massive data wrangling and integration efforts along the value chain. The diversity of regulatory requirements in different regions and product categories adds to the complexity of this scenario.

Workflow Integration

Since many different functions need to adapt, extend, and enhance information, it is crucial to align and maintain reference data in a way that enables the subsequent orchestration of data distribution and sharing workflows. This must happen without impeding processes that also require the reference data in other functions.

Similarly, master data entities, such as “product” or “study”, are managed in a distributed manner, and this will not change. This is because people work in their specific business applications when capturing or processing data. In this case, switching to another system or requesting the creation of a new master entity through a managed service is often not feasible for business users. Moreover, such a system is often unavailable to them, or they do not have access to it.

How does Accurids support distributed reference data management?

Nowadays, many enterprise data strategies aim to centralize reference and master data management to regain control. Though there are legitimate reasons and interests for centralization, Accurids is built on the premise that centralized management alone is too slow to accommodate the fast-changing IT and data landscape. Pure centralized approaches fail to support the perspectives of different business units and the corresponding incompatibilities in large organizations. Nevertheless, both aspects are increasingly important in an era of digitalization and increased need for collaboration across enterprise boundaries.

Accurids is a registry for distributed reference and master data. For data stewards and data consumers, it serves as a discovery solution and provides reliable access for anyone in your organization.

The key functions of Accurids are

Global lookup service that allows searching for any term or code used within the enterprise. This allows data stewards and business users to find the preferred terminologies for a given domain, encouraging reuse instead of re-creation.
Persistent Identifiers can be generated to provide long-term stable and resolvable IDs that function as data references that users and applications can rely on – even when the location where the reference data is managed changes. Persistent IDs are required for FAIR data [2].
Matching reference data is essential for aligning existing terminologies that are already in use. Accurids automatically generates matching proposals, which data stewards need to review and approve before the mappings become available.
Public standard terminologies, such as the Gene Ontology [3] or the NCBI taxonomy [4], can be imported into Accurids with one click so that all internal consumers use the same up-to-date and validated version.

Download the full whitepaper to learn more about how Accurids helps solve your reference data challenges.

You can schedule a demo to learn more about your individual use cases and the benefits your corporation can realise through the simple implementation of Accurids:

View full post