Data Scientists in Pharma

Data Science PSI
3 min readDec 15, 2021

--

written by Jennifer Bradford (Phastar) and Julia Chernova (Bayer)

In spring/summer 2021, Data Science (DS) SIG ran a survey to learn more about data scientists in pharma with the aim of learning how we can help our datascience and stats communities. Below, we summarise the survey results (July 2021, 66 people responded). For now, we will not cover exactly what DS is but focus more on the individuals involved in DS activities, who they are, their skill sets and where they come from.

Many thanks to all who contributed!

To read main messages follow a text in bold.

Naturally, as our SIG is a part of EFSPI, many responses came from professionals based in Europe (77%) in big pharma sector (90%).

What were the background characteristics of a typical respondent? A PhD (58%) or MSc (38%) graduate with Stats (54%), Maths (21%) or Computer Science (10%) degrees being most prevalent, some with multiple qualifications and highly confident in their DS skills — 41% reported advanced and 32% intermediate level.

Professionals working on DS activities come from a variety of backgrounds, often using vocabularies and approaches different to traditionally used by statisticians. This highlights a need to increase communication flow within and between DS and Stats communities.

Most prevalent activities performed included visualisation and model-based analysis (both >70%), predictive modelling (machine learning) and data engineering/manipulation (both almost 60%) and app and tool development (35%).

The range of activities covers many areas of DS in addition to predictive modelling work, which is often associated with DS, and, in fact, visualisation and data engineering constitute a big chunk.

The figures show the importance of creating DS teams with complementary skills as DS project activities are too many to be covered solely by a group of professional(s) with a narrow skill set.

The survey results reveal the most utilised software and tools included R/R studio (90%), Python (50%), R Markdown (55%), Cloud computing (45%).

Respondents shared working on projects coming from various functions (starting from most prevalent): phase 1–3 clinical trials, research, real world evidence/safety/medical affairs, preclinical, operations, diagnostics, HTA/marketing and manufacturing.

While many follow internal processes for quality control of varying degrees, worryingly, some (10%) shared that their outputs don’t go through any verification process.

Interestingly, 22% described too much process and 12% too little process.

This shows that data scientists work in various divisions outside of clinical trials. Often, outside of immediate clinical trials environment, a role involving any sort of statistical analysis, nowadays, comes with a data scientist title. Additionally, the above highlights divisional differences in processes and a call for best practice recommendations that should be available across functions.

Major challenges related to DS activities that you shared with us included: no clear vision and strategy on company and functional levels, limited access to the right data, unrealistic expectations, cross-functional collaborations and knowledge exchange, integration of colleagues with various backgrounds and team structure/skills resources.

Return on investment in DS activities and work force will increase with clear strategy in place.

Despite the current limitations the majority of respondents believe they bring value to their stakeholders and hope for career progression within their companies.

Based on the above, our SIG identified that improving communication between data scientists and other stakeholders, contributing to developing best practices and sharing various practical cases should be our top priorities.

To follow these priorities, at PSI 2021 we talked about “dos” and “don’ts” cases in programming practice and data analysis, shared the detailed survey results and presented a case study of the Novartis organisational initiative in Data Science.

We hope to keep bringing you interesting topics for discussions and we value your comments — please get in touch!

--

--

Data Science PSI
Data Science PSI

Written by Data Science PSI

We are a group of statisticians, computer and data scientists —working in Data Science functions across the Pharmaceutical Industry.

Responses (1)