Senior Scientific Data Engineer, Data Platform

London - UK

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Your work will change lives. Including your own.

Recursion is decoding biology to industrialize drug discovery. We are looking for a Senior Scientific Data Engineer. As part of a team you will own a suite of business-critical data products including our Structure-Activity Relationship data mart.

This is a high-impact role requiring a strong synthesis of robust software engineering capabilities and deep drug discovery domain expertise. You will take ownership of the data architecture responsible for ingesting standardizing and serving both public and proprietary datasets. These systems directly power our competitor intelligence chemical tractability assessments and compound design models.

Please note: This is a specialized Data Engineering position focused strictly on data infrastructure and product ownership. While your work will directly enable our machine learning and predictive modeling efforts the responsibilities do not encompass building or training models. This opportunity is ideally suited for engineers dedicated to architecting complex scientific data systems rather than data scientists seeking modeling-focused roles.

The Systems You Will Own

You will join the Data Platform team and maintain an ecosystem of 100 ingested datasets while taking specific ownership of high-value products including:

Flagship SAR Data Mart: A unified bioactivity warehouse merging commercial and public (e.g. ChEMBL) databases with internal assay data.
Commercial Vendor Data Mart: A massive catalog of purchasable compounds used to guide our internal compound design tools and tractability assessments.
Biomedical Knowledge Graph: The critical data feeds and infrastructure that power our semantic graph and associated AI agents linking targets diseases and compounds.
Chemical Synthesis Data: The foundational dataset of chemical reactions used for training retrosynthesis models and tractability prediction.
Patent Intelligence System: A pipeline transforming patent feeds and competitor data into actionable intelligence.
Compound Standardization Registry: A large-scale chemical structure warehouse ensuring consistency across billions of compounds (similar to UniChem).

What Youll Do

Pipeline Ownership at Scale: Act as a key owner for our core bioactivity pipeline processing 75M records and managing 100 distinct data feeds. You will navigate complex logic and orchestration including managing 4000 lines of complex SQL with 20 transformation steps.
Scientific Data Standardization: Resolve ambiguity by reconciling heterogeneous data formats from diverse commercial and public sources. You will design and implement logic to standardize chemical structures (SMILES InChI tautomers) biological targets (UniProt mapping gene families species homology) and assay data (IC50/Ki normalization unit conversion).
Engineer for Distributed Compute: Optimize tasks using Python and Snowpark for heavy-lifting operations such as large-scale text mining (extracting dose/concentration from unstructured text) and molecular property calculation.
Drive Data Quality: Implement rigorous data quality frameworks (DQF) to handle the nuance of biological data ensuring our downstream models are trained on clean semantic-aware data.
Cross-Functional Consulting: Interface directly with discovery scientists to understand their diverse data needs and translate complex scientific requirements into robust engineering solutions.

The Experience Youll Need

Core Engineering:

Advanced SQL & Warehousing: Deep expertise in modern cloud data warehousing (e.g. Snowflake BigQuery). You should be comfortable with complex window functions CTEs and schema design for multi-layer environments.
Python & Distributed Compute: Strong proficiency in Python for data processing. Experience with Data warehouses is a huge plus but general distributed processing experience is also valuable.
Orchestration: Experience managing complex DAGs and asynchronous task coordination (e.g. Prefect Argo Workflows).

Domain Expertise:

Medicinal Chemistry Context: You understand how chemistry is represented in data (SMILES scaffolds) and the nuance of bioactivity measurements (potency vs. efficacy IC50 vs. pXC50).
Biological Context: Familiarity with gene/protein families species homology and target nomenclature (e.g. how similar genes appear in different species).
Assay Knowledge: Ability to distinguish between assay types (e.g. binding functional) formats and the units/measurements associated with them. Ideally familiar with ontologies (e.g. BioAssay Ontology cell line taxonomies).
Data Landscape: Knowledge about public drug discovery datasets and how they can be used to support the drug discovery pipeline.

Nice-to-Haves:

Experience with chemical toolkits (e.g. OpenEye or RDKit).
Experience using text mining or LLMs for structured data extraction from scientific text.

Working Location & Compensation:

This position can be based at either our London or Milton Park office. Please note that we are a hybrid environment and ask that employees spend 50% of their time in the office.

At Recursion we believe that every employee should be compensated fairly. Based on the skill and level of experience required for this role the estimated current annual base range for this role is 75900 - 101900. You will also be eligible for an annual bonus and equity compensation as well as a comprehensive benefits package.

#LI-EP1

The Values We Hope You Share:

We act boldly with integrity. We are unconstrained in our thinking take calculated risks and push boundaries but never at the expense of ethics science or trust.
We care deeply and engage directly. Caring means holding a deep sense of responsibility and respect - showing up speaking honestly and taking action.
We learn actively and adapt rapidly. Progress comes from doing. We experiment test and refine embracing iteration over perfection.
We move with urgency because patients are waiting. Speed isnt about rushing but about moving the needle every day.
We take ownership and accountability. Through ownership and accountability we enable trust and autonomyleaders take accountability for decisive action and teams own outcomes together.
We are One Recursion. True cross-functional collaboration is about trust clarity humility and impact. Through sharing we can be greater than the sum of our individual capabilities.

Our values underpin the employee experience at Recursion. They are the character and personality of the company demonstrated through how we communicate support one another spend our time make decisions and celebrate collectively.

More About Recursion

Recursion (NASDAQ: RXRX) is a clinical stage TechBio company leading the space by decoding biology to radically improve lives. Enabling its mission is the Recursion OS a platform built across diverse technologies that continuously generate one of the worlds largest proprietary biological and chemical datasets. Recursion leverages sophisticated machine-learning algorithms to distill from its dataset a collection of trillions of searchable relationships across biology and chemistry unconstrained by human bias. By commanding massive experimental scale up to millions of wet lab experiments weekly and massive computational scale owning and operating one of the most powerful supercomputers in the world Recursion is uniting technology biology and chemistry to advance the future of medicine.

Recursion is headquartered in Salt Lake City where it is a founding member of BioHive the Utah life sciences industry collective. Recursion also has offices in Toronto Montréal New York London Oxford area and the San Francisco Bay area. Learn more at or connect on X (formerly Twitter)and LinkedIn.

Recursion is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex sexual orientation gender identity national origin age disability veteran status or any other characteristic protected under applicable federal state local or provincial human rights legislation.

Accommodations are available on request for candidates taking part in all aspects of the selection process.

Recruitment & Staffing Agencies: Recursion Pharmaceuticals and its affiliate companies do not accept resumes from any source other than candidates. The submission of resumes by recruitment or staffing agencies to Recursion or its employees is strictly prohibited unless contacted directly by Recursions internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Recursion and Recursion will not owe any referral or other fees. Our team will communicate directly with candidates who are not represented by an agent or intermediary unless otherwise agreed to prior to interviewing for the job.

Required Experience:

Senior IC

Your work will change lives. Including your own.Recursion is decoding biology to industrialize drug discovery. We are looking for a Senior Scientific Data Engineer. As part of a team you will own a suite of business-critical data products including our Structure-Activity Relationship data mart.This ...

Your work will change lives. Including your own.

The Systems You Will Own

You will join the Data Platform team and maintain an ecosystem of 100 ingested datasets while taking specific ownership of high-value products including:

Flagship SAR Data Mart: A unified bioactivity warehouse merging commercial and public (e.g. ChEMBL) databases with internal assay data.
Commercial Vendor Data Mart: A massive catalog of purchasable compounds used to guide our internal compound design tools and tractability assessments.
Biomedical Knowledge Graph: The critical data feeds and infrastructure that power our semantic graph and associated AI agents linking targets diseases and compounds.
Chemical Synthesis Data: The foundational dataset of chemical reactions used for training retrosynthesis models and tractability prediction.
Patent Intelligence System: A pipeline transforming patent feeds and competitor data into actionable intelligence.
Compound Standardization Registry: A large-scale chemical structure warehouse ensuring consistency across billions of compounds (similar to UniChem).

What Youll Do

Pipeline Ownership at Scale: Act as a key owner for our core bioactivity pipeline processing 75M records and managing 100 distinct data feeds. You will navigate complex logic and orchestration including managing 4000 lines of complex SQL with 20 transformation steps.
Scientific Data Standardization: Resolve ambiguity by reconciling heterogeneous data formats from diverse commercial and public sources. You will design and implement logic to standardize chemical structures (SMILES InChI tautomers) biological targets (UniProt mapping gene families species homology) and assay data (IC50/Ki normalization unit conversion).
Engineer for Distributed Compute: Optimize tasks using Python and Snowpark for heavy-lifting operations such as large-scale text mining (extracting dose/concentration from unstructured text) and molecular property calculation.
Drive Data Quality: Implement rigorous data quality frameworks (DQF) to handle the nuance of biological data ensuring our downstream models are trained on clean semantic-aware data.
Cross-Functional Consulting: Interface directly with discovery scientists to understand their diverse data needs and translate complex scientific requirements into robust engineering solutions.

The Experience Youll Need

Core Engineering:

Advanced SQL & Warehousing: Deep expertise in modern cloud data warehousing (e.g. Snowflake BigQuery). You should be comfortable with complex window functions CTEs and schema design for multi-layer environments.
Python & Distributed Compute: Strong proficiency in Python for data processing. Experience with Data warehouses is a huge plus but general distributed processing experience is also valuable.
Orchestration: Experience managing complex DAGs and asynchronous task coordination (e.g. Prefect Argo Workflows).

Domain Expertise:

Medicinal Chemistry Context: You understand how chemistry is represented in data (SMILES scaffolds) and the nuance of bioactivity measurements (potency vs. efficacy IC50 vs. pXC50).
Biological Context: Familiarity with gene/protein families species homology and target nomenclature (e.g. how similar genes appear in different species).
Assay Knowledge: Ability to distinguish between assay types (e.g. binding functional) formats and the units/measurements associated with them. Ideally familiar with ontologies (e.g. BioAssay Ontology cell line taxonomies).
Data Landscape: Knowledge about public drug discovery datasets and how they can be used to support the drug discovery pipeline.

Nice-to-Haves:

Experience with chemical toolkits (e.g. OpenEye or RDKit).
Experience using text mining or LLMs for structured data extraction from scientific text.

Working Location & Compensation:

This position can be based at either our London or Milton Park office. Please note that we are a hybrid environment and ask that employees spend 50% of their time in the office.

#LI-EP1

The Values We Hope You Share:

We act boldly with integrity. We are unconstrained in our thinking take calculated risks and push boundaries but never at the expense of ethics science or trust.
We care deeply and engage directly. Caring means holding a deep sense of responsibility and respect - showing up speaking honestly and taking action.
We learn actively and adapt rapidly. Progress comes from doing. We experiment test and refine embracing iteration over perfection.
We move with urgency because patients are waiting. Speed isnt about rushing but about moving the needle every day.
We take ownership and accountability. Through ownership and accountability we enable trust and autonomyleaders take accountability for decisive action and teams own outcomes together.
We are One Recursion. True cross-functional collaboration is about trust clarity humility and impact. Through sharing we can be greater than the sum of our individual capabilities.

More About Recursion

Accommodations are available on request for candidates taking part in all aspects of the selection process.

Recruitment & Staffing Agencies: Recursion Pharmaceuticals and its affiliate companies do not accept resumes from any source other than candidates. The submission of resumes by recruitment or staffing agencies to Recursion or its employees is strictly prohibited unless contacted directly by Recursions internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Recursion and Recursion will not owe any referral or other fees. Our team will communicate directly with candidates who are not represented by an agent or intermediary unless otherwise agreed to prior to interviewing for the job.

Required Experience:

Senior IC

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Recursion

Dive into Recursion's innovative approach to decoding biology. Join our mission, explore the future of TechBio, and be part of the revolution. Discover more!

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Scientific Data Engineer, Data Platform

London - UK

Job Summary

Your work will change lives. Including your own.

The Systems You Will Own

What Youll Do

The Experience Youll Need

Your work will change lives. Including your own.

The Systems You Will Own

What Youll Do

The Experience Youll Need

Key Skills

About Company

Related Jobs