Be at the heart of actionFly remote-controlled drones into enemy territory to gather vital information.

Apply Now

Principle Data Engineer

Generative Group
City of London
2 weeks ago
Create job alert
Overview

Our client in the Life Science industry is a startup in stealth mode backed by strong funding. They are seeking a Principal Data Engineer to lead the data and infrastructure systems powering the foundation model transforming drug development.

Responsibilities
  • Lead data and infrastructure systems powering foundation model initiatives in drug development.
  • Own data workflows end-to-end, from extraction and transformation to clean Parquet outputs for machine learning teams.
  • Collaborate closely with wet lab teams; practically understand assays and protocol development.
  • Set up cloud data infrastructure from scratch, including compute, storage, networking, and access controls.
  • Build reliable, repeatable pipelines with testing, version control, and clear documentation.
  • Maintain data quality, lineage, and monitoring; implement sound data modeling practices.
Qualifications (Requirements)
  • Principal-level data engineering experience in life sciences is essential.
  • End-to-end ownership of data workflows from extraction to machine learning-ready outputs (Parquet).
  • Hands-on familiarity with genomics data, including raw FASTQ files and Illumina sequencer outputs.
  • Experience with metabolomics data, particularly untargeted mass spectrometry.
  • Strong collaboration with wet lab teams and practical understanding of assays and protocol development.
  • Cloud data infrastructure built from scratch (compute, storage, networking, access controls).
  • Strong Python and SQL skills; proficient in data modeling, data quality, lineage, and monitoring.
  • Ability to design and maintain reliable pipelines with testing and documentation.
Preferences
  • Experience building data lakes or lakehouses and automating batch workflows (e.g., Airflow).
  • Familiarity with NGS pipelines (quality control, alignment/assembly, variant calling) and mass spectrometry data analysis.
  • Use of Infrastructure as Code (Terraform), containerization (Docker), and CI/CD for deploying data systems.
  • Prior 0-to-1 startup experience and close collaboration with ML and biology teams.
Why Join
  • Design and build cloud infrastructure and data pipelines powering distributed ML training and scalable biological data workflows—without legacy constraints.
  • Work with first-of-their-kind, multi-modal datasets to support foundation model training at AlphaFold scale; this is a builder role with deep technical ownership.
  • Join as a founding member of the engineering team with significant equity and end-to-end system ownership.
  • See your work directly enable drug discoveries that will impact millions, collaborating with world-leading scientists in microbiome research and machine learning.

Location: London - 3 days onsite
Salary: £ 80 000 - £ 120 000 plus equity


#J-18808-Ljbffr

Related Jobs

View all jobs

Principle Data Engineer

Principle Data Engineer in Nottingham - Commify

Senior Data Engineer

Naimuri - Data Engineer

Naimuri - Data Engineer

Naimuri - Data Engineer

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Machine Learning Recruitment Trends 2025 (UK): What Job Seekers Need To Know About Today’s Hiring Process

Summary: UK machine learning hiring has shifted from title‑led CV screens to capability‑driven assessments that emphasise shipped ML/LLM features, robust evaluation, observability, safety/governance, cost control and measurable business impact. This guide explains what’s changed, what to expect in interviews & how to prepare—especially for ML engineers, applied scientists, LLM application engineers, ML platform/MLOps engineers and AI product managers. Who this is for: ML engineers, applied ML/LLM engineers, LLM/retrieval engineers, ML platform/MLOps/SRE, data scientists transitioning to production ML, AI product managers & tech‑lead candidates targeting roles in the UK.

Why Machine Learning Careers in the UK Are Becoming More Multidisciplinary

Machine learning (ML) has moved from research labs into mainstream UK businesses. From healthcare diagnostics to fraud detection, autonomous vehicles to recommendation engines, ML underpins critical services and consumer experiences. But the skillset required of today’s machine learning professionals is no longer purely technical. Employers increasingly seek multidisciplinary expertise: not only coding, algorithms & statistics, but also knowledge of law, ethics, psychology, linguistics & design. This article explores why UK machine learning careers are becoming more multidisciplinary, how these fields intersect with ML roles, and what both job-seekers & employers need to understand to succeed in a rapidly changing landscape.

Machine Learning Team Structures Explained: Who Does What in a Modern Machine Learning Department

Machine learning is now central to many advanced data-driven products and services across the UK. Whether you work in finance, healthcare, retail, autonomous vehicles, recommendation systems, robotics, or consumer applications, there’s a need for dedicated machine learning teams that can deliver models into production, maintain them, keep them secure, efficient, fair, and aligned with business objectives. If you’re hiring for or applying to ML roles via MachineLearningJobs.co.uk, this article will help you understand what roles are typically present in a mature machine learning department, how they collaborate through project lifecycles, what skills and qualifications UK employers look for, what the career paths and salaries are, current trends and challenges, and how to build an effective ML team.