Applied Data Engineer - Build Scalable Research Pipelines

Ellison Institute of Technology Oxford
Oxford
6 hours ago
Create job alert

At the Ellison Institute of Technology (EIT), we're on a mission to translate scientific discovery into real world impact. We bring together visionary scientists, technologists, policy makers, and entrepreneurs to tackle humanity's greatest challenges in four transformative areas:

  • Health, Medical Science & Generative Biology
  • Food Security & Sustainable Agriculture
  • Climate Change & Managing CO₂
  • Artificial Intelligence & Robotics

This is ambitious work - work that demands curiosity, courage, and a relentless drive to make a difference. At EIT, you'll join a community built on excellence, innovation, tenacity, trust, and collaboration, where bold ideas become real-world breakthroughs. Together, we push boundaries, embrace complexity, and create solutions to scale ideas for lab to society. Explore more at www.eit.org

Requirements

The Role:

Forward Deployed Data Engineers are a key interface between our core data systems and our research and product engineering projects. As an early member of our rapidly growing team, you'll work side-by-side with scientists and engineers in bioinformatics, healthcare, robotics, agriculture, and frontier AI. You will disseminate data engineering best practices into the research and applied teams, ensuring that the datasets used for our AI models are accurate, reproducible, versioned-controlled, well-structured, and ready for training. You will help scientists turn raw information from various sources into high-quality resources, ensuring our technical foundations support the next generation of discovery.

This is a hands-on role for engineers who thrive on collaborating directly with researchers, solving problems quickly, and turning complex business logic into scalable, reliable data pipelines while contributing to the broader EIT platform. Successful candidates will be clear, respectful communicators who are comfortable bringing their own expertise into diverse groups.

Day-to-Day, You Might:

  • Partner with scientists and engineers to deliver robust, reproducible data pipelines that meet research needs across disciplines.
  • Own ingestion, storage, curation, and transformation of multimodal data, including text, images, structured/tabular data, and high I/O formats such as Arrow/NetCDF/HDF5.
  • Package and deploy code in research environments using containers (e.g. Docker).
  • Scale processing across distributed cloud warehouses/storage via container orchestration (e.g. Kubernetes), distributed compute frameworks (e.g. Spark, Ray), or High-Performance Compute (e.g. Slurm).
  • Work with sensitive data in line with security and compliance requirements (audit trails, encryption, GDPR, RBAC/ABAC).
  • Contribute to an engineering culture that values maintainability, testing, robust system design, and deep collaboration, but allows flexibility for rapid prototyping and responsiveness to changing landscapes.

What Makes You a Great Fit:

  • You have strong programming experience in Python and SQL, and value code quality, reliability (including testing, CI/CD) and observability as much as performance
  • You have experience designing, deploying, and optimising distributed data systems or data-intensive backend services
  • You think in terms of systems and longevity, not just one-off ETL scripts, and embrace end-to-end ownership from low-level performance to user interfaces
  • You're a collaborative partner to Infrastructure/Ops teams and researchers; clear, respectful communicator.
  • You have a low-ego, team-first mindset and help grow our engineering culture by mentoring, sharing, and elevating the work of those around you

Our Forward Deployed Engineers will have the opportunity to work on diverse projects, and to expand their skillset, but will add particular value when they can match their data engineering expertise with domain knowledge relevant to a project. Does one of these domains fit you?

Health/Clinical Data Engineering:

  • In-depth knowledge and expertise in human healthcare data, clinical data, patient journey and/or biocuration
  • Feature engineering for predictive models utilising health data
  • Transformation of clinical data into common data models, and design of new models
  • Clinical data quality control and analytics
  • Differential privacy systems, PII-handling, and anonymisation/pseudonymisation

OR

Bioinformatics/Genomics Data Engineering:

  • Experience in bioinformatics and use of industry-standard tooling such as NextFlow or similar
  • Genomic/metagenomic data processing e.g. genome assembly and annotation
  • Comfortable with relevant data formats such as fastq/fasta, vcf etc
  • Conversion of genomic data into ML-ready formats

Benefits

We offer the following salary and benefits:

Enhanced holiday pay

Pension

Life Assurance

Income Protection

Private Medical Insurance

Hospital Cash Plan

Therapy Services

Perk Box

Electric Car Scheme

--

Why work for EIT:

At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact!
#J-18808-Ljbffr

Related Jobs

View all jobs

Applied Data Scientist in AI & Geospatial Intelligence (KTP Associate)

Principal Geospatial Data Engineer

Data Engineer (Analytics)

Data Engineering Technical Lead

Data Engineer - AI Practice Team

Data Engineer - AI Practice Team

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

New Machine Learning Employers to Watch in 2026: UK and Global Companies Driving ML Innovation

Machine learning (ML) has transitioned from a specialised field into a core business capability. In 2026, organisations across healthcare, finance, robotics, autonomous systems, natural language processing, and analytics are expanding their machine learning teams to build scalable intelligent products and services. For professionals exploring opportunities on www.MachineLearningJobs.co.uk , understanding the companies that are scaling, winning investment, or securing high‑impact contracts is crucial. This article highlights the new and high‑growth machine learning employers to watch in 2026, focusing on UK innovators, international firms with significant UK presence, and global platforms investing in machine learning talent locally.

How Many Machine Learning Tools Do You Need to Know to Get a Machine Learning Job?

Machine learning is one of the most exciting and rapidly growing areas of tech. But for job seekers it can also feel like a maze of tools, frameworks and platforms. One job advert wants TensorFlow and Keras. Another mentions PyTorch, scikit-learn and Spark. A third lists Mlflow, Docker, Kubernetes and more. With so many names out there, it’s easy to fall into the trap of thinking you must learn everything just to be competitive. Here’s the honest truth most machine learning hiring managers won’t say out loud: 👉 They don’t hire you because you know every tool. They hire you because you can solve real problems with the tools you know. Tools are important — no doubt — but context, judgement and outcomes matter far more. So how many machine learning tools do you actually need to know to get a job? For most job seekers, the real number is far smaller than you think — and more logically grouped. This guide breaks down exactly what employers expect, which tools are core, which are role-specific, and how to structure your learning for real career results.

What Hiring Managers Look for First in Machine Learning Job Applications (UK Guide)

Whether you’re applying for machine learning engineer, applied scientist, research scientist, ML Ops or data scientist roles, hiring managers scan applications quickly — often making decisions before they’ve read beyond the top third of your CV. In the competitive UK market, it’s not enough to list skills. You must send clear signals of relevance, delivery, impact, reasoning and readiness for production — and do it within the first few lines of your CV or portfolio. This guide walks you through exactly what hiring managers look for first in machine learning applications, how they evaluate CVs and portfolios, and what you can do to improve your chances of getting shortlisted at every stage — from your CV and LinkedIn profile to your cover letter and project portfolio.