Digital Twin Multi-omics Data Scientist EMBL-EBI

Society of Research Software Engineering
Hinxton
2 months ago
Applications closed

Related Jobs

View all jobs

Machine Learning Engineer

Machine Learning Engineer

Lead Machine Learning Researcher

Digital Product & Service Specialists

Digital Design Engineer - High Speed Digital Design

Digital Marketing Executive (CRO)

We are seeking a talented Data Scientist to work on an exciting project for the development of Digital Twins for rare disease. You will work within a multidisciplinary project team, across the Open Targets, BioModels and Petsalaki research groups at the EMBL European Bioinformatics Institute (EMBL-EBI). This project is funded through the Chan Zuckerberg Initiative with a strong emphasis on making datasets and models open source where possible.

‘Digital Twins’ involve creating virtual models of real-world patients to simulate their disease trajectories and therapeutic response, which requires modelling complex multi-omics data and clinically relevant endpoints. By their very nature, rare diseases (RDs) are rare – but collectively they account for around 7,000 diseases affecting approximately 300 million individuals worldwide. Due to the heterogeneous and rare/ultra-rare nature of these diseases, having enough samples for each disease (and at a single cell molecular level) is a major challenge. This hinders the ability to study the cause and mechanisms underlying these diseases and thus is a major obstacle for diagnosis and designing treatment options for these patients.

To address this challenge, this project aims to develop ‘Digital Twins’ of rare disease patients by combining mechanistic, GenAI and other machine learning framework models to integrate patient-level multi-omics and clinical data to provide insights into rare diseases. The models will utilize extensive public datasets of single-cell multiomics including transcriptomics from diverse disease conditions, and simulations from mechanistic models. This will be applied to the challenge of limited multi-omics data for rare disease, with the aim of developing rare disease Digital Twins to provide new insights into disease mechanisms and potential treatments.

This is an exciting opportunity to make a significant contribution to our understanding of disease biology which may lead to applications such as diagnosis, drug repurposing and new treatment development.

You will be responsible for the processing and harmonisation of multi-omics data from disease and healthy patient cohorts, curating benchmarking datasets to test the models, developing and maintaining collaborations across the rare disease and project community, data stewardship and standardisation. You will work collaboratively within a multidisciplinary project team alongside modellers and scientists from EMBL-EBI, Open Targets, and CZI, as well as the wider Biocuration and rare disease community.

Main Duties and Responsibilities

The Data Scientist’s primary tasks will be to collect and process patient-derived multi-omics datasets linked with clinical phenotypes for several complex and rare diseases. You will also create healthy benchmarking datasets and validation datasets to train and test the models that are developed. You will work closely alongside the ML Modeller and Bioinformatician on the project team, ensuring data is appropriately harmonised, structured and processed through pipeline workflows for use in models. You will also help with testing of the model outcomes and have an opportunity to analyse the outputs. A strong aspect of the role will be helping to coordinate and develop collaborations with the rare disease community. Safe data stewardship will be a key responsibility, as well as ensuring data and models are made open-source and publicly available where possible.

In particular, and in addition to the above, this role will involve:

  • Scoping of publicly available single cell multi-omic datasets with phenotypes at individual level for healthy and disease tissues, with a strong focus on transcriptomic expression data
  • Evaluation of publicly available biomodels for healthy and disease tissues
  • Help coordinate existing collaborations across the rare disease, industry and project community, and help initiate and develop new collaborations
  • Collection and processing of patient-derived multi-omics datasets (single cell where possible) along with clinical phenotypes for the models, using appropriate data standards and ontologies and developing or utilising data processing workflows
  • Curation of validation/benchmarking datasets to train and test the models, and helping to test and validate the model outcomes
  • Taking ownership of data management overall for the project, with responsibility for applying to data access when necessary, data coordination between team members, ensuring data safety, control and ensuring data agreements are adhered to
  • Making datasets and models openly available, when possible, in standardised frameworks for use by the wider community
  • Actively collaborate with global consortia, leveraging advanced biological knowledge to harmonise data from disparate sources.

The role may require some international travel to conferences or meetings.

You have

  • A Bachelors, Masters (or equivalent) degree in medical or biological sciences
  • A higher degree (PhD) or equivalent experience
  • At least 2 years of relevant data scientist, biocuration or bioinformatics experience
  • Prior experience in working with single cell or bulk transcriptomics, and/or human genomics data analysis
  • The ability to apply scientific knowledge in the understanding of scientific research articles and data records
  • Experience of biological/clinical data curation
  • Experience of working with biological databases
  • Experience in developing and running pipelines to process and harmonise datasets
  • Experience of working in a UNIX/Linux environment, very good scripting skills (R or preferably Python)
  • Willingness to learn new skills as the project requires
  • Self-motivated and capable of working both independently and as part of a team
  • Excellent communication, interpersonal and English language skills

You may also have

  • Prior experience with eQTL or GWAS analysis
  • Prior experience with multi-omics data harmonization and integration
  • Working with pipeline workflow management tools such as Nextflow is desirable
  • Experience in reviewing, interpreting and summarising scientific literature
  • Experience in application of automation, text-mining and/or machine-learning to biocuration
  • Experience of working with ontologies or controlled vocabularies
  • Demonstration of being an active member on a collaborative project
  • Experience working with rare or common disease patient cohort data

Apply now! Benefits and Contract Information

  • Financial incentives: depending on circumstances, monthly family/marriage allowance of £272 monthly child allowance of £328 per child. Non resident allowance up to £556 per month. Annual salary review, pension scheme, death benefit, long-term care, accident-at-work and unemployment insurances
  • Hybrid working arrangements
  • Private medical insurance for you and your immediate family (including all prescriptions and generous dental & optical cover)
  • Generous time off: 30 days annual leave per year, in addition to eight bank holidays
  • Relocation package including installation grant (as applicable)
  • Campus life: Free shuttle bus to and from work, on-site library, subsidised on-site gym and cafeteria, casual dress code, extensive sports and social club activities (on campus and remotely)
  • Family benefits: On-site nursery, child sick leave, generous parental leave, holiday clubs on campus and monthly family and child allowances
  • Contract duration: This position is a 2 year fixed term contract
  • Salary: Monthly salary starting at £3,229 after tax but excl. pension & insurances) + benefits (Total package will be dependent on family circumstances)
  • International applicants: We recruit internationally and successful candidates are offered visa exemptions. Read more on our page for international applicants.
  • Diversity and inclusion: At EMBL-EBI, we strongly believe that inclusive and diverse teams benefit from higher levels of innovation and creative thought. We encourage applications from women, LGBTQ+ and individuals from all nationalities.
  • Job location: This role is based in Hinxton, near Cambridge, UK. You will be required to relocate if you are based overseas and you will receive a generous relocation package to support you.

To apply, please submit a covering letter and CV via our online system. Applications will close on 12/03/2025.

#J-18808-Ljbffr

Get the latest insights and jobs direct. Sign up for our newsletter.

By subscribing you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Portfolio Projects That Get You Hired for Machine Learning Jobs (With Real GitHub Examples)

In today’s data-driven landscape, the field of machine learning (ML) is one of the most sought-after career paths. From startups to multinational enterprises, organisations are on the lookout for professionals who can develop and deploy ML models that drive impactful decisions. Whether you’re an aspiring data scientist, a seasoned researcher, or a machine learning engineer, one element can truly make your CV shine: a compelling portfolio. While your CV and cover letter detail your educational background and professional experiences, a portfolio reveals your practical know-how. The code you share, the projects you build, and your problem-solving process all help prospective employers ascertain if you’re the right fit for their team. But what kinds of portfolio projects stand out, and how can you showcase them effectively? This article provides the answers. We’ll look at: Why a machine learning portfolio is critical for impressing recruiters. How to select appropriate ML projects for your target roles. Inspirational GitHub examples that exemplify strong project structure and presentation. Tangible project ideas you can start immediately, from predictive modelling to computer vision. Best practices for showcasing your work on GitHub, personal websites, and beyond. Finally, we’ll share how you can leverage these projects to unlock opportunities—plus a handy link to upload your CV on Machine Learning Jobs when you’re ready to apply. Get ready to build a portfolio that underscores your skill set and positions you for the ML role you’ve been dreaming of!

Machine Learning Job Interview Warm‑Up: 30 Real Coding & System‑Design Questions

Machine learning is fuelling innovation across every industry, from healthcare to retail to financial services. As organisations look to harness large datasets and predictive algorithms to gain competitive advantages, the demand for skilled ML professionals continues to soar. Whether you’re aiming for a machine learning engineer role or a research scientist position, strong interview performance can open doors to dynamic projects and fulfilling careers. However, machine learning interviews differ from standard software engineering ones. Beyond coding proficiency, you’ll be tested on algorithms, mathematics, data manipulation, and applied problem-solving skills. Employers also expect you to discuss how to deploy models in production and maintain them effectively—touching on MLOps or advanced system design for scaling model inferences. In this guide, we’ve compiled 30 real coding & system‑design questions you might face in a machine learning job interview. From linear regression to distributed training strategies, these questions aim to test your depth of knowledge and practical know‑how. And if you’re ready to find your next ML opportunity in the UK, head to www.machinelearningjobs.co.uk—a prime location for the latest machine learning vacancies. Let’s dive in and gear up for success in your forthcoming interviews.

Negotiating Your Machine Learning Job Offer: Equity, Bonuses & Perks Explained

How to Secure a Compensation Package That Matches Your Technical Mastery and Strategic Influence in the UK’s ML Landscape Machine learning (ML) has rapidly shifted from an emerging discipline to a mission-critical function in modern enterprises. From optimising e-commerce recommendations to powering autonomous vehicles and driving innovation in healthcare, ML experts hold the keys to transformative outcomes. As a mid‑senior professional in this field, you’re not only crafting sophisticated algorithms; you’re often guiding strategic decisions about data pipelines, model deployment, and product direction. With such a powerful impact on business results, companies across the UK are going beyond standard salary structures to attract top ML talent. Negotiating a compensation package that truly reflects your value means looking beyond the numbers on your monthly payslip. In addition to a competitive base salary, you could be securing equity, performance-based bonuses, and perks that support your ongoing research, development, and growth. However, many mid‑senior ML professionals leave these additional benefits on the table—either because they’re unsure how to negotiate them or they simply underestimate their long-term worth. This guide explores every critical aspect of negotiating a machine learning job offer. Whether you’re joining an AI-focused start-up or a major tech player expanding its ML capabilities, understanding equity structures, bonus schemes, and strategic perks will help you lock in a package that matches your technical expertise and strategic influence. Let’s dive in.