Principal Data Scientist

TN United Kingdom
Saffron Walden
1 month ago
Applications closed

Related Jobs

View all jobs

Principal Data Scientist

Principal Data Scientist

Principal Data Scientist

Principal Data Scientist

Principal Data Scientist (Remote)

Principal Data Scientist

Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.

Read on to find out what you will need to succeed in this position, including skills, qualifications, and experience.Principal Research Data Scientist

We seek a

Principal Machine Learning Research Data Scientist

to join a collaborative project between the Wellcome Sanger Institute and Open Targets. This project aims to leverage datasets internally generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared objective of advancing biological research through these foundational models. This role will sit within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and the successful candidates, across different seniority levels (senior and principal), will be responsible for delivering their portfolio of scientific research projects as part of the broader team strategy.About the RoleYour role will involve designing foundational models leveraging multi-modal readouts. This includes integrating and processing data from various sources to develop robust and versatile AI models. To achieve this, you will work with open-source software, proposing, developing, and maintaining new solutions to analyze and interpret large-scale single-cell datasets. We have access to unique data and are also in the position to generate data to train unique models. Additionally, we have substantial computational power and GPU resources to train large models efficiently.Our teams are well-positioned to tackle this problem with experience in both generating and analyzing datasets, including millions of cells across multiple tissues and conditions (e.g., disease, healthy). This involves a detailed understanding of the training of large-scale ML models and a track record of undertaking large data-science projects.You will be responsible for:Independently manage and lead machine learning research projects and write outcomes in a scientific publication for submission to journals or machine learning conferences (ICLR, ICML, CVPR, etc).Collaborate with team members, propose, develop, and evaluate new machine learning models that enable understanding single-cell data and its application in drug discovery.Work with Ph.D. students and postdocs in collaborating teams on developing solutions for interdisciplinary scientific problems in biology, providing supervision and training to junior members of the team.Contribute to writing scientific papers on biotechnology and biology.Distill your developed solutions into open-source and easy-to-install packages with documentation that facilitates the usage of your solution for downstream users, including biologists and bioinformaticians.Present your research and analysis pipelines to internal and external audiences.About You:You will be supported in your personal and professional development and have the opportunity to lead peer-reviewed publications around using genetics and genomics approaches to guide drug discovery and present them at national and international conferences.Ph.D. or M.Sc. with equivalent research experience in a relevant quantitative discipline (e.g., Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Statistics/Mathematics).Previous ML work experience in a scientific/academic environment (RA/Internships are considered as work experience).Strong knowledge of Python, including core data science libraries such as Scikit-Learn, SciPy, TensorFlow, and PyTorch.Expertise in machine learning algorithms and frameworks, with experience in designing, training, and deploying ML models.Proficiency in handling and processing large datasets, including techniques for data cleaning, feature engineering, and data augmentation.Experience with high-performance computing environments, including the use of GPUs for training large-scale machine learning models.Experience in natural language processing (NLP) and training models based on transformer architectures, such as BERT and GPT.Familiarity with generative models such as diffusion models and flow matching.Knowledge of software development good practices and collaboration tools, including git-based version control, Python package management, and code reviews.Strong problem-solving skills with the ability to analyze complex data and derive actionable insights.Excellent communication skills, with the ability to explain complex machine learning algorithms and statistical methods to non-technical stakeholders.In addition to the above technical skills, you will also have the following:Ability to quickly understand scientific, technical, and process challenges and breakdown complex problems into actionable steps.Ability to work in a frequently changing environment with the capability to interpret management information to amend plans.Ability to prioritize, manage workload, and deliver agreed activities consistently on time.Demonstrate good networking, influencing and relationship building skills.Strategic thinking is the ability to see the ‘bigger picture.Ability to build collaborative working relationships with internal and external stakeholders at all levels.Demonstrates inclusivity and respect for all.Relevant publication of the groups:Lotfollahi, M ., Naghipourfar, M., Luecken, M. D., Khajavi, M., Büttner, M., Wagenstetter, M., Avsec, Ž., Gayoso, A., Yosef, N., Interlandi, M. & Others. Mapping single-cell data to reference atlases by transfer learning.

Nature Biotechnology

1–10.Lotfollahi, M. , Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses.

Nature Methods

16, 715–721.Lotfollahi, M. , Rybakov, S., Hrovatin, K., Hediyeh-Zadeh, S., Talavera-López, C., Misharin, A. V. & Theis, F. J. Biologically informed deep learning to query gene programs in single cell atlases.

Nature Cell Biology.

#J-18808-Ljbffr

Get the latest insights and jobs direct. Sign up for our newsletter.

By subscribing you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Portfolio Projects That Get You Hired for Machine Learning Jobs (With Real GitHub Examples)

In today’s data-driven landscape, the field of machine learning (ML) is one of the most sought-after career paths. From startups to multinational enterprises, organisations are on the lookout for professionals who can develop and deploy ML models that drive impactful decisions. Whether you’re an aspiring data scientist, a seasoned researcher, or a machine learning engineer, one element can truly make your CV shine: a compelling portfolio. While your CV and cover letter detail your educational background and professional experiences, a portfolio reveals your practical know-how. The code you share, the projects you build, and your problem-solving process all help prospective employers ascertain if you’re the right fit for their team. But what kinds of portfolio projects stand out, and how can you showcase them effectively? This article provides the answers. We’ll look at: Why a machine learning portfolio is critical for impressing recruiters. How to select appropriate ML projects for your target roles. Inspirational GitHub examples that exemplify strong project structure and presentation. Tangible project ideas you can start immediately, from predictive modelling to computer vision. Best practices for showcasing your work on GitHub, personal websites, and beyond. Finally, we’ll share how you can leverage these projects to unlock opportunities—plus a handy link to upload your CV on Machine Learning Jobs when you’re ready to apply. Get ready to build a portfolio that underscores your skill set and positions you for the ML role you’ve been dreaming of!

Machine Learning Job Interview Warm‑Up: 30 Real Coding & System‑Design Questions

Machine learning is fuelling innovation across every industry, from healthcare to retail to financial services. As organisations look to harness large datasets and predictive algorithms to gain competitive advantages, the demand for skilled ML professionals continues to soar. Whether you’re aiming for a machine learning engineer role or a research scientist position, strong interview performance can open doors to dynamic projects and fulfilling careers. However, machine learning interviews differ from standard software engineering ones. Beyond coding proficiency, you’ll be tested on algorithms, mathematics, data manipulation, and applied problem-solving skills. Employers also expect you to discuss how to deploy models in production and maintain them effectively—touching on MLOps or advanced system design for scaling model inferences. In this guide, we’ve compiled 30 real coding & system‑design questions you might face in a machine learning job interview. From linear regression to distributed training strategies, these questions aim to test your depth of knowledge and practical know‑how. And if you’re ready to find your next ML opportunity in the UK, head to www.machinelearningjobs.co.uk—a prime location for the latest machine learning vacancies. Let’s dive in and gear up for success in your forthcoming interviews.

Negotiating Your Machine Learning Job Offer: Equity, Bonuses & Perks Explained

How to Secure a Compensation Package That Matches Your Technical Mastery and Strategic Influence in the UK’s ML Landscape Machine learning (ML) has rapidly shifted from an emerging discipline to a mission-critical function in modern enterprises. From optimising e-commerce recommendations to powering autonomous vehicles and driving innovation in healthcare, ML experts hold the keys to transformative outcomes. As a mid‑senior professional in this field, you’re not only crafting sophisticated algorithms; you’re often guiding strategic decisions about data pipelines, model deployment, and product direction. With such a powerful impact on business results, companies across the UK are going beyond standard salary structures to attract top ML talent. Negotiating a compensation package that truly reflects your value means looking beyond the numbers on your monthly payslip. In addition to a competitive base salary, you could be securing equity, performance-based bonuses, and perks that support your ongoing research, development, and growth. However, many mid‑senior ML professionals leave these additional benefits on the table—either because they’re unsure how to negotiate them or they simply underestimate their long-term worth. This guide explores every critical aspect of negotiating a machine learning job offer. Whether you’re joining an AI-focused start-up or a major tech player expanding its ML capabilities, understanding equity structures, bonus schemes, and strategic perks will help you lock in a package that matches your technical expertise and strategic influence. Let’s dive in.