Senior Data Scientist- AI Evaluation

Elsevier
Bradford
3 weeks ago
Create job alert

Do you have hands-on experience designing reliable evaluations for LLM/NLP features? Do you enjoy turning messy product questions into clear study designs, metrics, and production-ready code?


About our Team

Elsevier’s AI Evaluation team designs, builds, and operates NLP/LLM evaluation solutions used across multiple product lines. We partner with Product, Technology, Domain SMEs, and Governance to ensure our AI features are safe, effective, and continuously improving.


About the Role

As a Senior Data Scientist III, you will design and implement end-to-end evaluation studies and pipelines for AI products. You’ll translate product requirements into statistically sound test designs and metrics, build reproducible Python/SQL pipelines, run analyses and QC, and deliver concise readouts that drive roadmap decisions and risk mitigation. You’ll collaborate closely with SMEs, contribute to our shared evaluation libraries, and produce audit-ready documentation aligned with Responsible AI and governance expectations.


Responsibilities

  • Study design & metrics— Translate product questions into hypotheses, tasks/rubrics, datasets, and success criteria; define metrics (accuracy/correctness, groundedness, reliability, safety/bias/toxicity) with acceptance thresholds.
  • Pipelines & tooling— Build and maintain Python/SQL evaluation pipelines (data prep, prompt/rubric generation, LLM-as-judge with guardrails, scoring, QC, reporting); contribute to shared packages and CI.
  • Statistical rigor— Plan for power, confidence intervals, inter-rater reliability (e.g., Cohen’s κ/ICC), calibration, and significance testing; document assumptions and limitations.
  • SME integration— Partner with SME Ops and domain leads to create clear rater guidance, run calibration, monitor IRR, and incorporate feedback loops.
  • Analytics & reporting— Create analyses that highlight regressions, safety risks, and improvement opportunities; deliver crisp write-ups and executive-level summaries.
  • Governance & compliance— Produce audit-ready artifacts (evaluation plans, datasheets/model cards, risk logs); follow privacy/security guardrails and Responsible AI practices.
  • Quality & reliability— Implement test hygiene (dataset/versioning, golden sets, seed control), observability, and failure analysis; help run post-release regression monitoring.
  • Collaboration— Work closely with Product and Engineering to scope, estimate, and land evaluation work; participate in code reviews and design sessions alongside fellow Data Scientists.

Requirements

  • Education/Experience: Master’s + 3 years, or Bachelor’s + 5 years, in CS, Data Science, Statistics, Computational Linguistics, or related field; strong track record shipping evaluation or ML analytics work.
  • Technical: Strong Python and SQL; experience with LLM/NLP evaluation, data/versioning, testing/CI, and cloud-based workflows; familiarity with prompt/rubric design and LLM-as-judge patterns.
  • Statistics: Comfortable with power analysis, CIs, hypothesis testing, inter-rater reliability, and error/slice analysis.
  • Practices: Git, code reviews, reproducibility, documentation; ability to turn ambiguous product needs into executable study plans.
  • Communication: Clear written/oral communication; ability to produce crisp dashboards and decision-ready summaries for non-technical stakeholders.
  • Mindset: Ownership, curiosity, bias-for-action, and collaborative ways of working.

Nice to have

  • Experience with evaluation of retrieval-augmented or agentic systems and/or with safety/bias/toxicity measurements.
  • Familiarity with lightweight orchestration (e.g., Airflow/Prefect) and containerization basics.
  • Exposure to healthcare or education content or working with clinician/academic SMEs.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120.


Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams.


Please read our Candidate Privacy Policy.


We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.


USA Job Seekers: EEO Know Your Rights.


#J-18808-Ljbffr

Related Jobs

View all jobs

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Machine Learning Jobs for Career Switchers in Their 30s, 40s & 50s (UK Reality Check)

Are you considering a career change into machine learning in your 30s, 40s or 50s? You’re not alone. In the UK, organisations across industries such as finance, healthcare, retail, government & technology are investing in machine learning to improve decisions, automate processes & unlock new insights. But with all the hype, it can be hard to tell which roles are real job opportunities and which are just buzzwords. This article gives you a practical, UK-focused reality check: which machine learning roles truly exist, what skills employers really hire for, how long retraining realistically takes, how to position your experience and whether age matters in your favour or not. Whether you come from analytics, engineering, operations, research, compliance or business strategy, there is a credible route into machine learning if you approach it strategically.

How to Write a Machine Learning Job Ad That Attracts the Right People

Machine learning now sits at the heart of many UK organisations, powering everything from recommendation engines and fraud detection to forecasting, automation and decision support. As adoption grows, so does demand for skilled machine learning professionals. Yet many employers struggle to attract the right candidates. Machine learning job adverts often generate high volumes of applications, but few applicants have the blend of modelling skill, engineering awareness and real-world experience the role actually requires. Meanwhile, strong machine learning engineers and scientists quietly avoid adverts that feel vague, inflated or confused. In most cases, the issue is not the talent market — it is the job advert itself. Machine learning professionals are analytical, technically rigorous and highly selective. A poorly written job ad signals unclear expectations and low ML maturity. A well-written one signals credibility, focus and a serious approach to applied machine learning. This guide explains how to write a machine learning job ad that attracts the right people, improves applicant quality and strengthens your employer brand.

Maths for Machine Learning Jobs: The Only Topics You Actually Need (& How to Learn Them)

Machine learning job adverts in the UK love vague phrases like “strong maths” or “solid fundamentals”. That can make the whole field feel gatekept especially if you are a career changer or a student who has not touched maths since A level. Here is the practical truth. For most roles on MachineLearningJobs.co.uk such as Machine Learning Engineer, Applied Scientist, Data Scientist, NLP Engineer, Computer Vision Engineer or MLOps Engineer with modelling responsibilities the maths you actually use is concentrated in four areas: Linear algebra essentials (vectors, matrices, projections, PCA intuition) Probability & statistics (uncertainty, metrics, sampling, base rates) Calculus essentials (derivatives, chain rule, gradients, backprop intuition) Basic optimisation (loss functions, gradient descent, regularisation, tuning) If you can do those four things well you can build models, debug training, evaluate properly, explain trade-offs & sound credible in interviews. This guide gives you a clear scope plus a six-week learning plan, portfolio projects & resources so you can learn with momentum rather than drowning in theory.