Be at the heart of actionFly remote-controlled drones into enemy territory to gather vital information.

Apply Now

Data Engineer – ML Training Infrastructure

SpAItial AI
City of London
1 week ago
Create job alert
Data Engineer – ML Training Infrastructure

SpAItial AI


SpAItial is pioneering the development of a frontier 3D foundation model, pushing the boundaries of AI, computer vision, and spatial computing. Our mission is to redefine how industries, from robotics and AR/VR to gaming and movies, generate and interact with 3D content.


We’re seeking a Data Engineer to build the pipelines and infrastructure that fuel our large‑scale model training. As the first engineer focused on data, you’ll shape the backbone of how we handle terabytes of multimodal training data (images, video, and 3D). This role is ideal for someone who thrives at the intersection of data systems and machine learning—designing reliable, scalable, and efficient ways to get high‑quality data into cutting‑edge training runs.


Responsibilities

  • Architect and manage data infrastructure for large‑scale ML training datasets (e.g., Apache, Iceberg, Parquet, Spark).
  • Build and operate ingestion pipelines for multimodal data (e.g., images, videos, 3D), including metadata generation and quality signals.
  • Design data loaders, caching, and serving strategies optimized for ML training.
  • Develop tools for dataset inspection, experiment tracking, and evaluation workflows.
  • Partner closely with ML researchers to ensure infrastructure scales with training demands.
  • Uphold code quality and best practices in testing, CI/CD, and reproducibility.

Key Qualifications

  • 3+ years professional software/data engineering experience with production systems.
  • Proven experience in large‑scale data processing for ML training (not just analytics/BI).
  • Hands‑on with distributed data frameworks (e.g., Spark, Beam, Cloud SQL) and modern data formats (Parquet, Iceberg).
  • Proficiency in cloud platforms (AWS, GCP, or Azure).
  • Strong Python development skills, including testing and code quality.
  • Experience building and maintaining CI/CD pipelines.

Preferred Qualifications

  • Familiarity with ML frameworks (e.g., PyTorch, TensorFlow).
  • Experience preparing multimodal datasets (images, video, 3D) for ML pipelines.
  • Background in computer vision or 3D reconstruction (e.g., Structure-from-Motion).
  • Interest in AI‑assisted developer tools (Cursor, Windsurf, etc.).

At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.


#J-18808-Ljbffr

Related Jobs

View all jobs

Data Engineer – ML Training Infrastructure

Principle Data Engineer

Machine Learning Data Engineer - Obstetric Ultrasound

Machine Learning Data Engineer - Obstetric Ultrasound

Machine Learning Engineer, Distributed & Scalable Training

Machine Learning Operations Engineer

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Machine Learning Recruitment Trends 2025 (UK): What Job Seekers Need To Know About Today’s Hiring Process

Summary: UK machine learning hiring has shifted from title‑led CV screens to capability‑driven assessments that emphasise shipped ML/LLM features, robust evaluation, observability, safety/governance, cost control and measurable business impact. This guide explains what’s changed, what to expect in interviews & how to prepare—especially for ML engineers, applied scientists, LLM application engineers, ML platform/MLOps engineers and AI product managers. Who this is for: ML engineers, applied ML/LLM engineers, LLM/retrieval engineers, ML platform/MLOps/SRE, data scientists transitioning to production ML, AI product managers & tech‑lead candidates targeting roles in the UK.

Why Machine Learning Careers in the UK Are Becoming More Multidisciplinary

Machine learning (ML) has moved from research labs into mainstream UK businesses. From healthcare diagnostics to fraud detection, autonomous vehicles to recommendation engines, ML underpins critical services and consumer experiences. But the skillset required of today’s machine learning professionals is no longer purely technical. Employers increasingly seek multidisciplinary expertise: not only coding, algorithms & statistics, but also knowledge of law, ethics, psychology, linguistics & design. This article explores why UK machine learning careers are becoming more multidisciplinary, how these fields intersect with ML roles, and what both job-seekers & employers need to understand to succeed in a rapidly changing landscape.

Machine Learning Team Structures Explained: Who Does What in a Modern Machine Learning Department

Machine learning is now central to many advanced data-driven products and services across the UK. Whether you work in finance, healthcare, retail, autonomous vehicles, recommendation systems, robotics, or consumer applications, there’s a need for dedicated machine learning teams that can deliver models into production, maintain them, keep them secure, efficient, fair, and aligned with business objectives. If you’re hiring for or applying to ML roles via MachineLearningJobs.co.uk, this article will help you understand what roles are typically present in a mature machine learning department, how they collaborate through project lifecycles, what skills and qualifications UK employers look for, what the career paths and salaries are, current trends and challenges, and how to build an effective ML team.