Be at the heart of actionFly remote-controlled drones into enemy territory to gather vital information.

Apply Now

Data Engineer (OCR & Data Pipelines, Contract)

Intelance
City of London
2 days ago
Create job alert

Intelance is a specialist architecture and AI consultancy working with clients in regulated, high-trust environments (healthcare, pharma, life sciences, financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based, UKAS-accredited organisation in human genetic testing.

We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean, reliable, auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect, Lead ML Engineer, and Integration Engineer.

Tasks
  • Design and implement the end-to-end data pipeline for the project:
    • Ingest PDF/Word reports from secure storage
    • Run OCR / text extraction and layout parsing
    • Normalise, structure, and validate the data
    • Store outputs in a form ready for ML and integration.
  • Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar), and wrap them in robust, retry-safe, cost-aware scripts/services.
  • Define and implement data contracts and schemas between ingestion, ML, and integration components (JSON/Parquet/relational as appropriate).
  • Build quality checks and validation rules (field presence, format, range checks, duplicate detection, basic anomaly checks).
  • Implement logging, monitoring, and lineage so every processed document can be traced from source > OCR > structured output > model input.
  • Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training, evaluation, and explainability.
  • Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the client’s assessment system (API, CSV exports, or SFTP drop-zone).
  • Follow good security and privacy practices in all pipelines: encryption, access control, least privilege, and redaction where needed.
  • Contribute to infrastructure decisions (storage layout, job orchestration, simple CI/CD for data jobs).
  • Document the pipeline clearly: architecture diagrams, table/field definitions, data dictionaries, operational runbooks.
Requirements

Must-have

  • 3-5+ years of hands-on Data Engineering experience.
  • Strong Python skills, including building and packaging data processing scripts or services.
  • Practical experience with OCR / document processing (e.g. Tesseract, Azure Form Recognizer, AWS Textract, Google Document AI, or equivalent).
  • Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure, but AWS/GCP is fine if you’re comfortable switching).
  • Good knowledge of data modelling and file formats (JSON, CSV, Parquet, relational schemas).
  • Experience implementing data quality checks, logging, and monitoring for pipelines.
  • Understanding of security and privacy basics: encryption at rest/in transit, access control, secure handling of potentially sensitive data.
  • Comfortable working in a small, senior, remote team; able to take a loosely defined problem and design a clean, maintainable solution.
  • Available for 2-3 days per week on a contract basis, working largely remotely in UK or close European time zones.

Nice-to-have

  • Experience in healthcare, life sciences, diagnostics, or other regulated environments.
  • Familiarity with Azure Data Factory, Azure Functions, Databricks, or similar orchestration/compute tools.
  • Knowledge of basic MLOps concepts (feature stores, model input/output formats).
  • Experience with SFTP-based exchanges and batch integrations with legacy systems.
Benefits
  • Core impact role: you own the pipeline that makes the entire AI solution possible – without you, nothing moves.
  • Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide.
  • Lean, senior team: work alongside experienced architects and ML engineers; minimal bureaucracy, direct access to decision-makers.
  • Remote-first, flexible: work from anywhere compatible with UK hours, 2-3 days/week.
  • Contract / freelance: competitive day rate, with potential extension into further phases and additional schemes if the pilot is successful.
  • Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements.

We review every application personally. If there’s a good match, we’ll invite you to a short call to walk through the project, expectations, and next steps.


#J-18808-Ljbffr

Related Jobs

View all jobs

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Machine Learning Recruitment Trends 2025 (UK): What Job Seekers Need To Know About Today’s Hiring Process

Summary: UK machine learning hiring has shifted from title‑led CV screens to capability‑driven assessments that emphasise shipped ML/LLM features, robust evaluation, observability, safety/governance, cost control and measurable business impact. This guide explains what’s changed, what to expect in interviews & how to prepare—especially for ML engineers, applied scientists, LLM application engineers, ML platform/MLOps engineers and AI product managers. Who this is for: ML engineers, applied ML/LLM engineers, LLM/retrieval engineers, ML platform/MLOps/SRE, data scientists transitioning to production ML, AI product managers & tech‑lead candidates targeting roles in the UK.

Why Machine Learning Careers in the UK Are Becoming More Multidisciplinary

Machine learning (ML) has moved from research labs into mainstream UK businesses. From healthcare diagnostics to fraud detection, autonomous vehicles to recommendation engines, ML underpins critical services and consumer experiences. But the skillset required of today’s machine learning professionals is no longer purely technical. Employers increasingly seek multidisciplinary expertise: not only coding, algorithms & statistics, but also knowledge of law, ethics, psychology, linguistics & design. This article explores why UK machine learning careers are becoming more multidisciplinary, how these fields intersect with ML roles, and what both job-seekers & employers need to understand to succeed in a rapidly changing landscape.

Machine Learning Team Structures Explained: Who Does What in a Modern Machine Learning Department

Machine learning is now central to many advanced data-driven products and services across the UK. Whether you work in finance, healthcare, retail, autonomous vehicles, recommendation systems, robotics, or consumer applications, there’s a need for dedicated machine learning teams that can deliver models into production, maintain them, keep them secure, efficient, fair, and aligned with business objectives. If you’re hiring for or applying to ML roles via MachineLearningJobs.co.uk, this article will help you understand what roles are typically present in a mature machine learning department, how they collaborate through project lifecycles, what skills and qualifications UK employers look for, what the career paths and salaries are, current trends and challenges, and how to build an effective ML team.