Senior Platform Engineer (Infrastructure)

uSwitch
London
7 months ago
Create job alert

Description

Hybrid - 2 days per week in office (London Bridge/Tower Bridge area)

The RVU London cloud infrastructure team

We are committed to Open Source software in order to build services that help millions of customers to save money and make confident decisions. As well as helping our customers, we also give back to the community by open sourcing interesting projects that we build that might benefit others.

We’re looking for an experienced Platform/Infrastructure Engineer to join our infrastructure platform team, known internally as ‘Airship’.

Our goal as a team is to enable our development teams to deliver services quickly, reliably and securely. We do this by running multiple Kubernetes EKS and Fargate clusters in AWS, creating common tooling to aid in development tasks and running shared services such as Opensearch, Envoy, Vault and Prometheus to name a few. The team has also recently expanded its scope to simplify Data engineering in the organisation using the same techniques we used to ease creating web applications on data pipelines, leveraging Argo Workflows and Argo Events  as well as completed a migration to Github Actions.

Day to day tasks will include:

  • Planning and working on our infrastructure platform: from maintenance to design systems improvements or to adopt new technologies
  • Working with product engineering and data teams to design, build and improve scalability and reliability of their systems with an emphasis to provide the best DevEx
  • Developing tooling to help our teams work more efficiently

Requirements

The ideal candidate will have some of the following skills:

  • Extensive experience  in running Kubernetes clusters in production
  • Knowledge of Golang, Helm and Terraform  (some knowledge of  Python is definitely a plus)
  • Production experience in Cilium and/or eBPF and networking in general
  • Extensive experience in monitoring systems and their performance
  • The ability to debug large and complex systems and solving large problems that affect a wide user base in a simple way
  • Experience with image vulnerability scanning and patching strategies for large systems
  • Experience / Familiarity with AWS Multi Accounts system designs tools like  Crossplane and Control Tower 
  • Familiarity with Argo Workflows or similar data pipeline as a service tools
  • Familiarity working with a variety of Cloud Native projects
  • Familiarity with Github Action 
  • Familiarity with OpenTelemetry

Out team has been featured in a few conferences:

CNCF:   

PlatformCon:    and

 

We have also been featured  in the London AWS Summit 2023 for contribution to the EKS tooling community  

We also hosted and held the Terraform Hashicorp  User Group meetup in London in April. 

Examples of some projects we have worked on:

Short lived database credentials

Our running services previously relied on having long lived credentials to access data that were rarely, if ever, rotated. We wanted human and pod identity to be used to grant short-lived credentials based on policies. We used Vault to build a solution to this problem, creating tooling such as / to make it as easy as possible for developers to use these credentials with their services. ()

: a service that integrates AWS IAM with Kubernetes

We have a lot of existing AWS resource that have their access limited using IAM. We used Kube2IAM initially but experienced race conditions that would hand different role credentials to pods. We started work on a replacement and have worked with the community to get it used in other places.

: Envoy control plane for multi-cluster load balancing

For some of our more important applications it was important to have them survive a total cluster outage. This meant we needed a way to easily route traffic to an application spread out across multiple clusters so we created Yggdrasil, a tool to configure Envoy nodes to route our traffic between clusters based on Ingress resources. ()

: more confidence in the status of your deployments

It tracks deployments as they roll out and posts useful status updates into Slack. It does this by watching the Kubernetes api for namespaces and deployments with the correct annotations. When a new deployment rollout begins and completes updates are posted to the Slack API. Any errors during the deployment rollout are captured and included in the Slack message (see example below). This can be very useful to help quickly debug a failing deployment.

You can also check out our to see a number of blogs on what we’ve been up to.

Our commitment to you

At RVU, we are dedicated to developing valuable, inclusive, and user-friendly products and services for all. To achieve this it’s essential that our teams reflect the diverse range of people in our community. We believe in being the change we wish to see in the world, by embracing our differences and holding ourselves accountable to being open and inclusive teammates and wider community members.

Benefits

What we’ll give back to you:

We want to give you a great work environment; contribute back to both your personal and professional development; and give you great benefits to make your time at RVU even more enjoyable. Some of these benefits include:

  • Employer matching pension up to
  • Hybrid approach of in-office and remote working, and a “Work from Home” budget to help contribute towards a great work environment at home
  • Excellent maternity, paternity and adoption leave policy, for those key moments in your life
  • 25 days holiday (increasing to 30 days) + 2 days “My Time” per year
  • Up to 30 days per year “working from anywhere”
  • A healthy learning and training budget, as well as the chance to go to conferences around the world every year
  • Electric vehicles scheme
  • In office gym
  • Free breakfast in the office daily
  • Health insurance
  • Access to the Calm and Peppy app for physical and mental health
  • Regular events - from team socials to company-wide events with insightful external speakers, we want to make sure our colleagues continue to feel connected

Related Jobs

View all jobs

Senior Kafka Cloud Platform Engineer

Cloud Architect

Senior Software Engineer - Data (Basé à London)

Senior Data Engineer

Senior Data Engineer - MS Fabric - Remote - £70k - £75k

Senior Data Engineer - Remote - £70k

Get the latest insights and jobs direct. Sign up for our newsletter.

By subscribing you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Machine‑Learning Jobs for Non‑Technical Professionals: Where Do You Fit In?

The Model Needs More Than Math When ChatGPT went viral and London start‑ups raised seed rounds around “foundation models,” many professionals asked, “Do I need to learn PyTorch to work in machine learning?” The answer is no. According to the Turing Institute’s UK ML Industry Survey 2024, 39 % of advertised ML roles focus on strategy, compliance, product or operations rather than writing code. As models move from proof‑of‑concept to production, demand surges for specialists who translate algorithms into business value, manage risk and drive adoption. This guide reveals the fastest‑growing non‑coding ML roles, the transferable skills you may already have, real transition stories and a 90‑day action plan—no gradient descent necessary.

Quantexa Machine‑Learning Jobs in 2025: Your Complete UK Guide to Joining the Decision‑Intelligence Revolution

Money‑laundering rings, sanctioned entities, synthetic identities—complex risks hide in plain sight inside data. Quantexa, a London‑born scale‑up now valued at US $2.2 bn (Series F, August 2024), solves that problem with contextual decision‑intelligence (DI): graph analytics, entity resolution and machine learning stitched into a single platform. Banks, insurers, telecoms and governments from HSBC to HMRC use Quantexa to spot fraud, combat financial crime and optimise customer engagement. With the launch of Quantexa AI Studio in February 2025—bringing generative AI co‑pilots and large‑scale Graph Neural Networks (GNNs) to the platform—the company is hiring at record pace. The Quantexa careers portal lists 450+ open roles worldwide, over 220 in the UK across data science, software engineering, ML Ops and client delivery. Whether you are a graduate data scientist fluent in Python, a Scala veteran who loves Spark or a solutions architect who can turn messy data into knowledge graphs, this guide explains how to land a Quantexa machine‑learning job in 2025.

Machine Learning vs. Deep Learning vs. MLOps Jobs: Which Path Should You Choose?

Machine Learning (ML) continues to transform how businesses operate, from personalised product recommendations to automated fraud detection. As ML adoption accelerates in nearly every industry—finance, healthcare, retail, automotive, and beyond—the demand for professionals with specialised ML skills is surging. Yet as you browse Machine Learning jobs on www.machinelearningjobs.co.uk, you may encounter multiple sub-disciplines, such as Deep Learning and MLOps. Each of these fields offers unique challenges, requires a distinct skill set, and can lead to a rewarding career path. So how do Machine Learning, Deep Learning, and MLOps differ? And which area best aligns with your talents and aspirations? This comprehensive guide will define each field, highlight overlaps and differences, discuss salary ranges and typical responsibilities, and explore real-world examples. By the end, you’ll have a clearer vision of which career track suits you—whether you prefer building foundational ML models, pushing the boundaries of neural network performance, or orchestrating robust ML pipelines at scale.