Jobs

DevOps Engineer


Job details
  • microTECH Global LTD
  • London
  • 3 months ago
Applications closed

Job Title: DevOps Engineer

Job Type: Fixed Term Contract

Location: Kings Cross, London, United Kingdom

Duration: 12 Months (Extension Possible)

Budget: £70,000 - £90,000/year


100% On-Site Required // No Sponsorship Available


Our client are a global telecommunication company within their AI Infrastructure Team.


Brief:

We are looking for a highly skilled Senior DevOps Engineer to manage a large-scale AI development and training infrastructure.

The role involves overseeing GPU servers, Kubernetes clusters (Rancher), and storage systems to ensure seamless operations and optimized performance. You will collaborate with development teams, ensuring they have the resources and support needed to run their projects efficiently.

This is a critical technical position requiring expertise in Kubernetes, hardware management, automation


Responsibilities:

Kubernetes and Rancher Management: Configure, scale, and maintain Kubernetes clusters and Rancher for multi-cluster management, ensuring optimal performance and resource allocation.

GPU Resource Management: Manage GPU resources and servers, ensuring efficient resource scheduling, load balancing, and performance optimization for AI workloads.

Storage Management: Maintain and optimize large storage systems, ensuring high availability, performance, and data persistence.

DevOps and Automation: Implement CI/CD pipelines and automate infrastructure management using tools such as Terraform, Ansible, Jenkins, and GitLab CI.

Monitoring and Troubleshooting: Set up and manage monitoring and logging systems (e.g., Prometheus, Grafana, ELK) to ensure high availability and rapid issue resolution.

AI Framework Optimization: Collaborate with data scientists and AI developers to optimize AI frameworks (e.g., TensorFlow, PyTorch) for GPU and cluster environments.

Security and Access Management: Implement and manage role-based access control (RBAC) and ensure data security, encryption, and backup procedures are in place.


Key Requirements:

Proven experience in managing large-scale Kubernetes clusters and containerisation technologies (e.g., Docker).

Strong understanding of GPU resource management and optimization for AI workloads.

Expertise in managing large storage systems and implementing data persistence strategies.

Proficiency in scripting and automation (Python, Bash, Go), with experience in infrastructure as code (IaC) using Terraform, Ansible, or similar tools.

Familiarity with deep learning frameworks (e.g., TensorFlow, PyTorch) and experience optimizing them for large-scale environments.

Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK.


Desirables:

Experience with Rancher or other Kubernetes management platform

Experience in managing hybrid cloud environments

Preferred Red Hat Certified System Administrator (RHCSA)

Preferred Certified Kubernetes Administrator (CKA)

Preferred Mandarin Speaker.


Please get in touch with to hear more about this incredible position.

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

DevOps Engineer (AI/ML)

Does DevOps, AI/ML and pushing technical boundaries get you out of bed in the morning? If so, this is the place for you! We’re an AI business who are changing an industry. We’re looking for a DevOps Engineer (AI/ML) to join our rapidly expanding international team!What we offerUp to £90k...

Bryanston and Dorset Square

DevOps Engineer

***DevOps Engineer - UK***For our international client based in Liverpool Street (United Kingdom), RED is currently looking for a DevOps Engineer to start in a new project.Key Skills Required:Terraform, Shell/Powershell scripting (Strong skill and experience required) Microsoft Azure cloud platform, Azure Data Factory/Azure Data Bricks Azure Kubernetes Infrastructure as code...

RED SAP Solutions London

Lead DevOps Engineer

Salary banding: £90,000 - £110,000 dependent on experienceWorking pattern: 1-2 days per week in officeLocation: LondonAbout our Engineering TeamAs a business which has AI at its core, we need to have a reliable, scalable and secure real-time ML platform to deliver our product to customers. The Engineering team makes this...

Sprout.ai London

Principal DevOps/Cloud Engineer

Role OverviewJoin our team as the Principal Cloud/DevOps Engineer, a key player in our technological evolution. This role goes beyond traditional cloud infrastructure and DevOps, encompassing the automation and deployment of advanced computational processes, including machine learning models. You will be instrumental in ensuring our cloud infrastructure is robust, scalable,...

Jobleads London

Kickstart Your Dream Career in Tech or Change - No Experience, No Problem

Ready to dive into the fast-paced world of technology and change? At La Fosse Academy, we’re here to help you launch a successful career in Data, AI, DevOps, Engineering, Solution Architecture, Business Analysis, PMO, and beyond—no prior experience needed, just your drive and ambition! We’ll equip you with the skills...

La Fosse London

AWS Cloud Platform Engineer

This 12 month FTC will suit an experienced AWS Cloud Platform Engineer. Hybrid working available. Must have right to work in UK due to no Sponsorship available.Responsible for managing the analytical infrastructure, used principally by engineers, scientists and economists, and supporting technically challenging projects. You will work across all levels...

Michael Page Cardiff