Senior HPC AI Engineer

NVIDIA
remote, uk
11 months ago
Applications closed

Related Jobs

View all jobs

Senior Pricing Analyst

Senior Reporting Analyst

Senior Data Engineer

Senior Data Engineer - Databricks

Senior Solutions Architect

Senior Data Engineer - Microsoft Fabric

NVIDIA is looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. we are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC, be a key player to the most excitingcomputing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.

What you will be doing:

  • Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting

  • Manage Linux job/workload schedules and orchestration tools

  • Develop and maintain continuous integration and delivery pipelines

  • Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources

  • Deploy monitoring solutions for the servers, network and storage

  • Perform troubleshooting bottom up from bare metal, operating system, software stack and application level

  • Being a technical resource, develop, re-define and document standard methodologies to share with internal teams

  • Support Research & Development activities and engage in POCs/POVs for future improvements

What we need to see:

  • A degree in Computer Science, Engineering, or a related field and 5+ years of experience

  • Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software

  • Experience with job scheduling workloads and orchestration tools such as Slurm, K8s

  • Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.

  • Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.

  • Python programming and bash scripting experience.

  • Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef

  • Deep knowledge of Networking Protocols like InfiniBand, Ethernet

  • Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)

  • Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

Ways to stand out from the crowd:

  • Knowledge of CPU and/or GPU architecture

  • Knowledge of Kubernetes, container related microservice technologies

  • Experience with GPU-focused hardware/software (DGX, Cuda)

  • Background with RDMA (InfiniBand or RoCE) fabrics

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Get the latest insights and jobs direct. Sign up for our newsletter.

By subscribing you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Machine Learning Jobs at Newly Funded UK Start-ups: Q3 2025 Investment Tracker

Machine learning (ML) has become the beating heart of modern tech innovation, powering breakthroughs in healthcare, finance, cybersecurity, robotics, and more. Across the United Kingdom, this surge in ML-driven solutions is fueling the success of countless start-ups—and spurring demand for talented machine learning engineers, data scientists, and related professionals. If you’re eager to join a high-growth ML company or simply want to keep tabs on the latest trends, this Q3 2025 Investment Tracker will guide you through the newly funded UK start-ups pushing the boundaries of ML. In this article, we’ll highlight key developments from Q3 2025, delve into the most promising newly funded ventures, and shed light on the machine learning roles they’re urgently seeking to fill. Plus, we’ll show you how to connect with these employers via MachineLearningJobs.co.uk, a dedicated platform for ML job seekers. Let’s dive in!

Portfolio Projects That Get You Hired for Machine Learning Jobs (With Real GitHub Examples)

In today’s data-driven landscape, the field of machine learning (ML) is one of the most sought-after career paths. From startups to multinational enterprises, organisations are on the lookout for professionals who can develop and deploy ML models that drive impactful decisions. Whether you’re an aspiring data scientist, a seasoned researcher, or a machine learning engineer, one element can truly make your CV shine: a compelling portfolio. While your CV and cover letter detail your educational background and professional experiences, a portfolio reveals your practical know-how. The code you share, the projects you build, and your problem-solving process all help prospective employers ascertain if you’re the right fit for their team. But what kinds of portfolio projects stand out, and how can you showcase them effectively? This article provides the answers. We’ll look at: Why a machine learning portfolio is critical for impressing recruiters. How to select appropriate ML projects for your target roles. Inspirational GitHub examples that exemplify strong project structure and presentation. Tangible project ideas you can start immediately, from predictive modelling to computer vision. Best practices for showcasing your work on GitHub, personal websites, and beyond. Finally, we’ll share how you can leverage these projects to unlock opportunities—plus a handy link to upload your CV on Machine Learning Jobs when you’re ready to apply. Get ready to build a portfolio that underscores your skill set and positions you for the ML role you’ve been dreaming of!

Machine Learning Job Interview Warm‑Up: 30 Real Coding & System‑Design Questions

Machine learning is fuelling innovation across every industry, from healthcare to retail to financial services. As organisations look to harness large datasets and predictive algorithms to gain competitive advantages, the demand for skilled ML professionals continues to soar. Whether you’re aiming for a machine learning engineer role or a research scientist position, strong interview performance can open doors to dynamic projects and fulfilling careers. However, machine learning interviews differ from standard software engineering ones. Beyond coding proficiency, you’ll be tested on algorithms, mathematics, data manipulation, and applied problem-solving skills. Employers also expect you to discuss how to deploy models in production and maintain them effectively—touching on MLOps or advanced system design for scaling model inferences. In this guide, we’ve compiled 30 real coding & system‑design questions you might face in a machine learning job interview. From linear regression to distributed training strategies, these questions aim to test your depth of knowledge and practical know‑how. And if you’re ready to find your next ML opportunity in the UK, head to www.machinelearningjobs.co.uk—a prime location for the latest machine learning vacancies. Let’s dive in and gear up for success in your forthcoming interviews.