Jobs

Site Reliability Engineer - AI & ML


Job details
  • Galway
  • 1 month ago
Applications closed

Join an industry leader in Enterprise Technology Management solutions. Their SaaS solution, orchestrates and automates key business processes for IT, with agentless integrations, best practices, and low-code workflows, enabling enterprises to leverage their existing infrastructure systems and automate processes thereby reducing reliance on error-prone manual tasks and tickets. 

We are recruiting an experienced AI & ML Site Reliability Engineer who is passionate about AI, machine learning, and data science to support innovations in AI and Data product management.

In this role, you will

be responsible for architecting and maintaining infrastructure that supports machine learning (ML), artificial intelligence (AI), and data-driven solutions.

You will help stand up the foundational systems that enable large-scale AI deployment, including developing and managing big data analytics platform, developing AI architecture, implementing vector databases, building knowledge graphs, and optimizing systems for ML model deployment and inference.

collaborate closely with data scientists, infrastructure engineers, product management teams, and UX designers to ensure our customers realize meaningful business value by streamlining workflows, ensure scalability, and manage the complete lifecycle of AI systems from development to production.

Qualifications

Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field 

5+ years of experience in site reliability engineering, dev ops, ML Ops, or similar role.

Experience with cloud platforms such as AWS, GCP, or Azure, including AI/ML services (e.g., SageMaker, Google Colab, Vertex AI).

Proficient in deploying machine learning models such as regressions, decision trees, neural networks, recommendations systems, etc., into production and managing model 

Technical Skills: 

Experience with data processing tools such as Apache Spark, Hadoop, or Airflow for large-scale data processing. Experience with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch, LangChain, Hugging Face).

Strong understanding of vector databases (e.g., Pinecone, Milvus, Chroma) and knowledge graph tools (e.g., Neo4j, RDF).

Experience with RAG (Retrieval-Augmented Generation) techniques and GraphRAG systems. Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).

Proficiency in programming languages such as Python, Bash, and experience with ML tools and Libraries.

 Experience implementing CI/CD for ML pipelines and working with ML version control systems (e.g., DVC, MLflow).

Experience in on-call incident response in high-uptime environments

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

Site Reliability Engineer - AI & ML

Join an industry leader in Enterprise Technology Management solutions. Their SaaS solution, orchestrates and automates key business processes for IT, with agentless integrations, best practices, and low-code workflows, enabling enterprises to leverage their existing infrastructure systems and automate processes thereby reducing reliance on error-prone manual tasks and tickets. We are...

Galway

Site Reliability Engineer Graduate Considered

Site Reliability Engineer Graduate ConsideredWe are excited to be able to offer this Site Reliability Engineer role working for an industry-leading software company in Cambridge. This company has won several awards and is pioneering in their machine learning technology. Founded 8 years ago, with a team of 150 brilliant engineers,...

RedTech Recruitment Careers Cambridge

Senior Site Reliability Engineer - DevOps

What You'll Do:LM Envision, LogicMonitor's leading hybrid observability platform powered by AI, helps modern enterprises gain operational visibility into and predictability across their IT stacks, so they can continue to deliver extraordinary employee and customer experiences. LogicMonitor has a layered approach to intelligence, where AI and Machine Learning is baked...

LogicMonitor London

AWS Platform Engineer - Windsor/Nottingham

Job Title: AWS Platform Engineer - Windsor / NottinghamLocation: Windsor / Nottingham - Hybrid when requiredSalary/Rate: Up to £500 a day Inside IR35Start Date: January / FebruaryJob Type: 6 Month Contract (With Scope to Extend)Company IntroductionWe are looking for an experienced AWS Platform Engineer to join our client in the...

Windsor

Engineer I

You Lead the Way. Weve Got Your Back.With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other....

AMEX Brighton and Hove

Principal Frontend Engineer

Matillion is The Data Productivity Cloud.We are on a mission to power the data productivity of our customers and the world, by helping teams get data business ready, faster. Our technology allows customers to load, transform, sync and orchestrate their data. We are looking for passionate, high-integrity individuals to help...

Matillion Manchester