MLops Engineer (Training Scalability & Workflow Optimization)
Overview
We are seeking anMLops Engineerto lead the scaling of machine learning training pipelines and ensure the robustness and efficiency of our end-to-end ML workflows. This role focuses on leveragingFlyte,Kubernetes (GPU optimization),Docker, and distributed training frameworks such asRayto optimize and streamline our ML infrastructure.
Responsibilities
- Workflow Orchestration:Develop and maintain ML workflows usingFlyteto manage complex ML pipelines for training, testing, and deployment.
- Training Scalability:Architect and scale large-scale ML training systems onGPU-backed Kubernetes clusters, including auto-scaling and performance tuning for multi-node/multi-GPU workloads.
- Distributed Computing:Implement distributed model training pipelines using frameworks likeRayfor parallelization and resource efficiency.
- Containerization:Design, build, and optimize Docker images for ML workloads with a focus on reproducibility and security.
- Resource Optimization:Debug and optimize GPU utilization, memory, and compute bottlenecks during training and inference phases.
- Monitoring & Maintenance:Integrate monitoring for ML jobs, track resource consumption, and enforce cost-efficient resource utilization.
- Collaboration:Work closely with data scientists and ML engineers to productize and scale ML experiments.
Qualifications
- Strong proficiency withKubernetes(GPU scheduling, Helm, cluster autoscaling).
- Hands-on experience withFlyteor similar workflow orchestration tools (Airflow, Prefect).
- Deep knowledge of distributed ML training (e.g., PyTorch DDP, Ray, Horovod).
- Expertise inDockerand container lifecycle management.
- Solid understanding of GPU hardware/software stack (CUDA, NCCL).
- Familiarity with CI/CD for ML (MLops pipelines using tools like GitHub Actions, ArgoCD).
- Bonus: Familiarity with observability tools for ML systems (Prometheus, Grafana).
Seniority level
- Seniority levelMid-Senior level
Employment type
Job function
- Job functionEngineering and Information Technology
- IndustriesBusiness Consulting and Services, Biotechnology Research, and Engineering Services
Referrals increase your chances of interviewing at Arrayo by 2x
Get notified about new DevOps Engineer jobs inGreater Boston.
Boston, MA $130,000.00-$180,000.00 6 months ago
Boston, MA $80,000.00-$90,000.00 2 days ago
Boston, MA $125,000.00-$178,000.00 1 month ago
Graduate Software Engineer - Up to $110k + BonusFrontend Software Developer- React/Redux
Boston, MA $150,000.00-$175,000.00 2 months ago
Waltham, MA $109,800.00-$118,800.00 1 week ago
Software Engineer: Full-Stack Web Developer
Waltham, MA $109,800.00-$119,800.00 2 days ago
Boston, MA $125,000.00-$178,000.00 4 months ago
Software Developer – Full stack/back-end
Boston, MA $190,000.00-$220,000.00 3 weeks ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr