ABOUT PROJECT
The project focuses on building and enhancing MLOps infrastructure and pipelines for large-scale autonomous and AI-driven systems. The role involves developing, productizing, and maintaining end-to-end ML workflows, including model training, evaluation, deployment, monitoring, and automation, with a strong emphasis on cloud-native and scalable solutions.REQUIREMENTS
Strong experience in MLOps / ML infrastructure engineering Hands-on experience with GCP services (Vertex AI, GCS, BigQuery, Cloud Run, Workflows) Experience building and maintaining ML training and evaluation pipelines Strong knowledge of Docker and Kubernetes Experience with Kubeflow Experience with CI/CD pipelines and infrastructure automation Proficiency in Python Experience with model registries, metadata management, and REST APIs Experience with monitoring and observability tools (Grafana, TensorBoard, OpenTelemetry, tracing, alerting) Experience with Infrastructure as Code tools (Pulumi or similar) Understanding of data monitoring concepts (data drift, distribution shifts, performance tracking) Experience working in cloud-native, production-grade environmentsRESPONSIBILITIES
Design, develop, and productize model training and evaluation pipelines using Kubeflow, Docker, and GCP services Enhance evaluation pipelines with benchmarking datasets and monitoring tools (Grafana, TensorBoard) Publish and manage models and training metadata in a model registry Integrate model registry and metadata services via REST APIs Implement data and model monitoring, including drift detection and performance tracking Build feedback loops using cloud workflows and batch processing Establish and maintain CI/CD pipelines for ML workflows and infrastructure Automate infrastructure provisioning and management using Pulumi Integrate feature stores, self-service ML endpoints, and notebook-to-pipeline workflows Support advanced ML services such as data onboarding, import/export, and sampling workflows Enable additional capabilities such as LLM fine-tuning, model plug-and-play architectures, and dataset management Ensure reliability, scalability, observability, and operational readiness of MLOps systemsWHAT YOU WILL GET WITH ELEKS
Close cooperation with a customer Challenging tasks Competence development Ability to influence project technologies Team of professionals Dynamic environment with low level of bureaucracyABOUT
ELEKS is a custom software development company. We deliver value to our clients, thanks to our expertise and experience gained from working as a software innovation partner since 1991.
Our 2000+ professionals located in the Delivery Centers across Eastern Europe and sales offices in Europe and North America, provide our clients with a full range of software engineering services. These include product development, QA, R&D, design, technology consulting and dedicated teams.
Benefits
undefined