Hi, I'm Kante Yin đź‘‹

R&D Engineer • OpenSource Enthusiast • Cat Owner • Sports Fan ⚽️ 🏀 🥊


KubeCon China Shanghai

Session: Sailing Ray workloads with KubeRay and Kueue in Kubernetes

[Slides]

Compute demands for machine learning are growing rapidly nowadays. Ray, a unified computing framework, allows ML engineers to scale their workloads effortlessly without building complex computing infrastructures.

On the other hand, Kubernetes, a popular open-source container orchestration platform, can help to manage a wide range of workloads at ease with KubeRay, an operator for Ray workloads.

At ByteDance, thousands of jobs are submitted to the Ray cluster created by KubeRay daily. With the capability to debug programs on long-running clusters and launch regular jobs through Ray Job custom resources, users benefit from a streamlined workflow.

Meanwhile, efficiently managing concurrent Ray jobs poses challenges such as job starvation and resource allocation. Kueue, a Kubernetes native job queueing system offering capacities like resource management, multi-tenant support, and resource fair-sharing perfectly addresses the Ray job challenges in Kubernetes.

Presented together with @Basasuya from ByteDance.

session

Session: SIG-Scheduling Intro & Deep Dive

[Slides]

Kube-scheduler is a critical component to Kubernetes, responsible for placing the pod to the most suitable node. But how it works, can we customize it for advanced usage, what’s the best practice in large clusters. To answer these progressive questions, we’ll divide this session into two parts. If you’re a newbie to kube-scheduler, you may interest with the Intro part, if you’re a senior one, you can join our Deep Dive.

What’s more, we’ll share with you some ongoing works within the SIG, including the latest progress with the sub-projects.

Presented together with @denkensk from Shopee.

session