Hi, I'm Kante Yin 👋

R&D Engineer • OpenSource Enthusiast • Cat Owner • Sports Fan ⚽️ 🏀 🥊


KubeCon HongKong - New Pattern for Sailing Multi-Host LLM Inference

[Slides] [Project]

Inference workloads are becoming increasingly prevalent and vital in Cloud Native world. However, it’s not easy, one of the biggest challenges is large foundation model can not fit into a single node, like llama 3.1-405B or DeepSeek R1, which brings out the distributed inference with model parallelism, again, make serving inference workloads more complicated.

Read more...

KubeCon London - Sailing Multi-Host Inference with LWS

[Slides] [Project]

Inference workloads are becoming increasingly prevalent and vital in Cloud Native world. However, it’s not easy, one of the biggest challenges is large foundation model can not fit into a single node, which brings out the distributed inference with model parallelism, again, make serving inference workloads more complicated.

Read more...
Previous Page 2 of 5 Next Page