KubeCon Europe London
Session: Sailing Multi-Host Inference with LWS
[Slides]
Inference workloads are becoming increasingly prevalent and vital in Cloud Native world. However, it’s not easy, one of the biggest challenges is large foundation model can not fit into a single node, which brings out the distributed inference with model parallelism, again, make serving inference workloads more complicated.
Read more...