Hi, I'm Kante Yin 👋

Software Engineer | OpenSource Advocate | Cat Keeper | Sports Enthusiast ⚽️ 🏀 🥊


The Headroom of Optimization

Working on system optimization these days, joined a lot of meetings and talked a lot with the potential customers, also questioned a lot by them. Most of them, or even all of them, show great interest in the optimization solution, who can reject the idea of optimization?

However, it’s really important for us to tell who will be our target customers, what’s the selection criteria. Mostly because we’re a small team and we have very limit time to work with all kinds of problems, although we can since we’re a generic optimization platform.

Then I started to think about the How to calculate the headroom of optimization? Because it defines how great we can be, how many improvements we can bring to the customers, of course how much money we can make.

If we take the HPO (Hyperparameter Optimization) as an example, the search space is defined by the hyperparameters, which can be the number of layers, the learning rate, the batch size, etc. The more hyperparameters we have, the larger the search space becomes. Then the first dimension is the number of impacting factors, which is usually referred to as the Search Space.

For each search space, one interesting thing is some of them have the upper bound, which is usually defined by the physical limits of the system, e.g. the maximum utilization of a cluster. Then it’s really easy to calculate the headroom of the optimization.

The second dimension must be the Scale Size, which decides the great potential for optimization. For example, the Kubernetes scheduling problem, if you’re running a small cluster with only a few nodes, the optimization space is very limited. Even you improved like 10% utilization of a 10-nodes cluster, it equals 1 node, which is not that impressive. But if you’re running a large cluster with thousands of nodes, the same 10% improvement means 100 nodes, which is a huge difference. But yes, if it’s a GPU cluster, 10% improvement is also significant, because GPU is really expensive, which leads to the third dimension, impact weight.

Impact Weight means the per unit impact of the optimization, which can be defined by the cost of the resources, the time saved, or the performance improvement. As mentioned, optimizing a GPU is more valuable than optimizing a CPU.

However, the optimization is not free, it comes with a cost, which is the Cost Factor. From 0 to 60% is usually easy, but from 60% to 80% is hard, and from 80% to 100% is extremely hard. The cost factor can be defined by the time spent, the resources consumed, or the complexity of the optimization algorithm.

So generally a headroom of optimization can be calculated by the formula :

$$ \text{Headroom} \approx \frac{(\text{Search Space}) \times (\text{Scale Size}) \times (\text{Impact Weight})}{\text{Cost Factor}} $$

where:

  • Search Space is the number of impacting factors, e.g. number of hyperparameters. For bounded search space, it’s easy to find the upper limit, like the node utilization in a cluster.
  • Scale Size is the size of the system being optimized, e.g. number of nodes in a cluster.
  • Impact Weight is the value of each unit of improvement, e.g. cost of a GPU vs a CPU.
  • Cost Factor is the cost of the optimization, which constrains the maximum improvement we can achieve, e.g. time spent, resources consumed, or complexity of the algorithm.

Keep rolling with the optimization.