Understanding Little’s Law


Little’s law states that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system.

{\displaystyle L=\lambda W.}

Where

L = Average number of customers in a stationary system
λ = Average arrival rate in the system
W = Average time a customer spend in the system

In context of an API it means:
L = Average number of concurrent requests system can serve
λ = Average arrival rate of requests in the system
W = Average latency of each request

If an API endpoint takes on average 100ms and the API endpoint is receiving 100 requests per second then the average number of concurrent requests in the system is 10.

L = 100 * (100/1000) = 10

If rate remains constant then latency(W) is directly proportional to concurrent requests (L).

Let’s see if we can see it in action.

We will create a simple Spring Boot Java 17 application by going to start.spring.io
We will only have a single file as shown below.

package com.example.myapi;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.Map;

@SpringBootApplication
public class RateLimitingDemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(RateLimitingDemoApplication.class, args);
    }

}


@RestController
class HomeController {

    @GetMapping
    public Map<String, String> home() throws Exception {
        Thread.sleep(100);
        return Map.of("page", "home");
    }
}

As you can see above if we make a request to http://localhost:8080 we will get the following response. The request will take at least 100ms as we have set sleep of 100ms.

{
  "page": "home"
}

In the calculation above I wrote that our API can service on average 10 concurrent requests when our API is hit with 100 requests per second and average latency is 100 ms.

To show this we will set server.tomcat.threads.max value to 10 and server.tomcat.accept-count=100 in application.properties.

server.tomcat.threads.max: Maximum amount of worker threads.
server.tomcat.accept-count: Maximum queue length for incoming connection requests when all possible request processing threads are in use.

And using that we will see that our API will never serve more than 100 requests per second but latency of the API will start to go up as we put more load on the API.

Restart the server.

We will use a CLI tool called hey to load test our service. If you are on Mac then you can use brew to install hey. For others refer to the project homepage.

brew install hey

We will run hey with different combinations and show that we will always remain under 100 requests per second.

In the first test case we are making 1000 requests(-n) in total. We have 10 workers(-c) and each worker is rate limited to 10 requests per second (-q)

hey -n 1000 -c 10 -q 10 http://localhost:8080/
Summary:
  Total:    10.5063 secs
  Slowest:    0.1072 secs
  Fastest:    0.1005 secs
  Average:    0.1040 secs
  Requests/sec: 95.1808

Latency distribution:
  10% in 0.1014 secs
  25% in 0.1021 secs
  50% in 0.1035 secs
  75% in 0.1050 secs
  90% in 0.1058 secs
  95% in 0.1063 secs
  99% in 0.1066 secs

As you can see above we get 95.1808 requests per second.

{\displaystyle L=\lambda W.}

L = 95.1808 * 0.1040
L = 9.89

The above shows with the current hey load test configuration our system was serving 10 requests at a time. The average client perceived latency 104ms was close to the latency of the API endpoint 100ms.

Let’s run our second test case. We are again making 1000 requests in total. This time we have 20 concurrent workers and 10 queries per second.

 hey -n 1000 -c 20 -q 10 http://localhost:8080/
Summary:
  Total:    10.3901 secs
  Slowest:    0.3104 secs
  Fastest:    0.1019 secs
  Average:    0.2043 secs
  Requests/sec: 96.2455


Latency distribution:
  10% in 0.2020 secs
  25% in 0.2040 secs
  50% in 0.2055 secs
  75% in 0.2071 secs
  90% in 0.2085 secs
  95% in 0.2093 secs
  99% in 0.2160 secs

As you can see above we get 96.2455 requests per second.

{\displaystyle L=\lambda W.}

L = 96.2455 * 0.2043
L = 19.66

The above shows with the current hey load test configuration our system was serving close to 20 requests at a time. The average client perceived latency 205ms was almost double of the latency of the API endpoint 100ms.

We will run our third test case where we set number of concurrent workers to 100 and qps to 100.

hey -n 1000 -c 100 -q 100 http://localhost:8080/
Summary:
  Total:    10.3540 secs
  Slowest:    1.1458 secs
  Fastest:    0.1056 secs
  Average:    0.9862 secs
  Requests/sec: 96.5809

Latency distribution:
  10% in 0.9295 secs
  25% in 1.0287 secs
  50% in 1.0334 secs
  75% in 1.0385 secs
  90% in 1.0419 secs
  95% in 1.0494 secs
  99% in 1.1396 secs

As you can see we are rate limited to 96 rps which is below 100 rps.

{\displaystyle L=\lambda W.}

L = 96.5809 * 0.9862
L = 95.24

The above shows with the current hey load test configuration our system was serving close to 95 requests at a time. The average client perceived latency of 980ms was almost nine times of the latency of the API endpoint 100ms.

Let’s do the final test by setting server.tomcat.threads.max=100.

hey -n 1000 -c 100 -q 100 http://localhost:8080/
Summary:
  Total:    1.0929 secs
  Slowest:    0.1248 secs
  Fastest:    0.1011 secs
  Average:    0.1077 secs
  Requests/sec: 915.0301

Latency distribution:
  10% in 0.1032 secs
  25% in 0.1049 secs
  50% in 0.1068 secs
  75% in 0.1081 secs
  90% in 0.1128 secs
  95% in 0.1205 secs
  99% in 0.1235 secs
{\displaystyle L=\lambda W.}

L = 915.03 * 0.1077
L = 98.5

In this test we are serving close to 100 requests at a time but since the number of worker threads are the same we are able to process requests concurrently and maintain high throughput.

Using little law you can set concurrency limits in your application/service. You can understand how latency will be impacted by load.

In this sample service we were not consuming much CPU as we were just doing Thread.sleep. In a real world your service will be much more involved and you might have to apply little law to sub-components to determine limits of the overall system.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: