How Docker uses cgroups to set resource limits?


Today, I was interested to know how does Docker uses cgroups to set resource limits. In this short post, I will share with you what I learnt.

I will assume that you have a machine on which Docker is installed.

Docker allows you to pass resource limits using the command-line options. Let’s assume that you want to limit the IO read rate to 1mb per second for a container. You can start a new container with the device-read-bps option as shown below

$ docker run -it --device-read-bps /dev/sda:1mb centos

In the above command, we are instantiating a new centos container. We specified device-read-bps option to limit the read rate to 1mb per second for /dev/sda device.

The above command will start the container and you will inside the container shell.

We will create a new file inside the container and then try to read the file. To create a file with random content, we will use dd utility as shown below.

[root@container-id ~]# dd if=/dev/zero of=afile bs=1M count=100

The above will create a file with 100MB size.

Now, let’s try to read afile file.

But before that, we will start the iotop utility on the docker host

$ iotop -o

To do that, we will again use dd utility as shown below.

[root@container-id ~]# dd if=/root/afile of=/dev/null

As you can see below in the iotop screenshot, the disk read speed was close to 1mb per second.

iotop-cgroup

If you do the above in an unconstrained container, you will find that read speed is much higher.

Let’s start a new container without the limits

$ docker run -it centos

Now, again create a file as we did above. This time we will create a file of 5Gb size.

[root@container-id ~]# dd if=/dev/zero of=afile bs=1M count=5000

Next, we will read the file using dd command as we did previously. This time if you look at iotop , you will find that disk read speed is 591.89 Mb per second.

iotop-without-cgroup

How does Docker uses cgroup?

Cgroup is a linux feature to limit, police, and account the resource usage for a set of processes. It provides mechanism to limit and monitor system resources like CPU time, system memory, disk bandwidth, network bandwidth, etc.

The cgroups works by dividing resources into groups and then assigning tasks to those groups.

Docker uses cgroups to limit the system resources.

When you install Docker binary on a linux box like ubuntu it will install cgroup related packages and create subsystem directories. You can list all the subsystems that you can manage using cgroups via the lscgroup command.

$ lscgroup
cpuset:/
cpu:/
cpuacct:/
memory:/
devices:/
freezer:/
blkio:/
perf_event:/
hugetlb:/

If lscgroup is not installed, then you can install it using sudo apt-get install cgroup-bin command.

On Ubuntu, these corresponds to directories inside the /sys/fs/cgroup directory.

$ cd /sys/fs/cgroup/

Once inside the cgroup directory you can list its contents.

$ ls -l
total 0
drwxr-xr-x 2 root root 0 Jan  3 14:50 blkio
drwxr-xr-x 2 root root 0 Jan  3 14:50 cpu
drwxr-xr-x 2 root root 0 Jan  3 14:50 cpuacct
drwxr-xr-x 2 root root 0 Jan  3 14:50 cpuset
drwxr-xr-x 2 root root 0 Jan  3 14:50 devices
drwxr-xr-x 2 root root 0 Jan  3 14:50 freezer
drwxr-xr-x 2 root root 0 Jan  3 14:50 hugetlb
drwxr-xr-x 2 root root 0 Jan  3 14:50 memory
drwxr-xr-x 2 root root 0 Jan  3 14:50 perf_event
drwxr-xr-x 3 root root 0 Jan  3 14:45 systemd

The blkio directory is used to manage block devices. Similarly other directories are used to manage other system resources.

Let’s look inside the contents of blkio directory.

/sys/fs/cgroup/blkio$ ls -l
total 0
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_merged
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_merged_recursive
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_queued
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_queued_recursive
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_service_bytes
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_service_bytes_recursive
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_service_time
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_service_time_recursive
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_serviced
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_serviced_recursive
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_wait_time
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.io_wait_time_recursive
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.leaf_weight
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.leaf_weight_device
--w------- 1 root root 0 Jan  3 14:50 blkio.reset_stats
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.sectors
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.sectors_recursive
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.throttle.io_service_bytes
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.throttle.io_serviced
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.throttle.read_bps_device
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.throttle.read_iops_device
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.throttle.write_bps_device
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.throttle.write_iops_device
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.time
-r--r--r-- 1 root root 0 Jan  3 14:50 blkio.time_recursive
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.weight
-rw-r--r-- 1 root root 0 Jan  3 14:50 blkio.weight_device
-rw-r--r-- 1 root root 0 Jan  3 14:50 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  3 14:50 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  3 14:50 cgroup.procs
-r--r--r-- 1 root root 0 Jan  3 14:50 cgroup.sane_behavior
-rw-r--r-- 1 root root 0 Jan  3 14:50 notify_on_release
-rw-r--r-- 1 root root 0 Jan  3 14:50 release_agent
-rw-r--r-- 1 root root 0 Jan  3 14:50 tasks

The three important file from the above are:

  1. tasks: This contains pids for the tasks attached to this control group
  2. cgroup.procs: This file contain thread group ids which is useful if you have multi threaded application.
  3. cgroup.event_control: This file is used to hook in to notification API.

When you run a new docker container using docker run command then docker will create a new child group under each of the sub systems. The name of the child group will be docker/container_id.

So, when you run a new container using the command shown below

$ docker run -it --device-read-bps /dev/sda:1mb centos

Then, directories will be created for the container. If you list contents of the directory blkio you will notice following

$ ls -l blkio/docker/26dc49635757074a2119039dc74634f72e9eddff41bee9dd8f761d73d3780a5c/
total 0
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_merged
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_merged_recursive
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_queued
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_queued_recursive
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_service_bytes
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_service_bytes_recursive
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_service_time
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_service_time_recursive
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_serviced
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_serviced_recursive
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_wait_time
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.io_wait_time_recursive
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.leaf_weight
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.leaf_weight_device
--w------- 1 root root 0 Jan  3 15:10 blkio.reset_stats
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.sectors
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.sectors_recursive
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.throttle.io_service_bytes
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.throttle.io_serviced
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.throttle.read_bps_device
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.throttle.read_iops_device
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.throttle.write_bps_device
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.throttle.write_iops_device
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.time
-r--r--r-- 1 root root 0 Jan  3 15:10 blkio.time_recursive
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.weight
-rw-r--r-- 1 root root 0 Jan  3 15:10 blkio.weight_device
-rw-r--r-- 1 root root 0 Jan  3 15:10 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  3 15:10 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  3 15:10 cgroup.procs
-rw-r--r-- 1 root root 0 Jan  3 15:10 notify_on_release
-rw-r--r-- 1 root root 0 Jan  3 15:10 tasks

This has the same file structure as the blkio directory.

The two important things to note are:

  1. If you cat the contents of the tasks file then you will notice that it has the process id of the container.
    :/sys/fs/cgroup/blkio/docker/26dc49635757074a2119039dc74634f72e9eddff41bee9dd8f761d73d3780a5c$ cat tasks
    6347
    

    This is the process id of the bash process running inside the container.

    vagrant@vagrant-ubuntu-trusty-64:/sys/fs/cgroup/blkio/docker/26dc49635757074a2119039dc74634f72e9eddff41bee9dd8f761d73d3780a5c$ ps -ef|grep bash
    root      6347  6328  0 15:10 pts/0    00:00:00 /bin/bash
    
  2. There is an entry made to the blkio.throttle.read_bps_device with the read limit on the device.
    $ cat blkio.throttle.read_bps_device
    8:0 1048576
    

The above shows how Docker uses Cgroup to define limits on different resources. The similar happen for other resources like CPU, memory, etc.

Conclusion

In this post, we learn how Docker uses Cgroups to set resource constraints. Docker provides the plumbing and tooling that make it easy for developer to consume advance linux features.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s