Today, I watched DockerCon 2017 talk on Container Performance Analysis. Talk is given by Brendan Gregg, Senior Performance Architect at Netflix. In his talk, he shares various linux tools that can help you understand performance of your container platform. It is a great talk for anyone trying to do performance analysis of containers. In one of his slides, he shared 10 tools that he will use to start the investigation.
- uptime to check load averages
- dmesg | tail to check kernel errors
- vmstat 1 to see overall stats by time
- mpstat -P ALL 1 to check CPU balance
- pidstat 1 to check process usage
- iostat -xz 1 to disk I/O
- free -m to check memory usage
- sar -n DEV 1 to check network I/O
- sar -n TCP, ETCP 1 to view TCP stats
- top for overview
The talk goes deeper into how to use different tools to understand performance characteristics of the container platforms.
Below are some of the main points that I wrote down:
- Netflix has their container platform Titus. It does scheduling and container execution. Titus talks to AWS EC2.
- Netflix has more than million containers, running on 25 large EC2 instance
- They uses containers for services, batch, queued worked model
- Namespace limits visibility
- Control group limits what you can use — cpuset, device, memory, block io
- Combination of namespace and cgroup is called container
- Container’s CPU limit = 100% x container’s share/total busy shares .This let’s container use other tenant’s idle CPU (aka bursting) when available
- Container’s minimum CPU limit = 100% x container’s shares/total allocated shares. This is minimum a container will have when all containers are busy doing their job.