A couple of months back I watched a video by Andy Pavlo, Associate Professor of Databases Carnegie Mellon, where he made a point that databases should not use mmap. He went on to say that if there is only one thing you should get from his database course is to never use mmap when building and designing database management systems. I have not used mmap before so I was intrigued to understand it in more detail. I was aware that MongoDB used to use an mmap based storage engine. It allowed them to achieve faster time to market but later they had to replace it with a new storage engine wiredtiger because of the issues they faced with mmap. MongoDB is not the only database that uses mmap. There are many databases that use mmap. Some of the databases that use mmap are RavenDB, ElasticSearch, LevelDB, InfluxDB, LMDB, BoltDB, moss (key-value store from Couchbase), etc.
Given that so many databases use mmap I wanted to understand why Andy recommended us to not use mmap. I will list all of the reasons I could find in my research and from Andy’s video in this post. But, before we do that let’s first understand mmap.
Understanding mmap
Mmap stands for memory-mapped files. It is an OS level feature that memory-maps the file into application address space. It makes a section of the application address space to directly refer to the page caches that contain the file data. This allows an application to read data from the application address space just like an array. If the data happens to be in the cache then the kernel is bypassed and reads & writes are done from the memory. This makes it fast. If you miss the cache then it causes a page fault, prompting the kernel to go to fetch the corresponding data to the disk. This is blocking and synchronous.
mmap takes advantage of file system caches by asking the operating system to map the needed files in virtual memory in order to access that memory directly.
Mmap allows you to read files that are much bigger than the physical memory available to the system. It is achieved using virtual memory. Virtual memory gives programs an illusion that they have more memory available than they do.
Let’s do some quick experimentation with mmap to better understand it.
We will use a Python docker image to create our experimentation playground.
We will start by pulling the latest Python docker image.
docker pull python
Next, we will start a new container giving it only 100MB of memory
docker run -it -m 100m python /bin/bash
Now, we will be inside our docker container where we can start using mmap. But, before we do that let’s create a large file.
You will see a shell prompt like shown below.
root@ec72a7abe3c3:/#
From now on I will use #
to denote container shell prompt.
# fallocate -l 1G large.txt
This will create a file with size 1G.
Now, we will create two programs. One that uses regular file IO and second that uses mmap.
The regular file IO code looks like as shown below. We will call it reader-regular.py
def regular_io(filename):
with open(filename, mode="r", encoding="utf8") as file_obj:
text = file_obj.read()
print(text.find("abc"))
regular_io('large.txt')
We are opening our large file, then reading it, and finally searching for the text abc.
Next, we will create our second program reader-mmap.py
import mmap
def mmap_io(filename):
with open(filename, mode="r", encoding="utf8") as file_obj:
with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as mmap_obj:
print(mmap_obj.find(b"abc"))
mmap_io('large.txt')
The above code uses mmap API to do the same thing.
If we run the reader-regular.py
we will get an error as shown below.
# python reader-regular.py
Killed
When you look at the exit code you will see it is 137. 137 denotes out of memory.
# echo $?
137
Next, we run our mmap code. It successfully returns -1.
# python reader-mmap.py
-1
# echo $?
0
mmap is powerful and makes it easy for database designers to leave memory management to the operating system.
Now that we understand mmap, let’s talk about its issues.
Issues with mmap
I collected below mentioned reasons why to avoid using mmap.
- The programming model offered by mmap is synchronous and blocking. This is easy for programmers but you lose the flexibility to make things asynchronous and parallel. When your application process makes a page fault on an mmap region then you see major IO stalls.
- When you use mmap you let the kernel take the responsibility of maintaining caches for both reads and writes. The kernel decides on the page eviction when memory runs low. This is a good start and if your application can work with it effectively then you should be fine using mmap. But, many times applications have better knowledge about their workloads that they can use to make intelligent decisions that Kernel can’t do for you.
- Windows does not support mmap. It has its own construct called MapViewOfFile.The higher level APIs in languages like Python hide this behind an abstraction. There is one major difference in implementations of memory mapped files as mentioned by Sublime HQ in their post on memory maps. The difference is that Windows keeps a lock on the file, not allowing it to be deleted. As mentioned in their post, they handle it via releasing memory mapped locks on idle. This may or may not be possible depending on your application.
- Good Exception handling is key for any critical application. Mmap code produces SIGBUS signals that are difficult to centralise and complicate error handling code. SIGBUS (bus error) is a signal that happens when you try to access memory that has not been physically mapped. Mmap Error handling is done via global handlers that might interfere with third party libraries.
- There are hard constraints if you try to build apps for 32bit systems. On 32bit systems you are constrained by the address space that can’t be greater than 4GB. So, you might run out of address space if you try to map a large file.
- This is not an issue but I read multiple times during my research that mmap is good for reads but for writes you should avoid writing through it.
Is the usage of mmap in Postgresql that makes RSS and VIRT looks huge ?
top -b -n 1
……
115515 postgres 20 0 17.1g 14.5g 14.5g S 4.3 23.1 4484:13 postgres
126000 postgres 20 0 17.2g 14.9g 14.9g S 4.3 23.7 7427:12 postgres
……