In recent weeks, I have come across FerretDB on multiple occasions, and I thought why not just get a closer look on the topic. I took a particular interest in it (FerretDB) as it is a MongoDB implementation, on top of my favourite database PostgreSQL.
While I do have high-level thoughts on how I would go about building MongoDB on top of Postgres, I wanted to confirm, validate, and learn how the FerretDB team has been doing it.
FerretDB (previously MangoDB) was founded to become the de-facto open-source substitute to MongoDB. FerretDB is an open-source proxy, converting the MongoDB 5.0+ wire protocol queries to SQL – using PostgreSQL as a database engine.

At a high level, FerretDB is a proxy, which implements MongoDB Wire Protocol that MongoDB clients speak. After establishing the connection with MongoDB clients, it translates any query sent by MongoDB clients to the SQL queries Postgres understands.
In the recent release(0.5.0) of FerretDB, it is also possible to use it as a library rather than as a proxy. FerretDB as a library helps in reducing one network hop, which leads to better performance. It is only possible for applications that are built in Go since FerretDB is implemented in Go.
Below are some of the tweets from people on this article. If you find this article useful please share and tag me @shekhargulati
There is a possibility that in future FerretDB (in proxy mode) can do intelligent caching and Postgres connection management, relieving clients from that responsibility.
To try out FerretDB you have to follow the steps as mentioned in the documentation. It uses docker-compose
to boot up Postgres and FerretDB proxy. Then, you can connect it with any MongoDB client like mongosh
.
Let’s take a deep dive into how FerretDB works by going over its code. You can get FerretDB code on Github: https://github.com/FerretDB/FerretDB
I wrote this post based on the Git commit id 22f1fdbd0b80bdcaee4ae4486296b5fb7ef54612.
1. Start up
FerretDB like most Go binary applications follows the convention to use cmd
directory to store the its entry point file cmd/ferretdb/main.go. This is the file that is called when you run the FerretDB proxy.
The code below contains only the important parts.
func main() {
// 1
flag.Parse()
//2
ctx, stop := notifyAppTermination(context.Background())
go func() {
<-ctx.Done()
logger.Info("Stopping...")
stop()
}()
//3
h, err := registry.NewHandler(*handlerF, ®istry.NewHandlerOpts{
Ctx: ctx,
Logger: logger,
PostgreSQLURL: *postgreSQLURLF,
TigrisURL: tigrisURL,
})
//4
l := clientconn.NewListener(&clientconn.NewListenerOpts{
ListenAddr: *listenAddrF,
ProxyAddr: *proxyAddrF,
Mode: clientconn.Mode(*modeF),
Handler: h,
Logger: logger,
TestConnTimeout: *testConnTimeoutF,
})
err = l.Run(ctx)
}
- Parses the command-line argument flags using Go’s flag package. The parsed arguments are later used when we create the handler and listener
- Next, the code handles the operating system specific termination events. The go routine will call the
stop
function to release resources associated with it. - Next, it registers the handler that is responsible for implementing the MongoDB interface. In the current implementation, it creates the handle for Postgres. It wraps Postgres connection pool and provides implementation for different MongoDB operations. It is possible in future to have multiple handlers. In theory, you can implement FerretDB on MySQL or Sqlite or write a fake implementation for testing. The handlerF refers to the handler with name
pg
. This is hard code in theinitFlags
function. - Finally, it creates a listener that listens on port
27017
. It waits in an infinite loop for new TPC connections on port 27017.
If you look at logs of the FerretDB server, you will find the log line indicating that FerretDB is listening on 27017
.
2022-07-17T14:11:51.692Z INFO listener clientconn/listener.go:75 Listening on [::]:27017 ...
2. Accept connections
Now that FerretDB has started, let’s look at what happens when a client (like mongosh) connects to FerretDB.
Below is the high level simplified structure of listener.go
run function
func (l *Listener) Run(ctx context.Context) error {
l.listener, _ = net.Listen("tcp", l.opts.ListenAddr)
var wg sync.WaitGroup
//1
for {
netConn, _ := l.listener.Accept()
//2
wg.Add(1)
//3
go func() {
connID := fmt.Sprintf("%s -> %s", netConn.RemoteAddr(), netConn.LocalAddr())
defer func() {
netConn.Close()
wg.Done()
}()
opts := &newConnOpts{
netConn: netConn,
mode: l.opts.Mode,
l: l.opts.Logger.Named("// " + connID + " "), // derive from the original unnamed logger
proxyAddr: l.opts.ProxyAddr,
handler: l.opts.Handler,
connMetrics: l.metrics.connMetrics,
}
conn, e := newConn(opts)
logger.Info("Connection started", zap.String("conn", connID))
//4
e = conn.run(runCtx)
if e == io.EOF {
logger.Info("Connection stopped", zap.String("conn", connID))
} else {
logger.Warn("Connection stopped", zap.String("conn", connID), zap.Error(e))
}
}()
}
// 5
logger.Info("Waiting for all connections to stop...")
wg.Wait()
return ctx.Err()
}
- As discussed in the previous section, FerretDB server creates an infinite loop (for without any arguments) and the first line of the for loop waits for the new connection. This ensures the server keeps running and waits for new connections.
- When the new connection is accepted, WaitGroup is incremented by 1. Waitgroups are used to wait for multiple goroutines to finish.
- The new connection is handled by a goroutine. It creates the connection objection. Connection wraps the Postgres handler that implements MongoDB wire protocol.
- The
conn.run
is where all the magic happens. It also runs an infinite loop and handles queries from the clients. It runs until the client disconnects or fatal error or panic is encountered. - When the for loop breaks it wait for all goroutines to finish and then return the error
If you exit from the mongosh
then you will see Connection stopped
in the logs.
2022-07-17T14:34:58.757Z INFO listener clientconn/listener.go:139 Connection stopped
As I mentioned above conn.run
is where we handle requests from the clients. So, let’s look at the conn.go
run function. The below is the high level simplified structure.
func (c *conn) run(ctx context.Context) (err error) {
//1
bufr := bufio.NewReader(c.netConn)
bufw := bufio.NewWriter(c.netConn)
//2
for {
var reqHeader *wire.MsgHeader
var reqBody wire.MsgBody
//3
reqHeader, reqBody, err = wire.ReadMessage(bufr)
if err != nil {
return
}
c.l.Debugf("Request header: %s", reqHeader)
c.l.Debugf("Request message:\n%s\n\n\n", reqBody)
//4
var resHeader *wire.MsgHeader
var resBody wire.MsgBody
var resCloseConn bool
resHeader, resBody, resCloseConn = c.route(ctx, reqHeader, reqBody)
if resHeader == nil || resBody == nil {
panic("no response to send to client")
}
//5
if err = wire.WriteMessage(bufw, resHeader, resBody); err != nil {
return
}
if err = bufw.Flush(); err != nil {
return
}
if resCloseConn {
err = errors.New("fatal error")
return
}
}
}
- It creates a new reader and writer, wrapping the existing connection
- It runs an infinite loop to build a request response cycle. Client sends the request and FerretDB server connection returns a response
- The client request received in MongoDB wire protocol format is parsed to MsgHeader and MsgBody structs
type MsgHeader struct {
MessageLength int32
RequestID int32
ResponseTo int32
OpCode OpCode
}
type MsgBody interface {
readFrom(*bufio.Reader) error
encoding.BinaryUnmarshaler
encoding.BinaryMarshaler
fmt.Stringer
msgbody() // seal for go-sumtype
}
The header and body are logged as well. Below is the request mongosh sends when it connects with the FerretDB
2022-07-17T14:13:14.497Z DEBUG // 172.21.0.2:43478 -> 172.21.0.3:27017 .// 172.21.0.2:43478 -> 172.21.0.3:27017 clientconn/conn.go:169 Request header: length: 385, id: 1, response_to: 0, opcode: OP_QUERY
2022-07-17T14:13:14.500Z DEBUG // 172.21.0.2:43478 -> 172.21.0.3:27017 .// 172.21.0.2:43478 -> 172.21.0.3:27017 clientconn/conn.go:170 Request message:
{
"Flags": 0,
"FullCollectionName": "admin.$cmd",
"NumberToReturn": -1,
"NumberToSkip": 0,
"Query": {
"$k": [
"ismaster",
"helloOk",
"client",
"compression",
"loadBalanced"
],
"ismaster": true,
"helloOk": true,
// removed client details
"compression": [
"none"
],
"loadBalanced": false
}
}
- Next, the connection
route
method uses the Postgres handler to process the request. The op code provided in the request header helps figure out how to handle the request. Most of the Ops code exceptOP_MSG
are deprecated now. They are present just in case you connect with an older client. You can read more in the MongoDB documentation here. I will explain how commands are processed in the next section. - Finally, it writes the response in the MongoDB wire protocol format.
3. Insert and Find documents
Now, we will come to the meat of the logic. We will understand how inserts and find queries are handled.
To understand this, let’s insert some data in a collection.
use mydb
db.posts.insertOne(
{
title: "Hello, FerretDB",
tags: ["hello", databases],
author : "Shekhar Gulati"
}
)
db.posts.insertOne(
{
title: "Hello, Postgres",
tags: ["hello", databases],
author : "pg"
}
)
db.posts.insertOne(
{
title: "Hello, MongoDB",
tags: ["hello", databases],
author : "mg"
}
)
Now, we have three documents in our posts
collection. Let’s understand how a find query will work.
db.posts.findOne({ author: "Shekhar Gulati" })
The header and the request body received by FerretDB is shown below.
2022-07-17T16:15:59.057Z DEBUG // 172.21.0.2:43616 -> 172.21.0.3:27017 .// 172.21.0.2:43616 -> 172.21.0.3:27017 clientconn/conn.go:169 Request header: length: 107, id: 153, response_to: 0, opcode: OP_MSG
2022-07-17T16:15:59.057Z DEBUG // 172.21.0.2:43616 -> 172.21.0.3:27017 .// 172.21.0.2:43616 -> 172.21.0.3:27017 clientconn/conn.go:170 Request message:
{
"Checksum": 0,
"FlagBits": 0,
"Sections": [
{
"Document": {
"$k": [
"find",
"filter",
"limit",
"$db"
],
"find": "posts",
"filter": {
"$k": [
"author"
],
"author": "Shekhar Gulati"
},
"limit": 1,
"$db": "mydb"
},
"Kind": 0
}
]
}
The header says that the opcode is OP_MSG
. The route function logic corresponding to OP_MSG
is shown below.
func (c *conn) route(ctx context.Context, reqHeader *wire.MsgHeader, reqBody wire.MsgBody) (resHeader *wire.MsgHeader, resBody wire.MsgBody, closeConn bool) {
var command string
var result *string
resHeader = new(wire.MsgHeader)
var err error
switch reqHeader.OpCode {
case wire.OpCodeMsg:
var document *types.Document
msg := reqBody.(*wire.OpMsg)
document, err = msg.Document()
command = document.Command()
if err == nil {
resHeader.OpCode = wire.OpCodeMsg
resBody, err = c.handleOpMsg(ctx, msg, command)
}
// remaining cases
}
b, err := resBody.MarshalBinary()
resHeader.MessageLength = int32(wire.MsgHeaderLen + len(b))
resHeader.RequestID = atomic.AddInt32(&c.lastRequestID, 1)
resHeader.ResponseTo = reqHeader.RequestID
if result == nil {
result = pointer.ToString("ok")
}
return
}
In the OpCodeMsg case, we create the Document from msg. If you look at the request message above, we have a Document object under the Sections object. Although the section is an array, msg.Document()
returns only a single document. The struct Document looks like as shown below:
type Document struct {
m map[string]any
keys []string
}
The keys array is populated with a value of $k
. So, in our case, keys will be find, filter, limit, $db
. As you can see, the keys help you navigate the document object.
The document.Command()
returns the first key. So, in our case it will be find
.
Next, we call c.handleOpMsg
. Its code is shown below.
func (c *conn) handleOpMsg(ctx context.Context, msg *wire.OpMsg, cmd string) (*wire.OpMsg, error) {
if cmd, ok := common.Commands[cmd]; ok {
if cmd.Handler != nil {
return cmd.Handler(c.h, ctx, msg)
}
}
errMsg := fmt.Sprintf("no such command: '%s'", cmd)
return nil, common.NewErrorMsg(common.ErrCommandNotFound, errMsg)
}
command is a map with key as String command name(like find) and value is an object with two fields – help text and Handler that will handle the command.
var Commands = map[string]command{
"buildinfo": {
Help: "Returns a summary of the build information.",
Handler: (handlers.Interface).MsgBuildInfo,
},
"find": {
Help: "Returns documents matched by the query.",
Handler: (handlers.Interface).MsgFind,
},
"findAndModify": {
Help: "Inserts, updates, or deletes, and returns a document matched by the query.",
Handler: (handlers.Interface).MsgFindAndModify,
},
// rest removed..
}
So, common.Commands[cmd]
gives (handlers.Interface).MsgFind
handler.
Next, code calls the cmd.Handler(c.h, ctx, msg)
. This calls the MsgFind
function in the msg_find.go
file.
The signature of MsgFind
is as shown below.
func (h *Handler) MsgFind(ctx context.Context, msg *wire.OpMsg) (*wire.OpMsg, error)
The current implementation of find works by fetching all the documents in FerretDB process memory and then applying the filtering, sorting, and limit clauses. At a high level, this is how it works.
fetchedDocs, err := h.fetch(ctx, sp)
for _, doc := range fetchedDocs {
matches, err := common.FilterDocument(doc, filter)
resDocs = append(resDocs, doc)
}
if err = common.SortDocuments(resDocs, sort); err != nil {
return nil, err
}
if resDocs, err = common.LimitDocuments(resDocs, limit); err != nil {
return nil, err
}
if err = common.ProjectDocuments(resDocs, projection); err != nil {
return nil, err
}
I was surprised to see this implementation. I was expecting the MongoDB query to translate into a SQL query. I asked this in FerretDB discussion forum and received the following response.
That’s correct. See “What’s Changed” section in that release. We will publish a blog post soon explaining that in more detail and how we will change that before 1.0.
Previously, we generated a single SQL query that extensively used json/jsonb PostgreSQL functions for each incoming MongoDB request, then converted fetched data. All the filtering was performed by PostgreSQL. Unfortunately, the semantics of those functions do not match MongoDB behaviour in edge cases like comparison or sorting of different types. That resulted in a difference in behaviour between FerretDB and MongoDB, and that is a problem we wanted to fix.
So starting from this release, we fetch more data from PostgreSQL and perform filtering on the FerretDB side. This allows us to match MongoDB behaviour in all cases. Of course, that also greatly reduces performance. We plan to address it in future releases by pushing down parts of filtering queries that can be made fully compatible with MongoDB. For example, a simple query like
db.collection.find({_id: 'some-id-value'})
can be converted to SQLWHERE
condition relatively easy and be compatible even with weird values like IEEE 754 NaNs, infinities, etc.
The above still does not cover how MongoDB databases and collections are created. They follow the same logic: instead of msg_find.go
, FerretDB calls the msg_create.go
function.
func (h *Handler) MsgCreate(ctx context.Context, msg *wire.OpMsg) (*wire.OpMsg, error) {
document, err := msg.Document()
err = h.pgPool.InTransaction(ctx, func(tx pgx.Tx) error {
if err := pgdb.CreateDatabaseIfNotExists(ctx, tx, db); err != nil {
return lazyerrors.Error(err)
}
if err := pgdb.CreateCollection(ctx, tx, db, collection); err != nil {
if errors.Is(err, pgdb.ErrAlreadyExist) {
msg := fmt.Sprintf("Collection already exists. NS: %s.%s", db, collection)
return common.NewErrorMsg(common.ErrNamespaceExists, msg)
}
}
return nil
})
// removed for clarity
}
As you can see above, the method first creates the database and then creates the collection.
The CreateDatabaseIfNotExists
method creates a Postgres schema by firing the following query.
CREATE SCHEMA IF NOT EXISTS ` + pgx.Identifier{db}.Sanitize() // CREATE SCHEMA IF NOT EXISTS mydb
The CreateCollection
method creates the collection by creating a table in the mydb
schema as shown below
sql := `CREATE TABLE IF NOT EXISTS ` + pgx.Identifier{db, table}.Sanitize() + ` (_jsonb jsonb)`
It creates a table that has only a single column of type jsonb
. So, Postgres also stores data in binary JSON format.
The find implementation will use Postgres JSON operators to query the relevant data.
Conclusion
It was fun to explore FerretDB code. It is still early days for FerretDB, but it could be a possible MongoDB alternative.