How does FerretDB work?


In recent weeks, I have come across FerretDB on multiple occasions, and I thought why not just get a closer look on the topic. I took a particular interest in it (FerretDB) as it is a MongoDB implementation, on top of my favourite database PostgreSQL.

While I do have high-level thoughts on how I would go about building MongoDB on top of Postgres, I wanted to confirm, validate, and learn how the FerretDB team has been doing it.

FerretDB (previously MangoDB) was founded to become the de-facto open-source substitute to MongoDB. FerretDB is an open-source proxy, converting the MongoDB 5.0+ wire protocol queries to SQL – using PostgreSQL as a database engine.

At a high level, FerretDB is a proxy, which implements MongoDB Wire Protocol that MongoDB clients speak. After establishing the connection with MongoDB clients, it translates any query sent by MongoDB clients to the SQL queries Postgres understands.

In the recent release(0.5.0) of FerretDB, it is also possible to use it as a library rather than as a proxy. FerretDB as a library helps in reducing one network hop, which leads to better performance. It is only possible for applications that are built in Go since FerretDB is implemented in Go.

Below are some of the tweets from people on this article. If you find this article useful please share and tag me @shekhargulati

There is a possibility that in future FerretDB (in proxy mode) can do intelligent caching and Postgres connection management, relieving clients from that responsibility.

To try out FerretDB you have to follow the steps as mentioned in the documentation. It uses docker-compose to boot up Postgres and FerretDB proxy. Then, you can connect it with any MongoDB client like mongosh.

Let’s take a deep dive into how FerretDB works by going over its code. You can get FerretDB code on Github: https://github.com/FerretDB/FerretDB

I wrote this post based on the Git commit id 22f1fdbd0b80bdcaee4ae4486296b5fb7ef54612.

1. Start up

FerretDB like most Go binary applications follows the convention to use cmd directory to store the its entry point file cmd/ferretdb/main.go. This is the file that is called when you run the FerretDB proxy.

The code below contains only the important parts.

func main() {
    // 1
  flag.Parse()

  //2
    ctx, stop := notifyAppTermination(context.Background())
    go func() {
        <-ctx.Done()
        logger.Info("Stopping...")
        stop()
    }()

  //3
    h, err := registry.NewHandler(*handlerF, &registry.NewHandlerOpts{
        Ctx:           ctx,
        Logger:        logger,
        PostgreSQLURL: *postgreSQLURLF,
        TigrisURL:     tigrisURL,
    })

  //4
    l := clientconn.NewListener(&clientconn.NewListenerOpts{
        ListenAddr:      *listenAddrF,
        ProxyAddr:       *proxyAddrF,
        Mode:            clientconn.Mode(*modeF),
        Handler:         h,
        Logger:          logger,
        TestConnTimeout: *testConnTimeoutF,
    })

    err = l.Run(ctx)

}
  1. Parses the command-line argument flags using Go’s flag package. The parsed arguments are later used when we create the handler and listener
  2. Next, the code handles the operating system specific termination events. The go routine will call the stop function to release resources associated with it.
  3. Next, it registers the handler that is responsible for implementing the MongoDB interface. In the current implementation, it creates the handle for Postgres. It wraps Postgres connection pool and provides implementation for different MongoDB operations. It is possible in future to have multiple handlers. In theory, you can implement FerretDB on MySQL or Sqlite or write a fake implementation for testing. The handlerF refers to the handler with namepg. This is hard code in the initFlags function.
  4. Finally, it creates a listener that listens on port 27017. It waits in an infinite loop for new TPC connections on port 27017.

If you look at logs of the FerretDB server, you will find the log line indicating that FerretDB is listening on 27017.

2022-07-17T14:11:51.692Z    INFO    listener    clientconn/listener.go:75   Listening on [::]:27017 ...

2. Accept connections

Now that FerretDB has started, let’s look at what happens when a client (like mongosh) connects to FerretDB.

Below is the high level simplified structure of listener.go run function

func (l *Listener) Run(ctx context.Context) error {

    l.listener, _ = net.Listen("tcp", l.opts.ListenAddr)

    var wg sync.WaitGroup
  //1
    for {
        netConn, _ := l.listener.Accept()

    //2
        wg.Add(1)

        //3
        go func() {
            connID := fmt.Sprintf("%s -> %s", netConn.RemoteAddr(), netConn.LocalAddr())

      defer func() {
                netConn.Close()
                wg.Done()
            }()

            opts := &newConnOpts{
                netConn:     netConn,
                mode:        l.opts.Mode,
                l:           l.opts.Logger.Named("// " + connID + " "), // derive from the original unnamed logger
                proxyAddr:   l.opts.ProxyAddr,
                handler:     l.opts.Handler,
                connMetrics: l.metrics.connMetrics,
            }
            conn, e := newConn(opts)

            logger.Info("Connection started", zap.String("conn", connID))

            //4
            e = conn.run(runCtx)
            if e == io.EOF {
                logger.Info("Connection stopped", zap.String("conn", connID))
            } else {
                logger.Warn("Connection stopped", zap.String("conn", connID), zap.Error(e))
            }
        }()
    }

    // 5 
    logger.Info("Waiting for all connections to stop...")
    wg.Wait()

    return ctx.Err()
}
  1. As discussed in the previous section, FerretDB server creates an infinite loop (for without any arguments) and the first line of the for loop waits for the new connection. This ensures the server keeps running and waits for new connections.
  2. When the new connection is accepted, WaitGroup is incremented by 1. Waitgroups are used to wait for multiple goroutines to finish.
  3. The new connection is handled by a goroutine. It creates the connection objection. Connection wraps the Postgres handler that implements MongoDB wire protocol.
  4. The conn.run is where all the magic happens. It also runs an infinite loop and handles queries from the clients. It runs until the client disconnects or fatal error or panic is encountered.
  5. When the for loop breaks it wait for all goroutines to finish and then return the error

If you exit from the mongosh then you will see Connection stopped in the logs.

2022-07-17T14:34:58.757Z    INFO    listener    clientconn/listener.go:139  Connection stopped

As I mentioned above conn.run is where we handle requests from the clients. So, let’s look at the conn.go run function. The below is the high level simplified structure.

func (c *conn) run(ctx context.Context) (err error) {

    //1
    bufr := bufio.NewReader(c.netConn)
    bufw := bufio.NewWriter(c.netConn)

    //2
    for {
        var reqHeader *wire.MsgHeader
        var reqBody wire.MsgBody

        //3
        reqHeader, reqBody, err = wire.ReadMessage(bufr)
        if err != nil {
            return
        }

        c.l.Debugf("Request header: %s", reqHeader)
        c.l.Debugf("Request message:\n%s\n\n\n", reqBody)

        //4
        var resHeader *wire.MsgHeader
        var resBody wire.MsgBody
        var resCloseConn bool
        resHeader, resBody, resCloseConn = c.route(ctx, reqHeader, reqBody)

        if resHeader == nil || resBody == nil {
            panic("no response to send to client")
        }

        //5
        if err = wire.WriteMessage(bufw, resHeader, resBody); err != nil {
            return
        }

        if err = bufw.Flush(); err != nil {
            return
        }

        if resCloseConn {
            err = errors.New("fatal error")
            return
        }
    }
}
  1. It creates a new reader and writer, wrapping the existing connection
  2. It runs an infinite loop to build a request response cycle. Client sends the request and FerretDB server connection returns a response
  3. The client request received in MongoDB wire protocol format is parsed to MsgHeader and MsgBody structs
   type MsgHeader struct {
       MessageLength int32
       RequestID     int32
       ResponseTo    int32
       OpCode        OpCode
   }
   type MsgBody interface {
       readFrom(*bufio.Reader) error
       encoding.BinaryUnmarshaler
       encoding.BinaryMarshaler
       fmt.Stringer

       msgbody() // seal for go-sumtype
   }

The header and body are logged as well. Below is the request mongosh sends when it connects with the FerretDB

   2022-07-17T14:13:14.497Z    DEBUG   // 172.21.0.2:43478 -> 172.21.0.3:27017 .// 172.21.0.2:43478 -> 172.21.0.3:27017    clientconn/conn.go:169  Request header: length:   385, id:    1, response_to:    0, opcode: OP_QUERY
   2022-07-17T14:13:14.500Z    DEBUG   // 172.21.0.2:43478 -> 172.21.0.3:27017 .// 172.21.0.2:43478 -> 172.21.0.3:27017    clientconn/conn.go:170  Request message:
   {
        "Flags": 0,
        "FullCollectionName": "admin.$cmd",
        "NumberToReturn": -1,
        "NumberToSkip": 0,
        "Query": {
          "$k": [
            "ismaster",
            "helloOk",
            "client",
            "compression",
            "loadBalanced"
          ],
          "ismaster": true,
          "helloOk": true,
          // removed client details
          "compression": [
            "none"
          ],
          "loadBalanced": false
        }
   }
  1. Next, the connection route method uses the Postgres handler to process the request. The op code provided in the request header helps figure out how to handle the request. Most of the Ops code except OP_MSG are deprecated now. They are present just in case you connect with an older client. You can read more in the MongoDB documentation here. I will explain how commands are processed in the next section.
  2. Finally, it writes the response in the MongoDB wire protocol format.

3. Insert and Find documents

Now, we will come to the meat of the logic. We will understand how inserts and find queries are handled.

To understand this, let’s insert some data in a collection.

use mydb
db.posts.insertOne(
 {
   title: "Hello, FerretDB",
   tags: ["hello", databases],
   author : "Shekhar Gulati"
 }
)

db.posts.insertOne(
 {
   title: "Hello, Postgres",
   tags: ["hello", databases],
   author : "pg"
 }
)

db.posts.insertOne(
 {
   title: "Hello, MongoDB",
   tags: ["hello", databases],
   author : "mg"
 }
)

Now, we have three documents in our posts collection. Let’s understand how a find query will work.

db.posts.findOne({ author: "Shekhar Gulati" })

The header and the request body received by FerretDB is shown below.

2022-07-17T16:15:59.057Z    DEBUG   // 172.21.0.2:43616 -> 172.21.0.3:27017 .// 172.21.0.2:43616 -> 172.21.0.3:27017    clientconn/conn.go:169  Request header: length:   107, id:  153, response_to:    0, opcode: OP_MSG
2022-07-17T16:15:59.057Z    DEBUG   // 172.21.0.2:43616 -> 172.21.0.3:27017 .// 172.21.0.2:43616 -> 172.21.0.3:27017    clientconn/conn.go:170  Request message:
{
  "Checksum": 0,
  "FlagBits": 0,
  "Sections": [
    {
      "Document": {
        "$k": [
          "find",
          "filter",
          "limit",
          "$db"
        ],
        "find": "posts",
        "filter": {
          "$k": [
            "author"
          ],
          "author": "Shekhar Gulati"
        },
        "limit": 1,
        "$db": "mydb"
      },
      "Kind": 0
    }
  ]
}

The header says that the opcode is OP_MSG. The route function logic corresponding to OP_MSG is shown below.

func (c *conn) route(ctx context.Context, reqHeader *wire.MsgHeader, reqBody wire.MsgBody) (resHeader *wire.MsgHeader, resBody wire.MsgBody, closeConn bool) { 

    var command string
    var result *string

    resHeader = new(wire.MsgHeader)
    var err error
    switch reqHeader.OpCode {
    case wire.OpCodeMsg:
        var document *types.Document
        msg := reqBody.(*wire.OpMsg)
        document, err = msg.Document()

        command = document.Command()
        if err == nil {
            resHeader.OpCode = wire.OpCodeMsg
            resBody, err = c.handleOpMsg(ctx, msg, command)
        }

    // remaining cases
    }

    b, err := resBody.MarshalBinary()

    resHeader.MessageLength = int32(wire.MsgHeaderLen + len(b))

    resHeader.RequestID = atomic.AddInt32(&c.lastRequestID, 1)
    resHeader.ResponseTo = reqHeader.RequestID

    if result == nil {
        result = pointer.ToString("ok")
    }
    return
}

In the OpCodeMsg case, we create the Document from msg. If you look at the request message above, we have a Document object under the Sections object. Although the section is an array, msg.Document() returns only a single document. The struct Document looks like as shown below:

type Document struct {
    m    map[string]any
    keys []string
}

The keys array is populated with a value of $k. So, in our case, keys will be find, filter, limit, $db. As you can see, the keys help you navigate the document object.

The document.Command() returns the first key. So, in our case it will be find.

Next, we call c.handleOpMsg. Its code is shown below.

func (c *conn) handleOpMsg(ctx context.Context, msg *wire.OpMsg, cmd string) (*wire.OpMsg, error) {
    if cmd, ok := common.Commands[cmd]; ok {
        if cmd.Handler != nil {
            return cmd.Handler(c.h, ctx, msg)
        }
    }

    errMsg := fmt.Sprintf("no such command: '%s'", cmd)
    return nil, common.NewErrorMsg(common.ErrCommandNotFound, errMsg)
}

command is a map with key as String command name(like find) and value is an object with two fields – help text and Handler that will handle the command.

var Commands = map[string]command{
  "buildinfo": {
        Help:    "Returns a summary of the build information.",
        Handler: (handlers.Interface).MsgBuildInfo,
    },
  "find": {
        Help:    "Returns documents matched by the query.",
        Handler: (handlers.Interface).MsgFind,
    },
    "findAndModify": {
        Help:    "Inserts, updates, or deletes, and returns a document matched by the query.",
        Handler: (handlers.Interface).MsgFindAndModify,
    },
  // rest removed..
}

So, common.Commands[cmd] gives (handlers.Interface).MsgFind handler.

Next, code calls the cmd.Handler(c.h, ctx, msg). This calls the MsgFind function in the msg_find.go file.

The signature of MsgFind is as shown below.

func (h *Handler) MsgFind(ctx context.Context, msg *wire.OpMsg) (*wire.OpMsg, error)

The current implementation of find works by fetching all the documents in FerretDB process memory and then applying the filtering, sorting, and limit clauses. At a high level, this is how it works.

fetchedDocs, err := h.fetch(ctx, sp)
for _, doc := range fetchedDocs {
     matches, err := common.FilterDocument(doc, filter)
     resDocs = append(resDocs, doc)
}
if err = common.SortDocuments(resDocs, sort); err != nil {
        return nil, err
    }
    if resDocs, err = common.LimitDocuments(resDocs, limit); err != nil {
        return nil, err
    }
    if err = common.ProjectDocuments(resDocs, projection); err != nil {
        return nil, err
    }

I was surprised to see this implementation. I was expecting the MongoDB query to translate into a SQL query. I asked this in FerretDB discussion forum and received the following response.

That’s correct. See “What’s Changed” section in that release. We will publish a blog post soon explaining that in more detail and how we will change that before 1.0.

Previously, we generated a single SQL query that extensively used json/jsonb PostgreSQL functions for each incoming MongoDB request, then converted fetched data. All the filtering was performed by PostgreSQL. Unfortunately, the semantics of those functions do not match MongoDB behaviour in edge cases like comparison or sorting of different types. That resulted in a difference in behaviour between FerretDB and MongoDB, and that is a problem we wanted to fix.

So starting from this release, we fetch more data from PostgreSQL and perform filtering on the FerretDB side. This allows us to match MongoDB behaviour in all cases. Of course, that also greatly reduces performance. We plan to address it in future releases by pushing down parts of filtering queries that can be made fully compatible with MongoDB. For example, a simple query like db.collection.find({_id: 'some-id-value'}) can be converted to SQL WHERE condition relatively easy and be compatible even with weird values like IEEE 754 NaNs, infinities, etc.

The above still does not cover how MongoDB databases and collections are created. They follow the same logic: instead of msg_find.go, FerretDB calls the msg_create.go function.

func (h *Handler) MsgCreate(ctx context.Context, msg *wire.OpMsg) (*wire.OpMsg, error) {
    document, err := msg.Document()


    err = h.pgPool.InTransaction(ctx, func(tx pgx.Tx) error {
        if err := pgdb.CreateDatabaseIfNotExists(ctx, tx, db); err != nil {
            return lazyerrors.Error(err)
        }

        if err := pgdb.CreateCollection(ctx, tx, db, collection); err != nil {
            if errors.Is(err, pgdb.ErrAlreadyExist) {
                msg := fmt.Sprintf("Collection already exists. NS: %s.%s", db, collection)
                return common.NewErrorMsg(common.ErrNamespaceExists, msg)
            }

        }
        return nil
    })
    // removed for clarity
}

As you can see above, the method first creates the database and then creates the collection.

The CreateDatabaseIfNotExists method creates a Postgres schema by firing the following query.

 CREATE SCHEMA IF NOT EXISTS ` + pgx.Identifier{db}.Sanitize() // CREATE SCHEMA IF NOT EXISTS mydb

The CreateCollection method creates the collection by creating a table in the mydb schema as shown below

sql := `CREATE TABLE IF NOT EXISTS ` + pgx.Identifier{db, table}.Sanitize() + ` (_jsonb jsonb)`

It creates a table that has only a single column of type jsonb. So, Postgres also stores data in binary JSON format.

The find implementation will use Postgres JSON operators to query the relevant data.

Conclusion

It was fun to explore FerretDB code. It is still early days for FerretDB, but it could be a possible MongoDB alternative.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: