Architecture of Open source systems #1: Umami: An open source Google Analytics Alternative

As I have gained experience building software I have realized that most important skill I can build is understanding existing code bases. You can learn about a new technology stack or framework much faster by reading existing code base that uses those technologies and trying to build an existing app in a step by step manner. Today, I wanted to learn how to build a web analytics service like Google Analytics. Google Analytics is the most widely used web analytics service on the web. I found a popular open source project umami.

Umami is a simple, fast, privacy-focused alternative to Google Analytics.

We will first start by understanding the project from outside in and then we will build the backend of umami in a step by step manner.

Tools Used

VS Code
Node.js
DBeaver for creating ER diagram and as database client – Link
Httpie
Git
MySQL

Project Details

Github Repo – https://github.com/mikecao/umami
11,296 stars
1,580 forks
132 contributors
47 releases on Github. Last 15 days ago
1379 commits
The project is 1 year and 10 months old. First commit was made on 17th July 2020
Close to 10,000 lines of modern Javascript code(used Tokei). This includes both backend and frontend
MIT license

Technology Stack of Umami

Node.js 12+. Modern Javascript codebase.
Nextjs as backend web framework for REST APIs
Prisma as ORM
Postgres or MySQL as database
Reactjs as frontend framework

Next.js is a framework to build server-rendered React web applications. It takes building React based web applications to the next level. The main reasons you would want to use Next.js are:

Zero config but you can easily override defaults
Extensible
Server-side rendering
Build both dynamic and static websites with a single framework
Supports all modern browsers
Convention over configuration
Code splitting
SEO Optimized

High level understanding of a web analytics system

At a high level this is how a web analytics system works:

A website administrator creates an account in the web analytics service
A website administrator logs into the system and register their website specifying details like name, URL, industry type, etc
Website analytics service provides a tracking Javascript that website administrator should add to all the pages that need to be tracked
When a user goes visits the web pages with tracking enabled tracking script passes the data to web analytics system
Web analytics system stores the data and provide website administrators self-serving dashboard they can use to track their website analytics

Running Umami Locally

Download Postgres on your machine. On Mac, I use https://postgresapp.com/downloads.html

Once postgres is running on your machine you can set the psql CLI by running following commands on your machine. These work for Mac

sudo mkdir -p /etc/paths.d &&
echo /Applications/Postgres.app/Contents/Versions/latest/bin | sudo tee /etc/paths.d/postgresapp

Close your terminal and open again. Now, psql should be in PATH.

which psql
/Applications/Postgres.app/Contents/Versions/latest/bin/psql

Now, we will apply the database script to create the structure

psql -h localhost -U postgres -d umami -f sql/schema.postgresql.sql

Install the dependencies

npm install

Now, create a new file with name .env in the root of the project add two environment variables

DATABASE_URL=postgresql://postgres:postgres@localhost:5432/umami
HASH_SALT=random_string

Now. build the app

npm run build

Once built you can run the app in dev mode using

npm run dev

The application will start at http://localhost:3000/. You can login into the website using username admin and password umami

Now, you can create a website and the tracking script to your website. Then, you should start seeing data on the dashboard.

Entity Relationship Diagram

The best way to understand the data model for a web application is to look at its ER model. ER diagrams help us understand the entities and their relationships. I created the below using DBeaver.

The ER model shown above can be read as following

An account can have multiple websites associated with it. An account is associated with a website administrator. This makes sense since an account owner might want to view analytics of multiple websites.
For a website there will be multiple sessions.
During a session there will be multiple page views
Also, during a session user will perform actions that will fire events. Example of events include browser click events.
We have addd website_id to both pageview and event table so that by looking at the pageview or event record we know both the website_id and the session_id associated with it

The Postgres schema is mainatined in schema.postgresql.sql schema file.

In the current model it is not possible to have a single website associated with multiple accounts. When I looked at is_admin I thought I could have guest accounts access analytics of my websites. Looking at the model it is not possible. The difference between admin and non-admin user is that admin can create/edit/delete accounts. Non-admin can only add websites to their existing account.
It uses pseudo-type serial for primary keys. You do that by writing

  user_id serial primary key

. By using serial pseudo-type Postgres does following:

Create a sequence object and set the next value generated by sequence as the default value of the column
a not null constraint is added to the id column
Make id column owner of the sequence object. This ensure sequence is deleted when id column is deleted or table is dropped.
Serial pseudo-type uses an int4(4 bytes) type with value in range 1 to 2,147,483,647 (2.147 billion rows). This is more than sufficient for personal web analytics.
For ids that will be public author has used uuid. The website table has both int id as well as uuid website_id. This is same for session table. This is done so that users are not able to guess these ids. All foreign key constraint uses these integer ids. This help save storage cost.
For fields with type timestamp author has used default timezone. Since I am based out of India the timestamp has +0530 added like 2022-04-03 15:03:35.802 +0530
Password is 60 characters because that’s what we need to store bcrypted hash as mentioned in Stackoverflow link

Understanding Important Scenarios

We will cover following scenarios

Scenario 1: Login

The above sequence diagram should explain the flow. The convention I have followed is:

Actor	Backend/Frontend	Source code file
WebsiteAdmin	NA	NA
IndexPage	Frontend	pages/index.js
DashboardPage	Frontend	pages/dashboard/[[..id]].js
AuthVerifyAPI	Backend	pages/api/auth/verify.js
LoginPage	Frontend	pages/login.js
AuthLoginAPI	Backend	pages/api/auth/login.js
BrowserLocalStorage	NA	NA

Frontend code is not interesting in the above login scenario. The two API endpoints /auth/verify and /auth/login does all the interesting work. Let’s look at each of them.

The code for pages/api/auth/verify.js is shown below.

import { useAuth } from 'lib/middleware';
import { ok, unauthorized } from 'lib/response';

export default async (req, res) => {
  await useAuth(req, res);

  if (req.auth) {
    return ok(res, req.auth);
  }

  return unauthorized(res);
};

It uses useAuth custom middleware. The middleware gives you a reusable way to run code before a request is completed. They enable you to modify the response. In our case useAuth middleware checks if the request has a header called authorization. If the authorization header contains a valid JWT token then it calls the next middleware or code resumes if no next middleware is in the chain. Else, if authorization header is not present or JWT token is invalid then HTTP 401 unauthorized response is returned.

The code for pages/api/auth/login.js is also shown below.

import { checkPassword, createSecureToken } from 'lib/crypto';
import { getAccountByUsername } from 'lib/queries';
import { ok, unauthorized, badRequest } from 'lib/response';

export default async (req, res) => {
  const { username, password } = req.body;

  if (!username || !password) {
    return badRequest(res);
  }

  const account = await getAccountByUsername(username);

  if (account && (await checkPassword(password, account.password))) {
    const { user_id, username, is_admin } = account;
    const user = { user_id, username, is_admin };
    const token = await createSecureToken(user);

    return ok(res, { token, user });
  }

  return unauthorized(res);
};

The code shown above does the following:

It first checks if the username and password are present. If they are not present then bad request response is returned
It fetches from the database account with the given username. There is an index on username in the account table
If account exists and password matches the password in the request then a token is created and returned in the response
The reason password is compared in the code rather than in the database is that we save the bcrypt hash in the database. bcrypt hashes are always unique even for same input. bcrypt library is able to compare the request password with the hash password because hash contains the unique salt that was used to create the hash. So that’s why comparison is done in the code rather than in the database.
The createSecureToken generate the JWT token using the salt we provided in the .env file.
Else, If account does not exist unauthorized 401 response is returned

Scenario 2: Enabling analytics for a website

In this scenario we will cover how we can add analytics to a web site. We will create an index.html and render it via a Python static server. This will be the website we want to enable analytics for.

Create a new folder with name blog and create an index.html as shown below.

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Index</title>
</head>

<body>

    <h1>My Blogs</h1>
    <ul>
        <li><a href="/blog1.html">Blog 1</a></li>
        <li><a href="/blog2.html">Blog 2</a></li>
    </ul>
</body>

</html>

blog1.html is shown below.

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Blog 1</title>
</head>

<body>
    <h1>Blog 1</h1>
</body>

</html>

blog2.html is shown below

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Blog 2</title>
</head>

<body>
    <h1>Blog 2</h1>
</body>

</html>

You can run your static server using following command.

python3 -m http.server

The website will be running at http://localhost:8000/

Let’s now talk about how we will add analytics to our website. We will again start by creating a sequence diagram.

Assumption: You are already logged into the website

The above sequence diagram should explain the flow. The convention I have followed is:

Actor	Backend/Frontend	Source code file
WebsiteAdmin	NA	NA
SettingsPage	Frontend	pages/settings/index.js
WebsiteAPI	Backend	pages/api/website/index.js
MyBlog	Another web application	NA
User1	NA	NA
CollectAPI	Backend	pages/api/collect.js

Let’s try to understand how the above sequence diagram work with respect to the code.

We will start by looking at adding website. When website admin provide the details to onboard a website then it calls the /api/website endpoint passing in name and domain name as shown below.

The POST /api/website does not do much. It just takes the request data and creates a record in the website table.

import { updateWebsite, createWebsite, getWebsiteById } from 'lib/queries';
import { useAuth } from 'lib/middleware';
import { uuid, getRandomChars } from 'lib/crypto';
import { ok, unauthorized, methodNotAllowed } from 'lib/response';

export default async (req, res) => {
  await useAuth(req, res);

  const { user_id, is_admin } = req.auth;
  const { website_id, enable_share_url } = req.body;

  if (req.method === 'POST') {
    const { name, domain } = req.body;

    if (website_id) {
      // handle update

      return ok(res);
    } else {
// handle create
      const website_uuid = uuid();
      const share_id = enable_share_url ? getRandomChars(8) : null;
      const website = await createWebsite(user_id, { website_uuid, name, domain, share_id });

      return ok(res, website);
    }
  }

  return methodNotAllowed(res);
};

The interesting part in the above code is createWebsite method defined in lib/queries. This project uses Prisma , a typesafe ORM wrapper. The createWebsite does the following

prisma.website.create({
      data: {
        account: {
          connect: {
            user_id,
          },
        },
        ...data,
      },
    }),

For me the interesting part was how relationship between website and account was handled using the connect.

Now, let’s look at how tracking code is generated. It is generated at the client side using the website UUID.

<script async defer data-website-id="08042b26-2745-4ece-9ff4-bb91013b7cc5" src="<http://localhost:3000/umami.js>"></script>

When you add the script above to your website it will expect a Javascript file umami.js available at http://localhost:3000/umami.js. This is the tracker file. It resides in the tracker/index.js

We will now understand how tracking works. We will assume that you have added tracking script to your website.

After the page loads this script is executed. It captures the important information about the page and device/browser like language, screen size, hostname, etc.
If tracking is enabled then it will call the trackView method
trackView method will call the /api/collect endpoint passing in the payload that includes website id, url, and referrer.

Step by step creating the umami backend

Step 1: Create Next.js project

npx create-next-app@latest

You will be asked project name.

Step 2: Register New Account REST API

Create a folder named accounts inside api directory and create an index.js file.

import { hashPassword } from 'lib/crypto';
import { badRequest, created } from 'lib/response'
import { getAccountByUsername, register } from 'services/acccount-service';

export default async function handler(req, res) {

    if (req.method === 'POST') {
        const { user_id, username, password, is_admin } = req.body;

        if (user_id) {
            // update user
        } else {
            // create user
            const existingUser = await getAccountByUsername(username);
            if (existingUser) {
                return badRequest(res, { message: `Account already exists with username '${username}'` });
            }

            const saved = await register({ username, password: hashPassword(password), is_admin: is_admin });
            return created(res, { userId: saved.id, username: saved.username, is_admin: saved.is_admin });
        }
    }
}

Avoiding relative imports
Create a file named jsconfig.json in the root of the project and add following as mentioned in the following link.

{
    "compilerOptions": {
        "baseUrl": "."
    }
}

If you see an error in the VS Code you should set typescript.tsdk in VS Code as covered in this StackOverflow question.

"typescript.tsdk": "./node_modules/typescript/lib"

Now, we will create lib/crypto.js module.

import bcrypt from 'bcryptjs';

const SALT_LENGTH = 10;

export function hashPassword(password) {
    return bcrypt.hashSync(password, SALT_LENGTH);
}

There are two bcrypt libraries – bcrypt and bcryptjs. bcrypt is Node binding over a C++ library whereas brcryptjs is a complete Javascript library. You can read more about their comparison here.

Next, we wil create lib/response.js

export function created(res, data = {}) {
    return res.status(201).json(data);
}

export function ok(res, data = {}) {
    return res.status(200).json(data);
}

export function badRequest(res, errObj = { 'message': '400 bad request' }) {
    return res.status(400).json(errObj);
}

Finally, we will create services/acccount-service.js. We will be using Prisma to work with the database.

Let’s start by creating a MySQL database

mysql -U username -p

mysql> create database appdb;

Create a folder called prisma in the root of the app directory. Inside that directory create a file named schema.prisma.

generator client {
    provider = "prisma-client-js"
}

datasource db {
    provider = "mysql"
    url      = env("DATABASE_URL")
}

model Account {
    id         Int       @id @default(autoincrement()) @db.UnsignedInt
    username   String    @unique @db.VarChar(255)
    password   String    @db.VarChar(60)
    is_admin   Boolean   @default(false)
    created_at DateTime? @default(now()) @db.Timestamp(0)
    updated_at DateTime? @default(now()) @db.Timestamp(0)
}

In the above code snippet:

Password is 60 characters because that’s what we need to store bcrypted hash as mentioned in this link

Create a .env file with following content.

DATABASE_URL=mysql://<username>:<password>@localhost:3306/<db_name>

Please update username, password, and db_name.

Generate migration script.

npx prisma migrate dev --name init

Next, you will generate PrismaClient.

npx prisma generate

This will generate the PrismaClient that we will use to connect to the database and perform operations.

Create a new file services/account-service.js and paste following content.

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

export async function getAccountByUsername(username) {
    return prisma.account.findUnique({
        where: {
            username,
        },
    }).catch(e => { throw e });
}

export async function register(data) {
    return prisma.account.create(
        {
            data,
        }
    ).catch(e => { throw e });
}

Now, run the application using

npm run dev

We will use Httpie client to quickly test our API. On mac you can install this by running brew install httpie command.

http POST localhost:3000/api/account username=user1 password=password is_admin:=true

Please note that we have used is_admin: rather that is_admin to ensure we send Boolean value. Httpie by default assumes that fields are of String types.

The response received from the API

HTTP/1.1 201 Created
Connection: keep-alive
Content-Length: 47
Content-Type: application/json; charset=utf-8
Date: Sat, 06 Nov 2021 14:01:47 GMT
ETag: "2f-VWAVhXweZYKrSz5nfVHx4cMaq6U"
Keep-Alive: timeout=5
Vary: Accept-Encoding

{
    "is_admin": true,
    "userId": 5,
    "username": "user1"
}

If you try to create another account with the same name you will get error response.

http POST localhost:3000/api/account username=user1 password=password is_admin:=true

HTTP/1.1 400 Bad Request
Connection: keep-alive
Content-Length: 58
Content-Type: application/json; charset=utf-8
Date: Sat, 06 Nov 2021 14:06:04 GMT
ETag: "3a-SoLZQJLYpR81d++M61yEHFFNOBM"
Keep-Alive: timeout=5
Vary: Accept-Encoding

{
    "message": "Account already exists with username 'user1'"
}

Step 3: Login API

We will start by creating a new API endpoint for login. Create a new file auth/login.js under the api directory.

import { serialize } from "cookie";
import { checkPassword, createSecureToken } from "lib/crypto";
import { badRequest, ok, unauthorized } from "lib/response";
import { getAccountByUsername } from "services/acccount-service";

export default async function handler(req, res) {
    const { username, password } = req.body;

    if (!username || !password) {
        return badRequest(res);
    }
    const account = await getAccountByUsername(username);
    if (!account) {
        return badRequest(res, { message: 'Invalid credentials' });
    }
    const authenticated = checkPassword(password, account.password);

    if (!authenticated) {
        return unauthorized(res);
    }
    const { id: account_id, is_admin } = account;
    const token = await createSecureToken({ account_id, username, is_admin });
    console.log(`Generated token ${token}`);
    const cookie = serialize('AUTH', token, {
        path: '/',
        httpOnly: true,
        sameSite: true,
        maxAge: 1 * 24 * 60 * 60 // 1 day
    });
    res.setHeader('Set-Cookie', [cookie]);
    return ok(res, { token });

}

We first check if username and password are present if not we return bad request.
Then, we check if user exists with the username
Next, we compare password in teh request with saved hashed password. We will look at checkPassword function later
If passwored don’t match then we return unauthorized error response
We create a JWT token, set the token in cookie, set cookie in the response, and finally return success response

Let’s look at the checkPassword and createSecureToken functions of crypto.js

export function checkPassword(requestPassword, hashedPassword) {
    return bcrypt.compareSync(requestPassword, hashedPassword);
}

The createSecureToken generate a JWT token. We will need a library first.

npm i jose

import { EncryptJWT } from 'jose';

const SECRET = process.env.HASH_SALT || "secret_string";
const KEY = crypto.pbkdf2Sync(SECRET, "salt", 2000, 32, "sha512");

export async function createSecureToken(payload) {
    return new EncryptJWT(payload)
        .setProtectedHeader({ alg: "dir", enc: "A256GCM" })
        .setIssuedAt()
        .setExpirationTime("2h")
        .encrypt(KEY);
}

Now, we can test our login end point using Httpie client

http POST localhost:3000/api/auth/login username=user1 password=password

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 207
Content-Type: application/json; charset=utf-8
Date: Sat, 06 Nov 2021 17:49:56 GMT
ETag: "cf-LWJENPiR7Oy+NvjHoGFROEKXAj4"
Keep-Alive: timeout=5
Set-Cookie: AUTH=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIn0..kNZBfR6XM8hmk_tG.V9302tH7riI-NN9NJiIaFipI1SZBGQlQ_APF-X1uWH5Cb_HOmyFC5aMqR9oXP5FTnyYsxWuhO9CbVMd_SoIPDDtkX0IqvLxK5IdqJIXQ9ToOI9T2aA.TzFI8pVb5elYW9PYeyv-Mw; Max-Age=86400; Path=/; HttpOnly; SameSite=Strict
Vary: Accept-Encoding

{
    "token": "eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIn0..kNZBfR6XM8hmk_tG.V9302tH7riI-NN9NJiIaFipI1SZBGQlQ_APF-X1uWH5Cb_HOmyFC5aMqR9oXP5FTnyYsxWuhO9CbVMd_SoIPDDtkX0IqvLxK5IdqJIXQ9ToOI9T2aA.TzFI8pVb5elYW9PYeyv-Mw"
}

Now, that we have the token let’s use it. We will create a protected endpoint /api/ping that first checks token and then return the success response.

Create a new API endpoint api/ping/index.js

import { parse } from 'cookie';
import { parseSecureToken } from 'lib/crypto';
import { ok, unauthorized } from 'lib/response';

export default async function handle(req, res) {
    const token = req.headers.authorization
        ? req.headers.authorization.replace('Bearer ', '')
        : parse(req.headers.cookie || '')['AUTH'];

    if (!token) {
        return unauthorized(res);
    }

    const payload = await parseSecureToken(token);

    if (!payload) {
        return unauthorized(res);
    }
    return ok(res, { "ping": `pong to ${payload.username}` });
}

In the code shown above:

We first find the token in either the authorization header or cookie
If token is empty then we return unauthorized error
Next, we parse the token to get the payload
If payload is falsy then we return unauthorized error
Else we return the success response by building it using the username in the payload

Let’s look at the parseSecureToken function of crypto.js

import { EncryptJWT, jwtDecrypt } from 'jose';

const KEY = crypto.pbkdf2Sync(SECRET, "salt", 2000, 32, "sha512");

export async function parseSecureToken(token) {
    try {
        const { payload } = await jwtDecrypt(token, KEY)
        return payload;
    } catch {
        return null;
    }
}

Let;s test the ping endpoint first without passing the authorization header or cookie

http :3000/api/ping

HTTP/1.1 401 UnauthorizedConnection: keep-aliveDate: Sat, 06 Nov 2021 18:00:01 GMT
Keep-Alive: timeout=5
Transfer-Encoding: chunked

401 Unauthorized

As you can see above we get 401 Unauthorized response.

Let’s now test the success scenario

http localhost:3000/api/ping Authorization:'Bearer eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIn0..kNZBfR6XM8hmk_tG.V9302tH7riI-NN9NJiIaFipI1SZBGQlQ_APF-X1uWH5Cb_HOmyFC5aMqR9oXP5FTnyYsxWuhO9CbVMd_SoIPDDtkX0IqvLxK5IdqJIXQ9ToOI9T2aA.TzFI8pVb5elYW9PYeyv-Mw'

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 24
Content-Type: application/json; charset=utf-8
Date: Sat, 06 Nov 2021 18:01:19 GMT
ETag: "18-x8cMMbTgdl4eIxKza1nqKfC0Bwg"
Keep-Alive: timeout=5
Vary: Accept-Encoding

{
    "ping": "pong to user1"
}

As you above we get the successful response.

Similary we can test using the cookie

http localhost:3000/api/ping 'Cookie:AUTH=$TOKEN'

Step 4 : Using lowercase plural table names

I prefer to use table names that are all lower case and are plural in nature. One way to do that is to have our model use the same name. Prisma convention is to use to Singular form PascalCase for models. We can achieve both by using the @@map annotation of Prisma as shown below.

model Account {
    id         Int       @id @default(autoincrement()) @db.UnsignedInt
    username   String    @unique @db.VarChar(255)
    password   String    @db.VarChar(60)
    is_admin   Boolean   @default(false)
    created_at DateTime? @default(now()) @db.Timestamp(0)
    updated_at DateTime? @default(now()) @db.Timestamp(0)

    @@map("accounts")
}

You will have to run the migration and generate client again. The database migraton will drop the tables. So, keep in mind this is a destructive option.

npx prisma migrate dev --name v2

npx prisma generate

Now, your table names will use your convention.

Step 5: Add website to account

Now, we will write an API to add website to an account.

We will start by updating our data model.

model Account {
    // removed for brevity
    websites   Website[]

    @@map("accounts")
}

model Website {
    id           Int       @id @default(autoincrement()) @db.UnsignedInt
    website_uuid String    @unique @db.VarChar(36)
    name         String    @db.VarChar(100)
    domain       String    @db.VarChar(500)
    account_id   Int       @db.UnsignedInt
    account      Account   @relation(fields: [account_id], references: [id])
    created_at   DateTime? @default(now()) @db.Timestamp(0)

    @@index([account_id], name: "website_account_id_idx")
    @@map("websites")
}

Next, we will create our API endpoint. Create a new file website/index.js.

import { parse } from 'cookie';
import { created, methodNotAllowed, ok } from "lib/response";
import { uuid, parseSecureToken } from "lib/crypto";
import { createWebsite } from "services/website-service";

export default async function handle(req, res) {
    const token = req.headers.authorization
        ? req.headers.authorization.replace('Bearer ', '')
        : parse(req.headers.cookie || '')['AUTH'];

    if (!token) {
        return unauthorized(res);
    }

    const payload = await parseSecureToken(token);
    const { account_id } = payload;
    if (req.method !== 'POST') {
        return methodNotAllowed(res);
    }
    // add website to account
    const website_uuid = uuid();
    const { name, domain } = req.body;
    const website = await createWebsite(account_id, { website_uuid, name, domain });
    return created(res, website);
}

We first check the token exists. If it does we parse the token to get the payload
If the method is not POST we return with error response
Next, we create a new website obejct and save it to database
Finally, we return success response

Now, let’s look at the uuid and createWebsite methods.

The uuid method in the lib/crypto is defined below. It just wraps the uuid library.

import { v4 } from 'uuid';

export function uuid() {
    return v4();
}

You will need to install uuid package

npm i uuid

Next, create a new file services/website-service.js.

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

export async function createWebsite(account_id, data) {
    return prisma.website.create({
        data: {
            account: {
                connect: {
                    id: account_id
                },
            },
            ...data
        },
    })
}

We first created an instance of PrismaClient
Next, we used the prisma typesafe API to create website. Please look how we defined the connection as well

Let’s test the API using Httpie client

http POST :3000/api/website Authorization:'Bearer <token>' name='Shekhar Gulati Blog' domain='shekhargulati.com'

HTTP/1.1 201 CREATED

{
    "account_id": 1,
    "created_at": "2021-11-07T10:45:00.000Z",
    "domain": "shekhargulati.com",
    "id": 1,
    "name": "Shekhar Gulati Blog",
    "website_uuid": "7d1ed8df-8456-4817-92eb-8868f6754afe"
}

Step 6: Creating Middleware to reduce duplication

Link

Create a new file lib/middleware.js

import { parseSecureToken } from "lib/crypto";
import { parse } from 'cookie';
import { unauthorized } from "./response";

function runMiddleware(req, res, fn) {
    return new Promise((resolve, reject) => {
        fn(req, res, (result) => {
            if (result instanceof Error) {
                return reject(result)
            }

            return resolve(result)
        })
    })
}

export async function checkUserAuthentication(req, res) {
    const fn = async (req, res, next) => {
        const token = req.headers.authorization
            ? req.headers.authorization.replace('Bearer ', '')
            : parse(req.headers.cookie || '')['UPTICS.AUTH'];

        if (!token) {
            return unauthorized(res);
        }

        const payload = await parseSecureToken(token);
        req.payload = payload;
        next();
    }
    await runMiddleware(req, res, fn);
}

Next, update the website/index.js

import { checkUserAuthentication } from 'lib/middleware';

export default async function handle(req, res) {
    await checkUserAuthentication(req, res);
    const { account_id } = req.payload;
    // rest is same

}

You can again test the API

http POST :3000/api/website Authorization:'Bearer <token>' name='Shekhar Gulati Blog' domain='shekhargulati.com'

HTTP/1.1 201 CREATED

Step 7: Read, Update, and Delete API for Website

Create a file named website/[id]/index.js. It will have all the three methods.

import { checkUserAuthentication } from "lib/middleware";
import { methodNotAllowed, notFound, ok } from "lib/response";
import { deleteWebsite, getWebsiteByIdAndAccountId, updateWebsite } from "services/website-service";

export default async function handle(req, res) {
    await checkUserAuthentication(req, res);
    const { account_id, is_admin } = req.payload;
    const { id } = req.query;
    const websiteId = +id;
    const website = await getWebsiteByIdAndAccountId(websiteId, account_id, is_admin);
    if (!website) {
        return notFound(res);
    }
    if (req.method === 'GET') {
        return ok(res, website);
    } else if (req.method === 'PUT') {
        const { name, domain } = req.body;
        await updateWebsite(websiteId, { name, domain });
        return ok(res);
    } else if (req.method === 'DELETE') {
        await deleteWebsite(websiteId);
    }
    return methodNotAllowed(res);
}

We first used our middleware function to check if the user is authenticated
Next, we assign account specific information to variables
Next, we get id from query and convert it to numeric value – Link
Then, we fetch webiste for the given website id and logged in user id. We are also passing is_admin. This is because if the user is admin then we return the website for the websiteid. We don’t check if logged in user is the owner.
Next,
if the request method is GET then we just return the website in the response
Else if the request method is PUT we update the website object
Else if the request method is DELETE we delete the website
else we return method not allowed error response

Let’s now look at getWebsiteByIdAndAccountId, updateWebsite, and deleteWebsite methods we defined in website-service.js file.

export async function getWebsiteByIdAndAccountId(website_id, account_id, is_admin) {
    if (is_admin) {
        return prisma.website.findUnique({
            where: {
                id: website_id
            }
        }).catch(e => { throw e });
    }
    return prisma.website.findFirst({
        where: {
            AND: [
                {
                    id: {
                        equals: website_id,
                    }
                },
                {
                    account_id: {
                        equals: account_id
                    }
                }
            ]
        }
    }).catch(e => { throw e });
}

export async function updateWebsite(website_id, data) {
    return prisma.website.update({
        where: {
            website_id,
        },
        data,
    }).catch(e => { throw e });
}

export async function deleteWebsite(website_id) {
    return prisma.website.delete({
        where: {
            website_id
        }
    }).catch(e => { throw e });
}

All the methods are self-explanatory.

We can test our APIs using HTTPie tool.

Create two accounts for user1 and user2. Make user1 admin and user2 non-admin
Login to user1 and user2 and save their tokens
Create two websites each for user1 and user2 passing in their respective tokens
user1 will be able to get, update, and delete both websites since it is an admin
user2 will able to get. update, and delete its own website only

Step 8: Building the tracker script to track pagviews

Create a new file tracker/index.js

function startTracking(window) {
    const {
        screen: { width, height },
        navigator: { language },
        location: { hostname, pathname, search },
        document,
    } = window;

    const script = document.querySelector('script[data-website-id]');
    if (!script) return;

    const attr = key => script && script.getAttribute(key);
    const website = attr('data-website-id');

    const screen = `${width}x${height}`;
    let currentUrl = `${pathname}${search}`;
    let currentRef = document.referrer;

    const post = (url, data, callback) => {
        const req = new XMLHttpRequest();
        req.open('POST', url, true);
        req.setRequestHeader('Content-Type', 'application/json');

        req.onreadystatechange = () => {
            if (req.readyState === 4) {
                callback && callback(req.response);
            }
        };

        req.send(JSON.stringify(data));
    };

    const root = script.src.split('/').slice(0, -1).join('/');

    const collect = (type, params, uuid) => {
        const payload = {
            website: uuid,
            hostname,
            screen,
            language,
        };

        if (params) {
            Object.keys(params).forEach(key => {
                payload[key] = params[key];
            });
        }

        post(
            `${root}/api/collect`,
            {
                type,
                payload,
            },
            res => { console.log("Received response", res) },
        );
    };

    const trackView = (url = currentUrl, referrer = currentRef, uuid = website) =>
        collect(
            'pageview',
            {
                url,
                referrer,
            },
            uuid,
        );

    trackView(currentUrl, currentRef);
}

startTracking(window);

We will use rollup to build the executable script.

First, we will have to install rollup and a couple of plugins we will use.

npm install rollup --save-dev
npm install @rollup/plugin-buble --save-dev
npm install @rollup/plugin-node-resolve --save-dev
npm install rollup-plugin-terser --save-dev
npm i dotenv

Now, we will create a new file rollup.tracker.config.js in the project root.

import 'dotenv/config';
import buble from '@rollup/plugin-buble';
import resolve from '@rollup/plugin-node-resolve';
import { terser } from 'rollup-plugin-terser';

export default {
    input: 'tracker/index.js',
    output: {
        file: 'public/uptics.js',
        format: 'iife',
    },
    plugins: [resolve(), buble({ objectAssign: true }), terser({ compress: { evaluate: false } })],
};

We will add a npm task to build the tracker executable function.

 "scripts": {
        // rest of the scripts
    "build-tracker": "rollup -c rollup.tracker.config.js",
    "start": "next start",
    "lint": "next lint"
  },

Now, we can build the tracker using teh following command.

npm run build-tracker

To test the tracker we can create a simple static webste. Create a new folder called blog somewhere on your disk. Create an index.html with following text

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My Blog</title>
</head>

<body>
    <h1>My Blog</h1>
    <li>
        <a href="/blog1.html">Blog 1</a>
    </li>
    <li>
        <a href="/blog2.html">Blog 2</a>
    </li>
    <li>
        <a href="/blog3.html">Blog 3</a>
    </li>
</body>

<script async defer data-website-id="a3c87f3d-2219-4440-92e2-39d618011ccc"
    src="http://localhost:3000/uptics.js"></script>

</html>

You can similarly create blog1.html, blog2.html, and blog3.html.

You can run this blog by running a simple http server. If you have Python installed then you can use following:

python3 -m http.server

It will start the server on http://localhost:8000

You will see CORS error as shown in the below error message

Access to XMLHttpRequest at 'http://localhost:3000/api/collect' from origin 'http://localhost:8000' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

Please note /api/collect does not exist yet.

Let’s create the collect endpoint.

Create a new file collect.js in the pages/api directory.

import { enableCors } from "lib/middleware";
import { ok } from "lib/response";

export default async function handle(req, res) {
    await enableCors(req, res);
    console.log("Received request", req.body.payload);
    return ok(res);
}

Let’s look at enableCors method in the middleware.js

import Cors from 'cors';

export async function enableCors(req, res) {
    await runMiddleware(req, res, Cors({
        methods: ['GET', 'POST', 'OPTIONS'],
    }));
}

We will have to install cors package as shown below.

npm i cors

Now, when you make the request you will see success response and in your server you will see payload as shown below.

Received request {
  website: 'a3c87f3d-2219-4440-92e2-39d618011ccc',
  hostname: 'localhost',
  screen: '1792x1120',
  language: 'en-US',
  url: '/',
  referrer: ''
}

Step 9: Session and Page view

Now, we have reached the point where we can start collecting page views for websites that are using our tracker.

We created a basic collect.js in the previous step. We will now extend it.

import { enableCors } from "lib/middleware";
import { ok } from "lib/response";

export default async function handle(req, res) {
    await enableCors(req, res);
    console.log("Received request", req.body.payload);
    return ok(res);
}

First thing we need to do is to detect whether the request received is from bot. If yes we don’t record the visit.

We will use isbot package to detect the bot.

npm i isbot

Below we modified the collect.js file to use isbot package.

import { enableCors } from "lib/middleware";
import { ok } from "lib/response";
import isbot from 'isbot';

export default async function handle(req, res) {
    await enableCors(req, res);

    if (isbot(req.headers['user-agent'])) {
        return ok(res);
    }
    console.log("Received request", req.body.payload);

    //create a new session or associate with the current session user session

    // if the type of event is pageview then we save the pageview with the session

    return ok(res);
}

Also, I have added a couple of comments to guide us what we need to do next.

We need to introduce two concepts – Session and PageView.

Below is the schema now

model Website {
        // same as before
      session      Session[]

    PageView PageView[]
    @@index([account_id], name: "website_account_id_idx")
    @@map("websites")
}

model Session {
    id           Int        @id @default(autoincrement()) @db.UnsignedInt
    session_uuid String     @unique @db.VarChar(36)
    website_id   Int        @db.UnsignedInt
    created_at   DateTime?  @default(now()) @db.Timestamp(0)
    hostname     String?    @db.VarChar(100)
    browser      String?    @db.VarChar(20)
    os           String?    @db.VarChar(20)
    device       String?    @db.VarChar(20)
    screen       String?    @db.VarChar(11)
    language     String?    @db.VarChar(35)
    country      String?    @db.VarChar(2)
    website      Website    @relation(fields: [website_id], references: [id])
    pageview     PageView[]

    @@index([created_at], name: "session_created_at_idx")
    @@index([website_id], name: "session_website_id_idx")
    @@map("sessions")
}

model PageView {
    id         Int       @id @default(autoincrement()) @db.UnsignedInt
    website_id Int       @db.UnsignedInt
    session_id Int       @db.UnsignedInt
    created_at DateTime? @default(now()) @db.Timestamp(0)
    url        String    @db.VarChar(500)
    referrer   String?   @db.VarChar(500)
    session    Session   @relation(fields: [session_id], references: [id])
    website    Website   @relation(fields: [website_id], references: [id])

    @@index([created_at], name: "pageview_created_at_idx")
    @@index([session_id], name: "pageview_session_id_idx")
    @@index([website_id, created_at], name: "pageview_website_id_created_at_idx")
    @@index([website_id], name: "pageview_website_id_idx")
    @@index([website_id, session_id, created_at], name: "pageview_website_id_session_id_created_at_idx")
    @@map("page_views")
}

We will have to run migration and generate command again.

npx prisma migrate dev --name v3

npx prisma generate

Now, we will update our collect endpoint to first create a session and then create pageview. Update collect.js to the following:

import { enableCors } from "lib/middleware";
import { badRequest, ok } from "lib/response";
import isbot from 'isbot';
import { findOrCreateNewSession } from 'lib/session';
import { validate } from "uuid";
import { savePageView } from 'services/pageview-service';

export default async function handle(req, res) {
    await enableCors(req, res);

    if (isbot(req.headers['user-agent'])) {
        return ok(res);
    }
    const { type, payload } = req.body;
    if (!payload) {
        return badRequest(res);
    }

    if (!validate(payload.website)) {
        return badRequest(res);
    }
    console.log("Received request", payload);
    //create a new session or associate with the current session user session
    const { website_id, session_id } = await findOrCreateNewSession(req);

    // if the type of event is pageview then we save the pageview with the session
    if (type === 'pageview') {
        const { url, referrer } = payload;
        await savePageView(
            website_id,
            session_id,
            url,
            referrer
        )
    } else {
        return badRequest(res);
    }
    return ok(res);
}

We checked if the request is from a bot if yes we return
Next, we validated request payload and website id
Then, we looked for existing session for the same user or created a new session
Next, if the type is pageview we saved pageview for the session

We will need to install following libraries.

npm i detect-browser
npm i request-ip
npm i date-fns
npm i ua-parser-js

Le’s look at findOrCreateNewSessionin lib/session.js

import requestIp from 'request-ip';
import { browserName, detectOS } from 'detect-browser';
import { getWebsiteByUuid } from 'services/website-service';
import { nonRandomUUid } from './crypto';
import { createSession, getSessionByUuid } from 'services/session-service';
import parser from 'ua-parser-js';


export async function findOrCreateNewSession(req) {
    const { payload } = req.body;
    const { website: website_uuid } = payload;
    const website = await getWebsiteByUuid(website_uuid);

    if (!website) {
        throw new Error(`Website not found ${website_uuid}`);
    }

    const { hostname, screen, language } = payload;
    const { userAgent, browser, os, ip, country, device } = getClientInfo(req, payload);
    const { id: website_id } = website;
    const session_uuid = nonRandomUUid(website_id, hostname, ip, userAgent, os);
    let session = await getSessionByUuid(session_uuid);
    if (session) {
        return {
            website_id,
            session_id: session.id
        }
    }

    session = await createSession(website_id, {
        session_uuid,
        hostname,
        browser,
        os,
        screen,
        language,
        country,
        device,
    });

    return {
        website_id,
        session_id: session.id
    };
}

export function getClientInfo(req, { screen }) {
    const userAgent = req.headers['user-agent'];
    const ip = requestIp.getClientIp(req);
    //TODO: Use library
    const country = 'IN';
    const browser = browserName(userAgent);
    const os = detectOS(userAgent);
    const device = getDevice(userAgent);

    return { userAgent, browser, os, ip, country, device };
}

function getDevice(userAgent) {
    var result = parser(userAgent)
    const device = result.device;
    return device.type;

}

Important things to note:

Use of nonRandomUuid
Use of libraries to fetch session specific information

The getSessionByUuid and createSession are defined in the services/session-service.js file

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

export async function getSessionByUuid(session_uuid) {
    return prisma.session.findUnique({
        where: {
            session_uuid,
        },
    }).catch(e => { throw e });
}


export async function createSession(website_id, data) {
    return prisma.session.create({
        data: {
            website_id,
            ...data
        },
        select: {
            id: true
        }
    }).catch(e => { throw e });
}

Next, we will look at the services/pageview-service.jsfor savePageView method

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

export const URL_LENGTH = 500;

export async function savePageView(website_id, session_id, url, referrer) {
    return prisma.pageView.create({
        data: {
            website_id,
            session_id,
            url: url?.substr(0, URL_LENGTH),
            referrer: referrer?.substr(0, URL_LENGTH),
        },
    }).catch(e => { throw e });

}

Testing the collect endpoint. We can test it using the static service we created in the last step

Just refresh the page. You should see values in the sessions and page_views tables.

mysql> select * from sessions;
+----+--------------------------------------+------------+---------------------+-----------+---------+--------+--------+-----------+----------+---------+
| id | session_uuid                         | website_id | created_at          | hostname  | browser | os     | device | screen    | language | country |
+----+--------------------------------------+------------+---------------------+-----------+---------+--------+--------+-----------+----------+---------+
|  1 | cfafb370-f063-5047-8ef3-2c5ff9953d1f |          4 | 2021-11-09 20:13:59 | localhost | chrome  | Mac OS | NULL   | 1792x1120 | en-US    | IN      |
+----+--------------------------------------+------------+---------------------+-----------+---------+--------+--------+-----------+----------+---------+
1 row in set (0.00 sec)

and

mysql> select * from page_views;
+----+------------+------------+---------------------+-------------+----------+
| id | website_id | session_id | created_at          | url         | referrer |
+----+------------+------------+---------------------+-------------+----------+
|  1 |          4 |          1 | 2021-11-09 20:14:34 | /blog1.html |          |
+----+------------+------------+---------------------+-------------+----------+
1 row in set (0.00 sec)

Step 10: Getting country from ip address

We will use a library so let’s install it first

npm i geoip-lite

Next, we will update our lib/session.js

import geoip from 'geoip-lite';

export function getClientInfo(req, { screen }) {
    const userAgent = req.headers['user-agent'];
    const ip = requestIp.getClientIp(req);
    const country = getCountry(requestIp);
    const browser = browserName(userAgent);
    const os = detectOS(userAgent);
    const device = getDevice(userAgent);
    return { userAgent, browser, os, ip, country, device };
}

function getCountry(ipAddress) {
    const result = geoip.lookup(ipAddress)
    return result?.country;
}

To test this you will need to open the static server in a different browser. You will notice that since we are using localhost we country will be null.

To test it fully we need a public IP. We can do that by using tool like ngrok. You can download it from here.

Expose both the static server and next app port

ngrok http 8000
ngrok http 3000

Next, you will need to change the uptics.js to point to the internet URL.

Now when you will load the local server page you will see a new session created with the country populated.

mysql> select * from sessions;
+----+--------------------------------------+------------+---------------------+------------------------------+---------+--------+--------+-----------+----------+---------+
| id | session_uuid                         | website_id | created_at          | hostname                     | browser | os     | device | screen    | language | country |
+----+--------------------------------------+------------+---------------------+------------------------------+---------+--------+--------+-----------+----------+---------+
|  2 | c4c5470a-35d3-5de4-accc-16695ef58330 |          4 | 2021-11-11 20:39:12 | localhost                    | safari  | Mac OS | NULL   | 1792x1120 | en-GB    | NULL    |
|  5 | a2e15cd7-c3dd-5f0f-88ec-0872fecbf611 |          4 | 2021-11-11 20:59:14 | a267-171-78-204-224.ngrok.io | chrome  | Mac OS | NULL   | 1792x1120 | en-US    | IN      |
+----+--------------------------------------+------------+---------------------+------------------------------+---------+--------+--------+-----------+----------+---------+

As you can see above in the first record hostname is localhost and country is NULL where as in the second record hostname is a267-171-78-204-224.ngrok.io and country is IN.

Step 11: Track events

Firstly, we will update the tracker/index.js to listen to all events and submit them to backend

function startTracking(window) {
    // removed for brevity

    const handleEvents = () => {
        document.addEventListener('click', handleClickEvent);
    }

    const handleClickEvent = (event) => {
        const event_type = event.type;
        var link = event.target;
        const targetHref = getTargetHref(event);
        const getEventValue = (targetHref, element) => {
            if (targetHref) {
                return targetHref;
            } else if (element.id) {
                return 'id:' + link.id;
            } else {
                return 'tag:' + link.tagName.toLowerCase();
            }
        }
        const event_value = getEventValue(targetHref, link);
        collect(
            'event',
            {
                event_type,
                event_value,
                url: currentUrl,
            },
            website,
        );
        // Delay navigation so that API is notified of the click
        if (!link.target || link.target.match(/^_(self|parent|top)$/i)) {
            if (!(event.ctrlKey || event.metaKey || event.shiftKey) && click) {
                setTimeout(function () {
                    location.href = targetHref;
                }, 150);
                event.preventDefault();
            }
        }
    }

    trackView(currentUrl, currentRef);
    handleEvents();
}

function getTargetHref(event) {
    let link = event.target;
    const click = event.type == "click";
    while (link && (typeof link.tagName == 'undefined' || link.tagName.toLowerCase() != 'a' || !link.href)) {
        link = link.parentNode
    }

    if (link && link.href && link.host && link.host !== location.host) {
        if (click) {
            return link.href;
        }
    }
    return null;
}


startTracking(window);

Let’s look at important points in the code shown above:

We subscribe to all click events using the handleEvents method
handleClickEvent listener creates the event object and post to the collect API. We pass two main attributes to the collect endpoint – event_type and event_value. The event_type in our case will be click . The event_value depends on whether we are navigating to an external link or clicking any other element.
Also, when calling external URLs we add a wait for 150 ms to ensure our collect API endpoint is called
Finally we call the handleEvents method

Now, we will update our schema prisma/schema.prisma

model Website {
    events       Event[]
  // rest removed for brevity
}

model Session {
    events       Event[]
  // rest removed for brevity
}

model Event {
    id          Int       @id @default(autoincrement()) @db.UnsignedInt
    website_id  Int       @db.UnsignedInt
    session_id  Int       @db.UnsignedInt
    created_at  DateTime? @default(now()) @db.Timestamp(0)
    url         String    @db.VarChar(500)
    event_type  String    @db.VarChar(50)
    event_value String    @db.VarChar(50)
    session     Session   @relation(fields: [session_id], references: [id])
    website     Website   @relation(fields: [website_id], references: [id])

    @@index([created_at], name: "event_created_at_idx")
    @@index([session_id], name: "event_session_id_idx")
    @@index([website_id], name: "event_website_id_idx")
    @@map("events")
}

Next, we will generate migration script.

npx prisma migrate dev --name v4

It also generates the client so we don’t have to do it manually.

Now, we will update collect.js

import { saveEvent } from "services/event-service";

export default async function handle(req, res) {
    // same as before

    // if the type of event is pageview then we save the pageview with the session
    if (type === 'pageview') {
            // same as before
    } else if (type === 'event') {
        const { url, event_type, event_value } = payload;

        await saveEvent(website_id, session_id, url, event_type, event_value);
    } else {
        return badRequest(res);
    }
    return ok(res);
}

Finally, we will look at services/event-service.js

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

export const URL_LENGTH = 500;

export async function saveEvent(website_id, session_id, url, event_type, event_value) {
    return prisma.event.create({
        data: {
            website_id,
            session_id,
            url: url?.substr(0, URL_LENGTH),
            event_type: event_type?.substr(0, 50),
            event_value: event_value?.substr(0, 50),
        },
    });
}

Architecture of Open source systems #1: Umami: An open source Google Analytics Alternative

Tools Used

Project Details

Technology Stack of Umami

High level understanding of a web analytics system

Running Umami Locally

Entity Relationship Diagram

Understanding Important Scenarios

Scenario 1: Login

Scenario 2: Enabling analytics for a website

Step by step creating the umami backend

Step 1: Create Next.js project

Step 2: Register New Account REST API

Step 3: Login API

Step 4 : Using lowercase plural table names

Step 5: Add website to account

Step 6: Creating Middleware to reduce duplication

Step 7: Read, Update, and Delete API for Website

Step 8: Building the tracker script to track pagviews

Step 9: Session and Page view

Step 10: Getting country from ip address

Step 11: Track events

Other stuff I learnt from reading code

Leave a comment Cancel reply

Tools Used

Project Details

Technology Stack of Umami

High level understanding of a web analytics system

Running Umami Locally

Entity Relationship Diagram

Understanding Important Scenarios

Scenario 1: Login

Scenario 2: Enabling analytics for a website

Step by step creating the umami backend

Step 1: Create Next.js project

Step 2: Register New Account REST API

Step 3: Login API

Step 4 : Using lowercase plural table names

Step 5: Add website to account

Step 6: Creating Middleware to reduce duplication

Step 7: Read, Update, and Delete API for Website

Step 8: Building the tracker script to track pagviews

Step 9: Session and Page view

Step 10: Getting country from ip address

Step 11: Track events

Other stuff I learnt from reading code

Share this:

Related

Leave a comment Cancel reply