My Notes on GitLab Postgres Schema Design

2022-07-13T21:00:55+05:30

Thanks for this in-depth review, I found many points really interesting.

Reply

2022-07-14T07:11:26+05:30

Thanks for sharing. it’s help me a lot.

Reply

2022-07-15T13:50:52+05:30

Hey Shekhar, Yannis from the Database Group at GitLab here.

It is great to see your deep dive to our schema and all the details that you have pointed out!

There are much more to discuss that can not be seen by only inspecting a snapshot of our schema; the evolution of our schema during the years, how Ruby on Rails conventions have impacted it (especially in the early years), issues faced at the GitLab.com scale that force us to revisit our traditional database design thinking and more.

I was thinking of creating a video based on your post and discuss those and many other details. But we can also do it interactively if you want 🙂

If you are interested, add an issue on GitLab.com and ping me there (https://gitlab.com/iroussos)

Reply

2022-07-15T16:27:18+05:30

Hi Yannis, thanks for your comment. In which project should I create the issue? There are many under GitLab.com https://gitlab.com/gitlab-com

Reply

2022-07-15T15:01:53+05:30

Nice article, there is a huge market for these kind of blogs! One thing though: exposing internal id’s to end-users is an anti-pattern only if these end-users can access id-guessed objects. I don’t see how this is possible with the right access rules applied. I use RLS with end-user-is-a-database-user, so guessing id’s is harmless: only rows with read access are returned. I know most frameworks use a super-user, but that is a huge anti-pattern! Bigger than guesseble id’s. Can you explain how Gitlab solved the super-user issue? Or is it open?

Reply

2022-07-15T17:22:31+05:30

What is important to understand about some of “design choices” for GitLab’s database schema (like table naming, usage of timestamp without timezone data type, etc) is that it is a pretty old project (10+ years) that is written using Ruby on Rails framework which has its own conventions and defaults:
– Serial columns for primary keys was created by default before Ruby on Rails 5.1 released in 2017 (see https://github.com/rails/rails/pull/26266). So all older tables have 4-byte primary keys.
– Ruby on Rails uses `timestamp without timezone` datatype for datetime columns. It is fine as Ruby on Rails always by default sets UTC timezone for all database connections and handles timezone conversions in the application code. In that case `timestamp with timezone` and `timestamp without timezone` behaves identically (both stores UTC and reports UTC). However, caution is needed in case if someone is connecting to the database bypassing Ruby on Rails.
– Table naming follows Ruby on Rails conventions: tables are plural in snake case, model classes are singular in CamelCase. Index names are generated automatically by Rails migrations.

And so on and so forth. Read more in Ruby on Rails guides about database migrations: https://guides.rubyonrails.org/active_record_migrations.html

Reply

2023-02-08T16:22:18+05:30

Thanks Shekhar for such detailed explanation, I learnt a lot from it.

Reply

Name	Description	Range	Text
`serial`	4 bytes	1 to 2147483647	~2.1 billion
`bigserial`	8 bytes	1 to 9223372036854775807	~9.2 quintillion

Name	Description
`character varying(n)`, `varchar(n)`	variable-length with limit
`character(n)`, `char(n)`	fixed-length, blank padded
`text`	variable unlimited length

My Notes on GitLab Postgres Schema Design

1. Using the right primary key type for a table

2. Use of internal and external ids

3. Using `text` character type with check constraints

4. Naming conventions

5. Timestamp with timezone and without timezone

6. Foreign key constraints

7. Partitioning big tables

8. Supporting LIKE search use cases with Trigrams and `gin_trgm_ops`

9. Use of `jsonb`

10. Other tidbits

Conclusion

References

Discover more from Shekhar Gulati

7 thoughts on “My Notes on GitLab Postgres Schema Design”

Leave a comment Cancel reply

1. Using the right primary key type for a table

2. Use of internal and external ids

3. Using text character type with check constraints

4. Naming conventions

5. Timestamp with timezone and without timezone

6. Foreign key constraints

7. Partitioning big tables

8. Supporting LIKE search use cases with Trigrams and gin_trgm_ops

9. Use of jsonb

10. Other tidbits

Conclusion

References

Discover more from Shekhar Gulati

Share this:

Related

7 thoughts on “My Notes on GitLab Postgres Schema Design”

Leave a comment Cancel reply

3. Using `text` character type with check constraints

8. Supporting LIKE search use cases with Trigrams and `gin_trgm_ops`

9. Use of `jsonb`