How to create a custom Spring Boot FailureAnalyzer

Today, I was working with a Spring Boot application that does local JVM cache warming on the server start up. Application was calling a global Redis cache and storing state that does not change often in an in-memory JVM cache. It is a common pattern that many applications use. In our case, application not only just warm the cache but it also first process some data and then cache the result in the local JVM cache.

Many times junior developers forget to start redis or any other depending service and then application fails to start on their local machine. Then, they need to spend few minutes reading the long Java stack trace to find the problem. These stack trace can be quite long. And, it is difficult to find needle in this haystack.

Recently, I learnt about a Spring Boot feature called FailureAnalyzer. FailureAnalyzer allows you to intercept exceptions that occur at the start-up of an application causing an application startup failure. Using FailureAnalyzer you can replace the stack trace of the exception with a more human readable message. The best example of this is when your code has cyclic dependencies. A common example of cyclic dependency is a bean A depending on bean B and vice versa as shown below.

Continue reading “How to create a custom Spring Boot FailureAnalyzer”

Issue #38: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 135 minutes.

The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. – Stephen Hawking

  1. System design hack: Postgres is a great pub/sub & job server: 10 mins read. I have read multiple times that people are using Postgres as a job queue or as a pub/sub solution. It does require you to mess with SQL and write PSQL functions but I think it could be a good solution if you don’t want to manage some other pub/sub server.
  2. Developers mentoring other developers: practices I’ve seen work well: 20 mins read. The article covers how we can build good mentorship programs at our work.
  3. Head In The Clouds: 15 min read. This articles covers how folks at FreeAgent planned their cloud migration journey. The key points from the post are:
    1. Co-locating has been a terrific win for us over the years, providing us with a cost-effective, high performance compute platform that has allowed us to scale to over 95,000 customers with close to 5 9’s reliability.
    2. Growth often acts as a forcing function with regards to infrastructure. Head count has doubled. Customer count is growing quickly.
    3. Desire for new features is another forcing function. They wanted more datacenters to increase resilience. They were reaching hardware limitations. The ops team was pressed and it was challenging to find ops engineers with the right skills. They were experimenting with ML. Serverless was becoming a go to for production. They wanted to improve deployment. And scaling the database was a challenge.
    4. Experiments were run to research moving to AWS: Granted, any infrastructure migration would be expensive, the project complex and it would come with many challenges, but the advantages and opportunities that a full cloud migration would open up in the future were undeniable.
    5. The decision was made to migrate to AWS!
    6. Early on in the R&D phase we became customers of Gruntwork.io and have relied heavily on their Infrastructure as Code library and training to accelerate the project.
  4. We built network isolation for 1,500 services to make Monzo more secure: 20 mins read. In the Security team at Monzo, one of our goals is to move towards a completely zero trust platform. This means that in theory, we’d be able to run malicious code inside our platform with no risk – the code wouldn’t be able to interact with anything dangerous without the security team granting special access.
  5. Scaling in the presence of errors—don’t ignore them: 20 mins read. The secret to error handling at scale isn’t giving up, ignoring the problem, or even it trying again—it is structuring a program for recovery, making errors stand out, allowing other parts of the program to make decisions. Techniques like fail-fast, crash-only-software, process supervision, but also things like clever use of version numbers, and occasionally the odd bit of statelessness or idempotence. What these all have in common is that they’re all methods of recovery. Recovery is the secret to handling errors. Especially at scale. Giving up early so other things have a chance, continuing on so other things can catch up, restarting from a clean state to try again, saving progress so that things do not have to be repeated. That, or put it off for a while. Buy a lot of disks, hire a few SREs, and add another graph to the dashboard.
  6. Modern Data Practice and the SQL Tradition: 15 mins read. Over the last one year I have read multiple posts suggesting we should start with relational database route. SQL is becoming the defacto language for all things data. Most developers start looking at alternatives too early in the cycle before understanding pros and cons of using a technology. The key points from the post are:
    1. The more I work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand.
    2. One can now model the “known” part of his data model in a typical relational manner and dump his “raw and unstructured” data into JSON columns. No need to “denormalize all the things” just because some element of the domain is “unstructured”.
    3. The good thing with this approach is that one can have a single database for both their structured and unstructured data without sacrificing ACID-compliance.
    4. SQL and relational databases have come a long way and nowadays offer almost any function a data scientist could ask.
    5. Relational databases usually make more sense financially too. Distributed systems like MongoDB and ElasticSearch are money-hungry beasts and can kill your technology and human resources budget; unless you are absolutely certain and have run the numbers and decided that they do really make sense for your case.
    6. Performance and stability with relational databases can be better out of the box
  7. Hash join in MySQL 8: 10 mins read. You should read this blog if you want to learn how hash joins are implemented by databases. It will give you a good and detailed understanding on the subject.
  8. Managing a Go monorepo with Bazel: 10 mins read. I don’t think we still have a winner between monorepo and multiple repo approach when building Microservices. We have big organisations like Google and Facebook that prefer Monorepo approach and then we have organizations like Netflix that recommend multi repo approach. This post covers how you can manage a Go monorepo using Bazel build tool. I have not used Bazel so far but I am seriously considering it for my personal projects.
  9. The Value in Go’s Simplicity: 10 mins read. Go is one language that I really want to spend more time on. It is a popular language used almost everywhere these days. In this blog, author makes the case for Go’s simplicity. As author mentioned, Go core development team has take simplicity to another level. To keep language simple they are not allowing many good features like Generics implemented in Go.
  10. When XML beats JSON: UI layouts: 5 mins read. UI layouts are represented as component trees. And XML is ideal for representing tree structures. It’s a match made in heaven! In fact, the most popular UI frameworks in the world (HTML and Android) use XML syntax to define layouts.

Video of the week

Issue #37: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 145 minutes.

I am not bothered by the fact that I am unknown. I am bothered when I do not know others – Confucius

  1. What nobody tells you about documentation: 20 mins read. I like the way author divided technical documentation into four buckets.
    • Tutorials. They are learning oriented.
    • How-To Guides. They are problem oriented.
    • Reference: They are information oriented.
    • Explanation: They are understanding oriented.
  2. Test Desiderata: 5 mins read. This post is by Kent Beck, author of JUnit. In this post, Kent goes beyond FIRST properties of test cases.
    • Tests should be coupled to the behavior of code and decoupled from the structure of code. Seeing tests that fail on both counts
  3. Don’t Call Yourself A Programmer, And Other Career Advice 20 mins read.
  4. Testing Cloudflare workers: 20 mins read.
  5. Automated Disaster Recovery using CloudEndure: 10 mins read.
  6. AWS Lambda vs. Azure Functions: 10 Major Differences: 15 mins read.
  7. Daily Stand-up Injection of Guilt: 10 mins read. Yegor writes, “Only weak managers need daily stand-up meetings to coordinate the team, while strong ones use more formal instruments to organize the flow of information. However, as someone noted, morning meetings are not supposed to be used by managers to coordinate anyone, but “to discuss progress, impediments and to plan.” I’m not buying it.”
  8. Why Parcel Has Become My Go-To Bundler for Development: 10 mins read. An easy to read tutorial on how to start with Parcel, a zero config bundler. It really looks simple to use and comes with useful defaults.
  9. Storing 50 million events per second in Elasticsearch: How we did it: 20 mins read. A very detailed post on Datadome logs 50 million events per second for its customers to analyze and search over for a 30 day period.
  10. How Shopify Manages Petabyte Scale MySQL Backup and Restore: 15 mins read.

Issue #36: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 200 minutes.

Religion is the opium of the masses – Karl Marx

  1. A Technical Introduction to MemSQL: 20 mins read. MemSQL is s fast, commercial, ANSI SQL compliant, highly scalable HTAP database. HTAP databases are those that support both OLTP and OLAP workloads. It supports ACID transactions just like a regular relational database .It also supports document and geospatial data types. I have also written a quick post on MemSQL that you can read.

  2. It’s later than you think: 20 mins read. We all regret working too hard in the end. Give it a read it is an awesome write up on a heart breaking story.

  3. Modern applications at AWS: 10 mins read. To succeed in using application development to increase agility and innovation speed, organizations must adopt five elements, in any order: microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security.

  4. 1 Year of Event Sourcing and CQRS: 30 mins read. This is a long read that covers DDD, CQRS, and Event Sourcing. In this post author covered how they implemented this architecture style and issues they faced.

  5. The Single Most Important Internal Email in the History of Amazon: 20 mins read. This is a long read on how different organisations are organised. Some organisations are collocated and prefer synchronous mode of communication while others are distributed with asynchronous mode of communications. An organization’s communication system can be one of the most important leverages you can have to make an impact on productivity. Be very intentional about it.

  6. Lessons from Design School for Software Engineers: 20 mins read. Great advice from an Engineer at Github. All the lessons resonated with me.

    1. You are not your audience
    2. Constructive, objective feedback is always better than reductive, subjective feedback
    3. You are not your designs/work
    4. Iteration is key for improvement
    5. Always critique your work
  7. A Multithreaded Fork of Redis That’s 5X Faster Than Redis : 20 mins read. This is interesting. A fork of Redis that makes use of multi-threading to make Redis 5x faster. From the post:

    > KeyDB has a different philosophy on how the codebase should evolve. We feel that ease of use, high performance, and a “batteries included” approach is the best way to create a good user experience. While we have great respect for the Redis maintainers it is our opinion that the Redis approach focusses too much on simplicity of the code base at the expense of complexity for the user. This results in the need for external components and workarounds to solve common problems.

  8. Why we decided to go for the Big Rewrite: 20 mins. This post goes into detail how channable did rewrite of their main data processing system. It has a lot of good advice that you can apply in your work as well.

  9. How to Write Fast Code in Ruby on Rails: 15 mins read. This post contains general advice to write fast and performant Ruby code. Many of the lessons can be applied even if you use any other programming language.

  10. Cascading Cache Invalidation: 25 mins read. This is an interesting article covering flaw in one of the best practice most people use for asset caching i.e content hashes in filenames and far-future expiry. Author also shared three possible solutions to the problem.

Video of the week

This week video: Intel and Rust: the Future of Systems Programming

How To Think About Different Database Data Models: Relational vs Document Data Models

This week I was talking to a developer about how to think about data models supported by different databases. One thing that I have learnt in my 15 years of building web applications is that data models play an important role in the success of any software application.

Data model provides an abstract model to organise elements of data and how these elements relate to each other. They describe the structure, manipulation and integrity aspects of the data stored in data management systems such as relational databases. For example, when you are modelling your problem domain in a relational database then you think in terms of real-world entities and how those entities are related with each other. We usually use Entity Relation (ER) diagrams to model tables in relational databases. In this post, we will focus on two popular data models — Relational and Document. We will discuss when you will use one over the other.

Data model influence two main attributes:

  1. Easy of use: A data model can make some operations easy to achieve and others difficult or impossible
  2. Performance: A data model can be suitable for faster reads but slower writes or vice versa

Continue reading “How To Think About Different Database Data Models: Relational vs Document Data Models”

MemSQL Introduction: A Hybrid transactional/analytical processing database

main-how-it-works-ecosystem-diagram

MemSQL is s fast, commercial, ANSI SQL compliant, highly scalable HTAP database. HTAP databases are those that support both OLTP and OLAP workloads. It supports ACID transactions just like a regular relational database .It also supports document and geospatial data types.

MemSQL is fast because it stores data in-memory. But, it does not mean it is not durable. It maintains a copy of data on disk as well. Transactions are committed to the transaction log on disk and later compressed into full-database snapshots. One of the main reason new databases are designed as in-memory first is because memory is getting cheaper every year. It is estimated memory is becoming cheaper 40% every year.

MemSQL has tuneable durability. You can make it fully durable or completely ephemeral. It can be sync or async.

MemSQL simplifies your architecture as you don’t have to write ETL jobs to move data from one data store to another data store. This is the biggest selling point of any HTAP database.

Continue reading “MemSQL Introduction: A Hybrid transactional/analytical processing database”