May 2020 – Shekhar Gulati

Improving Spring Data JPA/Hibernate Bulk Insert Performance by more than 100 times

This week I had to work on a performance issue. Performance issues are always fun to work with. They give an opportunity to get into the depth of the technology we are using. We learn how much we don’t know about the technology we are using everyday. These days we too quickly think about changing the database or underlying library when faced with the performance bottleneck. When what we really need to do is learn about the technology we are using in depth.

The performance issue I am talking about was related to bulk insertion of data in to the database. We were using Spring Data JPA with SQL Server. In our use case, end user was uploading an excel file that the application code first parse, then process, and finally store in the database. The NFR for this requirement was that we should be able to process and store 100,000 records in less than 5 minutes.

Before my changes we were processing 10,000 records in 47 minutes. This certainly looked bad.

After making the changes that I will discuss in this post we were able to process 10,000 records in 20 seconds. We were able to process 100,000 in less that 4 minutes which is well below our NFR.

Continue reading “Improving Spring Data JPA/Hibernate Bulk Insert Performance by more than 100 times”

4 Reasons You Might Want To Build Stateful Services/Apps

These days we all are told to build stateless applications. Stateless apps are those that don’t store any state in the application process and fetch any state from a centralised datastore (it could be a global cache or a database). The sixth factor in 12 factor app also talk about the same principle.

Execute the app as one or more stateless processes

Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database.

There are advantages in building stateless applications primary being ability to scale horizontally with ease. When we build stateless applications we push the scalability problem to the database. We expect our database to scale horizontally. This usually is solved by sticking a global cache (Redis or Memcached) in between. Scaling cache is relatively easy and solved problem. Keeping cache updated with updates is a hard problem. We will discuss it some other time.

Continue reading “4 Reasons You Might Want To Build Stateful Services/Apps”

Using ArchUnit To Enforce Architecture Best Practices

I have worked with multiple software development teams that because of feature delivery pressure does not apply best practices. This later leads to tech debt and cost more time. In the last project team that I helped they made two small mistakes:

They didn’t use pagination in the collection resources. They were fetching the list of data from the database and returning back to the user as JSON. During development when data was small they didn’t face any issues. But, later when customer started doing testing it became a monumental task to add pagination in all the collection resources.
They were returning domain entities in the response. This meant they were transferring more data over the wire than it was necessary.

So, they were breaking two best practices:

Using pagination for collection resources
Keep your domain object different from representation object

Continue reading “Using ArchUnit To Enforce Architecture Best Practices”