This week I had to work on a performance issue. Performance issues are always fun to work with. They give an opportunity to get into the depth of the technology we are using. We learn how much we don’t know about the technology we are using everyday. These days we too quickly think about changing the database or underlying library when faced with the performance bottleneck. When what we really need to do is learn about the technology we are using in depth.
The performance issue I am talking about was related to bulk insertion of data in to the database. We were using Spring Data JPA with SQL Server. In our use case, end user was uploading an excel file that the application code first parse, then process, and finally store in the database. The NFR for this requirement was that we should be able to process and store 100,000 records in less than 5 minutes.
Before my changes we were processing 10,000 records in 47 minutes. This certainly looked bad.
After making the changes that I will discuss in this post we were able to process 10,000 records in 20 seconds. We were able to process 100,000 in less that 4 minutes which is well below our NFR.