TensorFlow is an open source computation framework for building machine learning models. Its design make use of lessons learnt from earlier machine learning frameworks — Torch, Theano, Caffe, and Keras. Torch is the earliest machine learning framework that made of the term Tensor. Theano makes use of Graph data structure to store operations and compile them to high-performance code. Caffe is a high performance framework written in C++ and makes feasible to execute applications on different devices. Keras provides an easy to use API to interface with various machine learning frameworks like Theano.
If you launch Eclipse MAT and get following error
java.lang.IllegalStateException: The platform metadata area could not be written: /private/var/folders/9q/zhpkyd3s4y9d5t1nv_5hszww0000gp/T/AppTranslocation/DF264CA5-4EEF-4916-A3FA-881B111294E5/d/mat.app/Contents/MacOS/work/.metadata. By default the platform writes its content under the current working directory when the platform is launched. Use the -data parameter to specify a different content area for the platform.
Then you should open the MemoryAnalyzer.ini in your favorite editor and add the
-data argument as shown below.
The location of MemoryAnalyzer.ini depends on your platform. If you are using Mac, then it is inside the app folder like
-startup ../Eclipse/plugins/org.eclipse.equinox.launcher_1.5.0.v20180512-1130.jar -data /Users/shekhargulati/dev/tmp/mat --launcher.library ../Eclipse/plugins/org.eclipse.equinox.launcher.cocoa.macosx.x86_64_1.1.700.v20180518-1200 -vmargs -Xmx4096m -Dorg.eclipse.swt.internal.carbon.smallFonts -XstartOnFirstThread
Recently, I read an article on Markov chains. In the post, author showed how we can build autocomplete functionality using them. The article piqued my interest to learn more about Markov chain and I started looking for an example application that I can build using it. I decided to build a web application that will suggest me what Indian prime minister Narendra Modi will say after a word/pair of words/triplet for words.
I am not a supporter of Narendra Modi style of leadership. The reason I chose him is because I could easily find text of all his speeches on the web .
This post is divided into three sections:
- What is Markov chain?
- Create dataset for the application
- Build the application that uses Markov chain
Last couple of days I was playing with Istio and I couldn’t find a working upto date tutorial that can teach me how to run a basic hello world application with Istio in Kubernetes.
Istio is an open source service mesh that provides a uniform way to integrate microservices, manage traffic flow across microservices, enforce policies, and aggregate telemetry data.
In this quick tutorial you will learn how to install Istio on Minikube and then deploy a helloworld sample application on it.
The time to read this newsletter is 180 minutes.
Wealth is the ability to fully experience life. — Henry David Thoreau
- Don’t get clever with login forms: 10 mins read. This post points to a valid concern related to cleverness of login forms. Author through a set of examples explain why clever login forms end up confusing users. Another example of clever login experience that author does not cover is https://login.microsoftonline.com . I agree with author recommendations for login page:
- Have a dedicated page for login
- Expose all required fields
- Keep all fields on one page
- Don’t get fancy.
- Why Google Needed a Graph Serving System: 30 mins read. In this post, author shares his story of building a distributed graph database that can answer queries with relationship. The post goes over various Graph based systems developed at Google and why Google failed to build a distributed Graph database that does not suffer from depth join problem. This post highlights an interesting point related to Google’s struggle to build innovative solution because of their internal politics. Building a distribued graph database that does not suffer from depth join problem is a herculean task. Dgraph an open source database developed by the author along with others in community is trying to build such a system.
- You probably don’t need a single-page application: 10 mins read. I agree in entirety with the author that best solution to build web application is somewhere in middle i.e. building hybrid apps. Build SPA only for parts where you need rich interaction and keep most other pages server rendered.
- Google wants Cloud Services Platform to Borg your datacenter: 20 mins read. This post gives insight into why Google made the move to build and open source Kubernetes. Google knew they are going to have a hard time beating AWS and Azure. So, they built and released Kubernetes and hoped it becomes a successful project with big community. This means cloud just became an implementation detail and most big enterprises started considering Kubernetes as a choice of softwaere to build a modern hybrid datacenter. Google’s Cloud Service Platform(CSP) will give enterprises a hardened Kubernetes, Istio, Knative software distribution. CSP is going to be a game changer for Google Cloud. Also, many OpenShift users might consider going for CSP. Interesting time ahead!
- Four Techniques Serverless Platforms Use to Balance Performance and Cost: 30 mins read. This is the best article I have read on Serverless. It starts by helping reader understand architecture of Serverless platform and then it talks about elephant in the room — cold start problem associated with Serverless platforms. The article covers four techniques that is employed by different Serverless platforms to overcome cold start issue. The techniques mentioned in the post are following:
- Function resource sharing
- Function resource pooling
- Function prefetching
- Function prewarming.
- Lessons from 6 software rewrite stories: 20 mins read. Another amazing read for this week. This post through real examples explain when it is fine to rewrite software. If you are building software for long, you will have come across advice by Joel Spolsky that rewriting software is the single worst strategic mistake that a software company can make. The post author tells the other side of the story in this post. The key take away from the post is
- Once you’ve learned enough that there’s a certain distance between the current version of your product and the best version of that product you can imagine, then the right approach is not to replace your software with a new version, but to build something new next to it — without throwing away what you have.
- How to build a distributed throttling system with Nginx + Lua + Redis: 15 mins read. This post covers how to build API rate limiting system with Nginx, Lua, and Redis. Instructions mentioned in the post are clear and to the point.
- Monte Carlo Simulation with Python: 20 mins read. The post explains Monte Carlo simulation using a simple but realisitic example. As per wikipedia,
- Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.
- 5 Ways To Process Feedback At Work Without Triggering A Stress Response: 10 mins read. This post covers an important aspect of professional life — taking feedback. The author suggests following:
- Keep an open mind about receiving feedback. Focus on how your work can be improved with some extra perspective.
- Don’t respond right away, take a few seconds to really process the feedback. You can assess rationally and logically, without undue emotion.
- Make sure you understand the feedback. In cases where you don’t, ask questions! The feedback giver should be happy to discuss specific points deeper to help clarify their suggestions.
- Be humble and gracious! Let them know you appreciate that they gave their time and energy to help make you more successful.
- Don’t let constructive criticism go in one ear and out the other. Take what you hear, implement it, and follow-up.
- How to Organize your Monolith Before Breaking it into Services: 15 mins read. This post talks about an intermediary stage between monolithic and microservices – a monolithic organized by domain without the entanglement or fragility of our original codebase. I agree with author in entirety that we should start with monolithic and modularise applications based on sub domains by applying DDD principles. If required in future, we can easily make these subdomain functional modules to services. It is great to read post like this as they provide valuable information that is usually missing in most posts found on the web.
The time to read this newsletter is 150 minutes.
The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn and relearn. — Alvin Toffler
- The Hard Truth About Innovative Cultures: 20 mins read. This post answers a question that I was struggling to find the right answer. If there is one post that you should read this week then It should be this one. From the post:
> A tolerance for failure requires an intolerance for incompetence. A willingness to experiment requires rigorous discipline. Psychological safety requires comfort with brutal candor. Collaboration must be balanced with a individual accountability. And flatness requires strong leadership. Innovative cultures are paradoxical. Unless the tensions created by this paradox are carefully managed, attempts to create an innovative culture will fail.
When AWS Autoscale Doesn’t: 15 mins read. This post by folks at Segment share valuable lessons on AWS autoscaling. The key points for me in the post are:
- AWS autoscaling for ECS follows the formula
new_task_count = current_task_count * ( actual_metric_value / target_metric_value ). The ratio
actual_metric_value/target_metric_valuelimit the magnitude of scale out event. To overcome this, you either have to reduce the target value leading to over scale all the time or use custom CloudWatch metric
- The default cool down time for scale out event is 3 minutes and cooldown for scale in event is 5 minutes
- AWS autoscaling for ECS follows the formula
- Multiply your time by asking 4 questions about the stuff on your to-do list: 10 mins read. This post won’t tell you how to magically make each day 38 hours long (we’re still working on that). But by assessing our tasks in terms of their significance, we can free up more time tomorrow.
Dotfile madness: 10 mins read. I just counted my home directory has more than 30 hidden directories. The post makes a valid argument against proliferation of dot files and dot directories. The author writes:
> Avoid creating files or directories of any kind in your user’s $HOME directory in order to store your configuration or data. This practice is bizarre at best and it is time to end it. I am sorry to say that many (if not most) programs are guilty of doing this while there are significantly better places that can be used for storing per-user program data.
Life of a SQL query: 15 mins read. What happens when you run a SQL statement? We follow a Postgres query transformation by transformation as a query is processed and results are returned.
Splitting Up a Codebase into Microservices and Artifacts: 10 mins read. This is the first post that you should read if you are thinking about Microservices. I like the way this post first talked about using module boundary to split the code base. If module boundaries are not enough then you should think about Microservices. In my opinion, you should choose Microservices 1) to scale engineering organization 2) the real need for your polyglot environment depending on your business problem.
Golang Datastructures: Trees: 20 mins read. This is an awesome read even if you can’t comprehend Golang. This beautifully written post explains how to implement a simple DOM tree in Golang. It shows implementation of breadth first search and depth first search algorithms to implement find functionality. I thoroughly enjoyed this post.
Deploying Python ML Models with Flask, Docker and Kubernetes: 30 mins read. This is an extensive tutorial that shows how to deploy Python Flask applications on Kubernetes. It covers how to deploy Machine Learning (ML) models into production environments by exposing them as RESTful API Microservices hosted from within Docker containers, that are in-turn deployed to a cloud environment.
A Minimalistic Guide to Kata Containers: 5 mins read. This is a short post that I wrote about Kata Containers. Kata Containers provide the best of containers and virtual machines. Read the post to learn more.
Building a Better Profanity Detection Library with scikit-learn: 15 mins read. This post covers how you can write your own profanity filter using machine learning. The author starts by giving reasons why he didn’t use existing profanity libraries and then he goes over the steps required to create your own profanity detection library.
Recently I discovered an interesting project called Kata Containers. It is an open source project hosted by OpenStack foundation. Kata Containers is the merger of Hyper.sh runV and Intel’s Clear Containers.
Kata Containers provide the isolations guarantee of a virtual machine and speed and ease of use of containers. As shown in the image below, virtual machines in the top left provide the strictest form of isolation but they are slow to boot up and their size on disk range from 500MB to GBs. On the other hand, containers in the bottom right are fast and nimble but they don’t provide the strictest form of isolation. Kata Containers are best of both worlds. They provide the speed of containers and security and isolation guarantees of virtual machine.
Containers face shared kernel problem, where if on a single host you have multiple containers, if one of those containers gets exploited, you can potentially have access to all the other containers on that host.
Kata Containers are highly optimised virtual machines that run the end user application in a container. So in essence, there is a one-to-one mapping between container and virtual machine as shown below. These virtual machines are lightweight and optimised so you don’t pay the huge cost of running traditional virtual Machines.
The main difference between containers and kata containers is that containers rely on software virtualisation provided by kernel where as Kata containers rely on hardware virtualisation. Containers for different workloads share the same OS kernel which leads to security and privacy concerns. Kata Containers are addressing this need of securely running disparate workloads. They are fast to boot as the virtual machines use a trimmed down version of OS that’s only responsible for booting the VM and handling over the control to the container.
Kata containers are OCI compatible runtime which means you can use them with container orchestration platforms like Kubernetes. The below image shows how Kata Containers will work with Kubernetes.