Day 16: Goose Extractor–An Article Extractor That Just Works

Today for my 30 day challenge, I decided to learn how to do article extraction using the Python programming language. I have been interested in article extraction for a few month when I wanted to write a Prismatic clone. Prismatic creates a news feed based on user interest. Extracting article’s main content, images, and other meta information is a very common requirement in most of the content discovery websites like Prismatic. In this blog post, we will learn how we can use a Python package called goose-extractor to accomplish this task. We will first cover some basics, and then we will develop a simple Flask application which will use the Goose Extractor API. Read the full article here



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s