Streaming Twitter Data with Tweepy

With almost a million tweets being published everyday, Twitter is one of the best free streaming data sources. There is an enormous wealth of data that can be gathered, and insights to be discovered.
Today, we will utilize a powerful Python library called tweepy
to access tweets in real-time.
Getting Started
The first thing you need to do is create a Twitter account if you don’t already have one.
Then, head over to their developer section. You will have to create a Twitter developer account and then you can create apps.

Once you have created your app, you can access the credentials:

Never share these anywhere!
Make sure to install the tweepy package and the dataset package.
Tweepy Class
In your folder, you’ll want to create 2 python files. One called config.py:
The other called stream.py:
You can follow the comments in the code to see what is going on.
The basic gist is that we instantiate an instance of the TweetStreamListener and then feed it the authentication credentials and our custom Listener.
We extract out the relevant data from any tweets matching the words “javascript”, “ruby”, and “python”.
The tweets are stored in a local SQLite database. This can be used for analytics down the line. Here’s an example text mining project.
Gotchas
- This stream has retweets, so you’ll need custom logic to filter them out.
- It matches all words, so if you put in “ruby programming”, it will match “ruby” and “programming”. If you are looking specifically for the two words together, you’ll have to do string parsing or regex matching.