Real-Time Data Sources for Data Engineering Projects
The best way to learn data engineering is to build data processing pipelines that handle big data. It’s not that easy to find free big data.
Below is a list of streaming data sets that are open for public consumption. I’ll keep updating as I find more. Please share other APIs in the comments!
Social Media
One of the best sources for user generated content and processing text.
Twitter — You can use their official API or a third-party library to access a part of their stream. For Python, go with Tweepy.
Reddit — Normally you would have to call the REST API at a regular interval to pull down posts and comments for certain subreddits, however, the team at Pusher has built a Realtime API, read more about it here.
Meetup — They provide extensive developer endpoints, plus 3 WebSockets. You can stream event comments, photos and RSVPs.
Finance
Money never sleeps. There are people buying and selling financial instruments every second of the day.
Stocks — IEX has an amazing developer platform. Not only can you get stock information, but you can get information on the order book, cryptocurrencies, sector performance, and much more.
Blockchain — The official Blockchain organization provides an API to get notifications on blocks and transactions for Bitcoin.
Crypto — CryptoCompare offers a free API with a streaming component that allows you to get all currency pairs and pricing info on demand.