Real-Time Data Sources for Data Engineering Projects

Ty Shaikh
K2 Data Science & Engineering
2 min readJan 10, 2019

--

The best way to learn data engineering is to build data processing pipelines that handle big data. It’s not that easy to find free big data.

Below is a list of streaming data sets that are open for public consumption. I’ll keep updating as I find more. Please share other APIs in the comments!

Social Media

One of the best sources for user generated content and processing text.

Twitter — You can use their official API or a third-party library to access a part of their stream. For Python, go with Tweepy.

Reddit — Normally you would have to call the REST API at a regular interval to pull down posts and comments for certain subreddits, however, the team at Pusher has built a Realtime API, read more about it here.

Meetup — They provide extensive developer endpoints, plus 3 WebSockets. You can stream event comments, photos and RSVPs.

Finance

Money never sleeps. There are people buying and selling financial instruments every second of the day.

Stocks — IEX has an amazing developer platform. Not only can you get stock information, but you can get information on the order book, cryptocurrencies, sector performance, and much more.

Blockchain — The official Blockchain organization provides an API to get notifications on blocks and transactions for Bitcoin.

Crypto — CryptoCompare offers a free API with a streaming component that allows you to get all currency pairs and pricing info on demand.

Click to learn more about our upcoming course!

--

--