K2 Data Science & Engineering

Latest news about the curriculum and alumni

Follow publication

Popular Questions about the Data Engineering Career Path

Ty Shaikh
K2 Data Science & Engineering
4 min readJan 8, 2019

--

A look at the most common questions.

What is data engineering?

Data engineers primarily focus on the following areas.

Build and maintain the organization’s data pipeline systems

Data pipelines encompass the journey and processes that data undergoes within a company. Data engineers are responsible for creating those pipelines.

Creating a data pipeline may sound easy or trivial, but at big data scale, this means bringing together 10–30 different big data technologies. More importantly, a data engineer is the one who understands and chooses the right tools for the job. A data engineer is the one who understands the various technologies and frameworks in-depth, and how to combine them to create solutions to enable a company’s business processes with data pipelines.

Clean and wrangle data into a usable state

Data engineers make sure the data the organization is using is clean, reliable, and prepped for whatever use cases may present themselves. Data engineers wrangle data into a state that can then have queries run against it by data scientists.

Data wrangling is about taking a messy or unrefined source of data and turning it into something useful. You begin by seeking out raw data sources and determining their value: How good are they as data sets? How relevant are they to your goal? Is there a better source? Once you’ve parsed and cleaned the data so that the data sets are usable, you can utilize tools and methods (like Python scripts) to help you analyze them and present your findings in a report. This allows you to take data no one would bother looking at and make it both clear and actionable.

Source: O’Reilly — Data Engineering: A Quick and Simple Definition

What is the typical background?

Data engineering is a specialization of software engineering. The majority of data engineers working in the field used to be software engineers.

Due to the high demand, other roles such as DBAs, system admins, data analysts and scientists are also transitioning.

Source: Stitch Data — The State of Data Engineering

What skills do you need?

Data engineers should have the following skills and knowledge:

  • Linux and the command line.
  • Experience programming in at least Python or Scala/Java.
  • SQL.
  • Some understanding of distributed systems in general and how they are different from traditional storage and processing systems.
  • Deep understanding of the ecosystem, including ingestion (e.g. Kafka, Kinesis), processing frameworks (e.g. Spark, Flink) and storage engines (e.g. S3, HDFS, HBase, Kudu). Know the strengths and weaknesses of each tool and what it’s best used for.
  • Know how to access and process data.

A holistic understanding of data is also important. That can mean thinking and acting like an engineer and sometimes that can mean thinking more like a traditional product manager.

What are salaries like?

Salaries vary greatly between Big Tech, Fortune 500, other large tech companies and startups. Here’s a ball-park of what to expect:

Entry-Level: $80–100k

Mid-Career (2–3 Yrs): $120–160k

Senior (4–6 Yrs): $180–240k

Lead/Head (8–12 Yrs): $200–400k

This is only considering cash compensation. Stock grants could be a percentage of salary or multiples of salary depending on individual company performance.

Source: Paysa — Data Engineers

How do I become one?

Traditionally, data engineering is not an entry-level role. Most employers want individuals with some professional software engineering experience. However, companies are growing too fast these days and are willing to take fresh CS/bootcamp graduates.

If you are in school, I recommend getting a software or data engineering internship at a tech company that works with big data.

If you have work experience as a software engineer, Insight Data Engineering is excellent for those in the US and ASI’s Fellowship is the equivalent for UK/Europe.

If you can’t quit your day job, we are developing a data engineering course. It will be released mid-2019.

Where can I learn more?

Check out the following articles on Medium to learn more about the field:

An unofficial manifesto for the field of data engineering. It hits on the main challenges that data engineers face.

A gentle introduction to some of the common tasks data engineers tackle with code examples as well.

A step-by-step look at how Boxever settled on their data processing pipeline. This gives great insight on how data engineers make decisions about technology based on the project requirements.

--

--

No responses yet