Data Engineering Interview Preparation Guide

Ajay Kumar Nagaraj
3 min readMay 27, 2021

I recently attended an interview for the FAANG Data Engineering role . So, I thought of giving back to the community about my experience and how to prepare.

What does FAANG look for in a Data Engineer?

Data engineers are knowledgeable in a variety of strategies for ingesting, modeling, processing, and persisting data. They have expertise in building scalable data infrastructure, and they understand distributed systems concepts from data storage and compute perspective. Data Engineers are experts in SQL and have a strong understanding of ETL and data modeling. They are also proficient in one or more scripting or programming languages.

They ensure the accuracy and availability of data to their customers, and they understand how technical decisions can impact their business’s analytics and reporting. Data engineers work with data in large volume and velocity, often using the latest AWS and open source technologies.

How to prepare for a data engineering interview?

  1. Be prepared to discuss technologies listed on your resume. i.e. if you list Python or big data, expect technical questions about your experiences with these technologies. It is helpful to review the job description before your interview to align your qualifications against the job’s specific requirements and responsibilities. Also connect with your recruiter to ensure you are being aligned with a job that is the right fit.
  2. Be comfortable with writing SQL fluently and thinking about edge cases. Understand different types of joins and how condition filters affect the joins. Be familiar with ways of simplifying a complex query and optimizing performance. Practice writing queries that are correct and free of syntax errors without submitting them to an interpreter.
  3. Be ready to write syntactically correct code in your preferred language. Expect to utilize common data structures and algorithms, and to compare and contrast their usage in various applications.
  4. Be prepared to understand and identify underlying business problems and choose the right technologies when providing your solution. Given an ambiguous business scenario, be able to propose a data model and end-to-end data architecture that will solve the needs.
  5. Understand the differences and trade-offs between types of databases and when each is useful in building a system. Know what different styles of schema design exist and when to use each.
  6. Be able to demonstrate your understanding of how to tune database components to meet reporting needs and also to transform data more quickly. Be comfortable explaining how to optimize and debug ETL jobs in your most frequently used environment. Be prepared to demonstrate your knowledge of distributed computing from a storage and compute perspective, and how you can use distributed computing to meet high-performance, standards.
  7. Understand what technologies are used for ETL. Be able to design an ETL pipeline in both serverless and persistent compute and discuss the tradeoffs of each for the problem statement provided.
  8. Understand how to build/optimize logical data models and data pipelines for a given data set.
  9. Be familiar with concepts such as “Workflow as code” and “Infrastructure as code”.

Resources

SQL

  1. T-SQL Querying (Chapter 1, 3 must)
  2. T-SQL Window Functions (Chapter 1, 2 and 6 must)

Data Structures and Algorithms

  1. The Algorithm Design Manual (Part II must)

Databases

  1. Seven Databases in Seven Weeks

Distributed Systems

  1. Designing Data-Intensive Applications

ETL

  1. Data Pipelines Pocket Reference
  2. https://www.mdpi.com/2076-3417/11/1/191
  3. Data Pipelines with Apache Airflow
  4. Terraform

Data Modeling (Very Important. Focus more on OLAP)

OLTP Database

  1. PostgreSQL
  2. SQL Cookbook
  3. Use the Index Luke
  4. Normalization

OLAP Database

  1. Redshift Fundamentals
  2. Redshift Deep Dive
  3. Tuning Query Performance
  4. Redshift Best Practices
  5. Data Warehousing Toolkit

Reverse Interview

  1. https://github.com/viraptor/reverse-interview

It took me 2 months to prepare. Generally in FAANG, more than technical skills, your values as an engineer is valued more. Things like: How much % impact did you create? What was a time when you need to disagree with your manager for achieving the long-term goal?

Hope it helps. Enjoy!

--

--

Ajay Kumar Nagaraj

I believe software is the most effective way to touch others’ lives in our day and time.