Data engineer tech-stack


Card image cap

Big data engineering tech stack

Credit: Image Source

Big data engineers deal with data-related tasks like fetching data from different sources like relational 
databases ( MySQL, Postgres), NoSQL databases ( MongoDB), and application events from Kafka. 
His work involves fetching data from different sources, transforming the data, and changing it to queryable 
format so that organization analyst can query the data and can host the different types of reports and 
dashboard for analytics purposes.

To master data engineering tasks, we need to hands on certain technologies.

Relational databases
1. MySQL
2. Postgres

Non-relations databases
1. MongoDB

Any one or more below languages
1. Python
2. R
3. Scala
4. Java

Analytical Engine
1. Apache spark
2. Hadoop
3. Sqoop

Data Transformer Framework
1. Apache Hudi
2. Delta Lake
3. Iceberge

Query Engine
1. Hive
2. Presto

Messaging Queue
1. Kafka

Distributed storage
1. Hadoop Distributed File System ( HDFS )
2. Amazon Simple Storage Service (Amazon S3)
3. Blob Storage ( Azure )
4. Google Cloud Storage ( GCS )

Workflow management platform
1. Airflow

Miscellaneous
1. Git and Github
2. Docker & Docker Compose
3. Kubernetes ( K8s)
4. Metabase

Cloud Technologies
1. Amazon Web Services EMR ( AWS EMR ) 
2. AWS EC2

Change Data Capture (CDC) tools
1. Debezium
2. GoldenGate (Oracle)
3. Striim
4. Bottled Water (by Confluent) Only support PostgreSQL

There is a very rich and great resources for Data engineering. You can visit Data engineering handbook by EcZachly. 

Comments