Big data engineering tech stack
Credit: Image Source
Big data engineers deal with data-related tasks like fetching data from different sources like relational databases ( MySQL, Postgres), NoSQL databases ( MongoDB), and application events from Kafka. His work involves fetching data from different sources, transforming the data, and changing it to queryable format so that organization analyst can query the data and can host the different types of reports and dashboard for analytics purposes. To master data engineering tasks, we need to hands on certain technologies. Relational databases 1. MySQL 2. Postgres Non-relations databases 1. MongoDB Any one or more below languages 1. Python 2. R 3. Scala 4. Java Analytical Engine 1. Apache spark 2. Hadoop 3. Sqoop Data Transformer Framework 1. Apache Hudi 2. Delta Lake 3. Iceberge Query Engine 1. Hive 2. Presto Messaging Queue 1. Kafka Distributed storage 1. Hadoop Distributed File System ( HDFS ) 2. Amazon Simple Storage Service (Amazon S3) 3. Blob Storage ( Azure ) 4. Google Cloud Storage ( GCS ) Workflow management platform 1. Airflow Miscellaneous 1. Git and Github 2. Docker & Docker Compose 3. Kubernetes ( K8s) 4. Metabase Cloud Technologies 1. Amazon Web Services EMR ( AWS EMR ) 2. AWS EC2 Change Data Capture (CDC) tools 1. Debezium 2. GoldenGate (Oracle) 3. Striim 4. Bottled Water (by Confluent) Only support PostgreSQL There is a very rich and great resources for Data engineering. You can visit Data engineering handbook by EcZachly.
Comments