DataMass Gdańsk Summit 2022
phone: +48 570 272 723
e-mail: kamil.piotrowski@evention.pl

 

Build reliable data pipelines using Modern Data Stack in the cloud

 

In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow. You will discover how you can improve your pipelines’ quality and the workflow of your data team by introducing a set of tools aimed to standardize the way you incorporate good practices within the data team: version controlling, testing, monitoring, change-data-capture, and easy scheduling. We will work through typical data transformation problems you can encounter on a journey to deliver fresh & reliable data and how modern tooling can help to solve them. All hands-on exercises will be carried out in the GCP environment.

Target Audience
Data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy data transformation workflows faster than ever before. Everyone, who would like to leverage their SQL skills and start working on building data pipelines more easily.

Requirements

  • SQL fluency: ability to write data transforming queries
  • Basic understanding of ETL processes
  • Basic experience with a command-line interface
  • Laptop with a stable internet connection (participants will connect to Jupyter Notebooks pre-created on Google Cloud Platform)

Participant’s ROI

  • Concise and practical knowledge of applying dbt to solve typical problems with data pipelines in a modern way: managing run sequence, data quality issues, monitoring, and scheduling transformations with Apache Airflow
  • Hands-on coding experience under the supervision of Data Engineers experienced in maintaining dbt pipelines
  • Tips about real-world applications and best practices.

Training Materials
During the workshop, participants will follow a shared step-by-step guideline with an overview from the perspective of augmenting a data team’s workflow with the dbt tool. Jupyter Notebook environments will be supplied for each participant. BigQuery pre-generated datasets will be provided to use for all participants to participate in the example real-life use case scenario.

 

Time Box

1 Day event

Agenda

Session #1 - Introduction to Modern Data Stack

  • What is Modern Data Stack? Intro
  • Key components of MDS
  • Core concepts of dbt
    • Data models
    • Seeds, sources
    • Tests
    • Documentation
  • Hands-on exercises

Session #2 - Simple end-do-end data pipeline

  • Data discovery (data search, usage statistics, data lineage)
  • Data profiling & exploration
  • Transforming data using SQL with dbt
  • Data consumption with BI tools
  • Hands-on exercises

Session #3 - Data pipeline - scheduling, deployment & advanced features

  • Apache Airflow as a workflow scheduler
  • Data testing & data observability
  • Exploring transformed data with Data Studio
  • Hands-on exercises

Prowadzący:

Data Analyst / Analytics Engineer
GetInData
Data Engineer
GetInData