DataMass Gdańsk Summit 2022 is coming

Build reliable data pipelines using Modern Data Stack in the cloud


 

In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow. You will discover how you can improve your pipelines’ quality and the workflow of your data team by introducing a set of tools aimed to standardize the way you incorporate good practices within the data team: version controlling, testing, monitoring, change-data-capture, and easy scheduling. We will work through typical data transformation problems you can encounter on a journey to deliver fresh & reliable data and how modern tooling can help to solve them. All hands-on exercises will be carried out in the GCP environment.
 

Target Audience
Data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy data transformation workflows faster than ever before. Everyone, who would like to leverage their SQL skills and start working on building data pipelines more easily.

 
Requirements

  • SQL fluency: ability to write data transforming queries
  • Basic understanding of ETL processes
  • Basic experience with a command-line interface
  • Laptop with a stable internet connection (participants will connect to Jupyter Notebooks pre-created on Google Cloud Platform)

 
Participant’s ROI

  • Concise and practical knowledge of applying dbt to solve typical problems with data pipelines in a modern way: managing run sequence, data quality issues, monitoring, and scheduling transformations with Apache Airflow
  • Hands-on coding experience under the supervision of Data Engineers experienced in maintaining dbt pipelines
  • Tips about real-world applications and best practices.

 
Training Materials
During the workshop, participants will follow a shared step-by-step guideline with an overview from the perspective of augmenting a data team’s workflow with the dbt tool. Jupyter Notebook environments will be supplied for each participant. BigQuery pre-generated datasets will be provided to use for all participants to participate in the example real-life use case scenario.

 

Time Box

1 Day event
 

Agenda
Session #1 - Introduction to Modern Data Stack

  • Extracting data with modern data ingestion tools
  • Transforming data using SQL with dbt
  • Data discovery (metadata search, usage statistics)
  • Data consumption with BI tools & notebooks
  • Data consumption with BI tools & notebooks
  • Impact on data teams’ workflow and the role of Analytics Engineer

Session #2 - Core concepts of dbt

  • Data models
  • Seeds, sources
  • Tests
  • Snapshots
  • Documentation, maintenance, and data lineage
  • Hands-on exercises

Session #3 - Scheduling, deployment, workflow, BI Tools

  • Apache Airflow as a workflow scheduler
  • dbt cloud for bootstrapping dbt project
  • Exploring transformed data with Data Studio
  • Hands-on exercises

 
Prowadzący:

Data Engineer
GetInData