DataMass Gdańsk Summit 2023
phone: +48 570 611 811

Build reliable data pipelines using Modern Data Stack in the cloud

In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow. You will discover how you can improve your pipelines’ quality and the workflow of your data team by introducing a set of tools aimed to standardize the way you incorporate good practices within the data team: version controlling, testing, monitoring, change-data-capture, and easy scheduling. We will work through typical data transformation problems you can encounter on a journey to deliver fresh & reliable data and how modern tooling can help to solve them. All hands-on exercises will be carried out in a public cloud environment (e.g. GCP or AWS).

During the workshop, participants will follow a shared step-by-step guideline with an overview from the perspective of augmenting a data team’s workflow with the dbt tool. Jupyter Notebook environments will be supplied for each participant. Pre-generated datasets will be provided to use for all participants to participate in the example real-life use case scenario.

Target Audience
Data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy data transformation workflows faster than ever before. Everyone, who would like to leverage their SQL skills and start working on building data pipelines more easily.

What do you get after the training?

  • Concise and practical knowledge of applying dbt to solve typical problems with data pipelines in a modern way: managing run sequence, data quality issues, monitoring, and scheduling transformations with Apache Airflow
  • Hands-on coding experience under the supervision of Data Engineers experienced in maintaining dbt pipelines
  • Tips about real-world applications and best practices.


  • SQL fluency: ability to write data transforming queries
  • Basic understanding of ETL processes
  • Basic experience with a command-line interface
  • Laptop with a stable internet connection (participants will connect to Jupyter Notebooks pre-created in a cloud environment

Session #1 - Introduction to Modern Data Stack

  • What is Modern Data Stack? Intro
  • Key components of MDS
  • Core concepts of dbt
    • Data models
    • Seeds, sources
    • Tests
    • Documentation
  • Hands-on exercises

Session #2 - Simple end-do-end data pipeline

  • Data discovery (data search, usage statistics, data lineage)
  • Data profiling & exploration
  • Transforming data using SQL with dbt
  • Data consumption with BI tools
  • Hands-on exercises

Session #3 - Data pipeline - scheduling, deployment & advanced features

  • Apache Airflow as a workflow scheduler
  • Data testing & data observability
  • Exploring transformed data with Data Studio
  • Hands-on exercises


  • 9.00 - 10.30
  • 10.30 - 11.00 - break
  • 11.00 - 13.00
  • 13.00 - 13.45 - lunch
  • 13.45 - 15.45
  • 13.45 - 16.00 - break
  • 16.00 - 17.00

Maximum number of the attendees

 Time Box
9.00 - 17.00 | 8h

Session leader:

Data Analyst / Analytics Engineer
Data Engineer