DataMass Gdańsk Summit 2023
phone: +48 570 611 811

(Near) Real-time data processing in the cloud using Spark Structured Streaming and SparkML


The subject of this workshop is real-time data analysis using Spark Streaming. We'll cover how Spark streaming works and how it can be used in machine learning systems. We will be interested in building machine learning models for classification and clustering. The main application we will spend most of our time on will be network traffic analysis for detecting threats in computer networks.


Target Audience
Data analysts and data scientists interested in real-time data processing using Spark for application to machine learning systems.

Some experience with Python, basic knowledge of cloud computing, and machine learning concepts. You need a laptop with internet access. We will work in the Databricks cloud environment.

Participant’s ROI
Practical knowledge of building data processing systems using Spark Streaming.
Practical experience building machine learning models for classification and clustering.
Application of learned techniques for analyzing network packets in order to increase network security.
Training Materials
All participants will receive training materials in the form of PDF files containing slides with the theory and an exercise manual with a detailed description of all exercises. During the workshops, the exercises can be performed in the Databricks Platform.

Time Box

This is a one-day event (9:00 AM - 4:00 PM). We will schedule breaks between sessions.


  • Session # 1 - Introduction to real-time data analysis using Spark Streaming.
    Practical exercises.
  • Session # 2 - Introduction to Spark ML and building models for classification and clustering.
    Practical exercises.
  • Session # 3- Application of learned techniques to solve practical problems: network packet analysis to detect threats in the network.
    Practical exercises.
  • Session # 4 - Summary and discussion.


Head of Data Science