DataMass Gdańsk Summit 2022
phone: +48 570 272 723
e-mail: kamil.piotrowski@evention.pl

(Near) Real-time data processing in the cloud using Spark Structured Streaming and SparkML


 

The subject of this workshop is real-time data analysis using Spark Streaming. We'll cover how Spark streaming works and how it can be used in machine learning systems. We will be interested in building machine learning models for classification and clustering. The main application we will spend most of our time on will be network traffic analysis for detecting threats in computer networks.

 

Target Audience
Data analysts and data scientists interested in real-time data processing using Spark for application to machine learning systems.

 
Requirements
Some experience with Python, basic knowledge of cloud computing, and machine learning concepts. You need a laptop with internet access. We will work in the Databricks cloud environment.

 
Participant’s ROI
Practical knowledge of building data processing systems using Spark Streaming.
Practical experience building machine learning models for classification and clustering.
Application of learned techniques for analyzing network packets in order to increase network security.
 
Training Materials
All participants will receive training materials in the form of PDF files containing slides with the theory and an exercise manual with a detailed description of all exercises. During the workshops, the exercises can be performed in the Databricks Platform.
 

Time Box

This is a one-day event (9:00 AM - 4:00 PM). We will schedule breaks between sessions.
 

Agenda

  • Session # 1 - Introduction to real-time data analysis using Spark Streaming.
    Practical exercises.
  • Session # 2 - Introduction to Spark ML and building models for classification and clustering.
    Practical exercises.
  • Session # 3- Application of learned techniques to solve practical problems: network packet analysis to detect threats in the network.
    Practical exercises.
  • Session # 4 - Summary and discussion.

 
Prowadzący:

Head of Data Science
DataMass