29.09 - Workshop Day
30.09 - Conference Day
Fourth edition of the conference
– First time together with Evention and GetInData
Conference for people who use the cloud in their daily work to solve Big Data,
Data Science, Machine Learning and AI problems
Community of dedicated experts will help us share knowledge and exchange
experience in shaping scalable and distributed computing solutions
DataMass Summit is not just another conference – it is an event created with passion. The Summit is aimed at people who use the cloud in their daily work to solve Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges.
This year we came back after a long break forced by a pandemic, in a classic on-site form. And we have great news for You! We have joined forces with GetInData and Evention (kudos to Adam Kawa for making it possible), who for years have been organizing the Big Data Technology Warsaw Summit, the largest BigData conference in Poland. So for you, our event will be even more interesting and even stronger in terms of content.
We planned two days, the first day was devoted to technical workshops conducted by practitioners. The second day included conference presentations by renowned experts in Big Data, Data Science, Machine Learning and AI, all in the context of cloud solutions.
Do come and join us!
Selection committee DataMass Gdansk Summit 2022:
During DATAMASS there will be 3 parallel workshops (additionally payable), each for a maximum of 25 participants
When/Where? September 29, Conference Center of Museum of The Second World War, Gdańsk
From 9 a.m. to 4 p.m (including lunch and coffee breaks), onsite
DETAILS:
In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow.
SESSION LEADER:
DETAILS:
The subject of this workshop is real-time data analysis using Spark Streaming. We'll cover how Spark streaming works and how it can be used in machine learning systems.
SESSION LEADERS:
DETAILS:
In this one-day workshop, you will learn how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.
SESSION LEADERS:
ZORBA RoofTop
Szafarnia 10 street, 80-755 Gdansk
The DATAMASS conference is not only about workshops and substantive presentations. It's also a great time for networking. That is why we invite you to an EVENING PARTY, during which, with good Greek food and drinks, we will have the opportunity to get to know each other and talk to the group of participants, speakers and conference partners.
The perfect place for integration will be the ZORBA RoofTop restaurant in Gdansk - a concept full of energy, music, good food and Greek aroma.
In this talk we'll talk about the Data Engineering Platform Datadog has built in multiple clouds. We'll discuss why, how, how it's used and the challenges we've faced.
If you move your analytics to the cloud and you're a bank, there are a few things to consider. We'll show you how we are doing this and why cloud and MLOps is the only way to go.
We love to analyze data. Often, we have to go to many different places to do one analysis that will help us make the correct business decision. Many sources exist because we match technology to varying types of data, different processing speeds, and costs. In addition, our organizations are changing rapidly and need more computing power and data storage space. How do you keep up with all this? Let's try to build an innovative, scalable data platform.
With increasing adoption of the native cloud based technologies, companies are struggling to migrate their legacy due to insufficient resources, lack of time and knowledge. See how our revolutionary metadata based platform – ADELE – helps customer to overcome these obstacles with automated metadata harvesting and solution re-platforming capabilities.
With the growing demand for real-time data in business, organizations need solutions that seamlessly extract data from many sources and ingest it into the cloud for further processing needs. Creating and managing data ingestion pipelines is a time-consuming and resource-intensive task. In this presentation you will learn about typical challenges and how they can be solved with DataLark - a LeverX developed modern data management platform, effectively combining batch and streaming data processing with data transformation capabilities.
We will talk about lessons learned from advanced analytics in traditional enterprise:
- how to talk with C-Line about Cloud?
- how to ingest data for further processing
- how to run processing on hundreds of cores and only spend few thousands a month
- which tools to use for data engineering in Google Cloud
- how to present results to the users
During the talk we will present the approach to data engineering at Pepsico eCommerce which scales to hundreds of DAGs/workflows and terabytes of data processed daily. This approach has progressed iteratively over the past 3 years, is used by 100+ engineers at PepsiCo and leverages 3 major cloud providers and several SaaS vendors. In particular, we will discuss the following points: data acquisition, modeling, transformation, data quality and lineage, and orchestration and scheduling.
For a modern publisher, data is like... water. This is a story how we moved our whole waterworks to the cloud. Story about changing our intakes, pipelines, pumps and sinks to the completely new environment. Story about using cloud-native services to optimize product development while keeping costs under control. We migrated over 120-datanode Hadoop cluster and massive data stream processing infrastructure to native data services provided by AWS. Join me, if you want to hear how such task can be performed in other way than just lift-and-shift.
In this session, you will learn how Automated ML can be combined with transfer learning to boost data scientist productivity when building computer vision models trained on medical image data. We will see new capabilities in Azure Machine Learning’s AutoML related to image classification, object detection and segmentation.
Important considerations when designing an IoT data pipe.
Data without the metadata is a noise without a value.
How the nature of input data impacts the solution design.
Maximizing the throughput vs saving the battery life.
How to leverage the pre-processing and post-processing in data migration projects.
- Pros & cons of multicloud solution
- How to do online inference of Deep Learning models?
- Why we should use CDK in our MLOps solution?
- Should we avoid manual steps in the automatic Machine Learning pipeline?
- Where is the end of Machine Learning Pipeline?
Apache Flink is a distributed stream processing engine with large known users including Alibaba, Amazon, Netflix and Uber amongst others. Running Flink workloads in cloud environments is gaining popularity, not only for proof of concepts but also large scale production environments.
This talk discusses a cloud provider agnostic approach for running Flink workloads using Kubernetes for the deployment and management of Flink jobs, Kafka as a message broker and the object store provided by your cloud for long term storage. The talks also demonstrates running Flink jobs in a cloud environment via the Flink Kubernetes Operator.
Have you ever wondered how AI technologies are used in the network area?
What kind of machine learning models do we use in anomaly detection?
How do we work with pipelines and Vertex AI?
Orange, as a telco operator has a very big and wide telecommunication network based on servers, routers, and wires or antennas. Each day millions of bites are flowing into the network to enable customers to call somebody, use the internet or send a file. How to maintain such a network? We will take you on a unique tour through our Predictive Network Maintenance project, starting from raw data and ending with an automized solution.
During this talk you'll learn how Volt.io together with GetInData and DataEdge architected a batteries-included Modern Data Platform using a combination of the state-of-the-art managed cloud services, R&D plugins and software engineering best practices to provide a scalable and self-service environment for analytics engineers, data analysts and business users.
There are four distinct ways to deploy a machine learning model in AWS. Amazon SageMaker contains Batch Transform, Asynchronous Inference, Serverless Inference and Real-time Inference. Each method has its own use case and its own limits. In our talk we compare them, we outline their pros and cons and we help you decide which one is the right fit for you.
Data Mesh enables every person in an organization to read and consume data produced by its software, greatly improving the cycle of discovering the data, learning about it, and envisioning new ways of utilizing it. All of that without the overhead of fragile ETL processes, monolithic data warehouses, or even highly sophisticated data lakes. Join me to hear about what Data Mesh is and how it can be implemented by Trino. Let's talk on how to finally enable people to easily discover, read and reason about data in your organization.
Not long ago, we had a chance to participate in a machine learning competition on the Kaggle platform. Usually, the goal of competing is to win, but hey - there’s only one winner among thousands of participants, so we tried to be smarter than that. We set up our own goals, just in case we somehow don’t manage to be the best. And guess what - we weren’t the best, but we learned a ton of things about different data analysis and machine learning approaches, useful MLOps and cloud tools and data science project team management. Accomplishing these research goals led us to some rough piece of machinery that combined Google Cloud Platform, Kedro, MLFlow and various analytical algorithms to solve a specific business problem. After we realized that with just a little bit of polishing and structuring we could forge it into a really robust framework for tackling a wide range of other, generic problems, we rolled up our sleeves and got to work. In this presentation we would like to show you what is our idea for such a framework, that allows you to take a data sample from your client even before making an official commercial offer, pick some bricks that match the use case, adjust them a little, quickly prototype the solution and get back to your client with an empirically proven estimate of analytical potential hidden in his data.
Engineering risk assessment is a process of analysing potential threats and vulnerabilities to enterprise IT systems to establish what loss an organization might expect to incur if certain events happen. Its objective is to help achieve optimal security at a reasonable cost. This is especially important while implementing new software to organization infrastructure such as cloud solutions.
During this short talk, I will present measurement methods that are used to indicate risk in IT infrastructure.
Based on examples in areas of cloud security, data management & governance, or incident management I will show challenges with establishing a risk profile.
This is a story about customer's implementation of complex event processing system running in very bad setup on Storm and how we manage to build clean design on Dataflow while maintaining core requirements.
In this talk, you will learn about the road to world-class observability of distributed and scalable systems. In this talk, you will:
- Learn about various stages of monitoring maturity
- Discover at what stage you are and learn how to get to the next level
- Learn about the monitoring prioritization pyramid
- Get best practices and tools recommendations to help you on your observability improvement journey
- Understand how to scale your observability in sync with your company
Cloud data warehousing technologies, such as Bigquery, have allowed companies to scale their analytics operations. Now at what point does it make sense to buy vs build. Bigquery has been a tremendous asset to Shopify, but we have had to reassess our relationship. We will walk through a case study of how Shopify is balancing the ability to move fast, along with supporting our needs of cost optimization, transparency, access controls, and customization.
Designing an experiment shows us how many constraints and limitations we have to deal with. Every business requires a different way to set up experimentations, therefore disparate techniques to test our solutions. Standard approaches in statistics, such as regression analysis, are concerned with quantifying how changes in X are associated with changes in Y. Unlike methods that are concerned with associations only, causal inference approaches can answer the question of why Y changes.
During DATAMASS there will be 3 parallel workshops (additionally payable), each for a maximum of 25 participants
When/Where? September 29, Conference Center of Museum of The Second World War, Gdańsk
From 9 a.m. to 4 p.m (including lunch and coffee breaks), onsite
DETAILS:
In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow.
SESSION LEADER:
DETAILS:
The subject of this workshop is real-time data analysis using Spark Streaming. We'll cover how Spark streaming works and how it can be used in machine learning systems.
SESSION LEADERS:
DETAILS:
In this one-day workshop, you will learn how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.
SESSION LEADERS:
DataMass strives to provide the best service possible with every contact!
BECOME A DATAMASS GDANSK SUMMIT PARTNER!
m: +48 509 622 541
e: mariola.rauzer@evention.pl
m: +48 604 112 883
e: dominika.opoka@evention.pl
CONTACT FOR ORGANIZATIONAL MATTERS
Kamil Piotrowski
+48 570 272 723
kamil.piotrowski@evention.pl
CONTACT FOR PARTICIPANTS
Weronika Warpas
+48 570 611 811
weronika.warpas@evention.pl
ADDRESS:
Conference Center of Museum of The Second World War
pl. Bartoszewskiego 1
80-862 Gdańsk