WORKSHOP DAY ONSITE
OCTOBER 4, 2023
CONFERENCE DAY
OCTOBER 5, 2023
WORKSHOP DAY ONLINE
OCTOBER 6, 2023
04.10.2023 - Workshop Day - onsite
05.10.2023 - Conference Day - onsite
06.10.2023 - Workshop Day - online
CLOUD AGAINST DATA
CLOUD AGAINST DATA
WORKSHOP DAY ONSITE
OCTOBER 4, 2023
CONFERENCE DAY
OCTOBER 5, 2023
WORKSHOP DAY ONLINE
OCTOBER 6, 2023
CLOUD AGAINST DATA
WORKSHOP DAY ONSITE
OCTOBER 4, 2023
CONFERENCE DAY
OCTOBER 5, 2023
WORKSHOP DAY ONLINE
OCTOBER 6, 2023
Fourth edition of the conference
– First time together with Evention and GetInData
Conference for people who use the cloud in their daily work to solve Big Data,
Data Science, Machine Learning and AI problems
Community of dedicated experts will help us share knowledge and exchange
experience in shaping scalable and distributed computing solutions
DataMass Summit is not just another conference – it is an event created with passion. The Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges.
This year it is already the 5th edition, and the 2nd together with our wonderful partners GetInData and Evention, who for years have been organizing the Big Data Technology Warsaw Summit, the largest BigData conference in Poland. In your own opinion, last year's event was great, but this one will be even better!
This year we plan three days, the first day was devoted to technical workshops conducted by practitioners from our community. The second day included conference presentations by renowned experts in Data Engineering, Big Data, Data Science, Machine Learning and AI, all in the context of cloud solutions. The third day is also devoted to workshops, this time prepared by our partners.
Do come and join us!
check previous edition of datamass gdansk summit
Selection Committee DataMass Gdansk Summit 2023:
During DATAMASS there will be 3 parallel workshops (additionally payable), each for a maximum of 25 participants
When/Where? September 29, Conference Center of Museum of The Second World War, Gdańsk
From 9 a.m. to 4 p.m (including lunch and coffee breaks), onsite
DETAILS:
Join us for an immersive one-day workshop on constructing robust data pipelines using the Modern Data Stack in the cloud. Tailored for data analysts, analytics engineers, and data engineers, who are interested in learning how to build and deploy data transformation workflows faster than ever before.
Throughout the day, engage in hands-on exercises addressing common data transformation issues, facilitated in a collaborative cloud setup (like GCP or AWS).
You'll follow structured guidelines, exploring real-life use cases with the assistance of pre-generated datasets, enhancing your SQL capabilities and understanding of data pipeline frameworks.
SESSION LEADER:
DETAILS:
Join us for a one-day workshop on Generative AI and large language models. This event aims to provide participants in-depth knowledge of the latest advancements in natural language processing, computer vision, and machine learning techniques for Gen AI.
The workshop will explore real-life applications of large language models using cutting-edge models such as GPT, PalM, and open-source LLMs. Participants will also learn how to use industry-standard LLMs with APIs, fine-tune models on their data, and deploy private LLM-based assistants.
Upon completing the workshop, attendees will gain a comprehensive understanding of integrating Generative AI into data solutions.
SESSION LEADER:
We invite you to a place where the exquisite flavors of Polish cuisine in a modern version reign supreme. Everything with a hint of history that surrounds us in the Museum of the Second World War and a magical view of the vibrant modern Gdańsk.
Some argue that education, as opposed to other industries, hasn’t been yet fundamentally impacted by AI. Generative AI is poised to change that. Companies & institutions are embedding large language models across educational products to transform the teaching & learning experience. However, implementing these new capabilities comes with several technical, ethical, and legal challenges.
In this talk, we will explore how these challenges are being addressed at Pearson, the world’s learning company. Pearson has been pioneering the use of trustworthy AI systems for millions of users worldwide for nearly three decades.
We will shed light on practical issues such as the validity, reliability, and fairness of assessment based on pre-generative and generative AI. We will delve into the operational and reputational risks associated with using pre-trained models that may contain built-in bias, or implementing blackbox models that may lack the transparency and interpretability required by customers or regulations such as the upcoming EU AI Act.
Discussing these challenges will allow us to get to a better understanding of the careful balance between technological advancement and ethical considerations needed for the future of education.
Every day DNV processes positions of around 200 000 in the world to gather historical data that can be used for analytics and research.
This result in large historical data set reaching 30 bilion rows processing this is done using Spark in DataBricks environment on Microsoft Azure.
Handling such volume of data brings all kinds of interesting challenges and opportunities that we will touch upon during this session.
- handling and storing large volumes of geospatial data
- optimization challenges and techniques for geospatial data in Spark
- looking into data quality issues
- opportunities unlocked by having such data set: trading patterns, carbon footprint, route prediction, ...
We will try to present the technical aspect of working with geospatial data with Spark, but also give some business context to the interesting Maritime domain.
2023 is Generative AI’s breakout year with most of the organizations already adopting Generative AI in their products and services. Nevertheless, the findings show that these are still early days for managing Generative AI–related risks and challenges. In this panel we will talk with our experts about an adoption of Generative AI projects at their companies, technologies that they are using, skills and competencies they develop in their teams, and challenges, barriers and risks they see and mitigate.
We all know the hype around large language models like ChatGPT. What if we could bring it securely to our companies?
Bring the practical use in daily business processes using company data backed up with Generative AI.
Discover how AskYourData can revolutionize access to internal resources. Stop searching through presentations, PDFs and text documents. Simply ask.
eMAG is an online marketplace and e-commerce leader in Eastern Europe headquartered in Romania. As in every modern data-driven company, most of the decisions are based on data provided by the Business Intelligence / Big Data team.
Discover how our team collaborated with Cloudera to revolutionize eMAG's data landscape. How did Big Data technologies enhance our data-driven decision-making processes? What challenges did we face? What lessons did we learn?
An important challenge faced by numerous data-mature organizations is the reliable and efficient deployment of a substantial number of machine learning models.
At FreeNow, we have tackled this issue by developing a pipeline that enables us to deploy over 70 machine learning models.
To accomplish this, we leverage a range of open-source tools, including GitLab CI, MLFlow, BentoML, Docker, and Kubernetes.
While almost every digital product claims to be AI powered these days, some do better than others in the market. In this talk we will explore a few schools of thought on development of "AI powered products" through the lens of the user - the human - you and me, who just want to do the best work of their lives.
4 Key Learning Points for Audience:
1. Getting the AI product to be "technically correct" is expensive in multiple ways.
2. People (users) do not like being proven incorrect/wrong.
3. While the power of the product is the tech (artificial intelligence), the perceived value lies in the integrated intelligence, and that’s what makes people love your AI powered product.
4. Being cloud native helps you shorten the path from being right to being loved.
Foundation models as part of the MLOps process. Why and how are they created? How to manage them? What can they give us in the organization? What is transfer learning and fine-tuning? We will try to answer these and other questions in a short presentation
- Scientists have developed a non-invasive brain decoding system capable of converting stories heard by participants into textual form using their MRI scans.
- In the future, this technology has the potential to help individuals who are mentally aware but physically unable to speak, facilitating communication.
- The model's accuracy continually improves and already has the ability to grasp the overall meaning of a sentence that the participant is hearing.
The selection of managed and cloud-native machine learning services that you can run your data science pipelines and deploy your trained models is versatile. Unfortunately, there is no single way of interacting with platforms like Amazon Sagemaker Pipelines, Google Vertex AI Pipelines, Microsoft AzureML Pipelines, Kubeflow Pipelines – or recently SQL and Snowpark within Snowflake data warehouse. In this presentation you will learn how a production-grade MLOps Platform built by GetInData and powered by battle-tested, cloud-native and/or open-source technologies such as Kedro, MLflow, and Terraform would make your data scientists’ life easier and more productive - regardless of what cloud provider you use.
Recommender systems are by all means a quite mature topic in the AI/ML community. Nonetheless, they are of a paramount importance for business, bringing direct value for marketing, sales, customer satisfaction to name a few. In the talk I will review the modern approaches to recommender systems from both scientific and business world and point out which algorithms are useful for a particular business problem. Finally, I will present how we approach a less typical recommendation problem we have in Amazon Ring.
At Allegro, we face the challenge of deploying and monitoring machine learning (ML) models built by research engineers and data scientists without extensive application deployment experience. To address this, we leverage the existing DevOps expertise and tools and extend them to support not only microservices but also ML models. In this presentation, we introduce Allegro's ML platform, which enables multiple teams to deploy custom models with low latency, robustness, scalability, and observability. We explore key components, including a custom Feature Store, Dataset Model Repository, and Online Prediction system.
- A story of how we've seized the opportunity to overhaul how we model the data completely.
- I'll share how we deploy dbt workloads in AWS and Databricks.
- I'll outline the decisions and challenges we've faced.
This presentation aims to illuminate the transformative potential of data in enhancing psychotherapy practices, specifically focusing on the optimization of Motivational Interviewing (MI) using Natural Language Processing (NLP). The discussion encompasses the scope of CareOps and delves into its application within "session intelligence", data usage & tagging, the utility of large language models (LLMs), and real-life implications of these advancements on therapy dynamics.
Outline:
1. Introduction to CareOps: A brief overview of CareOps with a specific focus on session intelligence and the role of data in enhancing the psychotherapeutic process.
2. Motivational Interviewing (MI) Explained: Understand the essence of MI and its significance within the field of psychotherapy.
3. The Intersection of MI and NLP: Explore the types of data used, and the various LLMs we experimented with to optimize our classification (and text generation) capabilities, contributing to a better understanding of complex psychotherapeutic concepts; I will delve into which of the processes, language model architectures, input types and linguistic contexts seem to work best, and the various "why's" - both from technical standpoint and the theoretical one.
4. NLP and LLM Enhancements: Delve into the various NLP "tricks" implemented on the models and the complex process analysts undertake in data tagging to improve abstract understanding in psychotherapy.
5. Impacts and Outcomes: Discuss the effectiveness of these models in capturing intricate therapy dynamics, and their potential in real-time improvement of therapeutic outcomes and behaviors in real-world settings.
During our presentation we would like to share how our organisation is using the power of event-driven architecture in Data Mesh implementation. Everything is based on modern, cloud-based tech stack with Kafka (MSK) in its core. Our company consists of many distributed systems. Data isolation and lack of a common understanding of Data Domains crucial for our business are the challenges we are solving thanks to Data Mesh principles. Our Data Platform is fully self-service where with CI/CD based on SBT and Terraform we create all necessary components allowing Data Owners to easily produce their data (and document its schema and meaning) on one end, and Data Consumers (Operations, Data Science, Analytics) integrate on the other. We are happy to show how you can automatically create e2e Data Mesh flow with kafka topics, schema registration and necessary connectors populating analytical layer.
#Kafka #EventDrivenArchitecture #EventSourcing #MetadataDrivenAutomation #DataMesh #DataGovernance #DataDemocratisation #AWS #CI/CD
Onboard the LLM hype-train! 🚂 Are you curious about what LLM's (Large Language Models) can do for you? One of their use cases is to enrich datasets, by converting text into structured data. This can have huge benefits. Useful facts that were previously hidden inside a large piece of text can now be unveiled, allowing your customers to more accurately filter and query your data. Let's explore how this can affect house-search.
In this talk, we will discover together 🚀:
- The current state of LLM's
- Usecase: housing dataset with mixed structured/unstructured data
- Choosing a LLM (Microsoft Azure / Google (GCP) / Open source offerings)
- Prompt engineering
- Validating LLM output (LangChain, JSON)
- Conclusion & results
Explore the convergence of two data realms—SQL and Python, data scientist, and data engineer—within the landscape of BigQuery. In this presentation (and demo!), we'll delve into a real-life use case unveiling the simplicity of polyglot data pipelines while maintaining the crucial aspects of data governance, version control, lineage, and quality tests.
Why is business analytics done by a business user so difficult and yet so important?
How does AI and NLP facilitate analytics and what are the implications?
An example of business analytics based on Snowflake + dbt + ThoughtSpot
The data is the oil of all modern companies, however consuming metadata coming from different corners of the company and sharing it with colleagues is still the challenge as the part of Data Factory and Mesh concept under Data Governance Universe umbrella.
The aim of the presentation is to show how using modern technologies Data Products may be created and data transparency with highest quality assured from business and regulatory perspective.
This presentation shows a conceptual roadmap towards truly large-scale and real-time AI in the cloud. We show our approach towards an AI copilot for data scientists working with behavioral event data. Current problems, mainly connected with scalability and technical complexity of modeling tools, lead to suboptimal use of data (e.g. usage of static, stale feature stores which are rarely updated). We propose a next-gen approach where the hardest parts of the data modeling pipeline are automated and fresh data is used continuously, due to adequate design of both algorithms and cloud integration tools.
Presentation outline:
1. How to use AI to predict your customers' behavior from event stream data – real-life approaches and problems. What are the current hot topics such as uplift, churn, propensity modeling and recommender systems.
2. Next steps on the road towards large-scale behavioral modeling: How to create AI that really works on massive datasets in the cloud (on the example of Synerise Monad platform). Smart and scalable foundation models that train and make predictions in the cloud.
3. Introduction to Cleora and EMDE, our open-source tools for real time multimodal modeling. These algorithms use graph theory and manifold learning to transform streams of event data into behavioral profiles of customers.
When Amazon launched Alexa in 2014, we made a promise to our customers that it would get better every day. To deliver on that promise, we need to measure the impact of new products and features on the Alexa usage, to understand what drives customer satisfaction. Doing this through random experiments is not always feasible or desirable. To help with that, we developed a customer matching model that allows our teams create synthetic control groups from existing user base. We use a concept similar in spirit to widely-used propensity score matching (PSM) method, however instead of estimating propensity for treatment, we predict future engagement with Alexa during the next week. In our talk we will tell you how our model manages to turn a broad spectrum of customer features into actionable predictions. We will also tell you how we managed to move from a solution that required tangible engineering skills and partially local execution to operate to a cloud-based self-service tool. We also built a robust data pipeline that improved reliability, scalability and allowed us to improve accuracy of our predictions with more advanced and compute-intensive machine learning techniques. Come and see how real-life marketing needs can be addressed with data science and cloud engineering!
During this talk, I’d like to share my experience and learnings from building a cloud-based ML platform for developing and operating ML models and products at scale.
We will discuss integrating native, cloud-based solutions with the existing company’s ecosystem and leveraging the best of two worlds. And all that while shortening time to value, saving infrastructure costs, and reducing technology (and carbon!) footprint.
This presentation provides a brief summary of data analysis issues specific to the automotive industry as well as solutions capable to handle them. In particular the following topics will be presented:
* Analysis of sales data where each product has its unique identity and configuration
* ML aided prediction of optimal product portfolio
* Connected cars - cars in IoT world
In addition, the AWS based VGP data lake will be briefly described, as the technical tool capable to solve the above business challenges.
This workshop will introduce you to the latest Google Cloud Platform services in the field of artificial intelligence. You'll learn about Document AI - a solution that enables the extraction, analysis, and processing of various types of documents (text, forms, documentation, invoices, prescriptions, and more). Additionally, you'll explore the capabilities of Generative AI for creating your own conversation models using the Vertex AI PaLM API. We'll also present the capabilities of Duet AI - a solution that can write emails for you, prepare presentations, summarize meetings, and even write code for your applications!
This workshop is primarily aimed at technical practitioners (programmers, application developers, ML engineers, data analysts) but also at business owners who want to learn how to take their organization to the next level. In short - for anyone who believes that artificial intelligence can make a real difference in their daily work.
During DATAMASS there will be 3 parallel workshops (additionally payable), each for a maximum of 25 participants
When/Where? September 29, Conference Center of Museum of The Second World War, Gdańsk
From 9 a.m. to 4 p.m (including lunch and coffee breaks), onsite
DETAILS:
In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow.
SESSION LEADER:
DETAILS:
The subject of this workshop is real-time data analysis using Spark Streaming. We'll cover how Spark streaming works and how it can be used in machine learning systems.
SESSION LEADERS:
DETAILS:
In this one-day workshop, you will learn how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.
SESSION LEADERS:
DataMass strives to provide the best service possible with every contact!
BECOME A DATAMASS GDANSK SUMMIT PARTNER!
m: +48 509 622 541
e: mariola.rauzer@evention.pl
m: +48 604 112 883
e: dominika.opoka@evention.pl
ADDRESS:
Conference Center of Museum of The Second World War
pl. Bartoszewskiego 1
80-862 Gdańsk