DataMass Gdańsk Summit 2018 DataMass Gdańsk Summit 2017
What is DataMass Gdańsk Summit?
It’s not just another conference. It’s an event created with passion – and targeted at those who use Big Data in practice in their everyday work. The main idea behind the conference is to promote knowledge and experience in designing and implementing tools for the analysis of big data volumes.
We connect the people who care
We believe that building a community of dedicated experts will help us share knowledge and exchange experience in shaping scalable and distributed computing solutions.
We value real-life expertise – is there any better way to learn than from actual specialists? Our speakers, true practitioners from top data-driven companies, will show you what they have learned and discovered about the Big Data world.
Answering your needs
Playing an active role in the Big Data world, we could see that a big technical conference focused on the exchange of knowledge and experience in this field was needed in Poland. The aim of DataMass Gdańsk Summit is to create a synergy between businesses focused on the creation and implementation of Enterprise-class solutions, together with experience and knowledge of the academic environment. The Data Science meetup in Gdańsk is a tangible proof that this is currently possible.
Big Data, Big Responsibility | 9:30-10:00 | We are at a cusp of a technological revolution, brought about by Big Data and Artificial Intelligence (AI). The novel ways of gathering, analyzing and interpreting data offer great advances in productivity, and will likely reshape the global economy. When properly harnessed, Big Data can be a force for good, poignant examples include in healthcare or crises management.
However, there is a darker side to this technological advancement. Big Data can also be used to control and manipulate people on an unprecedented scale, whether for pecuniary or political goals. It is already exacerbating economic inequalities, both at a local and a global scale, by greatly multiplying the wealth of the already affluent and affecting global employment patterns. Furthermore, it can lead to entrenching existing biases and solidifying unjust power structures. Data scientists and AI engineers are set to reshape the world. This is a boon, a privilege – but also a tremendous responsibility. This talk aims to highlight the potential pitfalls of the ongoing technological revolution and encourage attendees to actively grapple with the ethical dilemmas it poses. |
Jakub Szamałek
Author & Video Game Writer |
---|---|---|---|
Modern Stream Processing Engines Compared — Kafka Streams VS Spark Structured Streaming | 10:10-10:50 |
There is quite a bit to learn about any stream processing engine. But at a reasonably high level they actually are very similar and have lots in common. Not only do all have to offer a high-level stream processing API to describe distributed streaming dataflows, but also a low-level API for more sophisticated streaming topologies. The engines translate the dataflow description into their internal runtime representation. That’s where the differences are and where we’ll be looking at.
This talk compares two modern stream processing engines — Kafka Streams and Spark Structured Streaming. We’ll be talking about their internals and how the engines manage stateless and stateful streams. You will learn about their similarities and differences that should shed more light on the question when to use which engine. |
Jacek Laskowski
IT Consultant |
Where streaming meets batch: Data Integrations in the real world | 11:00-11:30 | Most 3rd-party APIs still only support batch updates, so how do you leverage the capabilities of streaming internal data? We’ve used some great new features of Kafka Streams to limit the load of hydration requests upon our internal databases while writing our own streaming batch implementation to only send updates to external partners (Google, Facebook) at rates they support. This is helping us deliver more up-to-date data to our 3rd parties with less strain on our production databases. | Emily Sommer
Etsy |
Introducing Cloudera Data Platform (CDP), the industry’s first enterprise data cloud
|
11:40-12:10 | Cloudera Data Platform (CDP) combines the best of Hortonworks’ and Cloudera’s technologies, to deliver the industry’s first enterprise data cloud. CDP delivers powerful self-service analytics across hybrid and multi-cloud environments, along with sophisticated and granular security and governance policies that IT and data leaders demand. | Balazs Gaspar
Cloudera |
Serverless – the next big thing in data processing | 12:20-12:40 | For a number of years, microservices architecture has been growing to become a silver bullet for all the pains of the IT industry (some still believe that is true) but it has never found the way to become fully utilized in data processing domain. However, a new buzzword emerges from the ashes of expectations related to microservices, its name Serverless. But is it yet another iteration of the same pattern or the solution that brings enlightenment to the whole IT field? Is it better suited for data processing than old, rusty microservices? This talk is going to explore serverless architecture in the data-oriented world and hopefully, find the answer to some burning questions. | Tomasz Sosiński
Scalac |
Big Data simulation and analysis of numerical solutions from PDEs | 12:50-13:20 | In this work, the d3f software is used for numerical solving the PDEs describing the Elder problem. The author adapted this software to conditions of a Spark cluster. That allowed implementing the mass parallel runs of the d3f, as well as efficient analysis of the result data using BigData technologies. Some important run metrics/estimates calculated.
The author explored the data within the broad range of Rayleigh numbers (Ra) with different grid levels, time steps and the simulation time. Ra sub-ranges containing the bifurcation points explored in more details. The conditional probabilities of the steady states estimated. A method for automatic recognition of steady states described. Also, the author presented an approach on how to build a predictive model for the Elder problem. |
Roman Khotyachuk
NORCE Norwegian Research Centre AS |
Lunch | 13:30-14:30 | ||
3 paramount drivers of AI projects | 14:40-15:10 | Artificial Intelligence offers a wide spectrum of possibilities that can be used not only in commercial projects but also in scientific initiatives as well as part of building new innovative products. But what are the drivers? Kamil Folkert, PhD, will talk about the most important factors of AI projects. | Kamil Folkert, PhD
3Soft |
Optimizing Speech-Input Length for Speaker-Independent Depression Classification | 15:30-15:50 |
Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. We analyze results for speaker-independent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. We examine performance as a function of response input length for two NLP systems that differ in overall performance. Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session. Systems share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it is better to pose a new question to the speaker, than to continue the current response. These and additional reported results suggest how applications can be better designed to both elicit and process optimal input lengths for depression classification. |
Tomasz Rutowski
Ellipsis Health |
Smart Cities Data – Discovering true potential of IoT street observations with Azure | 16:00-16:30 | Instrumenting the city is vital to preserve and enhance urban living across the globe and the amount of investment in Smart Cities is set to leap forward in the next few years to an estimated $600bn globally by 2027. As we collect, share and monetize data from the urban environment there are fascinating technical challenges involved with building the associated data pipelines from performance techniques to data anonymization. Kainos is investing in innovation in this area and this session will walk through some of our findings. | Mateusz Zając
Kainos |
AI’s not as black as it is painted | 16:30-17:00 | As any new technology artificial intelligence brings new set of challenges. Fortunately, at least some of those fears can and should be mitigated already. During this session I’ll try to debunk some of the common threats and share (technical) solutions to others. Human’s future is bright! | Michał Zyliński
|
In the service of the history. AI in archivistics | 10:10-10:50 | 1839 is a data generally accepted as the birth year of practical photography. Since then mankind produced about 10 quadrillions of photos including 1 quadrillion only last year. This huge amount of unlabeled an undescribed data is a problem if we want to obtain important information quickly and efficiently. Old photos are extremely valuable because they contain a lot of data about the past. However, some expertise and experience are needed to properly describe such images. What if we include all this knowledge into neural networks? Can AI become a friend of the 21st century archivist? Let’s talk about automatic image tagging and faces recognition in old photos. | Adrian Boguszewski & Natalia Ziemba-Jankowska
Linux Polska |
---|---|---|---|
How to expand Data Team from 8 to 22 members in one year and do not go crazy? | 11:00-11:30 | What’s the role of analytics in start-up success? How to build a proper data-driven culture in a dynamic environment? What are the different setups for data teams?
We explored and formed different analytical roles together: analyst, business analyst, scientist, engineers, data quality specialists. In this presentation, we’re going to share our lessons learned from this exciting journey called building Data Science Team. We’re going to share all the challenges we faced with, as well as all achievements too. |
Piotr Rosiak
AirHelp |
How to efficiently use huge satellite imagery dataset with Machine Learning | 11:40-12:10 | The talk is about new possibilities arising from analyzing satellite imagery. Satellite data changes the game, as it allows to travel in time and reach information not available to business. Combined with the advances in image recognition and computing power, satellite data analysis offers possibilities to automate or streamline processes and design better products. Satellite data is huge and non-obvious. Thanks to currently available technologies you can access it, build forecasts and observe events that were undetectable before. I will show you what possibilities can be offered with the use of deep learning on satellite images and how our data science department has been successfully working with satellite data to build decision support systems for business. | Damian Rodziewicz
Appsilon |
How Bigdata, AI & Blockchain can have a positive impact of upcoming economies by adding validation and value in their democratic process | 12:20-12:40 | Countries like India, Pakistan and similar Asian economies have lot of talent, land and resources but still they do not perform as per their expectation on a global scale.
The reason is lack of education, healthcare and last but not the least … corruption. There is a way that these countries can flourish and prosper by adapting Blockchain, AI into their governmental, democratic and administrative procedures. How is this possible and what could be the roadmap? And how companies who offer these services can benefit from it. |
Ammar Akhtar
Final Rentals |
Introducing Data Science teams to Big Data environments | 12:50-13:20 |
Every Data Scientist should know the theoretical foundations behind statistical models, their advantages, disadvantages and ways of using them to solve various types of problems. Analysis of data on one computation unit is not a problem for a typical analyst. However, Big Data environments have their own traps, training and running models operating in distributed environments must meet a much wider range of criteria. In this talk I will present the path followed by a typical Data Scientist in order to learn and understand Big Data environments and how to use them efficiently. I will demonstrate comparison of tools such as Spark, Dask, TensorFlow and Ray and how their knowledge can help ordinary Data Scientist become Big Data Scientist. This talk is dedicated for both Big Data engineers who are responsible for running statistical models prepared by Data Science teams, as well as for Data Scientists themselves who want to understand how to build solutions that can operate efficiently in distributed environments. |
Konrad Słoniewski
Atos |
Lunch | 13:30-14:30 | ||
Robot technologies that will improve the quality of life over the next 5 years | 14:40-15:10 | Robotics and exoskeletons can improve our life in the next 10 years. How does robotics already help us live now? Robotic technologies that will improve the quality of life over the next 5 years. How can the world of robotics and exoskeletons change in 10 years and its positive impact on people’s lives? |
Anton Holovachenko UniExpo |
A story of a model | 15:30-15:50 | In this talk we will follow a machine learning (ML) model lifecycle. As an example, we will look at a chessboard position recognition system. We will discuss common tradeoffs and pitfalls that are encountered during data collection, training, deployment and live running phases. The talk would mostly focus on custom object detection model and computer vision techniques. Occasionally we will peek into other domains, e.g. spoken language processing, for complimentary examples.
The narration would be centered on Python/Tensorflow ecosystem. Yet described choices are general enough to be applicable to other frameworks. Some of the topics that we will discuss:
This talk may be interesting for business owners considering risks of incorporating deep learning tech into their product and for ML engineers looking for model training tips. |
Dmytro Tkanov
SciForce/EyeAI |
Improving Demand Forecasting with Artificial Intelligence: a Practical Case Study | 16:00-16:30 | Despite a wealth of well-established forecasting approaches, retailers lose billions of dollars worldwide each year due to overstocks and out-of-stocks. AI, however, can strike a balance between the two and help predict demand as accurately as possible. The presentation reveals the case study on AI-based demand forecasting solution for Lenta, one of the world’s biggest retailers and the second-largest retail chain in Russia. We will share results, challenges, and pitfalls of implementing automated forecasting system to existing business processes. | Alexey Shaternikov
DSLAb |
Predictive Maintenance, or AI in industry | 16:30-17:00 | Along with the automation of production, we have gained a new, valuable source of data. Collecting, storing and analyzing data slowly becomes a new standard. This phenomenon has been hailed as the fourth industrial revolution. Undoubtedly, this is one of the signs of global progress. However, production automation also brings new challenges. Due to the complexity and scale of production processes, the use of artificial intelligence becomes indispensable in order to analyze the multitude and variety of data. Predicting failure is one of the key tasks faced by the industry today. Predictive Maintenance is designed to help to estimate when maintenance should be performed based on the actual condition of the equipment. AI has a lot to offer in this area. | Łukasz Grala & Natalia Szóstak
TIDK |
We operate in a company based on trust. This can be achieved through communication and experienced support.
Want to become a speaker or support the conference? We are opened for cooperation, so just drop us a line! Our entire team ensures you’re receiving the best support and information possible.
Centrum Stocznia Gdańska
pedestrian entry:
Wałowa Street 27a
car entry:
ul.Lisia Grobla