BUSINESS CASE

A datalake in 3 months

Challenge : a custom-made datalake

Thanks its "hypergrowth", the future French unicorn in the tourism sector saw its data volumes grow up to 1.5 TB in 2019. Analysts deliver and present BI reports to decision boards in more than a week, without any certainty that shown numbers are accurate. We implemented a datalake that delivers aggregated data collections according to collected business needs so that analysts can deliver reports in 4h.

The startup's selling point is custom-made and unique trips for travellers. To do so, it relies on big data volume from various sources (data collected from agents and travellers exchanges, payment data, users data, website sessions data).

300.000

Travellers

160

Destinations

2009

Launch

Our impact

Split by 8
BI reports' production time
7
aggregated sources
Background
Background
Quotes

Sicara is a driving force on technical issues: they helped us choose the technology stack that fitted our business ambitions and shared their ETL best practices. Furthermore, Sicara's teams perfectly fitted into our in-house data and business teams and helped us with lean methodology implementation.

Johann S. Lead Data Engineer

We developed a custom-made datalake in 3 months

Delivery of raw data and aggregated data collections

We developed a custom-made datalake in 3 months

Delivery of raw data and aggregated data collections

We developed a datalake with its historical data and up-to-date data to be delivered in real time. Thi data is either stocked or aggregated from 7 different sources in response to pre-identified business needs. First, product and marketing teams use raw data (website user sessions data, clients recommendations, sales data) in order to enhance user experience. Secondly, BI teams rely on data aggregated according to specific business rules that allow them to deliver BI reports in less than 4 hours to the executive team and internal teams (compared to 1 week beforehand).

How we built the datalake

A custom-made ETL

How we built the datalake

A custom-made ETL

We implemented an EL-ETL that organized 1TB historical data, hence more than 100.000 millions documents were aggregated in 22 collections. Once aggregated, these collections are provisionned in real-time by a dual system of 2O RabbitMQ workers that manage this data and 21 RabbitMQ workers that update this data. Furthermore, the data engineers team updated the PostgreSQL architecture to make it fluid and scalable in order to adapt to the startup ever-changing needs.

logo, python, manomano, sicara

Our team specialized in ETL

Total integration with the customer

Sicara, startup, team, teamwork

Our team specialized in ETL

Total integration with the customer

To set up the datalake, we integrated a Sicara team of 4 data engineers to the Evaneos data team. We supported the CoDir in the Data strategy with a Product Manager from Sicara to pilot the on-site project


Related articles written by Sicara data scientists

Automate AWS Tasks Thanks to Airflow Hooks

This article is a step-by-step tutorial that will show you how to upload a file to an S3 bucket thanks to an Airflow ETL (Extract Transform Load) pipeline

apache airflow to Celery

How Apache Airflow Distributes Jobs on Celery workers

The life of a distributed task instance

How to Get Certified in Spark by Databricks?

This article aims to prepare you for the Databricks Spark Developer Certification: register, train and succeed, based on my recent experience.