Data Pipeline for Near Real Time Analytics at Large Scale – Part 2
In my previous post I explained the various components of Data Pipeline. In this post I’ll walk you through the use case we solved by implementing a high throughput data pipeline for user segmentation. This project was implemented in Info Edge Hackathon 2019. Me and a couple of my teammates build an end to end tool of user segmentation, exploration and targeting users in near real time. By building this project we tried to solve marketing analytics problems in industry.
Idea and Motivation
Data analytics is the foundation of marketing. Data collection, tracking, analytics and application of results to differentiate how customers are engaged with your product is a vital challenge that product owner wishes to overcome. People use multiple tools for analytics, RoI (Return of Investment) calculation, targeting for their properties. We started with a very simple idea and that is to allow marketers or product managers to visualise their user base across key dimensions and create custom funnels while exploring users and target them with personalized messages on the go. In a nutshell we tried to solve the problem of Who, When, What and Where.
Solution and Architecture
The solution comprised of 3 parts
- Tracker script: This tracker supports sending event and profiles. This can be integrated on any website.
- Data Pipeline: Collector, Events Consumer and Aggregation services.
- Dashboard for user segmentation and exploration: This interface allows you to view users behaviour across some key dimensions and further slice and dice them across other dimensions in real time and target users across multiple communication channels (currently e-mail and SMS) This job usually might take a few days or hours of data processing if done manually.
High Level Use Case Diagram of the Solution
Architecture Diagram
Using the dashboard for analytics, exploration and targeting
We are going to show exploration and targeting for following flow
- Step1: Apply Engagement dimension on all users.
- Step2: Explore users with Medium High Engagement on month wise acquisition dimension.
- Step3: Apply Custom funnel on users who were acquired in May 2019. The funnel Applied has 3 levels.
- Level 1: Users who have started session
- Level 2: Users who have done click_event1
- Level 3: Users who have done custom_event2
- Step4: Send SMS to users who dropped out on level 2
Step by Step Demo of the solution build
Step 1:
Step 2:
Step 3
Step 4:
The fact that we were able to build this in 24 hours is in itself was an achievement to me. During these 24 hours there were moments of joy, anger and frustration. But that’s what hackathon is about, you propose a solution and grind yourself to build it.
I hope this will also help you figure out use cases that you can solve for yourself and for your organization. Please drop a comment in case you have any doubt or want to collaborate on some cool Data Engineering project.
Happy Coding…!