Data Pipeline for Near Real Time Analytics at Large Scale – Part 2

In my previous post I explained the various components of Data Pipeline. In this post I’ll walk you through the use case we solved by implementing a high throughput data pipeline for user segmentation. This project was implemented in Info Edge Hackathon 2019. Me and a couple of my teammates build an end to end tool of user segmentation, exploration and targeting users in near real time. By building this project we tried to solve marketing analytics problems in industry. 

Idea and Motivation

Data analytics is the foundation of marketing.  Data collection, tracking, analytics and application of results to differentiate how customers are engaged with your product is a vital challenge that product owner wishes to overcome. People use multiple tools for analytics, RoI (Return of Investment) calculation, targeting for their properties. We started with a very simple idea and that is to allow marketers or product managers  to visualise their user base across key dimensions and create custom funnels while exploring users and target them with personalized messages on the go. In a nutshell we tried to solve the problem of Who, When, What and Where.

Diagrammatic explanation of Who, When, What and Where

Solution and Architecture

The solution comprised of 3 parts

  • Tracker script: This tracker supports sending event and profiles. This can be integrated on any website.
  • Data Pipeline:  Collector, Events Consumer and Aggregation services.
  • Dashboard for user segmentation and exploration: This interface allows you to view users behaviour across some key dimensions and further slice and dice them across other dimensions in real time and target users across multiple communication channels (currently e-mail and SMS) This job usually might take a few days or hours of data processing if done manually.
High Level Use Case Diagram of the Solution
High Level Use Case Diagram of the solution built
Architecture Diagram
Architecture diagram of the Data Pipeline built for user segmentation and exploration

Using the dashboard for analytics, exploration and targeting

We are going to show exploration and targeting for following flow

  • Step1: Apply Engagement dimension on all users.
  • Step2: Explore users with Medium High Engagement on month wise acquisition dimension.
  • Step3: Apply Custom funnel on users who were acquired in May 2019. The funnel Applied has 3 levels.
    • Level 1: Users who have started session
    • Level 2: Users who have done click_event1
    • Level 3: Users who have done custom_event2
  • Step4: Send SMS to users who dropped out on level 2
Step by Step Demo of the solution build

Step 1:

User Explored on Engagement metrics
Apply Engagement dimension on all users.

Step 2:

Medium High Engagement users viewed in Monthly Acquisition dimension
Explore users with Medium High Engagement on month wise acquisition dimension.

Step 3

Created Custom Funnel with three events
Apply Custom funnel on users who were acquired in May 2019. The funnel Applied has 3 levels.
Level 1: Users who have started session
Level 2: Users who have done click_event1
Level 3: Users who have done custom_event2

enter image description here
Funnel View of where your users are stuck

Step 4:

enter image description here
Targeting users via a various communication channel

enter image description here
Sending the SMS to users who were stuck at level 2 in funnel

enter image description here
SMS received by the end users

The fact that we were able to build this in 24 hours is in itself was an achievement to me. During these 24 hours there were moments of joy, anger and frustration. But that’s what hackathon is about, you propose a solution and grind yourself to build it.

I hope this will also help you figure out use cases that you can solve for yourself and for your organization. Please drop a comment in case you have any doubt or want to collaborate on some cool Data Engineering project.

Happy Coding…!

Add a Comment

Your email address will not be published. Required fields are marked *