case study data tracking

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses. We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.

Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Begin Your Big Data Journey with ProjectPro's Project-Based PySpark Online Course !

Here's what valued users are saying about ProjectPro

Director Data Analytics at EY / EY Tech

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

Not sure what you are looking for?

iii) Packing Optimization

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering.

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience.

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics.

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used.

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato.

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time.

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

User policy

Write for ProjectPro

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications You must be signed in to change notification settings

Google Data Analytics Certificate case study of Fitbit data for Bella Beat

MirAnalysis/Google-Data-Analytics-Case-Study

Folders and files, repository files navigation, bella-beat-case-study-google-da-, introduction:.

Welcome to my capstone project for the Google Data Analytics Certificate! This study showcases the skills learned during the course including SQL and Tableau. I will be analyzing Fitbit data to make a recommendation to Bellabeat by using the data analysis process.

Business Task:

Bellabeat, a wellness and tech company whose mission is to empower women to reach their full potential, requests help with marketing their products. The company offers smart devices such as: leaf, ivy, and time. These items can track health data such as activity, sleep, menstrual cycles, heart rate, and hydration. In this scenario the Bellabeat marketing team requests recommendations based on competitor data. Bellabeat's competitor, Fitbit, will be analyzed to reveal user trends in the wellness device market. The findings from this will offer insights into areas of growth opportunity for Bellabeat going forward.

Data Sources

The data source, "Fitbit Fitness Tracker Data" was found on data science and coding website, Kaggle by data scientist, Möbius. The datasets were sourced from a survey performed on Amazon Mechanical Turk workers for a study which collected Fitbit tracking data. The original study states 30 participants were surveyed, however 33 can be found in the data. No demographic information such as age, height, or sex was provided. The exact Fitbit models are not specified, but it is noted that variation across the datasets is potentially due to varying device models and user tracking preferences. The data in my analysis is focused during 4-12-2016 to 5-12-2016. The data includes a total of 33 users over 4 datasets tracking data including: physical activity, steps count, sleep time, and weight information.

"Daily Activity Merged" includes daily activity logs for 33 users. This set compiles 3 activity types, their distance, minutes spent performing them. The 3 activity types are: light, fairly and very active. The distance columns are not defined but based on the step data provided resemble Kilometers. Minutes spent without activity are categorized as sedentary time. This set also includes steps taken and calories burned.

"Hourly Steps Merged" includes the same 33 user Ids, but expands the daily steps into hourly increments categorized in 24 hour format. As mentioned previously, there was a variance between the total steps calculated in this set compared to the daily logs in the "Daily Activity Merged" set above, likely due to device usage. Because of this variance I used the step information in this set only for my analysis on steps per time of day.

"Sleep Day Merged", details 24 user Ids, their minutes asleep, and minutes in bed but not asleep. Fitbit’s website states that the watch tracks heart rate and movement patterns to determine if the user is awake or asleep. Fitbit also states that the “Awake” category includes when users are somewhere in a sleep cycle but are restless and wake up briefly.

"Weight Log Info Merged", includes only 8 user Ids, weight (kg and lbs), BMI, and whether the data was logged manually or automatically. The set also included a “Fat” column but was only utilized in 2 cells.

The Cleaning Process

For this project I used Microsoft Excel and SQL for data cleaning. I started the cleaning process by checking all of my datasets for the same issues: blank spaces, duplicates, and inconsistencies. The following is my changelog for the cleaning process in Excel:

Shared Changes Across All Tables

Removed blank spaces using conditional formatting
Verified User Id column entries were uniform (10 characters) in length using LEN function (i.e. =LEN(A2))
Added underscores between words in column names
Added column “Day” using date function ( i.e. =TEXT(B2, "dddd"))
Changed “DateTime” columns into two separate columns, “Date” and “Time” using INT function (i.e. =INT(A2), =A2 - INT(A2))

Changed column name “activitydate” to “Date”

Changed column name “totalsteps” to “steps”

Removed "Tracker Distance", "Logged_Activities_Distance", "Very_Active_Distance", "Moderately_Active_Distance", "Light_Active_Distance", and "Sedentary_Active_Distance" columns.

Changed column name “sleepday” to “Date”

Subtracted "Time Asleep" from column "Total Time In Bed" and created new column "Time Awake" from results.

Removed column "Total Sleep Records"

Changed column name “Is Manual Report” to “Report_Type”

Changed column “Report_Type” Responses from True/False to Manual/Automatic respectively

Removed column “Fat”

Removed column “LogId”

Data Manipulation and Analysis

I then uploaded my 4 tables into BigQuery SQL Console to begin my data manipulation. Each phase of manipulation was guided by a question in search of a trend.

Continue to:

SQL_Queries for all queries
Data Table Link for access to view all tables resulting for queries
Analysis for analysis of data
Visualizations for data graphs
Recommendations for my answer to the business task
Sources Cited for resource credits

8 case studies and real world examples of how Big Data has helped keep on top of competition

Fast, data-informed decision-making can drive business success. Managing high customer expectations, navigating marketing challenges, and global competition – many organizations look to data analytics and business intelligence for a competitive advantage.

Using data to serve up personalized ads based on browsing history, providing contextual KPI data access for all employees and centralizing data from across the business into one digital ecosystem so processes can be more thoroughly reviewed are all examples of business intelligence.

Organizations invest in data science because it promises to bring competitive advantages.

Data is transforming into an actionable asset, and new tools are using that reality to move the needle with ML. As a result, organizations are on the brink of mobilizing data to not only predict the future but also to increase the likelihood of certain outcomes through prescriptive analytics.

Here are some case studies that show some ways BI is making a difference for companies around the world:

1) Starbucks:

With 90 million transactions a week in 25,000 stores worldwide the coffee giant is in many ways on the cutting edge of using big data and artificial intelligence to help direct marketing, sales and business decisions

Through its popular loyalty card program and mobile application, Starbucks owns individual purchase data from millions of customers. Using this information and BI tools, the company predicts purchases and sends individual offers of what customers will likely prefer via their app and email. This system draws existing customers into its stores more frequently and increases sales volumes.

The same intel that helps Starbucks suggest new products to try also helps the company send personalized offers and discounts that go far beyond a special birthday discount. Additionally, a customized email goes out to any customer who hasn’t visited a Starbucks recently with enticing offers—built from that individual’s purchase history—to re-engage them.

2) Netflix:

The online entertainment company’s 148 million subscribers give it a massive BI advantage.

Netflix has digitized its interactions with its 151 million subscribers. It collects data from each of its users and with the help of data analytics understands the behavior of subscribers and their watching patterns. It then leverages that information to recommend movies and TV shows customized as per the subscriber’s choice and preferences.

As per Netflix, around 80% of the viewer’s activity is triggered by personalized algorithmic recommendations. Where Netflix gains an edge over its peers is that by collecting different data points, it creates detailed profiles of its subscribers which helps them engage with them better.

The recommendation system of Netflix contributes to more than 80% of the content streamed by its subscribers which has helped Netflix earn a whopping one billion via customer retention. Due to this reason, Netflix doesn’t have to invest too much on advertising and marketing their shows. They precisely know an estimate of the people who would be interested in watching a show.

3) Coca-Cola:

Coca Cola is the world’s largest beverage company, with over 500 soft drink brands sold in more than 200 countries. Given the size of its operations, Coca Cola generates a substantial amount of data across its value chain – including sourcing, production, distribution, sales and customer feedback which they can leverage to drive successful business decisions.

Coca Cola has been investing extensively in research and development, especially in AI, to better leverage the mountain of data it collects from customers all around the world. This initiative has helped them better understand consumer trends in terms of price, flavors, packaging, and consumer’ preference for healthier options in certain regions.

With 35 million Twitter followers and a whopping 105 million Facebook fans, Coca-Cola benefits from its social media data. Using AI-powered image-recognition technology, they can track when photographs of its drinks are posted online. This data, paired with the power of BI, gives the company important insights into who is drinking their beverages, where they are and why they mention the brand online. The information helps serve consumers more targeted advertising, which is four times more likely than a regular ad to result in a click.

Coca Cola is increasingly betting on BI, data analytics and AI to drive its strategic business decisions. From its innovative free style fountain machine to finding new ways to engage with customers, Coca Cola is well-equipped to remain at the top of the competition in the future. In a new digital world that is increasingly dynamic, with changing customer behavior, Coca Cola is relying on Big Data to gain and maintain their competitive advantage.

4) American Express GBT

The American Express Global Business Travel company, popularly known as Amex GBT, is an American multinational travel and meetings programs management corporation which operates in over 120 countries and has over 14,000 employees.

Challenges:

Scalability – Creating a single portal for around 945 separate data files from internal and customer systems using the current BI tool would require over 6 months to complete. The earlier tool was used for internal purposes and scaling the solution to such a large population while keeping the costs optimum was a major challenge

Performance – Their existing system had limitations shifting to Cloud. The amount of time and manual effort required was immense

Data Governance – Maintaining user data security and privacy was of utmost importance for Amex GBT

The company was looking to protect and increase its market share by differentiating its core services and was seeking a resource to manage and drive their online travel program capabilities forward. Amex GBT decided to make a strategic investment in creating smart analytics around their booking software.

The solution equipped users to view their travel ROI by categorizing it into three categories cost, time and value. Each category has individual KPIs that are measured to evaluate the performance of a travel plan.

Reducing travel expenses by 30%

Time to Value – Initially it took a week for new users to be on-boarded onto the platform. With Premier Insights that time had now been reduced to a single day and the process had become much simpler and more effective.

Savings on Spends – The product notifies users of any available booking offers that can help them save on their expenditure. It recommends users of possible saving potential such as flight timings, date of the booking, date of travel, etc.

Adoption – Ease of use of the product, quick scale-up, real-time implementation of reports, and interactive dashboards of Premier Insights increased the global online adoption for Amex GBT

5) Airline Solutions Company: BI Accelerates Business Insights

Airline Solutions provides booking tools, revenue management, web, and mobile itinerary tools, as well as other technology, for airlines, hotels and other companies in the travel industry.

Challenge: The travel industry is remarkably dynamic and fast paced. And the airline solution provider’s clients needed advanced tools that could provide real-time data on customer behavior and actions.

They developed an enterprise travel data warehouse (ETDW) to hold its enormous amounts of data. The executive dashboards provide near real-time insights in user-friendly environments with a 360-degree overview of business health, reservations, operational performance and ticketing.

Results: The scalable infrastructure, graphic user interface, data aggregation and ability to work collaboratively have led to more revenue and increased client satisfaction.

6) A specialty US Retail Provider: Leveraging prescriptive analytics

Challenge/Objective: A specialty US Retail provider wanted to modernize its data platform which could help the business make real-time decisions while also leveraging prescriptive analytics. They wanted to discover true value of data being generated from its multiple systems and understand the patterns (both known and unknown) of sales, operations, and omni-channel retail performance.

We helped build a modern data solution that consolidated their data in a data lake and data warehouse, making it easier to extract the value in real-time. We integrated our solution with their OMS, CRM, Google Analytics, Salesforce, and inventory management system. The data was modeled in such a way that it could be fed into Machine Learning algorithms; so that we can leverage this easily in the future.

The customer had visibility into their data from day 1, which is something they had been wanting for some time. In addition to this, they were able to build more reports, dashboards, and charts to understand and interpret the data. In some cases, they were able to get real-time visibility and analysis on instore purchases based on geography!

7) Logistics startup with an objective to become the “Uber of the Trucking Sector” with the help of data analytics

Challenge: A startup specializing in analyzing vehicle and/or driver performance by collecting data from sensors within the vehicle (a.k.a. vehicle telemetry) and Order patterns with an objective to become the “Uber of the Trucking Sector”

Solution: We developed a customized backend of the client’s trucking platform so that they could monetize empty return trips of transporters by creating a marketplace for them. The approach used a combination of AWS Data Lake, AWS microservices, machine learning and analytics.

Reduced fuel costs
Optimized Reloads
More accurate driver / truck schedule planning
Smarter Routing
Fewer empty return trips
Deeper analysis of driver patterns, breaks, routes, etc.

8) Challenge/Objective: A niche segment customer competing against market behemoths looking to become a “Niche Segment Leader”

Solution: We developed a customized analytics platform that can ingest CRM, OMS, Ecommerce, and Inventory data and produce real time and batch driven analytics and AI platform. The approach used a combination of AWS microservices, machine learning and analytics.

Reduce Customer Churn
Optimized Order Fulfillment
More accurate demand schedule planning
Improve Product Recommendation
Improved Last Mile Delivery

How can we help you harness the power of data?

At Systems Plus our BI and analytics specialists help you leverage data to understand trends and derive insights by streamlining the searching, merging, and querying of data. From improving your CX and employee performance to predicting new revenue streams, our BI and analytics expertise helps you make data-driven decisions for saving costs and taking your growth to the next level.

Most Popular Blogs

Ready to transform and unlock your full IT potential? Connect with us today to learn more about our comprehensive digital solutions.

Schedule a Consultation

Transforming IT Operations with Managed Service Solutions for a Leading Retail Sports Giant

Delivering noc and soc it managed services for a leading global entertainment brand, elevating user transitions: jml automation mastery at work, saving hundreds of manual hours.

TechEnablers Episode 6: Navigating the Retail Revolutio

TechEnablers Episode 5: Upgrading the In-Store IT Infra

Cyber Program Operations: What might be missing from yo

Driving Efficiency in Retail Logistics

Visualizing Data in Healthcare

Diving into Data and Diversity

AWS Named as a Leader for the 11th Consecutive Year…

Introducing amazon route 53 application recovery controller, amazon sagemaker named as the outright leader in enterprise mlops….

Made To Order
Cloud Solutions
Salesforce Commerce Cloud
Distributed Agile
Consulting and Process Optimization
Data Warehouse & BI
ServiceNow Consulting and Implementation
Security Assessment & Mitigation
AI Strategy and Governance
Case Studies
News and Events

Quick Links

Privacy Statement
+91-22-61591100
[email protected]

Skip to primary navigation
Skip to main content
Skip to primary sidebar
Skip to footer

Inflow: eCommerce Marketing Agency

Home > KPIs and Reporting > Google Analytics

GA4 Case Study: Tracking Data for eCommerce & Non-eCommerce Sites

Over the last year, Inflow’s digital analytics team has been working hard to migrate our clients to Google Analytics 4 in preparation for the sunsetting of Universal Analytics.

To date, we’ve successfully configured the setup for more than 60 websites, both eCommerce and non-eCommerce. By replicating (or improving upon!) their existing data tracking in UA as closely as possible, we’ve provided our clients valuable historical data within GA4, giving them a leg up when the transition officially occurred on July 1, 2023.

In today’s blog, we’ll explore the work we did for two such clients — KEH Camera and Worldwide Business Research — including the unique challenges and solutions we discovered along the way.

Keep reading for the full details, or contact our team to have them audit (and recommend improvements for) your GA4 configuration today.

The eCommerce Site: KEH Camera

KEH is a reCommerce business that resells professional, collectible, and everyday camera gear. They’ve been an Inflow client since 2019 for a variety of services, including paid social, search engine optimization, and more.

K E H Camera website homepage. Hero image caption: Document the details of their day.

The Challenge

Unlike traditional eCommerce brands, KEH has two sides to their business: Shop (for customers buying products) and Sell (for customers selling their products to the brand).

While KEH was able to successfully track both of these audiences separately through Enhanced eCommerce in Universal Analytics, that functionality no longer exists in the new version of Google Analytics — forcing the brand to get creative with their new configuration and attribution, especially when it came to existing custom events used to track “Sell” conversions in UA.

In short, KEH needed a new data-collection solution in GA4 that would segment out purchases from both Shop and Sell (as well as the user data for each audience) to better inform their digital marketing strategy.

Before our partnership, the KEH team had used an Enhanced eCommerce converter to replicate their UA data layer for GA4. While it mostly worked, it wasn’t as clean of an installation as our team could provide and would have eventually needed to be revisited when the complete transition to GA4 was made.

The Solution

Using our Google Analytics 4 setup process as a foundation, we took the concept of enhanced eCommerce forward into GA4-style events, values, and more with a custom configuration for KEH.

We started by using GA4’s eCommerce setup to track both Shop and Sell activity from the website. With customized purchase and eCommerce events, we were able to pass in where each transaction was coming from (Shop or Sell) to not only track website actions but also user-level actions (with a similar custom setup on the user side of the analytics platform).

Combined, these configurations would give KEH plentiful options to segment their data, either at the event or user level. In turn, they could better understand their customer journey — where different audiences were browsing on their site, where purchases were coming from, and more.

To push our tracking live onto KEH’s site, however, we needed one more step: a custom data layer.

While many eCommerce platforms have plugins that assist with GTM data layers, few can handle the complexity of a site like KEH — or the ability to parallel-track UA and GA4, as we’re recommending for our clients until next year’s deadline.

So, we worked with KEH’s web team to create and implement a custom data layer that would set their GA4 tracking into motion.

Get our free eCommerce data layer in our GA4 tracking toolkit today.

The Results

Even with its Shop and Sell complexities, KEH’s GA4 tracking is performing as expected by our team.

In our reporting dashboards, eCommerce purchases compare closely across UA and GA4, with users sitting at a typical 10% discrepancy due to the difference in the platforms’ configuration (user- vs. event-based.) These results are common across all of our GA4 clients, eCommerce and non-eCommerce.

Fortunately, KEH had already included BigQuery in their overarching web analytics strategy, making the data warehousing required by GA4 much simpler to implement.

As a reminder, Google Analytics 4 only stores 14 months of historical data within its platform. For your site to have access to more historical data, you’ll need an integration with BigQuery — which will store your site’s data and allow you to compare longer periods in applications like Google Data Studio.

BigQuery is also technically the most “accurate” source of GA4 data.

Although Google Signals data does not come through to BigQuery, we’ve been successfully using the integration so far for KEH’s needs.

The non-eCommerce Site: Worldwide Business Research

Worldwide Business Research is a company that plans and hosts more than 100 annual worldwide conferences (both in-person and virtual). They also execute the marketing needed for those events, including email marketing, digital advertising, and more.

As a partner to our current client IQPC , WBR reached out to Inflow for GA4 migration services earlier this year.

Note: To avoid confusion with GA4 “events,” we’ve capitalized Event in reference to WBR’s conferences in the Google Analytics case study below.

Worldwide Business Research website screenshot. Hero image caption: Connecting Leaders. Inspiring Innovation.

As a non-eCommerce site, WBR needed to track data across three global offices and hundreds of subdomains.

In Universal Analytics, WBR had relied heavily on views for each of their Events/subdomains. However, with views no longer existing GA4, WBR needed a solution to get Event-level data from each subdomain.

Their goal: Streamline an entire office’s tracking while keeping the ability to segment out data by Event/conference.

In case that wasn’t enough, the company also needed to change their Google Ads tracking to meet GA4’s capabilities. (Previously, they had imported conversions from UA views, which, as mentioned, no longer exist in Google Analytics 4.)

In short, WBR needed a completely custom architecture recommendation for their Google Analytics 4 configuration and tracking.

To consolidate WBR’s data-tracking and reporting options, we recommended setting up one GA4 property per office, with different segments for the Events/conferences hosted by each location passed into GA4 from Google Tag Manager (GTM).

In other words, individual Event data could be viewed by applying segments (comparisons, filters, audiences, etc.) to their reports or through filtered Data Studio dashboards. Any unsegmented reports would be a comprehensive report of all the office’s Events.

That way, WBR could more clearly distinguish the KPIs for each Event they hosted across the globe with much less effort than before.

Using our confidence in and knowledge of GA4 capabilities, combined with custom event setup to track Google Ads, we implemented WBR’s new architecture smoothly — giving the brand deeper insights into its Event performance without the multi-property headache of the past.

An added bonus: By setting up one property per office, we avoided the need to set up BigQuery and Google Ads tracking for every single Event as done in UA.

Like most clients, WBR continues to report most of their data out of Universal Analytics. But, by completing this setup far before next year’s deadline, we’ve given WBR’s marketing team more flexibility in not only learning their new GA4 setup but also how to best report out of it for their future marketing needs.

In addition to the configuration described above, we also created a custom Data Studio template for the “segments” of each Event — avoiding any need for WBR’s team to dig around in GA4 (and get more confused than before) while giving them every tool needed to evaluate each Event’s performance and make appropriate business decisions.

Still Need to Set Up Your GA4?

When it comes to the new Google Analytics 4, the clock is ticking.

To get as much historical data as possible for future comparison, now is the time to start configuring your analytics data tracking in the platform.

If you need help making it happen — or would like an expert to evaluate your current setup — Inflow is always here to help.

Request a free GA4 migration proposal now to learn how we can help get your site set for future data-tracking success.

About The Author

Mike Belasco

Mike Belasco has been an entrepreneur and digital marketer since 2003. Mike founded Inflow (previously known as seOverflow) in 2007 and led Inflow to five Denver’s Fastest-Growing Private Company awards and three Inc. 5000 awards. In 2009, he also founded ConversionIQ, which was subsequently acquired by Inflow. After 20 years of serving as Inflow’s Founding CEO, in 2023 Mike completed a sale of Inflow. He now takes on entrepreneurial adventures and continues to be a raving fan of the Inflow team while consulting as a Strategic Advisor.

Request a Proposal

Let us build a personalized strategy with the best eCommerce marketing services for your needs. Contact us below to get started.

About Stanford GSB

The Leadership
Dean’s Updates
School News & History
Business, Government & Society
Centers & Institutes
Center for Entrepreneurial Studies
Center for Social Innovation
Stanford Seed

About the Experience

Learning at Stanford GSB
Experiential Learning
Guest Speakers
Entrepreneurship
Social Innovation
Communication
Life at Stanford GSB
Collaborative Environment
Activities & Organizations
Student Services
Housing Options
International Students

Full-Time Degree Programs

Why Stanford MBA
Academic Experience
Financial Aid
Why Stanford MSx
Research Fellows Program
See All Programs

Non-Degree & Certificate Programs

Executive Education
Stanford Executive Program
Programs for Organizations
The Difference
Online Programs
Stanford LEAD
Seed Transformation Program
Aspire Program
Seed Spark Program
Faculty Profiles
Academic Areas
Awards & Honors
Conferences

Faculty Research

Publications
Working Papers
Case Studies
Postdoctoral Scholars

Research Hub

Research Labs & Initiatives
Business Library
Data, Analytics & Research Computing
Behavioral Lab
Faculty Recruiting
See All Jobs

Research Labs

Cities, Housing & Society Lab
Golub Capital Social Impact Lab

Research Initiatives

Corporate Governance Research Initiative
Corporations and Society Initiative
Policy and Innovation Initiative
Rapid Decarbonization Initiative
Stanford Latino Entrepreneurship Initiative
Value Chain Innovation Initiative
Venture Capital Initiative
Career & Success
Climate & Sustainability
Corporate Governance
Culture & Society
Finance & Investing
Government & Politics
Leadership & Management
Markets and Trade
Operations & Logistics
Opportunity & Access
Technology & AI
Opinion & Analysis
Email Newsletter

Welcome, Alumni

Communities
Digital Communities & Tools
Regional Chapters
Women’s Programs
Identity Chapters
Find Your Reunion
Career Resources
Job Search Resources
Career & Life Transitions
Programs & Webinars
Career Video Library
Alumni Education
Research Resources
Volunteering
Alumni News
Class Notes
Alumni Voices
Contact Alumni Relations
Upcoming Events

Admission Events & Information Sessions

MBA Program
MSx Program
PhD Program
Alumni Events
All Other Events
Operations, Information & Technology
Organizational Behavior
Political Economy
Classical Liberalism
The Eddie Lunch
Accounting Summer Camp
California Econometrics Conference
California Quantitative Marketing PhD Conference
California School Conference
China India Insights Conference
Homo economicus, Evolving
Political Economics (2023–24)
Scaling Geologic Storage of CO2 (2023–24)
A Resilient Pacific: Building Connections, Envisioning Solutions
Adaptation and Innovation
Changing Climate
Civil Society
Climate Impact Summit
Climate Science
Corporate Carbon Disclosures
Earth’s Seafloor
Environmental Justice
Operations and Information Technology
Organizations
Sustainability Reporting and Control
Taking the Pulse of the Planet
Urban Infrastructure
Watershed Restoration
Junior Faculty Workshop on Financial Regulation and Banking
Ken Singleton Celebration
Marketing Camp
Quantitative Marketing PhD Alumni Conference
Presentations
Theory and Inference in Accounting Research
Past Scholars
Stanford Closer Look Series
Quick Guides
Core Concepts
Journal Articles
Glossary of Terms
Faculty & Staff
Subscribe to Corporate Governance Emails
Researchers & Students
Research Approach
Charitable Giving
Financial Health
Government Services
Workers & Careers
Short Course
Adaptive & Iterative Experimentation
Incentive Design
Social Sciences & Behavioral Nudges
Bandit Experiment Application
Conferences & Events
Get Involved
Reading Materials
Teaching & Curriculum
Energy Entrepreneurship
Faculty & Affiliates
SOLE Report
Responsible Supply Chains
Current Study Usage
Pre-Registration Information
Participate in a Study

Data Monetization and Consumer Tracking

In late September 2014, Facebook, the world’s largest social network, announced the launch of Atlas, an ad serving and measurement platform that would allow advertisers to target ads to real people using Facebook’s unique user IDs rather than information based on often inaccurate and unreliable cookies. The concept, called people-based marketing, would provide marketers with more accurate demographic, reach, and frequency information across the Internet and in-app, while preserving privacy by anonymizing users. At the time, the launch of Atlas was heralded as one of the most dramatic steps toward solving for cross-device reporting and cross-channel (particularly online and offline) issues (see Exhibit 1 for a description of cross-device and cross-channel components). This case serves to explain consumer tracking within the context of data monetization.

Learning Objective

Students are provided with a primer on consumer tracking as the foundation for further discussion on monetization.

Research & Insights
Search Fund Primer
Affiliated Faculty
Faculty Advisors
Louis W. Foster Resource Center
Defining Social Innovation
Impact Compass
Global Health Innovation Insights
Faculty Affiliates
Student Awards & Certificates
Changemakers
Dean Jonathan Levin
Dean Garth Saloner
Dean Robert Joss
Dean Michael Spence
Dean Robert Jaedicke
Dean Rene McPherson
Dean Arjay Miller
Dean Ernest Arbuckle
Dean Jacob Hugh Jackson
Dean Willard Hotchkiss
Faculty in Memoriam
Stanford GSB Firsts
Annual Alumni Dinner
Class of 2024 Candidates
Certificate & Award Recipients
Dean’s Remarks
Keynote Address
Teaching Approach
Analysis and Measurement of Impact
The Corporate Entrepreneur: Startup in a Grown-Up Enterprise
Data-Driven Impact
Designing Experiments for Impact
Digital Marketing
The Founder’s Right Hand
Marketing for Measurable Change
Product Management
Public Policy Lab: Financial Challenges Facing US Cities
Public Policy Lab: Homelessness in California
Lab Features
Curricular Integration
View From The Top
Formation of New Ventures
Managing Growing Enterprises
Startup Garage
Explore Beyond the Classroom
Stanford Venture Studio
Summer Program
Workshops & Events
The Five Lenses of Entrepreneurship
Leadership Labs
Executive Challenge
Arbuckle Leadership Fellows Program
Selection Process
Training Schedule
Time Commitment
Learning Expectations
Post-Training Opportunities
Who Should Apply
Introductory T-Groups
Leadership for Society Program
Certificate
2024 Awardees
2023 Awardees
2022 Awardees
2021 Awardees
2020 Awardees
2019 Awardees
2018 Awardees
Social Management Immersion Fund
Stanford Impact Founder Fellowships
Stanford Impact Leader Prizes
Social Entrepreneurship
Stanford GSB Impact Fund
Economic Development
Energy & Environment
Stanford GSB Residences
Environmental Leadership
Stanford GSB Artwork
A Closer Look
California & the Bay Area
Voices of Stanford GSB
Business & Beneficial Technology
Business & Sustainability
Business & Free Markets
Business, Government, and Society Forum
Second Year
Global Experiences
JD/MBA Joint Degree
MA Education/MBA Joint Degree
MD/MBA Dual Degree
MPP/MBA Joint Degree
MS Computer Science/MBA Joint Degree
MS Electrical Engineering/MBA Joint Degree
MS Environment and Resources (E-IPER)/MBA Joint Degree
Academic Calendar
Clubs & Activities
LGBTQ+ Students
Military Veterans
Minorities & People of Color
Partners & Families
Students with Disabilities
Student Support
Residential Life
Student Voices
MBA Alumni Voices
A Week in the Life
Career Support
Employment Outcomes
Cost of Attendance
Knight-Hennessy Scholars Program
Yellow Ribbon Program
BOLD Fellows Fund
Application Process
Loan Forgiveness
Contact the Financial Aid Office
Evaluation Criteria
GMAT & GRE
English Language Proficiency
Personal Information, Activities & Awards
Professional Experience
Letters of Recommendation
Optional Short Answer Questions
Application Fee
Reapplication
Deferred Enrollment
Joint & Dual Degrees
Entering Class Profile
Event Schedule
Ambassadors
New & Noteworthy
Ask a Question
See Why Stanford MSx
Is MSx Right for You?
MSx Stories
Leadership Development
How You Will Learn
Admission Events
Personal Information
GMAT, GRE & EA
English Proficiency Tests
Career Change
Career Advancement
Career Support and Resources
Daycare, Schools & Camps
U.S. Citizens and Permanent Residents
Requirements
Requirements: Behavioral
Requirements: Quantitative
Requirements: Macro
Requirements: Micro
Annual Evaluations
Field Examination
Research Activities
Research Papers
Dissertation
Oral Examination
Current Students
Education & CV
International Applicants
Statement of Purpose
Reapplicants
Application Fee Waiver
Deadline & Decisions
Job Market Candidates
Academic Placements
Stay in Touch
Faculty Mentors
Current Fellows
Standard Track
Fellowship & Benefits
Group Enrollment
Program Formats
Developing a Program
Diversity & Inclusion
Strategic Transformation
Program Experience
Contact Client Services
Campus Experience
Live Online Experience
Silicon Valley & Bay Area
Digital Credentials
Faculty Spotlights
Participant Spotlights
Eligibility
International Participants
Stanford Ignite
Frequently Asked Questions
Founding Donors
Program Contacts
Location Information
Participant Profile
Network Membership
Program Impact
Collaborators
Entrepreneur Profiles
Company Spotlights
Seed Transformation Network
Responsibilities
Current Coaches
How to Apply
Meet the Consultants
Meet the Interns
Intern Profiles
Collaborate
Research Library
News & Insights
Databases & Datasets
Research Guides
Consultations
Research Workshops
Career Research
Research Data Services
Course Reserves
Course Research Guides
Material Loan Periods
Fines & Other Charges
Document Delivery
Interlibrary Loan
Equipment Checkout
Print & Scan
MBA & MSx Students
PhD Students
Other Stanford Students
Faculty Assistants
Research Assistants
Stanford GSB Alumni
Telling Our Story
Staff Directory
Site Registration
Alumni Directory
Alumni Email
Privacy Settings & My Profile
Success Stories
The Story of Circles
Support Women’s Circles
Stanford Women on Boards Initiative
Alumnae Spotlights
Insights & Research
Industry & Professional
Entrepreneurial Commitment Group
Recent Alumni
Half-Century Club
Fall Reunions
Spring Reunions
MBA 25th Reunion
Half-Century Club Reunion
Faculty Lectures
Ernest C. Arbuckle Award
Alison Elliott Exceptional Achievement Award
ENCORE Award
Excellence in Leadership Award
John W. Gardner Volunteer Leadership Award
Robert K. Jaedicke Faculty Award
Jack McDonald Military Service Appreciation Award
Jerry I. Porras Latino Leadership Award
Tapestry Award
Student & Alumni Events
Executive Recruiters
Interviewing
Land the Perfect Job with LinkedIn
Negotiating
Elevator Pitch
Email Best Practices
Resumes & Cover Letters
Self-Assessment
Whitney Birdwell Ball
Margaret Brooks
Laura Bunch
Bryn Panee Burkhart
Margaret Chan
Ricki Frankel
Peter Gandolfo
Cindy W. Greig
Natalie Guillen
Carly Janson
Sloan Klein
Sherri Appel Lassila
Stuart Meyer
Tanisha Parrish
Virginia Roberson
Philippe Taieb
Michael Takagawa
Terra Winston
Johanna Wise
Debbie Wolter
Rebecca Zucker
Complimentary Coaching
Changing Careers
Work-Life Integration
Career Breaks
Flexible Work
Encore Careers
Join a Board
D&B Hoovers
Data Axle (ReferenceUSA)
EBSCO Business Source
Global Newsstream
Market Share Reporter
ProQuest One Business
RKMA Market Research Handbook Series
Student Clubs
Entrepreneurial Students
Stanford GSB Trust
Alumni Community
How to Volunteer
Springboard Sessions
Consulting Projects
2020 – 2029
2010 – 2019
2000 – 2009
1990 – 1999
1980 – 1989
1970 – 1979
1960 – 1969
1950 – 1959
1940 – 1949
Service Areas
ACT History
ACT Awards Celebration
ACT Governance Structure
Building Leadership for ACT
Individual Leadership Positions
Leadership Role Overview
Purpose of the ACT Management Board
Contact ACT
Business & Nonprofit Communities
Reunion Volunteers
Ways to Give
Fiscal Year Report
Business School Fund Leadership Council
Planned Giving Options
Planned Giving Benefits
Planned Gifts and Reunions
Legacy Partners
Giving News & Stories
Giving Deadlines
Development Staff
Submit Class Notes
Class Secretaries
Board of Directors
Health Care
Sustainability
Class Takeaways
All Else Equal: Making Better Decisions
If/Then: Business, Leadership, Society
Grit & Growth
Think Fast, Talk Smart
Spring 2022
Spring 2021
Autumn 2020
Summer 2020
Winter 2020
In the Media
For Journalists
DCI Fellows
Other Auditors
Academic Calendar & Deadlines
Course Materials
Entrepreneurial Resources
Campus Drive Grove
Campus Drive Lawn
CEMEX Auditorium
King Community Court
Seawell Family Boardroom
Stanford GSB Bowl
Stanford Investors Common
Town Square
Vidalakis Courtyard
Vidalakis Dining Hall
Catering Services
Policies & Guidelines
Reservations
Contact Faculty Recruiting
Lecturer Positions
Postdoctoral Positions
Accommodations
CMC-Managed Interviews
Recruiter-Managed Interviews
Virtual Interviews
Campus & Virtual
Search for Candidates
Think Globally
Recruiting Calendar
Recruiting Policies
Full-Time Employment
Summer Employment
Entrepreneurial Summer Program
Global Management Immersion Experience
Social-Purpose Summer Internships
Process Overview
Project Types
Client Eligibility Criteria
Client Screening
ACT Leadership
Social Innovation & Nonprofit Management Resources
Develop Your Organization’s Talent
Centers & Initiatives
Student Fellowships

You are using an outdated browser. Please upgrade your browser or activate Google Chrome Frame to improve your experience.

News & Blog
In-Memory Data Grids
In-Memory Computing
Try for Free
Operational Intelligence

Major U.S. Airline

A Case Study

Applications: Fast data tracking for flights, passengers, and messages

Server configuration: Multiple server farms in several locations run ScaleOut StateServer® Pro on more than 100 physical and virtual servers to manage data for hundreds of connected web and application servers.

Reason for Deployment: Needed low latency data storage and scalable data access for mission-critical applications serving hundreds of thousands of global passengers, flights, and operations functions.

Results: Removed data access bottlenecks to provide real-time data when it is needed for improved customer satisfaction and operational efficiency, developed a trusted technology partnership, and lowered cost by replacing another technology solution with ScaleOut Software.

Traveler with suitcase looking are flight departure board

A major U.S. airline has been a ScaleOut Software customer for more than a decade, using ScaleOut StateServer Pro with its in-memory computing and caching technologies to help manage the airline’s critical global flight tracking, passenger, baggage, and operations data with fast and highly available access to all stored data at any time.

Over the years, ScaleOut has become a trusted technology partner for the airline. It has helped to build a customized data access layer on top of ScaleOut StateServer Pro to meet the airline’s specific data access and management needs, to deliver high-value solutions with substantial cost efficiencies, and most recently to navigate additional complications spurred on by COVID-19’s impacts on the travel industry.

Air traffic navigator looking as screen tracking numerous airplanes

At the airline, ScaleOut Software primarily supports a team that is responsible for sourcing and persisting real-time data into enterprise data stores and for delivering events to application clients for business operations around the globe. Having fast and reliable access to the data stored in ScaleOut Software’s in-memory data grid is critical to managing passenger information, flight and baggage tracking, and operations control.

This team functions at the core of the airline’s nervous system and sends server-based information to data centers and systems around the world. Whether for passenger data or flight tracking and positioning information, the airline depends on ScaleOut Software to provide fast access to its data from any location without issues or outages.

“We really needed a fast system to store our data, keep it persisted, and make sure it doesn’t get lost or da maged.” – Senior Software Developer

Since deploying ScaleOut StateServer Pro , the airline has increased the performance of its applications and gained a reliable tool for caching critical .NET objects and their associations across locations that meets the need to process hundreds of thousands of data points each day.

“ The product has been awesome, stable. And that’s exactly what we want, you know? We don’t have to do anything to it. So, it’s perfect. It just runs and runs.” – Senior Software Developer

Due to ScaleOut’s reliability and ease of use, t he airline has consolidated from using three different caching technologies to two and standardi zed on ScaleOut StreamServer Pro. In addition to improving software engineering workflows, this change provide s additional business and cost-saving efficiencies .

“When looking at competing technologies, ScaleOut Software, is a much better value and it delivers consistently. They have also been a really great partner to us, taking our feedback seriously and helping to keep our data operations running smoothly.” – Resource Development Manager

This major U.S. airline value s having a true partnership with ScaleOut Software and its development, sales, and leadership teams. Whether providing additional coding support to complete a software transformation project, customizing its product for the airline’s needs, or flexibility to mitigate COVID-19 driven challenges, ScaleOut Software has been on call to help address all challenges .

Author: Kayley King

Try scaleout for free.

Use the power of in-memory computing in minutes on Windows or Linux.

Not ready to download? CONTACT US TO LEARN MORE

Data Topics

Data Architecture
Data Literacy
Data Science
Data Strategy
Data Modeling
Governance & Quality
Data Education
Smart Data News, Articles, & Education

Case Study: Tracking and Tracing Drugs in the Pharmaceutical Supply Chain

Failures or lack of visibility in the many-tiered pharmaceutical supply chain have multiple repercussions. Drug shortages have adverse economic and clinical effects on patients — they are more likely to have increased out-of-pocket costs, rates of drug errors, and, yes, mortality. Hospitals and health systems allocate over 8.6 million hours of additional labor hours to […]

The US had about 150 to 300 drug shortages every quarter from 2014 to 2019.

For drug managers, maintaining excess inventory to try to avoid shortages brings significant costs in storing pharmaceuticals — and waste when they are not used. They also struggle with being able to predict where a particular drug is likely to be needed at a particular time.

The average pharma holds 180 days of finished goods inventory, and could free up $25 billion if it reduced that to a target of 80 to 100 days. With increased competition from generics and rival brands, cutting costs in the supply chain lets them redirect money to competitive ends such as funding product development.

Compliance is another issue. Serialization compliance required by the FDA’s Drug Supply Chain Security Act requires manufacturers, re-packagers, wholesale distributors, and pharmacies to be capable of lot-level product tracing and to provide applicable transaction information, history, and statement.

To protect patients and prevent falsified medicines from entering the supply chain, the EU’s Falsified Medicines Directive was passed to increase the security of the manufacturing and delivery of medicines across Europe. The main focus is on counterfeit and falsified drugs that can be ineffective or even dangerous.

By 2023 in the US, lot-level tracing will move to unit-level serialization. Russia’s serialization gives pharma companies until this year for complete unit- and batch-level traceability. Brazil’s track and trace regulations go into effect in May 2022. In South Korea and India, companies must uniquely serialize drug products. Saudi Arabia’s Vision 2030 plan includes adopting technology for tracking all human registered drugs manufactured in Saudi Arabia and those imported from abroad. China has published regulations providing for the development of a new national drug traceability system by 2022.

Regulations that require that manufacturers add serial numbers to medications give them more data than previously, a benefit for having information about the status of drugs wherever they are in the supply chain. But getting this right requires that partners in the supply chain participate in the tracking.

Track and Trace in Action

Global pharmaceutical company Merck KGaA Healthcare is working on this issue. It maintains about 150 days of drug inventory, which is expensive to keep in-house and particularly wasteful when it comes to personalized drug therapies with short shelf lives. Its supply-and-demand forecasts are 85 percent accurate today.

One of the drugs it manufactures is related to amino oncology drugs. It was looking for a way to improve forecasting for these potentially life-changing and life-saving drugs. These are personalized medications, expensive and valuable. It’s imperative to ensure the drugs make it to the right place at the right time. It all starts with drawing blood from the patient and sending it to the lab, where a therapy is formulated based on the patient’s DNA. The drug created from this must travel along the supply chain in temperature-controlled environments, and it must reach the patient within a specified time frame for treatment.

Merck KGaA Healthcare has piloted a project with TraceLink , using the vendor’s Digital Network Platform to improve supply-and-demand forecasting and reduce shortages of critical immune-oncology drugs. Serialization on its own is still fairly new and even as it matures, TraceLink’s platform focuses on further enhancing the supply chain process. Not only does it generate serial numbers, but it also provides a centralized hub where third-party participants in drug companies’ supply chains can share relevant information such as manifests and product master data with each other. Bringing everyone together on the same platform is a more efficient way of trading this information than drug companies’ having to create point-to-point connections from their internal systems with the systems used by the companies they need to share data with.

Contract drug manufacturers, which in many cases manufacture generics for multiple drug companies, use the platform as well as brand-name big drug companies, and smaller ones that sometimes are the creators of blockbuster drugs. It’s hard for them to track all those relationships.

“With the network, you integrate once and interoperate all down the supply chain to increase visibility and lower the bar to sharing data,” said John Hogan, TraceLink Senior Vice President of Engineering. “The network changes the process of integrating between individual supply chain and inventory systems, which is difficult and for pharmaceutical companies and is not necessarily their strong point.”

The company has defined canonical data formats for internal management; it maps pharma supply chain partners’ data (logistics companies, dispensers, and wholesalers) into that and then back out into the format that another member of the network might need.

Compliance was the first problem TraceLink tackled but it realized the huge value in the data for many other purposes. “If you know how particular products traveled along the supply chain, you have unique insight that can be used for dealing with recalls or obstacles in the supply chain,” he said. “You can make things more efficient and avoid having those problems repeat themselves.”

It’s also providing APIs that other businesses can use when they see other use cases for the Digital Network Platform to leverage its core construction — for instance, to create new user experiences and provide a different preferred view into the same information using built-in machine learning algorithm. “In the future, you can imagine use cases where people involved in clinical trials might want to let their information be shared to prove the efficacy of those trials,” Hogan said.

Over the next five years the pharma track and trace solutions market is expected to surpass $2.38 billion. Other vendors that are in the pharma track and trace space include rfxcel, Adents, Acsis, Frequentz, Optel Group, Arvato Systems, E2open, Retail Solutions, UpNet, iControl and Nulogy.

Image used under license from Shutterstock.com