NYC Data Science Acedemy

  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS

NYC Data Science Academy

  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
  • Random Forest
  • Linear Regression
  • Decision Tree
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
  • Learn Python
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
  • Python Hard
  • Python Easy

Customer Lifetime Value Product Recommendation for Retail

retail data science capstone project

Project GitHub | LinkedIn:    Niki     Moritz     Hao-Wei     Matthew     Oren

The skills we demoed here can be learned through taking data science with machine learning bootcamp with nyc data science academy., introduction.

On any given day, countless transactions are being made in the retail space. All the transactions generate data, which can be utilized by merchants to improve their sales and help them make important business decisions. As part of our capstone, we consulted two retail clients to explore and identify trends in their customer behavior by building visualizations as well as predictive models. We have split the blog into 2 parts to represent our exploration and modeling for the respective clients and dataset.

Part 1. Predictive Customer Lifetime Value and Product Recommendation for Retail

1.Exploratory Data Analysis (EDA)

Upon receiving the data for the first client, we realized that the product items listed were in a semi-structured format. That is, some of the item names were in a “product name – color” format, although there were many items that did not have that format. That made it difficult to separate out the product with the color. To simplify things, we decomposed all the item names into individual words in a corporus, thereby allowing us to see the top generic items/color sold by analyzing the word count frequency.

We wanted to analyze any sales and item trends by year, so the frequency was standardized  to show a meaningful comparison across years. As the graphs below show: necklaces/earrings are hot sellers. And gold colored jewelry are in demand.

retail data science capstone project

For retailers, November tends to be the month of high sales volume due to the holiday season and Black Friday deals. We wanted to see if there were any specific trends in November that can allow the business to determine when is the best time to increase their advertising budget and promotional efforts. Indeed, we recognized that the first week of November in every year has the weakest sales volume. Therefore, we advised the company to perhaps spend more marketing dollars on the first week as an early holiday special promotion.

retail data science capstone project

  2.  Modeling

2.1 RFM Segmentation, Analysis and Model

Since we did not have a target variable to predict, we had to get creative during our modeling phase. After doing some research, we decided to first do an RFM analysis (recency, frequency, monetary). The goal of RFM analysis is to utilize data regarding the recency (how recently a customer has purchased), frequency (the number of repeat purchases of a customer), and the monetary value of the orders to determine how valuable a customer is, as well as how many times a customer will return over the course of the next x time periods.

In our case, we were specifically interested in the CLV (customer lifetime value) and the number of times a customer will return. We used these results to perform a customer segmentation by creating the additional variable “Target_Group.” Let’s begin with data preparation.

In order to perform RFM analysis on our data, we had to transform it. Luckily, the “lifetimes” package in Python provides a function to do so. After having transformed our data, our data frame looked like this:

retail data science capstone project

The ‘T’ column in this data frame simply represents the age of each customer. Equipped with our prepared data, we first investigated the specific characteristics of our clients’ best customers with regards to frequency and recency. We decided to plot our result using a heatmap that turns more yellow if a customer is more likely to return within the next period of time (as part of the data preparation, you have to specify a unit of time you want to base your analysis on; we set this parameter to ‘M’ for months).

retail data science capstone project

This heatmap shows that the customers most likely to return have a historical frequency of around 16, meaning they’ve came back to purchase again 16 times, and a recency a little over 30, meaning that these customers had an age of a little over 30 when they last purchased.

Before starting to segment our clients’ customers, we wanted to make sure the model we were using was making accurate predictions. As in other machine learning approaches to prevent overfitting, we divided our data in a calibration and holdout set. Then, we fit the model on our calibration set and made predictions on the holdout set that the model had not seen yet.

Since we only had data for a relatively short period of time, we used the last 6 months of our data to test our model and got the below result. As we can see, despite our model not fitting the actual purchases perfectly, it was able to capture trends and significant turning points over the course of six months:

retail data science capstone project

While this information is valuable, it does not satisfy our goal for insights yet.  We wanted to produce actionable insights that could be implemented immediately to create business value. To do so, we decided to take a look at the number of times a customer is predicted to return within the next month, which can be interpreted as the probability of the customer returning in the next month.

Based on these insights, customers most likely to return can be targeted specifically with ad/marketing campaigns. By doing so, the amount of money spent on marketing can be reduced, and the return on these expenses can be increased.

We also included a more generalized version of this technique in our final product that, after specifying a time range and selecting a customer, returned the number of expected repeat purchases by this specific customer.

retail data science capstone project

The other aspect of RFM analysis that we were really interested in as a basis for our clustering was the CLV. Calculating the CLV using the “lifetimes” package is really easy once you’ve prepared your data the right way. In order to be able to use the DCF (discounted cash flow) method, we needed to add a column with the monetary .

Then, it was just a matter of fitting the model and making the computation. We then sorted our data in an descending order to identify the most valuable customers for our clients. Here is an example of what our result looked like: We also wanted to let our clients know the probability of a particular customer to make returning purchases.

In other words, we needed to understand the probability that the customer is still “alive” or active in the customer lifecycle.  The “lifetimes” Python library includes the tools to allow us to do these types of analyses. Take for instance the following customer, who made their initial purchase back in October of 2016 and hasn’t made another purchase for a few months. The likelihood of the customer being a recurring customer drops until they make their second purchase.

For the client, this information can be used to signal when to send out customer targeted promotions whenever the customer aliveness probability drops below a certain threshold. In the life cycle plot below, the dashed lines represent a purchase and the normal line describe the probability of this customer being alive at that specific date.

retail data science capstone project

From a business perspective, it may also be helpful to segment customers based on their buying patterns. There are many ways to do this; the route we took was to use the CLV , mapping them to their corresponding percentiles and finally binning them to “Low Priority, Bronze, Silver, Gold”. Through this approach, Company A can easily see who are their most important customers and also create new strategies to bump lower tier customers to the higher tiers.

One such approach that we recommended was to have tier specific rewards program and to include periodic progress email to the customer to let them know how close they are to reaching the next tier in an attempt to encourage higher purchasing volume.

retail data science capstone project

2.2 Association Rules and the Apriori Algorithm

Another model we utilized was the product recommendation system that pushes potentially interesting products to customers.  One of the bigger costs in the retail business is the cost of unsold inventory that sits in the warehouse. To solve this problem, we designed a recommendation system utilizing associated rules with an added feature. The system allows the business to input items that they want to move from their inventory, and the recommendation system will prioritize those items if it is associated with items that a customer has or intends to purchase.

The system was based on a priori, or associated learning algorithm. It is an algorithm for frequent itemset mining over transactional database. The goal is to find high frequent item combination in transactions and make “rules” to make decisions to recommend products.

Each rule has three parameters: support, confidence, and lift. Generally speaking, it is desirable to use the rules with high support. These rules will be applicable to a large number of transactions and are more interesting and profitable to evaluate from a business standpoint.

The biggest challenge we encounter in rule mining was due to the unique nature of the client’s customers. As the customers are businesses and not normal retail customers, the rules that were generated were not unique. For example, jewelry retailers will buy earrings across multiple colors in bulk to appeal to different customers, whereas an individual retail customer will purchase just one or 2 colors for any particular item.  

The result is that a rule generated will be that customers who buy “Gold Earring” will also buy “Silver Earring,” which is not insightful (we want to generate rules between different products).

To achieve the goal of generating more meaningful rules, we created a new feature which is “vendor : category”. As its literal meaning, this feature stores the vendor and category information of each item. By using this feature instead of line item in transaction, we were able to decrease the computation burden and also resolve the problems discussed above, thereby acquiring interesting rules.

The following figure is an example of the new rules created using the R package “arules”. Comparing to its peer python packages, it shows more completed result. R provides robust visualization tools for a priori, “arulesViz,” which is not compatible in Python. The figure below shows a sample of three rules of total 2138 rule. For rule arrows pointing from item to rule vertices indicate LHS items and from rule to item represent RHS. The size of rules vertices indicates support of the rules, and color indicates lift. Larger size and darker color mean higher support and lift, respectively.

retail data science capstone project

Graph-based plot with items and rules as vertices (3 rules)

retail data science capstone project

Grouped Matrix-based plot (2138 rules)

The next step is to push products based on the rules. Here is how our model works: first, once customer generate an order, each item in the order will be transformed into “vendor : category” type, and create LHS “vendor : category” list; second, the model will look up rules library, and generate the highest support RHS “vendor : category” list according to the LHS; third, find all items under RHS “vendor : category” list, and rank them by their frequency, then recommend the top three frequency items.

As mentioned at the beginning of this section, our model has a unique feature of enabling  businesses to direct attention to items they want to offload. This is done by modifying the third step. Let’s say, the business created a list of items which make up a lot of  inventory in the warehouse, if these items happen to appear in the items under RHS “vendor : category” list, they will override the whole rank and are pushed to customers.

Through the CLV analysis and improved recommendation system, we aimed to help the business better target customer segmentation and help with their inventory turnover ratio. We’ve learned a great deal in working with this business in that applying our data science skill sets to a real world environment is often quite different than the classroom setting. In part 2, we will continue our journey with yet another retail company and hope to uncover hidden sales patterns through a Shiny dashboard as well as forecast future sales from a time series analysis.

About Authors

retail data science capstone project

Jo Wen (Iris) Chen

Esther chang.

retail data science capstone project

Raymond Liang

Nutchaphol chaivorapongsa, leave a comment, cancel reply.

You must be logged in to post a comment.

View Posts by Categories

Our recent popular posts, view posts by tags, nyc data science academy.

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our amazing bootcamp!

  • Refund Policy

SOCIAL MEDIA

retail data science capstone project

jamiefosterscience logo

10 Unique Data Science Capstone Project Ideas

A capstone project is a culminating assignment that allows students to demonstrate the skills and knowledge they’ve acquired throughout their degree program. For data science students, it’s a chance to tackle a substantial real-world data problem.

If you’re short on time, here’s a quick answer to your question: Some great data science capstone ideas include analyzing health trends, building a predictive movie recommendation system, optimizing traffic patterns, forecasting cryptocurrency prices, and more .

In this comprehensive guide, we will explore 10 unique capstone project ideas for data science students. We’ll overview potential data sources, analysis methods, and practical applications for each idea.

Whether you want to work with social media datasets, geospatial data, or anything in between, you’re sure to find an interesting capstone topic.

Project Idea #1: Analyzing Health Trends

When it comes to data science capstone projects, analyzing health trends is an intriguing idea that can have a significant impact on public health. By leveraging data from various sources, data scientists can uncover valuable insights that can help improve healthcare outcomes and inform policy decisions.

Data Sources

There are several data sources that can be used to analyze health trends. One of the most common sources is electronic health records (EHRs), which contain a wealth of information about patient demographics, medical history, and treatment outcomes.

Other sources include health surveys, wearable devices, social media, and even environmental data.

Analysis Approaches

When analyzing health trends, data scientists can employ a variety of analysis approaches. Descriptive analysis can provide a snapshot of current health trends, such as the prevalence of certain diseases or the distribution of risk factors.

Predictive analysis can be used to forecast future health outcomes, such as predicting disease outbreaks or identifying individuals at high risk for certain conditions. Machine learning algorithms can be trained to identify patterns and make accurate predictions based on large datasets.

Applications

The applications of analyzing health trends are vast and far-reaching. By understanding patterns and trends in health data, policymakers can make informed decisions about resource allocation and public health initiatives.

Healthcare providers can use these insights to develop personalized treatment plans and interventions. Researchers can uncover new insights into disease progression and identify potential targets for intervention.

Ultimately, analyzing health trends has the potential to improve overall population health and reduce healthcare costs.

Project Idea #2: Movie Recommendation System

When developing a movie recommendation system, there are several data sources that can be used to gather information about movies and user preferences. One popular data source is the MovieLens dataset, which contains a large collection of movie ratings provided by users.

Another source is IMDb, a trusted website that provides comprehensive information about movies, including user ratings and reviews. Additionally, streaming platforms like Netflix and Amazon Prime also provide access to user ratings and viewing history, which can be valuable for building an accurate recommendation system.

There are several analysis approaches that can be employed to build a movie recommendation system. One common approach is collaborative filtering, which uses user ratings and preferences to identify patterns and make recommendations based on similar users’ preferences.

Another approach is content-based filtering, which analyzes the characteristics of movies (such as genre, director, and actors) to recommend similar movies to users. Hybrid approaches that combine both collaborative and content-based filtering techniques are also popular, as they can provide more accurate and diverse recommendations.

A movie recommendation system has numerous applications in the entertainment industry. One application is to enhance the user experience on streaming platforms by providing personalized movie recommendations based on individual preferences.

This can help users discover new movies they might enjoy and improve overall satisfaction with the platform. Additionally, movie recommendation systems can be used by movie production companies to analyze user preferences and trends, aiding in the decision-making process for creating new movies.

Finally, movie recommendation systems can also be utilized by movie critics and reviewers to identify movies that are likely to be well-received by audiences.

For more information on movie recommendation systems, you can visit https://www.kaggle.com/rounakbanik/movie-recommender-systems or https://www.researchgate.net/publication/221364567_A_new_movie_recommendation_system_for_large-scale_data .

Project Idea #3: Optimizing Traffic Patterns

When it comes to optimizing traffic patterns, there are several data sources that can be utilized. One of the most prominent sources is real-time traffic data collected from various sources such as GPS devices, traffic cameras, and mobile applications.

This data provides valuable insights into the current traffic conditions, including congestion, accidents, and road closures. Additionally, historical traffic data can also be used to identify recurring patterns and trends in traffic flow.

Other data sources that can be used include weather data, which can help in understanding how weather conditions impact traffic patterns, and social media data, which can provide information about events or incidents that may affect traffic.

Optimizing traffic patterns requires the use of advanced data analysis techniques. One approach is to use machine learning algorithms to predict traffic patterns based on historical and real-time data.

These algorithms can analyze various factors such as time of day, day of the week, weather conditions, and events to predict traffic congestion and suggest alternative routes.

Another approach is to use network analysis to identify bottlenecks and areas of congestion in the road network. By analyzing the flow of traffic and identifying areas where traffic slows down or comes to a halt, transportation authorities can make informed decisions on how to optimize traffic flow.

The optimization of traffic patterns has numerous applications and benefits. One of the main benefits is the reduction of traffic congestion, which can lead to significant time and fuel savings for commuters.

By optimizing traffic patterns, transportation authorities can also improve road safety by reducing the likelihood of accidents caused by congestion.

Additionally, optimizing traffic patterns can have positive environmental impacts by reducing greenhouse gas emissions. By minimizing the time spent idling in traffic, vehicles can operate more efficiently and emit fewer pollutants.

Furthermore, optimizing traffic patterns can have economic benefits by improving the flow of goods and services. Efficient traffic patterns can reduce delivery times and increase productivity for businesses.

Project Idea #4: Forecasting Cryptocurrency Prices

With the growing popularity of cryptocurrencies like Bitcoin and Ethereum, forecasting their prices has become an exciting and challenging task for data scientists. This project idea involves using historical data to predict future price movements and trends in the cryptocurrency market.

When working on this project, data scientists can gather cryptocurrency price data from various sources such as cryptocurrency exchanges, financial websites, or APIs. Websites like CoinMarketCap (https://coinmarketcap.com/) provide comprehensive data on various cryptocurrencies, including historical price data.

Additionally, platforms like CryptoCompare (https://www.cryptocompare.com/) offer real-time and historical data for different cryptocurrencies.

To forecast cryptocurrency prices, data scientists can employ various analysis approaches. Some common techniques include:

  • Time Series Analysis: This approach involves analyzing historical price data to identify patterns, trends, and seasonality in cryptocurrency prices. Techniques like moving averages, autoregressive integrated moving average (ARIMA), or exponential smoothing can be used to make predictions.
  • Machine Learning: Machine learning algorithms, such as random forests, support vector machines, or neural networks, can be trained on historical cryptocurrency data to predict future price movements. These algorithms can consider multiple variables, such as trading volume, market sentiment, or external factors, to make accurate predictions.
  • Sentiment Analysis: This approach involves analyzing social media sentiment and news articles related to cryptocurrencies to gauge market sentiment. By considering the collective sentiment, data scientists can predict how positive or negative sentiment can impact cryptocurrency prices.

Forecasting cryptocurrency prices can have several practical applications:

  • Investment Decision Making: Accurate price forecasts can help investors make informed decisions when buying or selling cryptocurrencies. By considering the predicted price movements, investors can optimize their investment strategies and potentially maximize their returns.
  • Trading Strategies: Traders can use price forecasts to develop trading strategies, such as trend following or mean reversion. By leveraging predicted price movements, traders can make profitable trades in the volatile cryptocurrency market.
  • Risk Management: Cryptocurrency price forecasts can help individuals and organizations manage their risk exposure. By understanding potential price fluctuations, risk management strategies can be implemented to mitigate losses.

Project Idea #5: Predicting Flight Delays

One interesting and practical data science capstone project idea is to create a model that can predict flight delays. Flight delays can cause a lot of inconvenience for passengers and can have a significant impact on travel plans.

By developing a predictive model, airlines and travelers can be better prepared for potential delays and take appropriate actions.

To create a flight delay prediction model, you would need to gather relevant data from various sources. Some potential data sources include:

  • Flight data from airlines or aviation organizations
  • Weather data from meteorological agencies
  • Historical flight delay data from airports

By combining these different data sources, you can build a comprehensive dataset that captures the factors contributing to flight delays.

Once you have collected the necessary data, you can employ different analysis approaches to predict flight delays. Some common approaches include:

  • Machine learning algorithms such as decision trees, random forests, or neural networks
  • Time series analysis to identify patterns and trends in flight delay data
  • Feature engineering to extract relevant features from the dataset

By applying these analysis techniques, you can develop a model that can accurately predict flight delays based on the available data.

The applications of a flight delay prediction model are numerous. Airlines can use the model to optimize their operations, improve scheduling, and minimize disruptions caused by delays. Travelers can benefit from the model by being alerted in advance about potential delays and making necessary adjustments to their travel plans.

Additionally, airports can use the model to improve resource allocation and manage passenger flow during periods of high delay probability. Overall, a flight delay prediction model can significantly enhance the efficiency and customer satisfaction in the aviation industry.

Project Idea #6: Fighting Fake News

With the rise of social media and the easy access to information, the spread of fake news has become a significant concern. Data science can play a crucial role in combating this issue by developing innovative solutions.

Here are some aspects to consider when working on a project that aims to fight fake news.

When it comes to fighting fake news, having reliable data sources is essential. There are several trustworthy platforms that provide access to credible news articles and fact-checking databases. Websites like Snopes and FactCheck.org are good starting points for obtaining accurate information.

Additionally, social media platforms such as Twitter and Facebook can be valuable sources for analyzing the spread of misinformation.

One approach to analyzing fake news is by utilizing natural language processing (NLP) techniques. NLP can help identify patterns and linguistic cues that indicate the presence of misleading information.

Sentiment analysis can also be employed to determine the emotional tone of news articles or social media posts, which can be an indicator of potential bias or misinformation.

Another approach is network analysis, which focuses on understanding how information spreads through social networks. By analyzing the connections between users and the content they share, it becomes possible to identify patterns of misinformation dissemination.

Network analysis can also help in identifying influential sources and detecting coordinated efforts to spread fake news.

The applications of a project aiming to fight fake news are numerous. One possible application is the development of a browser extension or a mobile application that provides users with real-time fact-checking information.

This tool could flag potentially misleading articles or social media posts and provide users with accurate information to help them make informed decisions.

Another application could be the creation of an algorithm that automatically identifies fake news articles and separates them from reliable sources. This algorithm could be integrated into news aggregation platforms to help users distinguish between credible and non-credible information.

Project Idea #7: Analyzing Social Media Sentiment

Social media platforms have become a treasure trove of valuable data for businesses and researchers alike. When analyzing social media sentiment, there are several data sources that can be tapped into. The most popular ones include:

  • Twitter: With its vast user base and real-time nature, Twitter is often the go-to platform for sentiment analysis. Researchers can gather tweets containing specific keywords or hashtags to analyze the sentiment of a particular topic.
  • Facebook: Facebook offers rich data for sentiment analysis, including posts, comments, and reactions. Analyzing the sentiment of Facebook posts can provide valuable insights into user opinions and preferences.
  • Instagram: Instagram’s visual nature makes it an interesting platform for sentiment analysis. By analyzing the comments and captions on Instagram posts, researchers can gain insights into the sentiment associated with different images or topics.
  • Reddit: Reddit is a popular platform for discussions on various topics. By analyzing the sentiment of comments and posts on specific subreddits, researchers can gain insights into the sentiment of different communities.

These are just a few examples of the data sources that can be used for analyzing social media sentiment. Depending on the research goals, other platforms such as LinkedIn, YouTube, and TikTok can also be explored.

When it comes to analyzing social media sentiment, there are various approaches that can be employed. Some commonly used analysis techniques include:

  • Lexicon-based analysis: This approach involves using predefined sentiment lexicons to assign sentiment scores to words or phrases in social media posts. By aggregating these scores, researchers can determine the overall sentiment of a post or a collection of posts.
  • Machine learning: Machine learning algorithms can be trained to classify social media posts into positive, negative, or neutral sentiment categories. These algorithms learn from labeled data and can make predictions on new, unlabeled data.
  • Deep learning: Deep learning techniques, such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs), can be used to capture the complex patterns and dependencies in social media data. These models can learn to extract sentiment information from textual or visual content.

It is important to note that the choice of analysis approach depends on the specific research objectives, available resources, and the nature of the social media data being analyzed.

Analyzing social media sentiment has a wide range of applications across different industries. Here are a few examples:

  • Brand reputation management: By analyzing social media sentiment, businesses can monitor and manage their brand reputation. They can identify potential issues, respond to customer feedback, and take proactive measures to maintain a positive image.
  • Market research: Social media sentiment analysis can provide valuable insights into consumer opinions and preferences. Businesses can use this information to understand market trends, identify customer needs, and develop targeted marketing strategies.
  • Customer feedback analysis: Social media sentiment analysis can help businesses understand customer satisfaction levels and identify areas for improvement. By analyzing sentiment in customer feedback, companies can make data-driven decisions to enhance their products or services.
  • Public opinion analysis: Researchers can analyze social media sentiment to study public opinion on various topics, such as political events, social issues, or product launches. This information can be used to understand public sentiment, predict trends, and inform decision-making.

These are just a few examples of how analyzing social media sentiment can be applied in real-world scenarios. The insights gained from sentiment analysis can help businesses and researchers make informed decisions, improve customer experience, and drive innovation.

Project Idea #8: Improving Online Ad Targeting

Improving online ad targeting involves analyzing various data sources to gain insights into users’ preferences and behaviors. These data sources may include:

  • Website analytics: Gathering data from websites to understand user engagement, page views, and click-through rates.
  • Demographic data: Utilizing information such as age, gender, location, and income to create targeted ad campaigns.
  • Social media data: Extracting data from platforms like Facebook, Twitter, and Instagram to understand users’ interests and online behavior.
  • Search engine data: Analyzing search queries and user behavior on search engines to identify intent and preferences.

By combining and analyzing these diverse data sources, data scientists can gain a comprehensive understanding of users and their ad preferences.

To improve online ad targeting, data scientists can employ various analysis approaches:

  • Segmentation analysis: Dividing users into distinct groups based on shared characteristics and preferences.
  • Collaborative filtering: Recommending ads based on users with similar preferences and behaviors.
  • Predictive modeling: Developing algorithms to predict users’ likelihood of engaging with specific ads.
  • Machine learning: Utilizing algorithms that can continuously learn from user interactions to optimize ad targeting.

These analysis approaches help data scientists uncover patterns and insights that can enhance the effectiveness of online ad campaigns.

Improved online ad targeting has numerous applications:

  • Increased ad revenue: By delivering more relevant ads to users, advertisers can expect higher click-through rates and conversions.
  • Better user experience: Users are more likely to engage with ads that align with their interests, leading to a more positive browsing experience.
  • Reduced ad fatigue: By targeting ads more effectively, users are less likely to feel overwhelmed by irrelevant or repetitive advertisements.
  • Maximized ad budget: Advertisers can optimize their budget by focusing on the most promising target audiences.

Project Idea #9: Enhancing Customer Segmentation

Enhancing customer segmentation involves gathering relevant data from various sources to gain insights into customer behavior, preferences, and demographics. Some common data sources include:

  • Customer transaction data
  • Customer surveys and feedback
  • Social media data
  • Website analytics
  • Customer support interactions

By combining data from these sources, businesses can create a comprehensive profile of their customers and identify patterns and trends that will help in improving their segmentation strategies.

There are several analysis approaches that can be used to enhance customer segmentation:

  • Clustering: Using clustering algorithms to group customers based on similar characteristics or behaviors.
  • Classification: Building predictive models to assign customers to different segments based on their attributes.
  • Association Rule Mining: Identifying relationships and patterns in customer data to uncover hidden insights.
  • Sentiment Analysis: Analyzing customer feedback and social media data to understand customer sentiment and preferences.

These analysis approaches can be used individually or in combination to enhance customer segmentation and create more targeted marketing strategies.

Enhancing customer segmentation can have numerous applications across industries:

  • Personalized marketing campaigns: By understanding customer preferences and behaviors, businesses can tailor their marketing messages to individual customers, increasing the likelihood of engagement and conversion.
  • Product recommendations: By segmenting customers based on their purchase history and preferences, businesses can provide personalized product recommendations, leading to higher customer satisfaction and sales.
  • Customer retention: By identifying at-risk customers and understanding their needs, businesses can implement targeted retention strategies to reduce churn and improve customer loyalty.
  • Market segmentation: By identifying distinct customer segments, businesses can develop tailored product offerings and marketing strategies for each segment, maximizing the effectiveness of their marketing efforts.

Project Idea #10: Building a Chatbot

A chatbot is a computer program that uses artificial intelligence to simulate human conversation. It can interact with users in a natural language through text or voice. Building a chatbot can be an exciting and challenging data science capstone project.

It requires a combination of natural language processing, machine learning, and programming skills.

When building a chatbot, data sources play a crucial role in training and improving its performance. There are various data sources that can be used:

  • Chat logs: Analyzing existing chat logs can help in understanding common user queries, responses, and patterns. This data can be used to train the chatbot on how to respond to different types of questions and scenarios.
  • Knowledge bases: Integrating a knowledge base can provide the chatbot with a wide range of information and facts. This can be useful in answering specific questions or providing detailed explanations on certain topics.
  • APIs: Utilizing APIs from different platforms can enhance the chatbot’s capabilities. For example, integrating a weather API can allow the chatbot to provide real-time weather information based on user queries.

There are several analysis approaches that can be used to build an efficient and effective chatbot:

  • Natural Language Processing (NLP): NLP techniques enable the chatbot to understand and interpret user queries. This involves tasks such as tokenization, part-of-speech tagging, named entity recognition, and sentiment analysis.
  • Intent recognition: Identifying the intent behind user queries is crucial for providing accurate responses. Machine learning algorithms can be trained to classify user intents based on the input text.
  • Contextual understanding: Chatbots need to understand the context of the conversation to provide relevant and meaningful responses. Techniques such as sequence-to-sequence models or attention mechanisms can be used to capture contextual information.

Chatbots have a wide range of applications in various industries:

  • Customer support: Chatbots can be used to handle customer queries and provide instant support. They can assist with common troubleshooting issues, answer frequently asked questions, and escalate complex queries to human agents when necessary.
  • E-commerce: Chatbots can enhance the shopping experience by assisting users in finding products, providing recommendations, and answering product-related queries.
  • Healthcare: Chatbots can be deployed in healthcare settings to provide preliminary medical advice, answer general health-related questions, and assist with appointment scheduling.

Building a chatbot as a data science capstone project not only showcases your technical skills but also allows you to explore the exciting field of artificial intelligence and natural language processing.

It can be a great opportunity to create a practical and useful tool that can benefit users in various domains.

Completing an in-depth capstone project is the perfect way for data science students to demonstrate their technical skills and business acumen. This guide outlined 10 unique project ideas spanning industries like healthcare, transportation, finance, and more.

By identifying the ideal data sources, analysis techniques, and practical applications for their chosen project, students can produce an impressive capstone that solves real-world problems and showcases their abilities.

Similar Posts

High-Paying Computer Science Associate Degree Jobs

High-Paying Computer Science Associate Degree Jobs

In today’s technology-driven world, a computer science associate degree can open the door to lucrative and rewarding careers. Whether you’re a recent high school grad or looking to change careers, an associate degree in computer science provides the technical skills and knowledge needed to qualify for many in-demand jobs. If you’re short on time, here’s…

Pursuing A Part-Time Phd In Computer Science: What You Need To Know

Pursuing A Part-Time Phd In Computer Science: What You Need To Know

Earning a PhD is the pinnacle of academic achievement in computer science, opening doors to research, teaching, and leadership roles. But taking 4+ years off work for a full-time program isn’t feasible for everyone. Part-time PhD options allow professionals to attain this goal while continuing their careers. If you’re short on time, here’s the key…

Is A Communications Degree A Bachelor Of Arts Or Bachelor Of Science?

Is A Communications Degree A Bachelor Of Arts Or Bachelor Of Science?

In today’s world, a communications degree can open many doors and lead to exciting careers in fields like public relations, marketing, journalism, advertising, and more. But one question that often comes up when deciding to pursue this versatile degree is: will I earn a Bachelor of Arts (BA) or a Bachelor of Science (BS) in…

Is Human Geography A Social Science? Examining The Field

Is Human Geography A Social Science? Examining The Field

Human geography studies the relationship between humans and their environment. If you’re interested in majoring in this field, you likely want to know – is human geography a social science? While it contains some spatial and scientific elements, human geography is fundamentally grounded in the social sciences. If you’re short on time, here’s a quick…

The Top 10 Cities For Computer Science Jobs And Careers

The Top 10 Cities For Computer Science Jobs And Careers

For tech professionals, location matters. Some cities offer far more opportunities for computer science careers than others. If you’re looking to jumpstart your CS job search, targeting areas with thriving tech scenes and plentiful programming jobs is key. If you’re short on time, here’s a quick answer: The top cities for computer science jobs are…

What Does It Mean To Believe In Science Over Religion?

What Does It Mean To Believe In Science Over Religion?

In an age of rapidly advancing technology and scientific discoveries, the line between science and faith is increasingly blurred for many people. If you feel yourself siding more with science over religion, you’re not alone. If you’re short on time, here’s a quick answer: Believing in science over religion means trusting empirical evidence and the…

Capstone Projects

The capstone project experience.

In the final two quarters of the program, students gain real world experience working in small groups on a data science challenge facing a company or not-for-profit. At the conclusion of the capstone project, sponsoring organizations are invited to attend a formal Capstone Event where students showcase their work. Capstone projects typically span a wide range of interests, including energy, agriculture, retail, urban planning, healthcare, marketing, and education.

Examples of Previous Capstone Sponsors

  • Applied Physics Lab, UW
  • Civil & Environmental Engineering, WSU
  • Equal Opportunity Schools
  • The Hershey Company
  • Jacksonville Zoo and Gardens
  • Kids on 45th
  • Seattle Children’s Hospital
  • Urban Planning, UW
  • Virginia Mason

Capstone Archives

Capstone projects take a variety of forms.  These include, but are not limited to, dashboard development, data analysis, pipeline building, and machine learning models.  The scope and goal of each project is developed to satisfy sponsor needs and student interests.

2024 Cohort

In 2024 sixteen teams presented capstone posters at our MSDS co-working space.  These projects included audio signal analysis (Bats!, SonarSquad, Hydrophonatics), pipeline development (Ocastra, Virufy), dashboards (DataNuggets, EqualOpportunitySchools, Koalified), image analysis (Diateam, TreeMusketeers, PixelPioneers), large language model tools (EquityEngine, MetaMinds, SCubed, Trojans), and data collection and analysis (Virgina Mason).  Many of these projects combined data collection, analysis, modeling, and dashboard development.

Please find PDF versions of all posters here .  (These files are enclosed in a zip folder for your convenience.)

Gather Interactive Archives

retail data science capstone project

Due to the pandemic, our Capstone 2021 was held entirely online in the Gather.Town platform , to which we added galleries of our 2020 and 2022 Capstone projects for an archive you can digitally wander and browse.

Gather presents a map-based, interactive platform where you can wander among projects, see media like posters, infographics, and video, and do video/audio chat with others who are logged into the space. You can read some basics about using this platform at the Gather site. One of the other benefits of Gather is that it created a persistent archive of our Capstone 2020-2022 projects, which you can view and digitally wander among here:

https://tinyurl.com/msdsfair

Admissions timelines.

Applications are now open for Autumn 2025

International Deadline: January 7, 2025 at 11:59pm PST

Domestic Deadline: January 14, 2025 at 11:59pm PST

Information Sessions

Upcoming online information sessions:

  • Oct 30, 12:00pm PDT Register Here

Admissions Updates

Be boundless, connect with us:.

© 2024 University of Washington | Seattle, WA

retail data science capstone project

Home

Main navigation

  • Undergraduate Programs
  • Bachelor of Commerce
  • MBA Programs
  • MM in Analytics
  • MM in Finance
  • MM in Retailing
  • Global Manufacturing and Supply Chain Management
  • Graduate Certificate in Healthcare Management
  • Graduate Certificate in Professional Accounting
  • McGill-HEC Montréal Executive MBA
  • McGill Executive Institute
  • International Masters for Health Leadership
  • International Masters Program for Managers
  • PhD in Management
  • McGill Personal Finance Essentials
  • McGill Dobson Centre for Entrepreneurship
  • Career Management
  • Marcel Desautels Institute for Integrated Management (MDIIM)
  • Equity, Diversity and Inclusion (EDI)
  • Laidley Centre for Business Ethics and Equity (LCBEE)
  • Sustainability
  • Sustainable Growth Initiative (SGI)
  • Entrepreneurship & Innovation Initiative (E&I)
  • Managing Disruption: Analytics, Advanced Digital Technologies and AI (AAAI)
  • TEST Home page

In-Person Information Session Meet us at our Montreal downtown campus on  Sunday, October 27, 2024, from 2:30 to 3:30 p.m

McGill MMA Experiential Module

McGill MMA (EXP)

Real-world Exposure through the Experiential Module

  • Corporate projects
  • Community projects
  • Our community
  • Industry partners

Need help with your business analytics needs? Learn how MMA can help you or your partners at no cost.

On this page: →  EXP Analytics Consulting module → Capstone projects →  Experiential Learning Spotlight →  Experiential Coaches

As core to the Master of Management in Analytics (MMA) program, the EXP Analytics Consulting module has McGill MMA students working alongside Industry professionals solving a significant Data & Analytics problem, aimed to boost the client’s top or bottom lines.

Be part of a Data Science team of 4 student specialists

  • Business Strategist - What is the problem? - How do we solve it?
  • Data Analyst/Modeler - Identify core data needs - Define formulas/algorithms
  • Data Engineer/Coder - Automate data sourcing - Integrate solution components
  • Visualization/UI Designer - Design front end for best user adoption - Articulate User Experience

With the McGill MMA (EXP) Analytics projects, you get full structural integrity to drive a strong result.

MMA (EXP) Cycle

Build a Data Driven Solution over the Program Long Tenure

MMA (EXP) Program Structure

All students undertake a technical consulting role by working in teams with real companies and attempting to solve a live data-driven problem .

  • Produce a robust analytic solution over 10 months
  • Practice using real data and market-leading software
  • Benefit from industry mentorship and faculty coaching
  • Gain unparalleled training for the job market

Capstone projects

The Master of Management in Analytics’ experiential capstone projects are year-long opportunities where student teams get to work in the private and public sectors to solve pressing issues of the day using data analytics. Here are a few examples of our student's real-world projects.

Professional Services-Consulting Forensic Analytics Anomaly Detection

The world’s governments and large companies spend trillions of dollars on procurement each year. That is a lot of transactions – and an enormous amount of data. Identifying fraudulent transactions is no easy task, but it can help protect a company’s reputation, and avoid costly fines from regulators. To do that, large organizations often work with auditors at large professional services firms like KPMG, which specializes in this type of work. With such large and complex data sets, automation is key to efficiency. And Master of Management Analytics students have been working with the Big 4 accounting firm to apply analytics techniques to detecting anomalies in these data sets – and helping to identify suspicious transactions.

Public Sector-Provincial Govt Communications & Media Buying

Public Media Topic Modeling & Media Channel Optimization

Governments represent the voters who elected them – but the halls of power can be pretty far removed from the everyday experiences of ordinary people. Topic modelling can help bridge this gap, and Master of Management Analytics have used the technique to help the Government of Ontario understand what their voters care about most. Topic modelling analyzes a set of textual document to search for specific topics, and how frequently they are being discussed. Governments can use that information to shape communications strategies, and ensure they are focused what people care about the most.

Retail: Consumer Goods Digital Retail: Beauty

Product Recommendation & Bundling Engine

Loblaws began as a single grocery store in Toronto, but it grew in to a retail giant with stores across the country. Today, Loblaws is Canada’s largest retail chain by revenue, and it sells a lot more than groceries. The company has branched out in to clothing, household items, pharmacy, and beauty products, and its PC Optimum loyalty rewards program is one way the company nudges its customers to buy additional products from the company. Master of Management Analytics students have worked with Loblaws to develop a product recommendation and bundling engine that will help them identify which products would complement other purchases.

MMA Experiential Learning Spotlight

MMA Experiential Learning Spotlight

The experiential learning project offers students the chance to gain valuable hands-on experience and develop their skills in analytics while making an impact on their professional growth. From real-world experience to mentorship from industry leaders, learn more about the benefits of the MMA program.

MMA Experiential Coaches

Complementing the Academic side of the learning, Professional Coaches play an integral role in helping navigate students through expectations of the MMA industry projects as well as that of the course. They ensure that students keep the project on point, guide them through aspects of the deliverables that are unique to each client and mentor on effective and successful collaboration with client teams. Leveraging their industry expertise, they also act as advisers on solution development and challenge students to get to the edge of their abilities.

retail data science capstone project

Dino Stamatiou, MSc Director of Business Intelligence |  Tempo Software

Dino guides MMA students as an Analytics Consulting Coach to share his experiences in having designed, developed, and delivered mission-critical decision support and analytics solutions for leading corporations across various industries and ranging from Fortune 100s to technology startups. His passion lies in building high performing teams, tackling real-world challenges using data, and elevating the analytical capabilities of his clients and stakeholders.

retail data science capstone project

Kenneth Richardson, MBA Consultant, Strategist | AI/Data/Finance/Sales

Ken guides MMA students as an Analytics Consulting Coach to share his experiences in being passionate about distilling disparate sources of information into key points of relevant business or academic knowledge in order to help coach people or organizations to operate more successfully.

retail data science capstone project

Dimitris Lianoudakis, MSc Founder and Principal |  LP Group Payments Consulting

Dimitris guides MMA students as an Analytics Consulting Coach to share his experiences in having built data science teams from the ground up with a focus on learning and development and a proven track record of delivering something of value to the customer. Dimitris has worked in the intersection of payments and data for over 10 years across multiple countries. He's held positions in Business Intelligence, Product, Payments, and various senior leadership roles which he leverages to quickly adapt to a changing market.

If you are interested in becoming an MMA coach please feel free to connect with us.

Become an MMA coach

Department and University Information

Desautels faculty of management mcgill university.

McGill Desautels Faculty of Management

  • Bachelor of Commerce (BCom)
  • Master of Management in Analytics (MMA)
  • Master of Management in Finance (MMF)
  • Master of Management in Retailing (MMR)
  • Global Manufacturing and Supply Chain Management Program (GMSCM)
  • Graduate Certificate in Healthcare Management (GCHM)
  • Graduate Certificate in Professional Accounting (GCPA Program)
  • Executive MBA
  • McGill Executive Institute (MEI)
  • International Masters for Health Leadership (IMHL)
  • International Masters Program for Managers (IMPM)
  • Desautels at a Glance
  • Marcel Desautels
  • Administration & Governance
  • Desautels Strategic Plan 2025
  • Equity, Diversity and Inclusion
  • Academic Integrity
  • International Advisory Board
  • Desautels Global Experts
  • Delve Thought Leadership
  • Search the Desautels directory
  • Areas of specialization
  • Desautels 22: Top-tier Publications
  • Research publications
  • Research centres
  • McGill Centre for the Convergence of Health and Economics (MCCHE)
  • Desautels alumni
  • Get involved
  • Support Desautels
  • News and social
  • Desautels Stories
  • DesautelsConnect on 10KC
  • Working at Desautels
  • Student Hub
  • Casual payroll

Capstone Projects

M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project over the course of two semesters. 

Most projects are sponsored by an organization—academic, commercial, non-profit, and government—seeking valuable recommendations to address strategic and operational issues. Depending on the needs of the sponsor, teams may develop web-based applications that can support ongoing decision-making. The capstone project concludes with a paper and presentation.

Key takeaways:

  • Synthesizing the concepts you have learned throughout the program in various courses (this requires that the question posed by the project be complex enough to require the application of appropriate analytical approaches learned in the program and that the available data be of sufficient size to qualify as ‘big’)
  • Experience working with ‘raw’ data exposing you to the data pipeline process you are likely to encounter in the ‘real world’  
  • Demonstrating oral and written communication skills through a formal paper and presentation of project outcomes  
  • Acquisition of team building skills on a long-term, complex, data science project 
  • Addressing an actual client’s need by building a data product that can be shared with the client

Capstone projects have been sponsored by a variety of organizations and industries, including: Capital One, City of Charlottesville, Deloitte Consulting LLP, Metropolitan Museum of Art, MITRE Corporation, a multinational banking firm, The Public Library of Science, S&P Global Market Intelligence, UVA Brain Institute, UVA Center for Diabetes Technology, UVA Health System, U.S. Army Research Laboratory, Virginia Department of Health, Virginia Department of Motor Vehicles, Virginia Office of the Governor, Wikipedia, and more. 

Sponsor a Capstone Project  

View previous examples of capstone projects  and check out answers to frequently asked questions. 

What does the process look like?

  • The School of Data Science periodically puts out a Call for Proposals . Prospective project sponsors submit official proposals, vetted by the Associate Director for Research Development, Capstone Director, and faculty.
  • Sponsors present their projects to students at “Pitch Day” near the start of the Fall term, where students have the opportunity to ask questions.
  • Students individually rank their top project choices. An algorithm sorts students into capstone groups of approximately 3 to 4 students per group.
  • Adjustments are made by hand as necessary to finalize groups.
  • Each group is assigned a faculty mentor, who will meet groups each week in a seminar-style format.

What is the seminar approach to mentoring capstones?

We utilize a seminar approach to managing capstones to provide faculty mentorship and streamlined logistics. This approach involves one mentor supervising three to four loosely related projects and meeting with these groups on a regular basis. Project teams often encounter similar roadblocks and issues so meeting together to share information and report on progress toward key milestones is highly beneficial.

Do all capstone projects have corporate sponsors?

Not necessarily. Generally, each group works with a sponsor from outside the School of Data Science. Some sponsors are corporations, some are from nonprofit and governmental organizations, and some are from in other departments at UVA.

One of the challenges we continue to encounter when curating capstone projects with external sponsors is appropriately scoping and defining a question that is of sufficient depth for our students, obtaining data of sufficient size, obtaining access to the data in sufficient time for adequate analysis to be performed and navigating a myriad of legal issues (including conflicts of interest). While we continue to strive to use sponsored projects and work to solve these issues, we also look for ways to leverage openly available data to solve interesting societal problems which allow students to apply the skills learned throughout the program. While not all capstones have sponsors, all capstones have clients. That is, the work is being done for someone who cares and has investment in the outcome. 

Why do we have to work in groups?

Because data science is a team sport!

All capstone projects are completed by group work. While this requires additional coordination , this collaborative component of the program reflects the way companies expect their employees to work. Building this skill is one of our core learning objectives for the program. 

I didn’t get my first choice of capstone project from the algorithm matching. What can I do?

Remember that the point of the capstone projects isn’t the subject matter; it’s the data science. Professional data scientists may find themselves in positions in which they work on topics assigned to them, but they use methods they enjoy and still learn much through the process. That said, there are many ways to tackle a subject, and we are more than happy to work with you to find an approach to the work that most aligns with your interests.

Your ability to influence which project you work on is in the ranking process after “pitch day” and in encouraging your company or department to submit a proposal during the Call for Proposal process. At a minimum it takes several months to work with a sponsor to adequately scope a project, confirm access to the data and put the appropriate legal agreements into place. Before you ever see a project presented on pitch day, a lot of work has taken place to get it to that point!

Can I work on a project for my current employer?

Each spring, we put forward a public call for capstone projects. You are encouraged to share this call widely with your community, including your employer, non-profit organizations, or any entity that might have a big data problem that we can help solve. As a reminder, capstone projects are group projects so the project would require sufficient student interest after ‘pitch day’. In addition, you (the student) cannot serve as the project sponsor (someone else within your employer organization must serve in that capacity).

If my project doesn’t have a corporate sponsor, am I losing out on a career opportunity?

The capstone project will provide you with the opportunity to do relevant, high-quality work which can be included on a resume and discussed during job interviews. The project paper and your code on Github will provide more career opportunities than the sponsor of the project. Although it does happen from time to time, it is rare that capstones lead to a direct job offer with the capstone sponsor's company. Capstone projects are just one networking opportunity available to you in the program.

Capstone Project Reflections From Alumni  

Theo Braimoh, MSDS Online Graduate and Admissions Student Ambassador

"For my Capstone project, I used Python to train machine learning models for visual analysis – also known as computer vision. Computer vision helped my Capstone team analyze the ergonomic posture of workers at risk of developing musculoskeletal injuries. We automated the process, and hope our work further protects the health and safety of people working in the United States.” — Theophilus Braimoh, MSDS Online Program 2023, Admissions Student Ambassador

Haley Egan, MSDS Online 2023 and Admissions Student Ambassador

“My Capstone experience with the ALMA Observatory and NRAO was a pivotal chapter in my UVA Master’s in Data Science journey. It fostered profound growth in my data science expertise and instilled a confidence that I'm ready to make meaningful contributions in the professional realm.” — Haley Egan, MSDS Online Program 2023, Admissions Student Ambassador

Mina Kim, MSDS/PhD 2023

“Our Capstone projects gave us the opportunity to gain new domain knowledge and answer big data questions beyond the classroom setting.” — Mina Kim, MSDS Residential Program 2023, Ph.D. in Psychology Candidate

Capstone Project Reflections From Sponsors  

“For us, the level of expertise, and special expertise, of the capstone students gives us ‘extra legs’ and an extra push to move a project forward. The team was asked to provide a replicable prototype air quality sensor that connected to the Cville Things Network, a free and community supported IoT network in Charlottesville. Their final product was a fantastic example that included clear circuit diagrams for replication by citizen scientists.” — Lucas Ames, Founder, Smart Cville
“Working with students on an exploratory project allowed us to focus on the data part of the problem rather than the business part, while testing with little risk. If our hypothesis falls flat, we gain valuable information; if it is validated or exceeded, we gain valuable information and are a few steps closer to a new product offering than when we started.” — Ellen Loeshelle, Senior Director of Product Management, Clarabridge

Image of a beer being poured

Data Science Capstone Project Examines COVID's Impact on Alcohol-Related Health Incidents at UVA

electrolarynx capstone project

Student Capstone Project Looks To Improve Electrolarynx Speech-to-Text

Research in library

Master’s Students Strengthen Ability of LLM to Recommend Scholarly Works

Tonal California image upward toward bright sun with palm trees

My MSDS Capstone Project: Predicting California’s Hydroclimate

David Diaz addresses the audience during his group's capstone project presentation. (Photo by Alyssa Brown)

Data Science Master’s Students Tackle Diverse, Real-World Challenges in Capstone Projects

Get the latest news.

Subscribe to receive updates from the School of Data Science.

  • Prospective Student
  • School of Data Science Alumnus
  • UVA Affiliate
  • Industry Member

Data Science: Capstone

Show what you’ve learned from the Professional Certificate Program in Data Science.

Stained glass windows arranged in a spiraling shape

  • Introductory

Associated Schools

Harvard T.H. Chan School of Public Health

Harvard T.H. Chan School of Public Health

What you'll learn.

How to apply the knowledge base and skills learned throughout the series to a real-world problem

Independently work on a data analysis project

Course description

To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

Unlike the rest of our Professional Certificate Program in Data Science, in this course, you will receive much less guidance from the instructors. When you complete the project you will have a data product to show off to potential employers or educational programs, a strong indicator of your expertise in the field of data science.

Instructors

Rafael Irizarry

Rafael Irizarry

You may also like.

Colorful confetti against a blue background

Data Science: Probability

Learn probability theory — essential for a data scientist — using a case study on the financial crisis of 2007–2008.

Purple and teal geometric shapes

Data Science: Inference and Modeling

Learn inference and modeling: two of the most widely used statistical tools in data analysis.

lines of genomic data (dna is made up of sequences of a, t, g, c)

High-Dimensional Data Analysis

A focus on several techniques that are widely used in the analysis of high-dimensional data.

Join our list to learn more

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

I wanted to showcase to companies how can data analysis help improve the company profit.

ghadikq/Capstone_Project_Online_Retail

Folders and files, repository files navigation, capstone project online retail.

Understand data better and extract insights from it to provide insight for decision-makers to improve company marketing and increase sales. Also, showcase how can adding a recommender engine help to increase the company sales.

RESEARCH QUESTIONS

  • COUNTRY - How many customers from different country , dose profit change?
  • QUANTITY - How dose quantity trend change based on date?
  • PROFIT - How was the profit for this year based on months and days?
  • PRODUCTS - What is most sold products in the store?

This analysis is on Online_Retail_II dataset provided by UCI Link .

The dataset contain transactions occurring for a UK-based and registered online shop , The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers.

The dataset contains the following variables:

You can access the dataset directly from data folder in this repository.

Repository content

  • Online Retail R Markdown.
  • Online Retail html.
  • Online Retail ppt slide for presentation.
  • HTML 100.0%

IMAGES

  1. Project 2 online retail store

    retail data science capstone project

  2. Data Science Capstone Project: Predicting Purchases with Machine Learning

    retail data science capstone project

  3. GitHub

    retail data science capstone project

  4. Data Science Capstone Project · Issue #1 · Rohit112r/Capstone-Project

    retail data science capstone project

  5. GitHub

    retail data science capstone project

  6. GitHub

    retail data science capstone project

VIDEO

  1. Advanced Data Science Capstone Sara Iaccheo

  2. Aldie Adrian

  3. Discusing Capstone project chapter 2 review of related literature part 2

  4. IBM Coursera Advanced Data Science Capstone

  5. Data Science Capstone Project Spotlight: Language Detection App

  6. Univariate Time Series Forecasting Analysis ARIMA Prophet Data Science Capstone Project R Software

COMMENTS

  1. Simplilearn Capstone Project: Retail

    I worked on this capstone project towards completion of final assessment for PGP in Data Science course from Simplilearn-Purdue University. My job was to analyze transactional data for a online retail company and create customer segementation so that company can create effective marketing campaign.

  2. Capstone_Project_2_Retail_Sales_Prediction

    AlmaBetter Capstone Project -Machine Learning Project type: Regression. Sales forecasting is an approach retailers use to anticipate future sales by analyzing past sales, identifying trends, and projecting data into the future. - GitHub - samchak18/Capstone_Project_2_Retail_Sales_Prediction: AlmaBetter Capstone Project -Machine Learning Project type: Regression.

  3. Retail Data Analytics. A Data Science Portfolio Project using…

    In this article I want to present you a full data science portfolio project. In this project I want to perform Retail Data Analytics using the Amazon Web Service and different Machine Learning Algorithms. The full code including a project proposal and a final project report can be found in my Github repository. Definition

  4. Simplilearn AI Capstone-Retail with accuracy 95%

    Explore and run machine learning code with Kaggle Notebooks | Using data from [Private Datasource] Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed!

  5. Customer Lifetime Value Product Recommendation for Retail

    Part 1. Predictive Customer Lifetime Value and Product Recommendation for Retail. 1.Exploratory Data Analysis (EDA) Upon receiving the data for the first client, we realized that the product items listed were in a semi-structured format. That is, some of the item names were in a "product name - color" format, although there were many ...

  6. Capstone Project on Retail and Marketing Analysis

    Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it.

  7. A Machine Learning project for retail data analytics as part of the

    The notebook 1_Data_Exploration.ipynb contains some code for the data analysis of the dataset. The notebook 2_Create_Train_and_Test_Data.ipynb contains the code for merging all data together and creating the final csv files for training and testing. The folder Documentation contains the Proposal for this Project.

  8. 10 Unique Data Science Capstone Project Ideas

    Project Idea #10: Building a Chatbot. A chatbot is a computer program that uses artificial intelligence to simulate human conversation. It can interact with users in a natural language through text or voice. Building a chatbot can be an exciting and challenging data science capstone project.

  9. Capstone Projects

    The Capstone Project Experience. In the final two quarters of the program, students gain real world experience working in small groups on a data science challenge facing a company or not-for-profit. At the conclusion of the capstone project, sponsoring organizations are invited to attend a formal Capstone Event where students showcase their work.

  10. Machine Learning for Sales Forecasting: A Capstone Project with

    Capstone projects are specifically designed to encourage students to think critically, solve challenging data science problems, and develop analytical skills. Two group of students built an end-to-end data science solution using Azure Machine Learning to accurately forecast sales.

  11. Simplilearn-Data-Science-Capstone-Project-03-Retail

    This is a transnational data set which contains all the transactions that occurred between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique and all-occasion gifts.

  12. Master of Management in Analytics (MMA)

    Capstone projects. The Master of Management in Analytics' experiential capstone projects are year-long opportunities where student teams get to work in the private and public sectors to solve pressing issues of the day using data analytics. Here are a few examples of our student's real-world projects. Professional Services-Consulting

  13. kc2019/DS_Capstone_Retail: Data Science Capstone Retail Project

    Data Science Capstone Retail Project. Contribute to kc2019/DS_Capstone_Retail development by creating an account on GitHub.

  14. Capstone Projects

    Capstone Projects. M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project ...

  15. Data Science: Capstone

    By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

  16. rohitlog/Data-Science-capstone-Retail-project

    rohitlog/Data-Science-capstone-Retail-project This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main

  17. Capstone Project

    Hello and welcome. I'm Vijayraj K Poojary and together with Snehil and Vinay, we will do our Capstone project presentation on "OList Marketing and Retail Ana...

  18. ghadikq/Capstone_Project_Online_Retail

    This analysis is on Online_Retail_II dataset provided by UCI Link. The dataset contain transactions occurring for a UK-based and registered online shop , The company mainly sells unique all-occasion gift-ware.