Skip to content

monamur7/Bellabeat_Case_Study_R_Tableau

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

BELLABEAT DATA ANALYSIS CASE STUDY


Author: Monalisa Murmu
Date: May 22, 2024

How Can a Wellness Technology Company Play It Smart?

Alt text

INTRODUCTION

Bellabeat is a high-tech company that manufactures health-focused products for women. Founded in 2013 by Urška Sršen and Sando Mur, Bellabeat has grown rapidly and quickly positioned itself as a wellness-tech company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products, including an app (Bellabeat), a wellness tracker (Leaf), a wellness smartwatch (Time), a smart water bottle (Spring) and a subscription-based membership program (Bellabeat) that provides users 24/7 access to fully personalised guidance on having a healthy lifestyle.

The CEO, Urška Sršen believes that analyzing smart device fitness data could help unlocking new growth opportunities for the company and they have the potential to become a larger player in the global smart device market. So, she has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices.

With the analysis report, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

ASK

Business Task

Analyze the usage of data from FitBit Fitness Tracker to uncover insights on user behaviour and preferences, and to provide data-driven recommendations for improving product features, marketing strategies, and user engagement.

  • Primary stakeholders:
    • Urška Sršen, Bellabeat’s cofounder & Chief Creative Officer and,
    • Sando Mur, Bellabeat’s cofounder & Mathematician
  • Secondary stakeholders:
    • Bellabeat marketing analytics team, a team of data analysts

PREPARE

Data Source: FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius)

The dataset has 18 CSV files representing 18 datasets but of these datasets, I will be using 5. The data also follows a ROCCC approach:

  1. Reliability: These datasets were generated by 30 FitBit respondents who consented to the submission of personal fitness tracker data to a distributed survey via Amazon Mechanical Turk.

  2. Original: The data is from 30 eligible Fitbit users who consented to the submission of personal tracker data, which includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

  3. Comprehensive: The data is stored in long format with each ID having data in multiple rows and it includes minute-level output for physical activity, heart rate, and sleep monitoring. The sample size being small presents a bias as a larger sample size would have been more representative of the population and the lack of demographic information like gender, age, and location prevents us from confirming if the data accurately represents the population.

  4. Current: The dataset having been collected between March, 2016 and May, 2016 makes it outdated for the current analysis as the users habit may be different now.

  5. Cited: Furberg, R., Brinton, J., Keating, M., & Ortiz, A. (2016). Crowd-sourced Fitbit datasets 03.12.2016-05.12.2016 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.53894

⚠️ Limitations in the dataset:

  1. According to the central limit theorem, a sample size of at least 30 is generally considered sufficient for the sample mean to approximate a normal distribution, allowing us to use the t-test for statistical analysis. However, a larger sample size would provide more reliable and robust insights, reducing the margin of error and increasing the confidence in the results.

  2. Further investigation with n_distinct() to check for unique user IDs showed a user count of 33 users data from daily activity and hourly steps, 24 from sleep, 14 from heart rate seconds and only 8 from weight. Also, the timeframe stated as 03.12.2016 to 05.12.2016 showed only 31 days upon verification which concludes that the data does not pass the integrity and credibility test.

  3. Out of the 8 users data for weight, 5 entered their weight manually, while 3 used a connected wifi device (e.g., wifi scale) to record their weight.

  4. Most data is recorded from Tuesday to Thursday. Interestingly, the sleep data, which mirrors this trend, raises questions about the data's comprehensiveness for accurate analysis.

⬆️ Back to Top

PROCESS

Why R and Tableau?

We have opted for R in RStudio for our data analysis due to its ability to provide in-depth statistical analysis, data manipulation, and generating complex insights from the data.
On the other hand for visualization we are using Tableau, another powerful tool employed for creating interactive and visually appealing dashboards, making it easier to explore and present the findings effectively to both internal and external stakeholders.
Together, they provide a comprehensive approach to analyzing and visualizing data.

1. The following CSV files were used for analysis:

dailyActivity_merged.csv
sleepDay_merged.csv
weightLogInfo_merged.csv
hourlySteps_merged.csv
heartrate_seconds_merged.csv

2. Examine the data of three main tables: daily_activity, sleep_day & weight and, check for NA as well as duplicate values and, eliminate them:

dim(daily_activity)
dim(sleep_day)
dim(weight)

sum(is.na(daily_activity))
sum(is.na(sleep_day))                 # We will leave the NA. 
sum(is.na(weight))                    # 65 NA values belongs to "Fat" data of different dates.

sum(duplicated(daily_activity))
sum(duplicated(sleep_day))
sum(duplicated(weight))              

sleep_day <- sleep_day[!duplicated(sleep_day), ]      # Eliminate the 3 duplicate values in the table `sleep_day`

3. Add a new column for the weekdays:

daily_activity <- daily_activity %>% mutate(Weekday = weekdays(as.Date(ActivityDate, "%m/%d/%Y")))

4. Check for the uniqueness of 30 users using n_distinct():

n_distinct(daily_activity$Id)
n_distinct(sleep_day$Id)
n_distinct(weight$Id)

The table has 33 unique users' data from daily_activity, 24 from sleep_day and only 8 from weight.

5. Check how the data was recorded in the weight table:

weight %>% 
  filter(IsManualReport == "True") %>% 
  group_by(Id) %>% 
  summarise("Manual Weight Report"=n()) %>%            # 5 users manually reported the weight whereas,
  distinct()                                           # 3 users reported it with a connected device - wifi connected scale

This leads to a question that, how often do users record their data?
To have additional insights, we used ggplot() to plot a bar graph which shows us that the users mostly record their data from Tuesdays to Thursdays. We can also notice that even though Mondays and Fridays are weekdays, there has been significantly lesser data recordings.

Alt text

6. Merge the three tables into a single data frame:

merged_v1 <- merge(daily_activity, sleep_day, by = c("Id"), all=TRUE)
merged_data <- merge(merged_v1, weight, by = c("Id"), all=TRUE)

⬆️ Back to Top

ANALYZE

1. Distribution of Active Minutes:

Alt text Alt text

With the help of a boxplot and a pie chart, we can see a very clear distribution of the four categories of active minutes:

  • 81.3% or 12 to 20 hours in sedentary,
  • 15.8% or 2 to 4.5 hours lightly active,
  • 1.11% fairly active and,
  • 1.74% very active.

The American Heart Association and the World Health Organization recommend at least 150 minutes of moderate-intensity activity or 75 minutes of vigorous activity, or a combination of both, each week.

This means we need to have a daily goal of 21.4 minutes being fairly active or 10.7 minutes being very active.

active_users <- daily_activity %>%
  filter(FairlyActiveMinutes >= 21.4 | VeryActiveMinutes >= 10.7) %>% 
  group_by(Id) %>% 
  count(Id) 

n_distinct(active_users)

As per our study, 30 users met fairly active minutes or very active minutes.

2. Total Steps taken:

Alt text

This animated line graph shows that users walked similar amount of steps during the month analyzed, with some pronounced peaks and minimums, which shows an unsteady trend.

Alt text Alt text

The bar graphs above shows that the users take the most steps between 5 PM to 7 PM and 12 PM to 2 PM for hourly steps and, most steps on Tuesdays for weekly steps.

3. Statistical Summary:

Check for Mean, Median, Min, Max of the dataset(s)

merged_data %>%
  dplyr::select(TotalSteps,
                TotalDistance,
                VeryActiveMinutes,
                FairlyActiveMinutes,
                LightlyActiveMinutes,
                SedentaryMinutes,
                Calories,
                TotalMinutesAsleep,
                TotalTimeInBed,
                WeightPounds,
                BMI
  ) %>%
  summary()

Average steps taken to burn 2103 calories in average is 9373. Average weight is 139.6 pound (63 kg) with average BMI, 24.42. Users tend to spend on an average 12 hours a day in sedentary, 4 hours being lightly active and, just 41 minutes a day being fairly to very active. Users sleep for 7 hours on an average per day. The average heartrate is 77 bpm.

4. Interesting Findings:

ggplot(data=daily_activity, aes(x=TotalSteps, y = Calories, color=SedentaryMinutes)) + 
  geom_point() + 
  labs(title="Calories Vs. Total Steps by Sedentary Minutes", 
       caption= "Google Data Analytics Capstone",
       subtitle = "Bellabeat data analysis case study - Period analyzed: 31 days - Users qty: 33", x="Total Steps") +
  theme(plot.title = element_text(size = 15), 
        plot.subtitle = element_text(size = 11)) +
  stat_smooth(method=lm) +
  scale_color_gradient(low="cornsilk", high="navy")
Alt text

We can see in the above plot that, there are some users who are in sedentary taking 10,000 steps yet they were able to burn over 1500 to 3000 calories !!

According to the Lancet Public Health study on daily steps and all-cause mortality, there is a decrease in the risk of mortality among adults aged 60 years and older with increasing number of steps per day until 6000–8000 steps per day and among adults younger than 60 years until 8000–10,000 steps per day.

Alt text Alt text

It is to be noted in the above two graphs that most data is concentrated on users who take about 2500 to 15000 steps a day and, users who burn 1500 to 3000 calories a day. These users spent between 8 to 11.5 hours in sedentary, 5 hours in lightly active and, 1 to 2 hour for fairly and very active.

⬆️ Back to Top

SHARE

ACT

Conclusion based on our analysis:

  • Sedentary behavior accounts for a substantial 81% of users' daily active minutes. On average, users spend 12 hours a day in sedentary, 4 hours being lightly active and, just 41 minutes a day being fairly to very active.

  • 54% of users who logged their sleep data spent 55 minutes awake in bed before falling asleep.

  • Most steps are taken by users between 12 PM to 2 PM and 5 PM to 7 PM. Sedentary users take fewer steps and burn 1500 to 3000 calories, while more active users take more steps but burn a similar amount of calories.

Marketing recommendations to expand globally:

  • To enhance data accuracy, we recommend that users utilize a WiFi-connected scale instead of manually entering their weight.

  • Help users reduce sedentary time by suggesting alternative activities and delivering articles on the health benefits of shorter sedentary periods through pop-up notifications.

  • The marketing team should stress the health benefits of exercise, highlighting how the watch helps users track progress, set daily goals, and receive activity reminders. By promoting its "Stay Active, Stay Healthy" feature, users will be encouraged to complete their daily activity rings and maintain a healthy lifestyle.

References:

  1. American Heart Association. "What Exercise Is Right for Me?" Go Red Get Fit, 2024. click here

  2. World Health Organization (WHO). "Physical Activity." Be Active, 2024. click here

  3. Paluch, Amanda E., et al. "Steps per day and all-cause mortality in middle-aged adults in the Coronary Artery Risk Development in Young Adults study." The Lancet Public Health, vol. 6, no. 11, 2021, pp. e787-e794. click here

  4. Banach, M., Lewek, J., Surma, S., Penson, P. E., Sahebkar, A., Martin, S. S., Bajraktari, G., Henein, M. Y., Reiner, Ž., Bielecka-Dąbrowa, A., Bytyçi, I., et al. "The association between daily step count and all-cause and cardiovascular mortality: a meta-analysis." European Journal of Preventive Cardiology, vol. 30, no. 18, Dec. 2023, pp. 1975-1985. click here

⬆️ Jump to Top