View Notebook
Cluster Analysis
Project Overview
This project is based on a survey concerning mobility and the environment. The survey has been conducted to find out respondents’ relative preferences on attributes/features when buying a new car.
The goal is to understand the market segments in terms of car preferences, with a specific focus on environmentally friendly cars. Specifically, we aim to gain insights into:
How many different market segments are there in the market? and
What are the most important car attributes for these segments?
Additionally, we also want to find the potential for eco-friendly vehicles.
Outline
Part 1: Data Collection, Data Preparation, and Exploration
Variable Description
Data Cleaning
Visualization
Part 2: Segmentation Analysis (Clustering)
Method 1: Hierarchical Clustering
Method 2: K-Means
Method 3: Model-Based Clustering
Part 3: Segment Profiling
Characterize each segment in terms of demographics and preferences.
Part 4: Validating Cluster Solutions
Part 5: Conclusions
Part1: Data Descriptions
The dataset contains the responses of a sample of 420 readers of the newspaper consisting of:
1) Respondent ID
2) Gender : (1) = female (2) = male (3) = other/unknown
3) Age: continuous variable
4) Education: [ 1 ] = high school profession-oriented degree [ 2 ]= high school theory-oriented degree [ 3 ] = higher education non-university degree [ 4 ] = university degree [ 5 ] = other
5) Area: [ 1 ] = metropolitan [ 2 ] = urban [ 3 ] = suburban [ 4 ] = countryside
6) Mileage: How far should you be able to drive with a full gas tank/battery? Scale 1-7, 1 = low mileage is ok, 7 = high mileage is desired
7) Power: How much power should the car have? Scale 1-7, 1 = low power is ok, 7 = high power is desired
8) Design: How fashionable should the car be? Scale 1-7, 1 = low design is ok, 7 = high design is desired
9) Comfort: How comfortable should the car be? Scale 1-7, 1 = low comfort is ok, 7 = high comfort is desired
10) Entertainment: How developed should the in-car entertainment facilities be? Scale 1-7, 1 = low entertainment is ok, 7 = high entertainment is desired
11) Environment: How environmentally friendly should the car be? Scale 1-7, 1 = low environmental-friendliness is ok, 7 = high environmental-friendliness is desired
Key Findings:
Based on the sample of 420 readers of the newspaper on their relative preferences on several car attributes, we can distinguish them into three segments based on
Segmentation Clarity: Hierarchical clustering of 420 newspaper readers' car preferences effectively distinguishes three well-defined segments, corroborated by the analysis of segment membership and dendrogram patterns.
Optimal Cluster Selection: Transitioning from four to three segments incurs minimal loss of detail, yielding a more manageable and cost-effective segmentation strategy.
Differentiation Validation: K-means clustering with three clusters confirms distinct differentiation among the segments than other numbers of clustering, reinforcing the segmentation approach.
Part 2: Segmentation Analysis
Method: We have used 3 types of clustering methods to find a proposed cluster solution
Research purpose: identifying "customer segment" according to their preference for "car's attributes."
Active Variables: Car attributes including " Mileage, Power, Design, Comfort, Entertainment, Environment."
Passive Variables: Customer demographic including "Gender, Age, Education, Area."
According to the "Cluster Dendrogram", we can divide people into 3 segments based on their preferences.
On top of that, choosing 3 segments is more persuasive and economical than other number of clusters when looking at the result from the "elbow method".
The "Ward" method of clustering is a useful way to group people with similar preferences as it generates clusters by trying to minimize the within-cluster variance.
The figure show a cluster plot created with clusplot() for 3 group solution from kmeans()
The 3 group are modestly differentiated and are clearly differentiated on certain key variables.
Result from BIC criterion suggests that the two best models from EEV are 5 and 4 clusters.
However, when comparing the results with other methods, we choose "3 clusters" based on the "Dendogram"
Key Findings: The demographic of people in each customer segments concerning gender, age, education and area are shown below. The table summarized statistic from R code. The bar graph below shows the total size and proportion (number) of customers for each variables in specific segment.
3.1 Segment - (socio-demographics)
Part 3: Segment Profiling
Key Findings: The result from hierarchical clustering (WARD method) is used to conclude the customer profile in each segment. The boxplot is combined with the statistic results to get an overview of the significance/impact of each active variable on a specific segment.
3.2 Segment - (segment preferences)
4.1 Test for Significance of Active Variables
Key Findings: For all active variables, we test for the difference between clusters utilizing ANOVA (H0 = average are the same across clusters). The results show a significant p-value (p < 0.05) for all active variables (mileage, power, design, comfort, entertainment, and environment), indicating that all averages are not the same between-group variation (reject H0). The groups significantly differ with regard to each active variable. The table below illustrates more on where the difference can be found.
Part 4: Validating Cluster Solutions
4.2 Test for Difference between Groups
Key Findings:
Segment 1,2, and 3 are significantly different from each other regarding their average level of mileage, comfort, and entertainment.
Segment2 is different from other groups(1 & 3) in its average level of power.
Segment 3 differs from other groups (1&2) in its average level of design.
Segment 1 differs from others (2&3) in its average level of environment.
Implication of the Results for the market-potential of environmentally-friendly cars
The result shows that among three customer segments based on their car's attributes preference, segment 3 is the potential buyers for environmentally-friendly vehicles. Among the three groups, segment3 is the smallest group that shows significant differences in demographic and preferences from others. Most target consumers in this group are primarily female (72%) who live in metropolitan (76%). More than 70% of people in this group are also younger than 44 years old and have a higher level of education than people from other segments. When buying new cars, the top 3 purchase decisions are 1) design, 2) entertainment, and 3) environment. Therefore, if the company or brand wants to promote the environmentally-friendly car, they need to research the car design preference of people in this group and the entertainment function they need. Then the "eco-friendly" attribute can be used to differentiate the company's car from other brands and attract this group of customers.
As most people in this target segment live in the city, the company can make them more aware of environmental problems and stimulate a positive attitude toward using eco-friendly products. Moreover, people in this group are highly educated and already have some knowledge and awareness of environmental factors. Therefore, the company can create a green marketing campaign and some incentive programs that drive more awareness and enhance the understanding of people in this group to make them toward a purchase decision.