View Notebook

Cluster Analysis

Project Overview

This project is based on a survey concerning mobility and the environment. The survey has been conducted to find out respondents’ relative preferences on attributes/features when buying a new car.

The goal is to understand the market segments in terms of car preferences, with a specific focus on environmentally friendly cars. Specifically, we aim to gain insights into:

How many different market segments are there in the market? and

What are the most important car attributes for these segments?

Additionally, we also want to find the potential for eco-friendly vehicles.

Outline

Part 1: Data Collection, Data Preparation, and Exploration

  • Variable Description

  • Data Cleaning

  • Visualization

Part 2: Segmentation Analysis (Clustering)

  • Method 1: Hierarchical Clustering

  • Method 2: K-Means

  • Method 3: Model-Based Clustering

Part 3: Segment Profiling

  • Characterize each segment in terms of demographics and preferences.

Part 4: Validating Cluster Solutions

Part 5: Conclusions

Part1: Data Descriptions

The dataset contains the responses of a sample of 420 readers of the newspaper consisting of:

1) Respondent ID

2) Gender : (1) = female (2) = male (3) = other/unknown

3) Age: continuous variable

4) Education: [ 1 ] = high school profession-oriented degree [ 2 ]= high school theory-oriented degree [ 3 ] = higher education non-university degree [ 4 ] = university degree [ 5 ] = other

5) Area: [ 1 ] = metropolitan [ 2 ] = urban [ 3 ] = suburban [ 4 ] = countryside

6) Mileage: How far should you be able to drive with a full gas tank/battery? Scale 1-7, 1 = low mileage is ok, 7 = high mileage is desired

7) Power: How much power should the car have? Scale 1-7, 1 = low power is ok, 7 = high power is desired

8) Design: How fashionable should the car be? Scale 1-7, 1 = low design is ok, 7 = high design is desired

9) Comfort: How comfortable should the car be? Scale 1-7, 1 = low comfort is ok, 7 = high comfort is desired

10) Entertainment: How developed should the in-car entertainment facilities be? Scale 1-7, 1 = low entertainment is ok, 7 = high entertainment is desired

11) Environment: How environmentally friendly should the car be? Scale 1-7,  1 = low environmental-friendliness is ok, 7 = high environmental-friendliness is desired

Key Findings:

Based on the sample of 420 readers of the newspaper on their relative preferences on several car attributes, we can distinguish them into three segments based on

  1. Segmentation Clarity: Hierarchical clustering of 420 newspaper readers' car preferences effectively distinguishes three well-defined segments, corroborated by the analysis of segment membership and dendrogram patterns.

  2. Optimal Cluster Selection: Transitioning from four to three segments incurs minimal loss of detail, yielding a more manageable and cost-effective segmentation strategy.

  3. Differentiation Validation: K-means clustering with three clusters confirms distinct differentiation among the segments than other numbers of clustering, reinforcing the segmentation approach.

Part 2: Segmentation Analysis

Method: We have used 3 types of clustering methods to find a proposed cluster solution

Research purpose: identifying "customer segment" according to their preference for "car's attributes."

Active Variables: Car attributes including " Mileage, Power, Design, Comfort, Entertainment, Environment."

Passive Variables: Customer demographic including "Gender, Age, Education, Area."

  • According to the "Cluster Dendrogram", we can divide people into 3 segments based on their preferences.

  • On top of that, choosing 3 segments is more persuasive and economical than other number of clusters when looking at the result from the "elbow method".

  • The "Ward" method of clustering is a useful way to group people with similar preferences as it generates clusters by trying to minimize the within-cluster variance.

  • The figure show a cluster plot created with clusplot() for 3 group solution from kmeans()

  • The 3 group are modestly differentiated and are clearly differentiated on certain key variables.

  • Result from BIC criterion suggests that the two best models from EEV are 5 and 4 clusters.

  • However, when comparing the results with other methods, we choose "3 clusters" based on the "Dendogram"

Key Findings: The demographic of people in each customer segments concerning gender, age, education and area are shown below. The table summarized statistic from R code. The bar graph below shows the total size and proportion (number) of customers for each variables in specific segment.

3.1 Segment - (socio-demographics)

Part 3: Segment Profiling

Key Findings: The result from hierarchical clustering (WARD method) is used to conclude the customer profile in each segment. The boxplot is combined with the statistic results to get an overview of the significance/impact of each active variable on a specific segment.

3.2 Segment - (segment preferences)

4.1 Test for Significance of Active Variables

Key Findings: For all active variables, we test for the difference between clusters utilizing ANOVA (H0 = average are the same across clusters). The results show a significant p-value (p < 0.05) for all active variables (mileage, power, design, comfort, entertainment, and environment), indicating that all averages are not the same between-group variation (reject H0). The groups significantly differ with regard to each active variable. The table below illustrates more on where the difference can be found.

Part 4: Validating Cluster Solutions

4.2 Test for Difference between Groups

Key Findings:

  • Segment 1,2, and 3 are significantly different from each other regarding their average level of mileage, comfort, and entertainment.

  • Segment2 is different from other groups(1 & 3) in its average level of power.

  • Segment 3 differs from other groups (1&2) in its average level of design.

  • Segment 1 differs from others (2&3) in its average level of environment.

Implication of the Results for the market-potential of environmentally-friendly cars

The result shows that among three customer segments based on their car's attributes preference, segment 3 is the potential buyers for environmentally-friendly vehicles. Among the three groups, segment3 is the smallest group that shows significant differences in demographic and preferences from others. Most target consumers in this group are primarily female (72%) who live in metropolitan (76%). More than 70% of people in this group are also younger than 44 years old and have a higher level of education than people from other segments. When buying new cars, the top 3 purchase decisions are 1) design, 2) entertainment, and 3) environment. Therefore, if the company or brand wants to promote the environmentally-friendly car, they need to research the car design preference of people in this group and the entertainment function they need. Then the "eco-friendly" attribute can be used to differentiate the company's car from other brands and attract this group of customers.

As most people in this target segment live in the city, the company can make them more aware of environmental problems and stimulate a positive attitude toward using eco-friendly products. Moreover, people in this group are highly educated and already have some knowledge and awareness of environmental factors. Therefore, the company can create a green marketing campaign and some incentive programs that drive more awareness and enhance the understanding of people in this group to make them toward a purchase decision.

Part 5: Conclusions