CSAL - Finals (Clustering)

HD-wallpaper-python-amoled-coding-coding-dark-dark-programming-python-sky-universe_edited.

CLUSTERING

DATA PREPARATION

We begin by loading the Iris dataset and performing exploratory data analysis to understand the structure and distribution of the data. The Iris dataset is a well-known dataset that contains measurements for 150 iris flowers from three different species.

CLUSTERING

FEATURE ENGINEERING

The features are standardized using StandardScaler to ensure that they have a mean of 0 and a standard deviation of 1. This step is crucial for clustering algorithms like K-Means to perform well.

CLUSTERING

MODELING

K-Means clustering is applied with 3 clusters. The choice of 3 clusters is based on the known number of classes in the Iris dataset. We visualize the clusters to understand how well the algorithm has performed.

CLUSTERING

MODEL EVALUATION

The silhouette score is calculated to evaluate the clustering performance. A higher silhouette score indicates better-defined clusters.

REFLECTION

Through this exercise, I gained practical experience in implementing machine learning models in Python. I learned how to preprocess data, engineer features, and build and evaluate models. The clustering task enhanced my understanding of unsupervised learning, while the Association Rule Mining task introduced me to a different aspect of data mining and pattern discovery. These projects reinforced concepts covered in the course, such as data preprocessing and model evaluation techniques. Overall, this project helped me broaden my understanding of machine learning and its applications in real-world scenarios.

CODE IMPLEMENTATION

JUPYTER NOTEBOOK

Download