less than 1 minute read

Unsupervised Learning

Performed k-means clustering of Spotify’s dataset which contains 13 features related to each song: acousticness, danceability, duration_ms, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, and valence.

Step 1: Computed first 2 Principle Components (PC’s).

A pipeline was fitted which first standardizes the values w.r.t. Z-score of each metric, and then performs PCA. The factor loadings for each metric for the first 2 PC’s are as follows:

Step 2: Visualized various songs wrt the PC’s.

Step 3: Created pandas dataframes of the top-3 metrics w.r.t. each PC.

For PC 1:

For PC 2:

Step 4: Run K-means for 10 PC’s, and 5 clusters.

Generalization Performance