top of page

Music Analysis

About

I have a problem of listening to too many songs, I always run out of songs pretty fast. This project was born as a solution to that.
I collected by personal music data, using spotipy API I was able to get the songs' components, using this I was able to analyse and create a criteria for my personal music preference. Using this criteria and a bigger (about half a million rows) dataset, I predicted about 2000+ new songs that I would like.

Go to..

Dataset

Project Code

Libraries 

image.png

Head

image.png

Body

Input

print(df1.head(2))

print(df1['Artist'].mode())

print(df1['Artist'].value_counts(['BTS'])

total_count = df1['Artist'].str.count('Post').sum()

print(total_count)

print(len(df1))


avg={}
valueColumns=['Danceability','Energy','Loudness','Speechiness','Acousticness','Instrumentalness','Liveness','Tempo','Duration_ms']


avg_values=df1[valueColumns].mean()
for a,b in zip(avg_values,valueColumns):
    avg[b]=a
avg['Key']=df1['Key'].mode()[0]
#avg['Artist']=df1['Artist'].mode()  
print('avg: ')  
print(avg)

 

Output

image.png

Input

#Analysing dataset attributes

for i in valueColumns:
    print(i,":")
    sns.catplot(x=df1.index,y=df1[i],data=df1)
    plt.show()

Output

Input

#Analysing dataset attributes

for i in valueColumns:
    print(i,':')
    #font={'size':10}
    #plt.rc('font',**font)
    x = np.arange(min(df1[i]), max(df1[i]), 0.001)

#create range of y-values that correspond to normal pdf with mean=0 and sd=1 
    y = norm.pdf(x,df1[i].mean(),stat.stdev(df1[i]))

#define plot 
    fig, ax = plt.subplots(figsize=(5,5))
    ax.plot(x,y)

#choose plot style and display the bell curve 
    plt.style.use('fivethirtyeight')
    plt.show()

Output

Input

#Printing artists Ranking:
print(df1['Artist'].value_counts().idxmax())
a=0
df1_1=df1[df1['Artist']!=df1['Artist'].value_counts().idxmax()]
while a<5:
    a=a+1
    print(df1_1['Artist'].value_counts().idxmax())
    df1_2=df1_1[df1_1['Artist']!=df1_1['Artist'].value_counts().idxmax()]
    print(df1_2['Artist'].value_counts().idxmax())
    df1_1=df1_2[df1_2['Artist']!=df1_2['Artist'].value_counts().idxmax()]

#Using the current data about me and trying to find songs that i might like
crit_max=copy.deepcopy(avg)
for i in crit_max.keys():
    crit_max[i]=crit_max[i]+stat.stdev(df1[i])
print("crit_max:")
print(crit_max)
print('')
crit_min=copy.deepcopy(avg)
for i in crit_min.keys():
    crit_min[i]=crit_min[i]-stat.stdev(df1[i])
print('crit_min')
print(crit_min)


#Plotting correlation between variables

def Pearson_correlation(X,Y):
    if len(X)==len(Y):
        Sum_xy = sum((X-X.mean())*(Y-Y.mean()))
        Sum_x_squared = sum((X-X.mean())**2)
        Sum_y_squared = sum((Y-Y.mean())**2)       
        corr = Sum_xy / np.sqrt(Sum_x_squared * Sum_y_squared)
    return corr

a=1
while a<5:
    for idx_i, i in enumerate(valueColumns):
        for z in valueColumns[idx_i + 1:]:
            x=copy.deepcopy(pd.Series(df1[i].values))
            y=copy.deepcopy(pd.Series(df1[z].values))

            correlation = y.corr(x)
            plt.scatter(x, y)

 

            plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))
                     (np.unique(x)), color='red')
            plt.xlabel(i)
            plt.ylabel(z)
            plt.show()
            cor=Pearson_correlation(x, y)
            print(cor)
            ideal_correlation[i+'-'+z]=cor
            a=a+1

 

Output

image.png

*Only some of the graphs are shown

These are not all the coorelations, but some, the highest being 0.7 between loudness and dancibility which makes sense and lowest being alsot 0 between duration and liveleness which aloso makes sense

Input

#Working on prediction
df_predict=pd.DataFrame()
for i in valueColumns:
    condition = (df[i] > crit_min[i]) & (df[i] < crit_max[i])
    df_predict[i] = df.loc[condition, i].reset_index(drop=True)
print(df_predict)
print(len(df_predict))
print(len(df))


# In[20]:

condition = pd.Series(True, index=df.index)
df_predict = pd.DataFrame()

for i in valueColumns:
    condition &= (df[i] > crit_min[i]) & (df[i] < crit_max[i])
    output = df.loc[condition, i].index.tolist()

df_predict=df.iloc[output].reset_index(drop=True)
# Remove rows from df_predict that are present in df1
df_predict = df_predict.merge(df1, how='left', indicator=True).query('_merge == "left_only"').drop(columns=['_merge']).reset_index(drop=True)

print(df_predict.head())

#Exporting df_predict to a csv

df_predict.to_csv('predictedsongs.csv',index=False)

print('df_predict: ')
print(len(df_predict))
print(len(df))

Output

This considers the criteria and then using that predicts new songs, and finally sends to a new csv file

 

Creates about 5500 songs

 

image.png

Skills Demonstrated in This Project

Data Analysis

  • Data Cleaning & Preprocessing: pandas (read_csv, filtering, .mean(), .mode(), .value_counts())

  • Statistical Analysis: numpy, statistics (mean, stdev, describe)

  • Correlation & Similarity Analysis: Custom Pearson correlation function, .corr(), deepcopy
     

Data Visualization

  • Distribution & Trend Analysis: seaborn.catplot, matplotlib.pyplot

  • Scatter Plots with Trend Lines: plt.scatter, np.polyfit

  • Probability Distribution: scipy.stats.norm.pdf
     

Libraries Used

  • pandas, numpy, seaborn, matplotlib, scipy.stats, statistics, copy

Conclusions

-This project was all about using data analytics to explore my music preferences and predict songs that might be a good match. By breaking down song features like danceability, energy, and loudness, I identified trends and patterns. Using statistical methods like averages, standard deviations, and correlation analysis, I built a model to find songs that closely align with my taste.​ I love data because it solves real problems like mine. Beyond that this project helped me test and even develop valuable skills in data wrangling, predictive modeling, and statistical analysis.

 

There’s plenty of room for improvement, like integrating machine learning models to refine predictions, incorporating more behavioral data.

 

Overall, this was a fun and insightful dive into how data analytics can turn raw information into meaningful, actionable insights. In part II, I analyze another dataset, find songs both parties would like and finally make a visual dashboard.

Go To The Top

bottom of page