in python code, how can i generate every possible clustering results at every possible combination of variables?
since variables are many, i have to automate it but i dont know how
i added “#HAS TO BE AUTOMATED” at the end of the code where i think automation is to be done
i appreciate your advices
sample data :
id age bp sg al su rbc
0 0 48 80 1.020 1 0 1
1 1 7 50 1.020 4 0 1
id: unnecessary in modeling
age bp sg al su : numeric
rbc : categorical
code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from kmodes import kprototypes
dataset = pd.read_csv(..)
df=dataset.copy()
#drop unnecessary columns
df.drop(columns=["id"],inplace=True)
#normalize
columns_to_normalize = ['age','bp','sg','al', 'su'] #HAS TO BE AUTOMATED
df[columns_to_normalize] = df[columns_to_normalize].apply(lambda x: (x - x.mean()) / np.std(x))
#obtain array of values
data_array=df.values
#specify data types
data_array[:, 0:4] = data_array[:, 0:4].astype(float) #HAS TO BE AUTOMATED
data_array[:, 5] = data_array[:, 5].astype(str) #HAS TO BE AUTOMATED
#create untrained model
untrained_model = kprototypes.KPrototypes(n_clusters=2,max_iter=20)
#predict clusters
clusters = untrained_model.fit_predict(data_array, categorical=[5])
dataset["Cluster labels"]=clusters
print("The clustered data is:")