Uncategorized

automation – how to blast every variables into model in Python (clustering)


in python code, how can i generate every possible clustering results at every possible combination of variables?

since variables are many, i have to automate it but i dont know how

i added “#HAS TO BE AUTOMATED” at the end of the code where i think automation is to be done

i appreciate your advices

sample data :
    id  age bp  sg  al  su  rbc 
0   0   48  80  1.020   1   0   1
1   1   7   50  1.020   4   0   1


id: unnecessary in modeling
age bp  sg  al  su : numeric
rbc : categorical
code: 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from kmodes import kprototypes

dataset = pd.read_csv(..)
df=dataset.copy()


#drop unnecessary columns
df.drop(columns=["id"],inplace=True)

#normalize
columns_to_normalize     = ['age','bp','sg','al', 'su'] #HAS TO BE AUTOMATED
df[columns_to_normalize] = df[columns_to_normalize].apply(lambda x: (x - x.mean()) / np.std(x))


#obtain array of values
data_array=df.values


#specify data types
data_array[:, 0:4] = data_array[:, 0:4].astype(float) #HAS TO BE AUTOMATED
data_array[:, 5] = data_array[:, 5].astype(str) #HAS TO BE AUTOMATED


#create untrained model
untrained_model = kprototypes.KPrototypes(n_clusters=2,max_iter=20)


#predict clusters
clusters = untrained_model.fit_predict(data_array, categorical=[5])

dataset["Cluster labels"]=clusters
print("The clustered data is:") 



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *