Conversation
bda602_hw1/hw1.py
Outdated
| # y = species | ||
|
|
||
|
|
||
| from sklearn.preprocessing import StandardScaler |
There was a problem hiding this comment.
I think it's better to add all the import commands at the beginning of the code, in one place
bda602_hw1/hw1.py
Outdated
| datafile ="http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" | ||
| #c=pd.read_csv(datafile) | ||
| #datafile2 = "/mnt/C:/Users/thoma/OneDrive/Documents/bda602/hw1/bezdekIris.data" | ||
| #print(datafile) |
There was a problem hiding this comment.
This prints the URL not the dataframe, it should be print(c). anyway it's not important just a note
bda602_hw1/hw1.py
Outdated
|
|
||
| print(iris_data.head()) | ||
|
|
||
| # def petal_func(columnname): |
There was a problem hiding this comment.
iris_data.describe() is much more convenient to get all statistics
bda602_hw1/hw1.py
Outdated
| from sklearn.model_selection import train_test_split | ||
| species = iris_data['species'] | ||
| data_new = iris_data.drop(columns=['species']) | ||
| xtrain, xtest, ytrain, ytest = train_test_split(data_new, test_size=0.2) |
There was a problem hiding this comment.
I am not sure if this is running for you, but wouldn't run for me. So I noticed you created species and actually never used it. I changed this line to xtrain, xtest, ytrain, ytest = train_test_split(data_new, species, test_size=0.2) and it works now
bda602_hw1/hw1.py
Outdated
| print(X_train) | ||
|
|
||
| # from sklearn.ensemble import RandomForestClassifier | ||
| # xtrain, xtest = train_test_split(data_new, test_size=0.2,random_state=123) |
There was a problem hiding this comment.
Was there any reason you again divided data into test and train?
Bita look at this