-
Notifications
You must be signed in to change notification settings - Fork 2
Knn #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Knn #21
Conversation
…tructures into decision-tree
| """Calcute the distance between two rows.""" | ||
| dist = 0.0 | ||
| for i in range(len(row1) - 1): | ||
| dist += (row1[i] - row2[i]) ** 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing out on using the power of Numpy (or pandas) here to broadcast mathematical operations. If row1 and row2 are numpy arrays, then you could just have
return sqrt(np.sum((row1 - row2)**2))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Written this way to account for the difference in length between rows. Test data is submitted without a "classification" column. Present data has such columns.
src/knn.py
Outdated
|
|
||
| def predict(self, test_data, tk=None): | ||
| """Given data, categorize the data by its k nearest neighbors.""" | ||
| if tk is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's tk?
src/knn.py
Outdated
| for row in self.data.iterrows(): | ||
| distances.append((row[1][-1], self._distance(row[1], test_data))) | ||
| distances.sort(key=lambda x: x[1]) | ||
| # import pdb; pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corpse code
src/knn.py
Outdated
| if my_class: | ||
| return my_class | ||
| else: | ||
| self.predict(test_data, tk - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused as to why this has to be recursive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Written for the case in which the classification is a "tie" between two classes. In that case, the classify function returns None and therefore predict is run once again with a decreased k value. This is based on my interpretation of the algorithm in the class notes. Does not mean I didn't interpret it incorrectly, though.
https://codefellows.github.io/sea-python-401d5/lectures/k_nearest_neighbors.html?highlight=nearest
No description provided.