Skip to content

Conversation

@pasaunders
Copy link
Collaborator

No description provided.

"""Calcute the distance between two rows."""
dist = 0.0
for i in range(len(row1) - 1):
dist += (row1[i] - row2[i]) ** 2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing out on using the power of Numpy (or pandas) here to broadcast mathematical operations. If row1 and row2 are numpy arrays, then you could just have

return sqrt(np.sum((row1 - row2)**2))

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written this way to account for the difference in length between rows. Test data is submitted without a "classification" column. Present data has such columns.

src/knn.py Outdated

def predict(self, test_data, tk=None):
"""Given data, categorize the data by its k nearest neighbors."""
if tk is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tk?

src/knn.py Outdated
for row in self.data.iterrows():
distances.append((row[1][-1], self._distance(row[1], test_data)))
distances.sort(key=lambda x: x[1])
# import pdb; pdb.set_trace()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corpse code

src/knn.py Outdated
if my_class:
return my_class
else:
self.predict(test_data, tk - 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused as to why this has to be recursive

Copy link
Owner

@CCallahanIV CCallahanIV Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written for the case in which the classification is a "tie" between two classes. In that case, the classify function returns None and therefore predict is run once again with a decreased k value. This is based on my interpretation of the algorithm in the class notes. Does not mean I didn't interpret it incorrectly, though.

https://codefellows.github.io/sea-python-401d5/lectures/k_nearest_neighbors.html?highlight=nearest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants