What steps will reproduce the problem?
1. A term such as the following
<a href="http: /www.pamil-visions.net/author/laura/" title="posts by laura
spencer">
What is the expected output?
In the line
frequency = int(tokens[1].strip())
frequency should return a numner
What do you see instead?
ValueError: invalid literal for int() with base 10:
'/www.pamil-visions.net/author/laura/" title="posts by laura spencer">'
On what operating system?
Windows vista
I think to correct this you can do the following:
# Reads "term:frequency" from each subsequent line in the file.
for line in corpus_file:
tokens = line.rpartition(":")
term = tokens[0].strip()
frequency = int(tokens[2].strip())
self.term_num_docs[term] = frequency
Original issue reported on code.google.com by jsaucedo@gmail.com on 23 Aug 2009 at 7:13
Original issue reported on code.google.com by
jsaucedo@gmail.comon 23 Aug 2009 at 7:13