Skip to content

Error with "Reads "term:frequency" from each subsequent line in the file" part of code #1

@GoogleCodeExporter

Description

@GoogleCodeExporter
What steps will reproduce the problem?
1. A term such as the following
<a href="http: /www.pamil-visions.net/author/laura/" title="posts by laura
spencer">

What is the expected output? 
In the line 
frequency = int(tokens[1].strip())
frequency should return a numner
What do you see instead?
ValueError: invalid literal for int() with base 10:
'/www.pamil-visions.net/author/laura/" title="posts by laura spencer">'

On what operating system?
Windows vista

I think to correct this you can do the following:
      # Reads "term:frequency" from each subsequent line in the file.
      for line in corpus_file:
        tokens = line.rpartition(":")
        term = tokens[0].strip()        
        frequency = int(tokens[2].strip())
        self.term_num_docs[term] = frequency

Original issue reported on code.google.com by jsaucedo@gmail.com on 23 Aug 2009 at 7:13

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions