Hi, i tried running your code to see the output but it didn't work.
The errors:
The variable 'reduce' is not defined
the two inverted functions for documents the dictionary was not iterable.
do you have an idea why its not working?
I put in my code below as i am trying to build an inverted index similar to yours and also want to run a query by inputing words to search for. I also want to try the program just opening a text file for queries as well. any advise on how i can move forward
` from collections import Counter
import re #I used reg expressions to easily remove unwanted characters
import math
import buildindex
file = open('docs.txt', 'r')
.lower() returns a version with all upper case characters replaced with lower case characters.
tex = file.read().lower()
file.close()
replaces anything that is not a lowercase letter, a space, or an apostrophe with a space:
text = re.sub('[^a-z\ \']+', " ",tex)#For some reason, even though the text is in lower case, the code does't work unless i redo that condition
words = list(text.split()) # put text into an empty list using split()
Count = Counter(words) # counts the seperated words by assigning a number to them
Total = sum(Count.values()) #shows the total of all the words used!
print("Words in dictionary: " )
dictionary = {}
for i in words:
if i in dictionary :
dictionary[i] += 1
else:
dictionary[i] = 1
print(len(dictionary)) #checks for words used more than once and represents it as one word, then prints out the total
print(dictionary)
Take input
query = input(" Query : ")
`
Hi, i tried running your code to see the output but it didn't work.
The errors:
The variable 'reduce' is not defined
the two inverted functions for documents the dictionary was not iterable.
do you have an idea why its not working?
I put in my code below as i am trying to build an inverted index similar to yours and also want to run a query by inputing words to search for. I also want to try the program just opening a text file for queries as well. any advise on how i can move forward
` from collections import Counter
import re #I used reg expressions to easily remove unwanted characters
import math
import buildindex
file = open('docs.txt', 'r')
.lower() returns a version with all upper case characters replaced with lower case characters.
tex = file.read().lower()
file.close()
replaces anything that is not a lowercase letter, a space, or an apostrophe with a space:
text = re.sub('[^a-z\ \']+', " ",tex)#For some reason, even though the text is in lower case, the code does't work unless i redo that condition
words = list(text.split()) # put text into an empty list using split()
Count = Counter(words) # counts the seperated words by assigning a number to them
Total = sum(Count.values()) #shows the total of all the words used!
print("Words in dictionary: " )
dictionary = {}
for i in words:
print(len(dictionary)) #checks for words used more than once and represents it as one word, then prints out the total
print(dictionary)
Take input
query = input(" Query : ")
`