Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/CLI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ command ``polyglot``
-h, --help show this help message and exit
--lang LANG Language to be processed
--delimiter DELIMITER
Delimiter that seperates documents, records or even sentences.
Delimiter that separates documents, records or even sentences.
--workers WORKERS Number of parallel processes.
-l LOG, --log LOG log verbosity level
--debug drop a debugger if an exception is raised.
Expand All @@ -43,7 +43,7 @@ command ``polyglot``

Notice that most of the operations are language specific. For example,
tokenization rules and part of speech taggers differ between languages.
Therefore, it is important that the lanaguage of the input is detected
Therefore, it is important that the language of the input is detected
or given. The ``--lang`` option allows you to tell polyglot which
language the input is written in.

Expand Down Expand Up @@ -186,7 +186,7 @@ option ``workers``.
Building Pipelines
------------------

The previous subcommand ``count`` assumed that the words are separted by
The previous subcommand ``count`` assumed that the words are separated by
spaces. Given that we never tokenized the text file, that may result in
suboptimal word counting. Let us take a closer look at the tail of the
word counts
Expand Down
2 changes: 1 addition & 1 deletion docs/Detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Mixed Text
"""

If the text contains snippets from different languages, the detector is
able to find the most probable langauges used in the text. For each
able to find the most probable languages used in the text. For each
language, we can query the model confidence level:

.. code:: python
Expand Down
4 changes: 2 additions & 2 deletions docs/Download.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ its name and the target language.

Package name format is ``task_name.language_code``

Langauge Collections
Language Collections
^^^^^^^^^^^^^^^^^^^^

Packages are grouped by language. For example, if we want to download
Expand Down Expand Up @@ -152,7 +152,7 @@ Therefore, we can just run:



Langauge & Task Support
Language & Task Support
-----------------------

We can query our download manager for which tasks are supported by
Expand Down
2 changes: 1 addition & 1 deletion docs/Embeddings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Nearest Neighbors
-----------------

A common way to investigate the space capture by the embeddings is to
query for the nearest neightbors of any word.
query for the nearest neighbors of any word.

.. code:: python

Expand Down
2 changes: 1 addition & 1 deletion docs/NamedEntityRecognition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ We can query all entities mentioned in a text.



Or, we can query entites per sentence
Or, we can query entities per sentence

.. code:: python

Expand Down
2 changes: 1 addition & 1 deletion docs/Sentiment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ is mentioned in text as the following:
text = Text(blob)

First, we need split the text into sentneces, this will limit the words
tha affect the sentiment of an entity to the words mentioned in the
that affect the sentiment of an entity to the words mentioned in the
sentnece.

.. code:: python
Expand Down
6 changes: 3 additions & 3 deletions notebooks/CLI.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
" -h, --help show this help message and exit\r\n",
" --lang LANG Language to be processed\r\n",
" --delimiter DELIMITER\r\n",
" Delimiter that seperates documents, records or even sentences.\r\n",
" Delimiter that separates documents, records or even sentences.\r\n",
" --workers WORKERS Number of parallel processes.\r\n",
" -l LOG, --log LOG log verbosity level\r\n",
" --debug drop a debugger if an exception is raised.\r\n",
Expand Down Expand Up @@ -65,7 +65,7 @@
"source": [
"Notice that most of the operations are language specific.\n",
"For example, tokenization rules and part of speech taggers differ between languages.\n",
"Therefore, it is important that the lanaguage of the input is detected or given.\n",
"Therefore, it is important that the language of the input is detected or given.\n",
"The `--lang` option allows you to tell polyglot which language the input is written in."
]
},
Expand Down Expand Up @@ -284,7 +284,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The previous subcommand `count` assumed that the words are separted by spaces.\n",
"The previous subcommand `count` assumed that the words are separated by spaces.\n",
"Given that we never tokenized the text file, that may result in suboptimal word counting.\n",
"Let us take a closer look at the tail of the word counts"
]
Expand Down
2 changes: 1 addition & 1 deletion notebooks/Detection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If the text contains snippets from different languages, the detector is able to find the most probable langauges used in the text.\n",
"If the text contains snippets from different languages, the detector is able to find the most probable languages used in the text.\n",
"For each language, we can query the model confidence level:"
]
},
Expand Down
4 changes: 2 additions & 2 deletions notebooks/Download.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@
"\n",
"Package name format is `task_name.language_code`\n",
"\n",
"#### Langauge Collections\n",
"#### Language Collections\n",
"\n",
"Packages are grouped by language. For example, if we want to download all the models that are specific to Arabic, the arabic collection of models name is **LANG:** followed by the language code of Arabic which is `ar`.\n",
"\n",
Expand Down Expand Up @@ -238,7 +238,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Langauge & Task Support"
"## Language & Task Support"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion notebooks/Embeddings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A common way to investigate the space capture by the embeddings is to query for the nearest neightbors of any word."
"A common way to investigate the space capture by the embeddings is to query for the nearest neighbors of any word."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion notebooks/NamedEntityRecognition.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Or, we can query entites per sentence"
"Or, we can query entities per sentence"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion notebooks/Sentiment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we need split the text into sentneces, this will limit the words tha affect the sentiment of an entity to the words mentioned in the sentnece."
"First, we need split the text into sentences, this will limit the words that affect the sentiment of an entity to the words mentioned in the sentence."
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions polyglot/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ def __init__(self, id, url, name=None, subdir='',
"""The task this package is serving."""

self.language = language
"""The langauge code this package belongs to."""
"""The language code this package belongs to."""

self.attrs = attrs
"""Extra attributes generated by Google Cloud Storage."""
Expand Down Expand Up @@ -396,7 +396,7 @@ class Downloader(object):
task."""

#/////////////////////////////////////////////////////////////////
# Cosntructor
# Constructor
#/////////////////////////////////////////////////////////////////

def __init__(self, server_index_url=None, source=None, download_dir=None):
Expand Down Expand Up @@ -808,7 +808,7 @@ def _update_index(self, url=None):
"""A helper function that ensures that self._index is
up-to-date. If the index is older than self.INDEX_TIMEOUT,
then download it again."""
# Check if the index is aleady up-to-date. If so, do nothing.
# Check if the index is already up-to-date. If so, do nothing.
if not (self._index is None or url is not None or
time.time()-self._index_timestamp > self.INDEX_TIMEOUT):
return
Expand Down
2 changes: 1 addition & 1 deletion polyglot/mapping/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from ..utils import _open

def count(lines):
""" Counts the word frequences in a list of sentences.
""" Counts the word frequencies in a list of sentences.

Note:
This is a helper function for parallel execution of `Vocabulary.from_text`
Expand Down
4 changes: 2 additions & 2 deletions polyglot/mapping/embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ def _from_word2vec_text(fname):
except TypeError as e:
parts = line.strip().split()
except Exception as e:
logger.warning("We ignored line number {} because of erros in parsing"
logger.warning("We ignored line number {} because of errors in parsing"
"\n{}".format(line_no, e))
continue
# We differ from Gensim implementation.
Expand Down Expand Up @@ -263,7 +263,7 @@ def _from_glove(fname):
except TypeError as e:
parts = line.strip().split()
except Exception as e:
logger.warning("We ignored line number {} because of erros in parsing"
logger.warning("We ignored line number {} because of errors in parsing"
"\n{}".format(line_no, e))
continue
# We deduce layer1_size because GloVe files have no header.
Expand Down
4 changes: 2 additions & 2 deletions polyglot/mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def find(self, sub, start=0, end=sys.maxsize):

def rfind(self, sub, start=0, end=sys.maxsize):
'''Behaves like the built-in str.rfind() method. Returns an integer,
the index of he last (right-most) occurence of the substring argument
the index of he last (right-most) occurrence of the substring argument
sub in the sub-sequence given by [start:end].
'''
return self._strkey().rfind(sub, start, end)
Expand Down Expand Up @@ -189,7 +189,7 @@ def join(self, iterable):
return self.__class__(self._strkey().join(iterable))

def replace(self, old, new, count=sys.maxsize):
"""Return a new blob object with all the occurence of `old` replaced
"""Return a new blob object with all the occurrence of `old` replaced
by `new`.
"""
return self.__class__(self._strkey().replace(old, new, count))
2 changes: 1 addition & 1 deletion polyglot/tag/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def _load_network(self):
raise NotImplementedError()

def annotate(self, sent):
"""Annotate a squence of words with entity tags.
"""Annotate a sequence of words with entity tags.

Args:
sent: sequence of strings/words.
Expand Down
4 changes: 2 additions & 2 deletions polyglot/transliteration/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ class Transliterator(object):
def __init__(self, source_lang="en", target_lang="en"):
"""
Args:
source_lang (string): language code of the input langauge.
target_lang (string): language code of the generated output langauge.
source_lang (string): language code of the input language.
target_lang (string): language code of the generated output language.
"""
self.source_lang = source_lang
self.target_lang = target_lang
Expand Down