aboSamoor · cjrh · Apr 28, 2021
diff --git a/docs/CLI.rst b/docs/CLI.rst
@@ -21,7 +21,7 @@ command ``polyglot``
       -h, --help            show this help message and exit
       --lang LANG           Language to be processed
       --delimiter DELIMITER
-                            Delimiter that seperates documents, records or even sentences.
+                            Delimiter that separates documents, records or even sentences.
       --workers WORKERS     Number of parallel processes.
       -l LOG, --log LOG     log verbosity level
       --debug               drop a debugger if an exception is raised.
@@ -43,7 +43,7 @@ command ``polyglot``
 
 Notice that most of the operations are language specific. For example,
 tokenization rules and part of speech taggers differ between languages.
-Therefore, it is important that the lanaguage of the input is detected
+Therefore, it is important that the language of the input is detected
 or given. The ``--lang`` option allows you to tell polyglot which
 language the input is written in.
 
@@ -186,7 +186,7 @@ option ``workers``.
 Building Pipelines
 ------------------
 
-The previous subcommand ``count`` assumed that the words are separted by
+The previous subcommand ``count`` assumed that the words are separated by
 spaces. Given that we never tokenized the text file, that may result in
 suboptimal word counting. Let us take a closer look at the tail of the
 word counts

diff --git a/docs/Detection.rst b/docs/Detection.rst
@@ -46,7 +46,7 @@ Mixed Text
     """
 
 If the text contains snippets from different languages, the detector is
-able to find the most probable langauges used in the text. For each
+able to find the most probable languages used in the text. For each
 language, we can query the model confidence level:
 
 .. code:: python

diff --git a/docs/Download.rst b/docs/Download.rst
@@ -85,7 +85,7 @@ its name and the target language.
 
 Package name format is ``task_name.language_code``
 
-Langauge Collections
+Language Collections
 ^^^^^^^^^^^^^^^^^^^^
 
 Packages are grouped by language. For example, if we want to download
@@ -152,7 +152,7 @@ Therefore, we can just run:
 
 
 
-Langauge & Task Support
+Language & Task Support
 -----------------------
 
 We can query our download manager for which tasks are supported by

diff --git a/docs/Embeddings.rst b/docs/Embeddings.rst
@@ -29,7 +29,7 @@ Nearest Neighbors
 -----------------
 
 A common way to investigate the space capture by the embeddings is to
-query for the nearest neightbors of any word.
+query for the nearest neighbors of any word.
 
 .. code:: python
 

diff --git a/docs/NamedEntityRecognition.rst b/docs/NamedEntityRecognition.rst
@@ -96,7 +96,7 @@ We can query all entities mentioned in a text.
 
 
 
-Or, we can query entites per sentence
+Or, we can query entities per sentence
 
 .. code:: python
 

diff --git a/docs/Sentiment.rst b/docs/Sentiment.rst
@@ -111,7 +111,7 @@ is mentioned in text as the following:
     text = Text(blob)
 
 First, we need split the text into sentneces, this will limit the words
-tha affect the sentiment of an entity to the words mentioned in the
+that affect the sentiment of an entity to the words mentioned in the
 sentnece.
 
 .. code:: python

diff --git a/notebooks/CLI.ipynb b/notebooks/CLI.ipynb
@@ -34,7 +34,7 @@
       "  -h, --help            show this help message and exit\r\n",
       "  --lang LANG           Language to be processed\r\n",
       "  --delimiter DELIMITER\r\n",
-      "                        Delimiter that seperates documents, records or even sentences.\r\n",
+      "                        Delimiter that separates documents, records or even sentences.\r\n",
       "  --workers WORKERS     Number of parallel processes.\r\n",
       "  -l LOG, --log LOG     log verbosity level\r\n",
       "  --debug               drop a debugger if an exception is raised.\r\n",
@@ -65,7 +65,7 @@
    "source": [
     "Notice that most of the operations are language specific.\n",
     "For example, tokenization rules and part of speech taggers differ between languages.\n",
-    "Therefore, it is important that the lanaguage of the input is detected or given.\n",
+    "Therefore, it is important that the language of the input is detected or given.\n",
     "The `--lang` option allows you to tell polyglot which language the input is written in."
    ]
   },
@@ -284,7 +284,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The previous subcommand `count` assumed that the words are separted by spaces.\n",
+    "The previous subcommand `count` assumed that the words are separated by spaces.\n",
     "Given that we never tokenized the text file, that may result in suboptimal word counting.\n",
     "Let us take a closer look at the tail of the word counts"
    ]

diff --git a/notebooks/Detection.ipynb b/notebooks/Detection.ipynb
@@ -93,7 +93,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If the text contains snippets from different languages, the detector is able to find the most probable langauges used in the text.\n",
+    "If the text contains snippets from different languages, the detector is able to find the most probable languages used in the text.\n",
     "For each language, we can query the model confidence level:"
    ]
   },

diff --git a/notebooks/Download.ipynb b/notebooks/Download.ipynb
@@ -150,7 +150,7 @@
     "\n",
     "Package name format is `task_name.language_code`\n",
     "\n",
-    "#### Langauge Collections\n",
+    "#### Language Collections\n",
     "\n",
     "Packages are grouped by language. For example, if we want to download all the models that are specific to Arabic, the arabic collection of models name is **LANG:** followed by the language code of Arabic which is `ar`.\n",
     "\n",
@@ -238,7 +238,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Langauge & Task Support"
+    "## Language & Task Support"
    ]
   },
   {

diff --git a/notebooks/Embeddings.ipynb b/notebooks/Embeddings.ipynb
@@ -65,7 +65,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A common way to investigate the space capture by the embeddings is to query for the nearest neightbors of any word."
+    "A common way to investigate the space capture by the embeddings is to query for the nearest neighbors of any word."
    ]
   },
   {

diff --git a/notebooks/NamedEntityRecognition.ipynb b/notebooks/NamedEntityRecognition.ipynb
@@ -173,7 +173,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Or, we can query entites per sentence"
+    "Or, we can query entities per sentence"
    ]
   },
   {

diff --git a/notebooks/Sentiment.ipynb b/notebooks/Sentiment.ipynb
@@ -168,7 +168,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First, we need split the text into sentneces, this will limit the words tha affect the sentiment of an entity to the words mentioned in the sentnece."
+    "First, we need split the text into sentences, this will limit the words that affect the sentiment of an entity to the words mentioned in the sentence."
    ]
   },
   {

diff --git a/polyglot/downloader.py b/polyglot/downloader.py
@@ -193,7 +193,7 @@ def __init__(self, id, url, name=None, subdir='',
     """The task this package is serving."""
 
     self.language = language
-    """The langauge code this package belongs to."""
+    """The language code this package belongs to."""
 
     self.attrs = attrs
     """Extra attributes generated by Google Cloud Storage."""
@@ -396,7 +396,7 @@ class Downloader(object):
      task."""
 
   #/////////////////////////////////////////////////////////////////
-  # Cosntructor
+  # Constructor
   #/////////////////////////////////////////////////////////////////
 
   def __init__(self, server_index_url=None, source=None, download_dir=None):
@@ -808,7 +808,7 @@ def _update_index(self, url=None):
     """A helper function that ensures that self._index is
     up-to-date.  If the index is older than self.INDEX_TIMEOUT,
     then download it again."""
-    # Check if the index is aleady up-to-date.  If so, do nothing.
+    # Check if the index is already up-to-date.  If so, do nothing.
     if not (self._index is None or url is not None or
         time.time()-self._index_timestamp > self.INDEX_TIMEOUT):
       return

diff --git a/polyglot/mapping/base.py b/polyglot/mapping/base.py
@@ -18,7 +18,7 @@
 from ..utils import _open
 
 def count(lines):
-  """ Counts the word frequences in a list of sentences.
+  """ Counts the word frequencies in a list of sentences.
 
   Note:
     This is a helper function for parallel execution of `Vocabulary.from_text`

diff --git a/polyglot/mapping/embeddings.py b/polyglot/mapping/embeddings.py
@@ -201,7 +201,7 @@ def _from_word2vec_text(fname):
         except TypeError as e:
           parts = line.strip().split()
         except Exception as e:
-          logger.warning("We ignored line number {} because of erros in parsing"
+          logger.warning("We ignored line number {} because of errors in parsing"
                           "\n{}".format(line_no, e))
           continue
         # We differ from Gensim implementation.
@@ -263,7 +263,7 @@ def _from_glove(fname):
         except TypeError as e:
           parts = line.strip().split()
         except Exception as e:
-          logger.warning("We ignored line number {} because of erros in parsing"
+          logger.warning("We ignored line number {} because of errors in parsing"
                           "\n{}".format(line_no, e))
           continue
         # We deduce layer1_size because GloVe files have no header.

diff --git a/polyglot/mixins.py b/polyglot/mixins.py
@@ -119,7 +119,7 @@ def find(self, sub, start=0, end=sys.maxsize):
 
     def rfind(self, sub, start=0, end=sys.maxsize):
         '''Behaves like the built-in str.rfind() method. Returns an integer,
-        the index of he last (right-most) occurence of the substring argument
+        the index of he last (right-most) occurrence of the substring argument
         sub in the sub-sequence given by [start:end].
         '''
         return self._strkey().rfind(sub, start, end)
@@ -189,7 +189,7 @@ def join(self, iterable):
         return self.__class__(self._strkey().join(iterable))
 
     def replace(self, old, new, count=sys.maxsize):
-        """Return a new blob object with all the occurence of `old` replaced
+        """Return a new blob object with all the occurrence of `old` replaced
         by `new`.
         """
         return self.__class__(self._strkey().replace(old, new, count))
diff --git a/polyglot/tag/base.py b/polyglot/tag/base.py
@@ -60,7 +60,7 @@ def _load_network(self):
     raise NotImplementedError()
 
   def annotate(self, sent):
-    """Annotate a squence of words with entity tags.
+    """Annotate a sequence of words with entity tags.
 
     Args:
       sent: sequence of strings/words.

diff --git a/polyglot/transliteration/base.py b/polyglot/transliteration/base.py
@@ -19,8 +19,8 @@ class Transliterator(object):
   def __init__(self, source_lang="en", target_lang="en"):
     """
     Args:
-      source_lang (string): language code of the input langauge.
-      target_lang (string): language code of the generated output langauge.
+      source_lang (string): language code of the input language.
+      target_lang (string): language code of the generated output language.
     """
     self.source_lang = source_lang
     self.target_lang = target_lang
Original file line number	Diff line number	Diff line change
Expand Up		@@ -96,7 +96,7 @@ We can query all entities mentioned in a text.



		Or, we can query entites per sentence
		Or, we can query entities per sentence

		.. code:: python

Expand Down