0 points

Removing Punctuation · Issue #80 · mammothb/symspellpy


dashboard

Fri Oct 09 2020 18:57:48 GMT+0000 (UTC)

Posted by @mzeid #python

import pkg_resources, string
from symspellpy import SymSpell, Verbosity

spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
dictionary_path = pkg_resources.resource_filename('symspellpy', 'frequency_dictionary_en_82_765.txt')
spell.load_dictionary(dictionary_path, term_index=0, count_index=1)

def correct(w):
  word = w
  o = spell.lookup(w,
    Verbosity.CLOSEST,
    max_edit_distance=2,
    transfer_casing=True)
  if not o: return w
  word = o[0].term
  if w[0].isupper():
    word = word[0].upper() + ''.join(word[1:])
  # find start punctuation
  start_idx = 0
  start_punct = ''
  while w[start_idx] in string.punctuation:
    start_punct += w[start_idx]
    if start_idx + 1 < len(w):
      start_idx += 1
    else:
      break
  # find end punctuation
  end_idx = 1
  end_punct = ''
  while w[-end_idx] in string.punctuation:
    end_punct += w[-end_idx]
    if end_idx - 1 > 0:
      end_idx -= 1
    else:
      break
  return start_punct + word + end_punct

s = '''Now that we have carried our geographical analogy quite far, we return to the uestion of isomorphisms between brains. You might well wonder why this whole uestion of brain isomorphisms has been stressed so much. What does it matter if two rains are isomorphic, or quasi-isomorphic, or not isomorphic at all? The answer is that e have an intuitive sense that, although other people differ from us in important ways, hey are still "the same" as we are in some deep and important ways. It would be nstructive to be able to pinpoint what this invariant core of human intelligence is, and hen to be able to describe the kinds of "embellishments" which can be added to it, aking each one of us a unique embodiment of this abstract and mysterious quality alled "intelligence".'''
cleaned = ' '.join([correct(w) for w in s.split()])
print(cleaned)
content_copy Copy

I needed to leave punctuation, and essentially just re-added it to my words after correcting their spelling. Remove/Restore punctuation.

https://github.com/mammothb/symspellpy/issues/80