Building a simple multilingual spell-checker in Python

And serving it as a REST API

An example of spell-checking functionality from Google.

Introduction

This post describes how to build a simple multilingual spell-checker service in Python.

Approach

Given an input sentence, this service determines the language first, with an associated probability:

se

Process and set up

1. Language detection

sudo apt-get install libenchant-dev
sudo apt-get install hunspell-it hunspell-es hunspell-de-de hunspell-fr

Implementation

We begin with the definition of the objects we are going to use:

  • SpellCheck, composed by the suggested sentence and the similarity score.
  • Response: Language and SpellCheck objects, returned for a given text input.
  1. map_language_to_dict: maps the identified language to the corresponding Hunspell dictionary.
  2. spellcheck: given the input text and a Hunspell dictionary, identifies mispelled words and provides an alternate suggestion together with a measure of similarity.
  3. process_input_text: wrapper around the previous functions. Takes an input text and a minimum probability (default: 0.9) under which the identified language is rejected. It returns a Response object.

Let’s see how it works

We can qualitatively assess the behaviour of the process by analyzing some sentences from different languages, with different spelling mistakes:

Caveats

1. Language detection on short/equivocal sentences

from langdetect import DetectorFactory
DetectorFactory.seed = 0
d = enchant.DictWithPWL("en_GB", "custom_list.txt")

Providing a REST API

Finally, we make the service available as a web service using the Flask framework:

  1. Save the first code snippet (with the objects definition) as “model.py” inside the folder.
  2. Save the second code snippet (with the functions) as “suggester.py” inside the folder, and import the objects from the model file.
  3. Create the file “app.py” as follows:

Links

The project is available on GitHub, together with a Notebook that could be executed on Google Colab.

Data Scientist & philomath.