Language model
A language model is a probability distribution that describes how frequent an occurrence of a particular sequence of words is. This modeling is nowadays used in the various applications of Natural Language Processing, such as machine translation, speech recognition, part-of-speech tagging, parsing and others.
Word embeddings
The Sketch Engine team prepared word embeddings, language models trained using fastText from the multi-billion-word corpora available in Sketch Engine. In a nutshell, the embedding means a word vector which describes word relations described by numbers (lengths) and directions.
Try word embeddings
Practical examples of using word embeddings include creating a thesaurus or word analogy (finding similar relations on the same principle, e.g. king – man, queen – woman). See the example from our embeddings viewer for the query king -man +woman (you will get queen, princess, …).
Language models for download are available for:
- English (Modern English, Early Modern English)
- Arabic
- Chinese
- Czech
- Danish
- French
- German
- Italian
- Korean
- Portuguese
- Russian
- Spanish (American, European)
on attributes lemma or word form.