whatlanggo alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view whatlanggo alternatives based on common mentions on social networks and blogs.
-
prose
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction. -
gse
Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. -
spaGO
Self-contained Machine Learning and Natural Language Processing library in Go -
kagome
Self-contained Japanese Morphological Analyzer written in pure Go -
nlp
[UNMANTEINED] Extract values from strings and fill your structs with nlp. -
universal-translator
:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules -
locales
:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
RAKE.go
A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE) -
go-nlp
Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
textcat
A Go package for n-gram based text categorization, with support for utf-8 and raw text -
go-localize
i18n (Internationalization and localization) engine written in Go, used for translating locale strings. -
stemmer
Stemmer packages for Go programming language. Includes English, German and Dutch stemmers. -
petrovich
Golang port of Petrovich - an inflector for Russian anthroponyms. -
paicehusk
Golang implementation of the Paice/Husk Stemming Algorithm -
go-tinydate
A tiny date object in Go. Tinydate uses only 4 bytes of memory -
golibstemmer
Go bindings for the snowball libstemmer library including porter 2 -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation) -
spreak
Flexible translation and humanization library for Go, based on the concepts behind gettext. -
gosentiwordnet
💬 Sentiment analyzer library using SentiWordnet in Go
Clean code begins in your IDE with SonarLint
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of whatlanggo or a related project?
README
Whatlanggo
Natural language detection for Go.
Features
- Supports 84 languages
- 100% written in Go
- No external dependencies
- Fast
- Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
Getting started
Installation:
go get -u github.com/abadojack/whatlanggo
Simple usage example:
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main() {
info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}
Blacklisting and whitelisting
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main() {
//Blacklist
options := whatlanggo.Options{
Blacklist: map[whatlanggo.Lang]bool{
whatlanggo.Ydd: true,
},
}
info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)
fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])
//Whitelist
options1 := whatlanggo.Options{
Whitelist: map[whatlanggo.Lang]bool{
whatlanggo.Epo: true,
whatlanggo.Ukr: true,
},
}
info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}
For more details, please check the documentation.
Requirements
Go 1.8 or higher
How does it work?
How does the language recognition work?
The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.
How IsReliable calculated?
It is based on the following factors:
- How many unique trigrams are in the given text
- How big is the difference between the first and the second(not returned) detected languages? This metric is called
rate
in the code base.
Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:
For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.
License
Derivation
whatlanggo is a derivative of Franc (JavaScript, MIT) by Titus Wormer.
Acknowledgements
Thanks to greyblake (Potapov Sergey) for creating whatlang-rs from where I got the idea and algorithms.
*Note that all licence references and agreements mentioned in the whatlanggo README section above
are relevant to that project's source code only.