snowball alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view snowball alternatives based on common mentions on social networks and blogs.
-
prose
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction. -
gse
Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. -
universal-translator
:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules -
locales
:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
go-nlp
Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
go-localize
i18n (Internationalization and localization) engine written in Go, used for translating locale strings. -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
WorkOS - The modern identity platform for B2B SaaS
Do you think we are missing an alternative of snowball or a related project?
Popular Comparisons
README
Description
Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info see http://snowball.tartarus.org/
Installing
go get github.com/goodsign/snowball
go test github.com/goodsign/snowball (Must PASS)
Done! Use it in your go files. (import 'github.com/goodsign/snowball')
Usage
stemmer, err := NewWordStemmer(algorithm, encoding)
if nil != err {
/*...handle error...*/
}
defer stemmer.Close()
wordStem, err := stemmer.Stem(word)
if nil != err {
/*...handle error...*/
}
/* Use wordStem */
Usage notes
According to Snowball documentation:
Creating a stemmer is a relatively expensive operation - the expected
usage pattern is that a new stemmer is created when needed, used
to stem many words, and deleted after some time.
Algorithms & encodings
File modules.txt contains all the main algorithms for each language, in UTF-8, and also with the most commonly used encoding.
Language Encodings Algorithms
danish UTF_8,ISO_8859_1 danish,da,dan
dutch UTF_8,ISO_8859_1 dutch,nl,dut,nld
english UTF_8,ISO_8859_1 english,en,eng
finnish UTF_8,ISO_8859_1 finnish,fi,fin
french UTF_8,ISO_8859_1 french,fr,fre,fra
german UTF_8,ISO_8859_1 german,de,ger,deu
hungarian UTF_8,ISO_8859_1 hungarian,hu,hun
italian UTF_8,ISO_8859_1 italian,it,ita
norwegian UTF_8,ISO_8859_1 norwegian,no,nor
portuguese UTF_8,ISO_8859_1 portuguese,pt,por
romanian UTF_8,ISO_8859_2 romanian,ro,rum,ron
russian UTF_8,KOI8_R russian,ru,rus
spanish UTF_8,ISO_8859_1 spanish,es,esl,spa
swedish UTF_8,ISO_8859_1 swedish,sv,swe
turkish UTF_8 turkish,tr,tur
Thread-safety
The original Snowball documentation says:
Stemmers are re-entrant, but not threadsafe. In other words, if
you wish to access the same stemmer object from multiple threads,
you must ensure that all access is protected by a mutex or similar
device.
Thus this Go wrapper uses sync.Mutex for each stem operation, so it is thread safe.
Snowball Licence
The Snowball library is released under the BSD Licence
Licence
The goodsign/snowball binding is released under the BSD Licence
*Note that all licence references and agreements mentioned in the snowball README section above
are relevant to that project's source code only.