porter alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view porter alternatives based on common mentions on social networks and blogs.
-
prose
A library for text processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. -
gojieba
This is a Go implementation of jieba which a Chinese word splitting algorithm. -
gse
Go efficient text segmentation; support english, chinese, japanese and other. -
spaGO
Self-contained Machine Learning and Natural Language Processing library in Go. -
whatlanggo
A natural language detection package for Go. Supports 84 languages and 24 scripts (writing systems e.g. Latin, Cyrillic, etc). -
locales
๐ a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
universal-translator
๐ฌ i18n Translator for Go/Golang using CLDR data + pluralization rules -
RAKE.go
A Go port of the Rapid Automatic Keyword Extraction Algorithm (RAKE) -
go-nlp
Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
MMSEGO
This is a GO implementation of MMSEG which a Chinese word splitting algorithm. -
textcat
A Go package for n-gram based text categorization, with support for utf-8 and raw text -
stemmer
Stemmer packages for Go programming language. Includes English and German stemmers. -
petrovich
Petrovich is the library which inflects Russian names to given grammatical case. -
go-localize
Simple and easy to use i18n (Internationalization and localization) engine -
snowball
Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality Snowball native. -
golibstemmer
Go bindings for the snowball libstemmer library including porter 2 -
libtextcat
Cgo binding for libtextcat C library. Guaranteed compatibility with version 2.2. -
icu
Cgo binding for icu4c C library detection and conversion functions. Guaranteed compatibility with version 50.1. -
go-tinydate
A tiny date object in Go. Tinydate uses only 4 bytes of memory -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation) -
gosentiwordnet
Sentiment analyzer using sentiwordnet lexicon in Go. -
go-eco
Similarity, dissimilarity and distance matrices; diversity, equitability and inequality measures; species richness estimators; coenocline models. -
detectlanguage
Language Detection API Go Client. Supports batch requests, short phrase or single word language detection.
Scout APM - Leading-edge performance monitoring starting at $39/month
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest. Visit our partner's website for more details.
Do you think we are missing an alternative of porter or a related project?
Popular Comparisons
README
Porter Stemmer for Go
This is a fairly straighforward port of Martin Porter's C implementation of the Porter stemming algorithm. The C version this port is based on is available for download here: http://tartarus.org/~martin/PorterStemmer/c_thread_safe.txt
The original algorithm is described in the paper:
M.F. Porter, 1980, An algorithm for suffix stripping, Program, 14(3) pp
130-137.
While the internal implementation and interface is nearly identical to the original implementation, the Go interface is much simplified. The stemmer can be called as follows:
import "porter"
...
stemmed := porter.Stem(word_to_stem)
Installing
go get github.com/a2800276/porter
to use the stemmer when installed using goinstall, import:
import "github.com/a2800276/porter"
Limitations
While the implementation is fairly robust, this is a work in progress.
In particular, a new interface will likely be provided to prevent
excessive conversions between string
s and []byte
. Currently, on
calling Stem
the string argument is converted to a byte slice which
the algorithm works on and is converted back into a string before
returning.
Also, the implementation is not particularly robust at handling Unicode input, currently, only bytes with the high bit set are ignored. It's up to the caller to make sure the string contains only ASCII characters. Since the algorithm itself operates on English words only, this doens't restrict the functionality, but it is nuisance.
TODO:
- byte slice API to void roundtripping to string and back