gotokenizer alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view gotokenizer alternatives based on common mentions on social networks and blogs.

prose

8.7 1.9 gotokenizer VS prose

DISCONTINUED. :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
go-i18n

8.5 7.1 gotokenizer VS go-i18n

Translate your Go program into multiple languages.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

gojieba

8.4 1.9 gotokenizer VS gojieba

"结巴"中文分词的Golang版本
gse

8.4 4.4 gotokenizer VS gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
go-pinyin

7.9 4.5 gotokenizer VS go-pinyin

汉字转拼音
spaGO

7.9 0.0 gotokenizer VS spaGO

DISCONTINUED. Self-contained Machine Learning and Natural Language Processing library in Go
when

7.6 5.1 gotokenizer VS when

A natural language date/time parser with pluggable rules
kagome

7.0 6.4 gotokenizer VS kagome

Self-contained Japanese Morphological Analyzer written in pure Go
whatlanggo

6.8 0.0 gotokenizer VS whatlanggo

Natural language detection library for Go
nlp

6.3 0.0 gotokenizer VS nlp

DISCONTINUED. [UNMANTEINED] Extract values from strings and fill your structs with nlp.
sentences

6.2 4.5 gotokenizer VS sentences

A multilingual command line sentence tokenizer in Golang
universal-translator

6.1 0.0 gotokenizer VS universal-translator

:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules
locales

5.9 0.0 gotokenizer VS locales

:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator
getlang

4.9 0.0 gotokenizer VS getlang

Natural language detection package in pure Go
go-unidecode

4.5 3.1 gotokenizer VS go-unidecode

ASCII transliterations of Unicode text.
RAKE.go

4.5 0.0 gotokenizer VS RAKE.go

A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)
segment

4.3 0.0 gotokenizer VS segment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
go-nlp

4.3 0.0 gotokenizer VS go-nlp

DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work.
gounidecode

4.2 0.0 gotokenizer VS gounidecode

Unicode transliterator for #golang
go-stem

3.9 0.0 gotokenizer VS go-stem

Word Stemming in Go
textcat

3.8 0.0 gotokenizer VS textcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text
MMSEGO

3.6 0.0 gotokenizer VS MMSEGO

Chinese word splitting algorithm MMSEG in GO
address

3.3 6.5 gotokenizer VS address

Address handling for Go.
go-localize

3.3 0.0 gotokenizer VS go-localize

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
go2vec

3.2 0.0 gotokenizer VS go2vec

Read and use word2vec vectors in Go
stemmer

3.1 0.0 gotokenizer VS stemmer

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.
porter2

2.9 0.0 gotokenizer VS porter2

High Performance Porter2 Stemmer
petrovich

2.9 3.8 gotokenizer VS petrovich

Golang port of Petrovich - an inflector for Russian anthroponyms.
dpar

2.8 3.2 gotokenizer VS dpar

Neural network transition-based dependency parser (in Rust)
iuliia-go

2.8 1.8 gotokenizer VS iuliia-go

Transliterate Cyrillic → Latin in every possible way
govader

2.7 0.0 gotokenizer VS govader

vader sentiment analysis in go
go-mystem

2.6 0.0 gotokenizer VS go-mystem

CGo bindings to Yandex.Mystem
go-tinydate

2.5 0.0 gotokenizer VS go-tinydate

A tiny date object in Go. Tinydate uses only 4 bytes of memory
spreak

2.4 6.4 gotokenizer VS spreak

Flexible translation and humanization library for Go, based on the concepts behind gettext.
snowball

2.4 0.0 L1 gotokenizer VS snowball

Cgo binding for Snowball C library
paicehusk

2.4 0.0 gotokenizer VS paicehusk

Golang implementation of the Paice/Husk Stemming Algorithm
detectlanguage

2.0 0.0 gotokenizer VS detectlanguage

Detect Language API Go Client
golibstemmer

2.0 0.0 gotokenizer VS golibstemmer

Go bindings for the snowball libstemmer library including porter 2
icu

1.8 0.0 gotokenizer VS icu

Cgo binding for icu4c library
libtextcat

1.8 0.0 gotokenizer VS libtextcat

Cgo binding for libtextcat C library
t

1.8 3.5 gotokenizer VS t

t: translation util for go, using GNU gettext
shamoji

1.3 0.0 gotokenizer VS shamoji

The shamoji (杓文字) is a word filtering package
porter

1.2 0.0 gotokenizer VS porter

porter stemmer
gosentiwordnet

0.9 0.0 gotokenizer VS gosentiwordnet

💬 Sentiment analyzer library using SentiWordnet in Go
go-eco

0.5 0.0 gotokenizer VS go-eco

Automatically exported from code.google.com/p/go-eco
govader-backend

0.5 2.6 gotokenizer VS govader-backend

Sentimental Analysis Microservice
spelling-corrector

0.3 0.0 gotokenizer VS spelling-corrector

Spelling corrector for Spanish language

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of gotokenizer or a related project?

Add another 'Natural Language Processing' Package

Popular Comparisons

README

gotokenizer

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)

Motivation

I wanted a simple tokenizer that has no unnecessary overhead using the standard library only, following good practices and well tested code.

Features

Support Maximum Matching Method
Support Minimum Matching Method
Support Reverse Maximum Matching
Support Reverse Minimum Matching
Support Bidirectional Maximum Matching
Support Bidirectional Minimum Matching
Support using Stop Tokens
Support Custom word Filter

Installation

go get -u github.com/xujiajun/gotokenizer

Usage

package main

import (
    "fmt"

    "github.com/xujiajun/gotokenizer"
)

func main() {
    text := "gotokenizer是一款基于字典和Bigram模型纯go语言编写的分词器，支持6种分词算法。支持stopToken过滤和自定义word过滤功能。"

    dictPath := "/Users/xujiajun/go/src/github.com/xujiajun/gotokenizer/data/zh/dict.txt"
    // NewMaxMatch default wordFilter is NumAndLetterWordFilter
    mm := gotokenizer.NewMaxMatch(dictPath)
    // load dict
    mm.LoadDict()

    fmt.Println(mm.Get(text)) //[gotokenizer 是 一款 基于 字典 和 Bigram 模型 纯 go 语言 编写 的 分词器 ， 支持 6 种 分词 算法 。 支持 stopToken 过滤 和 自定义 word 过滤 功能 。] <nil>

    // enabled filter stop tokens 
    mm.EnabledFilterStopToken = true
    mm.StopTokens = gotokenizer.NewStopTokens()
    stopTokenDicPath := "/Users/xujiajun/go/src/github.com/xujiajun/gotokenizer/data/zh/stop_tokens.txt"
    mm.StopTokens.Load(stopTokenDicPath)

    fmt.Println(mm.Get(text)) //[gotokenizer 一款 字典 Bigram 模型 go 语言 编写 分词器 支持 6 种 分词 算法 支持 stopToken 过滤 自定义 word 过滤 功能] <nil>
    fmt.Println(mm.GetFrequency(text)) //map[6:1 种:1 算法:1 过滤:2 支持:2 Bigram:1 模型:1 编写:1 gotokenizer:1 go:1 分词器:1 分词:1 word:1 功能:1 一款:1 语言:1 stopToken:1 自定义:1 字典:1] <nil>

}