Popularity

8.4

Stable

Activity

1.9

Stars 2,322

Watchers 68

Forks 299

Last Commit almost 1 year ago

Programming language: Go

License: MIT License

Tags: Natural Language Processing

Latest version: v1.2.0

gojieba alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view gojieba alternatives based on common mentions on social networks and blogs.

prose

8.7 1.9 gojieba VS prose

DISCONTINUED. :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
go-i18n

8.5 7.1 gojieba VS go-i18n

Translate your Go program into multiple languages.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

gse

8.4 4.4 gojieba VS gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
go-pinyin

7.9 4.5 gojieba VS go-pinyin

汉字转拼音
spaGO

7.9 0.0 gojieba VS spaGO

DISCONTINUED. Self-contained Machine Learning and Natural Language Processing library in Go
when

7.6 5.1 gojieba VS when

A natural language date/time parser with pluggable rules
kagome

7.0 6.4 gojieba VS kagome

Self-contained Japanese Morphological Analyzer written in pure Go
whatlanggo

6.8 0.0 gojieba VS whatlanggo

Natural language detection library for Go
nlp

6.3 0.0 gojieba VS nlp

DISCONTINUED. [UNMANTEINED] Extract values from strings and fill your structs with nlp.
sentences

6.2 4.5 gojieba VS sentences

A multilingual command line sentence tokenizer in Golang
universal-translator

6.1 0.0 gojieba VS universal-translator

:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules
locales

5.9 0.0 gojieba VS locales

:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator
getlang

4.9 0.0 gojieba VS getlang

Natural language detection package in pure Go
go-unidecode

4.5 3.1 gojieba VS go-unidecode

ASCII transliterations of Unicode text.
RAKE.go

4.5 0.0 gojieba VS RAKE.go

A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)
segment

4.3 0.0 gojieba VS segment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
go-nlp

4.3 0.0 gojieba VS go-nlp

DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work.
gounidecode

4.2 0.0 gojieba VS gounidecode

Unicode transliterator for #golang
go-stem

3.9 0.0 gojieba VS go-stem

Word Stemming in Go
textcat

3.8 0.0 gojieba VS textcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text
MMSEGO

3.6 0.0 gojieba VS MMSEGO

Chinese word splitting algorithm MMSEG in GO
go-localize

3.3 0.0 gojieba VS go-localize

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
address

3.3 6.5 gojieba VS address

Address handling for Go.
go2vec

3.2 0.0 gojieba VS go2vec

Read and use word2vec vectors in Go
stemmer

3.1 0.0 gojieba VS stemmer

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.
petrovich

2.9 3.8 gojieba VS petrovich

Golang port of Petrovich - an inflector for Russian anthroponyms.
porter2

2.9 0.0 gojieba VS porter2

High Performance Porter2 Stemmer
iuliia-go

2.8 1.8 gojieba VS iuliia-go

Transliterate Cyrillic → Latin in every possible way
dpar

2.8 3.2 gojieba VS dpar

Neural network transition-based dependency parser (in Rust)
govader

2.7 0.0 gojieba VS govader

vader sentiment analysis in go
go-mystem

2.6 0.0 gojieba VS go-mystem

CGo bindings to Yandex.Mystem
go-tinydate

2.5 0.0 gojieba VS go-tinydate

A tiny date object in Go. Tinydate uses only 4 bytes of memory
spreak

2.4 6.4 gojieba VS spreak

Flexible translation and humanization library for Go, based on the concepts behind gettext.
paicehusk

2.4 0.0 gojieba VS paicehusk

Golang implementation of the Paice/Husk Stemming Algorithm
snowball

2.4 0.0 L1 gojieba VS snowball

Cgo binding for Snowball C library
gotokenizer

2.0 0.0 gojieba VS gotokenizer

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
golibstemmer

2.0 0.0 gojieba VS golibstemmer

Go bindings for the snowball libstemmer library including porter 2
detectlanguage

2.0 0.0 gojieba VS detectlanguage

Detect Language API Go Client
icu

1.8 0.0 gojieba VS icu

Cgo binding for icu4c library
libtextcat

1.8 0.0 gojieba VS libtextcat

Cgo binding for libtextcat C library
t

1.8 3.5 gojieba VS t

t: translation util for go, using GNU gettext
shamoji

1.3 0.0 gojieba VS shamoji

The shamoji (杓文字) is a word filtering package
porter

1.2 0.0 gojieba VS porter

porter stemmer
gosentiwordnet

0.9 0.0 gojieba VS gosentiwordnet

💬 Sentiment analyzer library using SentiWordnet in Go
go-eco

0.5 0.0 gojieba VS go-eco

Automatically exported from code.google.com/p/go-eco
govader-backend

0.5 2.6 gojieba VS govader-backend

Sentimental Analysis Microservice
spelling-corrector

0.3 0.0 gojieba VS spelling-corrector

Spelling corrector for Spanish language

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of gojieba or a related project?

Add another 'Natural Language Processing' Package

Popular Comparisons

README

GoJieba [English](README_EN.md)

GoJieba是"结巴"中文分词的Golang语言版本。

简介

支持多种分词方式，包括: 最大概率模式, HMM新词发现模式, 搜索引擎模式, 全模式
核心算法底层由C++实现，性能高效。
字典路径可配置，NewJieba(...string), NewExtractor(...string) 可变形参，当参数为空时使用默认词典(推荐方式)

用法

go get github.com/yanyiwu/gojieba

分词示例

package main

import (
    "fmt"
    "strings"

    "github.com/yanyiwu/gojieba"
)

func main() {
    var s string
    var words []string
    use_hmm := true
    x := gojieba.NewJieba()
    defer x.Free()

    s = "我来到北京清华大学"
    words = x.CutAll(s)
    fmt.Println(s)
    fmt.Println("全模式:", strings.Join(words, "/"))

    words = x.Cut(s, use_hmm)
    fmt.Println(s)
    fmt.Println("精确模式:", strings.Join(words, "/"))
    s = "比特币"
    words = x.Cut(s, use_hmm)
    fmt.Println(s)
    fmt.Println("精确模式:", strings.Join(words, "/"))

    x.AddWord("比特币")
    s = "比特币"
    words = x.Cut(s, use_hmm)
    fmt.Println(s)
    fmt.Println("添加词典后,精确模式:", strings.Join(words, "/"))

    s = "他来到了网易杭研大厦"
    words = x.Cut(s, use_hmm)
    fmt.Println(s)
    fmt.Println("新词识别:", strings.Join(words, "/"))

    s = "小明硕士毕业于中国科学院计算所，后在日本京都大学深造"
    words = x.CutForSearch(s, use_hmm)
    fmt.Println(s)
    fmt.Println("搜索引擎模式:", strings.Join(words, "/"))

    s = "长春市长春药店"
    words = x.Tag(s)
    fmt.Println(s)
    fmt.Println("词性标注:", strings.Join(words, ","))

    s = "区块链"
    words = x.Tag(s)
    fmt.Println(s)
    fmt.Println("词性标注:", strings.Join(words, ","))

    s = "长江大桥"
    words = x.CutForSearch(s, !use_hmm)
    fmt.Println(s)
    fmt.Println("搜索引擎模式:", strings.Join(words, "/"))

    wordinfos := x.Tokenize(s, gojieba.SearchMode, !use_hmm)
    fmt.Println(s)
    fmt.Println("Tokenize:(搜索引擎模式)", wordinfos)

    wordinfos = x.Tokenize(s, gojieba.DefaultMode, !use_hmm)
    fmt.Println(s)
    fmt.Println("Tokenize:(默认模式)", wordinfos)

    keywords := x.ExtractWithWeight(s, 5)
    fmt.Println("Extract:", keywords)
}

我来到北京清华大学
全模式: 我/来到/北京/清华/清华大学/华大/大学
我来到北京清华大学
精确模式: 我/来到/北京/清华大学
比特币
精确模式: 比特/币
比特币
添加词典后,精确模式: 比特币
他来到了网易杭研大厦
新词识别: 他/来到/了/网易/杭研/大厦
小明硕士毕业于中国科学院计算所，后在日本京都大学深造
搜索引擎模式: 小明/硕士/毕业/于/中国/科学/学院/科学院/中国科学院/计算/计算所/，/后/在/日本/京都/大学/日本京都大学/深造
长春市长春药店
词性标注: 长春市/ns,长春/ns,药店/n
区块链
词性标注: 区块链/nz
长江大桥
搜索引擎模式: 长江/大桥/长江大桥
长江大桥
Tokenize: [{长江 0 6} {大桥 6 12} {长江大桥 0 12}]

See example in [jieba_test](jieba_test.go), [extractor_test](extractor_test.go)

Benchmark

Jieba中文分词系列性能评测

Unittest

go test ./...

Benchmark

go test -bench "Jieba" -test.benchtime 10s
go test -bench "Extractor" -test.benchtime 10s

Contributors

Code Contributors

This project exists thanks to all the people who contribute.

Contact

Email: [email protected]

*Note that all licence references and agreements mentioned in the gojieba README section above are relevant to that project's source code only.

gojieba

"结巴"中文分词的Golang版本

gojieba alternatives and similar packages

Popular Comparisons

README

GoJieba [English](README_EN.md)

简介

用法

Benchmark

Contributors

Code Contributors

Contact