gojieba alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view gojieba alternatives based on common mentions on social networks and blogs.
-
prose
A library for text processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. -
whatlanggo
A natural language detection package for Go. Supports 84 languages and 24 scripts (writing systems e.g. Latin, Cyrillic, etc). -
locales
🌎 a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
go-nlp
Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
snowball
Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality Snowball native. -
icu
Cgo binding for icu4c C library detection and conversion functions. Guaranteed compatibility with version 50.1. -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation) -
porter
This is a fairly straightforward port of Martin Porter's C implementation of the Porter stemming algorithm. -
go-eco
Similarity, dissimilarity and distance matrices; diversity, equitability and inequality measures; species richness estimators; coenocline models. -
detectlanguage
Language Detection API Go Client. Supports batch requests, short phrase or single word language detection.
Get performance insights in less than 4 minutes
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest. Visit our partner's website for more details.
Do you think we are missing an alternative of gojieba or a related project?
Popular Comparisons
README
GoJieba [English](README_EN.md)
GoJieba是"结巴"中文分词的Golang语言版本。
简介
- 支持多种分词方式,包括: 最大概率模式, HMM新词发现模式, 搜索引擎模式, 全模式
- 核心算法底层由C++实现,性能高效。
- 字典路径可配置,NewJieba(...string), NewExtractor(...string) 可变形参,当参数为空时使用默认词典(推荐方式)
用法
go get github.com/yanyiwu/gojieba
分词示例
package main
import (
"fmt"
"strings"
"github.com/yanyiwu/gojieba"
)
func main() {
var s string
var words []string
use_hmm := true
x := gojieba.NewJieba()
defer x.Free()
s = "我来到北京清华大学"
words = x.CutAll(s)
fmt.Println(s)
fmt.Println("全模式:", strings.Join(words, "/"))
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("精确模式:", strings.Join(words, "/"))
s = "比特币"
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("精确模式:", strings.Join(words, "/"))
x.AddWord("比特币")
s = "比特币"
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("添加词典后,精确模式:", strings.Join(words, "/"))
s = "他来到了网易杭研大厦"
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("新词识别:", strings.Join(words, "/"))
s = "小明硕士毕业于中国科学院计算所,后在日本京都大学深造"
words = x.CutForSearch(s, use_hmm)
fmt.Println(s)
fmt.Println("搜索引擎模式:", strings.Join(words, "/"))
s = "长春市长春药店"
words = x.Tag(s)
fmt.Println(s)
fmt.Println("词性标注:", strings.Join(words, ","))
s = "区块链"
words = x.Tag(s)
fmt.Println(s)
fmt.Println("词性标注:", strings.Join(words, ","))
s = "长江大桥"
words = x.CutForSearch(s, !use_hmm)
fmt.Println(s)
fmt.Println("搜索引擎模式:", strings.Join(words, "/"))
wordinfos := x.Tokenize(s, gojieba.SearchMode, !use_hmm)
fmt.Println(s)
fmt.Println("Tokenize:(搜索引擎模式)", wordinfos)
wordinfos = x.Tokenize(s, gojieba.DefaultMode, !use_hmm)
fmt.Println(s)
fmt.Println("Tokenize:(默认模式)", wordinfos)
keywords := x.ExtractWithWeight(s, 5)
fmt.Println("Extract:", keywords)
}
我来到北京清华大学
全模式: 我/来到/北京/清华/清华大学/华大/大学
我来到北京清华大学
精确模式: 我/来到/北京/清华大学
比特币
精确模式: 比特/币
比特币
添加词典后,精确模式: 比特币
他来到了网易杭研大厦
新词识别: 他/来到/了/网易/杭研/大厦
小明硕士毕业于中国科学院计算所,后在日本京都大学深造
搜索引擎模式: 小明/硕士/毕业/于/中国/科学/学院/科学院/中国科学院/计算/计算所/,/后/在/日本/京都/大学/日本京都大学/深造
长春市长春药店
词性标注: 长春市/ns,长春/ns,药店/n
区块链
词性标注: 区块链/nz
长江大桥
搜索引擎模式: 长江/大桥/长江大桥
长江大桥
Tokenize: [{长江 0 6} {大桥 6 12} {长江大桥 0 12}]
See example in [jieba_test](jieba_test.go), [extractor_test](extractor_test.go)
Benchmark
Unittest
go test ./...
Benchmark
go test -bench "Jieba" -test.benchtime 10s
go test -bench "Extractor" -test.benchtime 10s
Contributors
Code Contributors
This project exists thanks to all the people who contribute.
Contact
- Email:
[email protected]
*Note that all licence references and agreements mentioned in the gojieba README section above
are relevant to that project's source code only.