Popularity

8.4

Stable

Activity

4.4

Stars 2,462

Watchers 66

Forks 210

Last Commit 2 months ago

Programming language: Go

License: Apache License 2.0

Tags: Natural Language Processing

Latest version: v0.63.2

gse alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view gse alternatives based on common mentions on social networks and blogs.

prose

8.7 1.9 gse VS prose

DISCONTINUED. :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
go-i18n

8.5 7.1 gse VS go-i18n

Translate your Go program into multiple languages.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

gojieba

8.4 1.9 gse VS gojieba

"结巴"中文分词的Golang版本
go-pinyin

7.9 4.5 gse VS go-pinyin

汉字转拼音
spaGO

7.9 0.0 gse VS spaGO

DISCONTINUED. Self-contained Machine Learning and Natural Language Processing library in Go
when

7.6 5.1 gse VS when

A natural language date/time parser with pluggable rules
kagome

7.0 6.4 gse VS kagome

Self-contained Japanese Morphological Analyzer written in pure Go
whatlanggo

6.8 0.0 gse VS whatlanggo

Natural language detection library for Go
nlp

6.3 0.0 gse VS nlp

DISCONTINUED. [UNMANTEINED] Extract values from strings and fill your structs with nlp.
sentences

6.2 4.5 gse VS sentences

A multilingual command line sentence tokenizer in Golang
universal-translator

6.1 0.0 gse VS universal-translator

:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules
locales

5.9 0.0 gse VS locales

:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator
getlang

4.9 0.0 gse VS getlang

Natural language detection package in pure Go
RAKE.go

4.5 0.0 gse VS RAKE.go

A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)
go-unidecode

4.5 3.1 gse VS go-unidecode

ASCII transliterations of Unicode text.
go-nlp

4.3 0.0 gse VS go-nlp

DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work.
segment

4.3 0.0 gse VS segment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
gounidecode

4.2 0.0 gse VS gounidecode

Unicode transliterator for #golang
go-stem

3.9 0.0 gse VS go-stem

Word Stemming in Go
textcat

3.8 0.0 gse VS textcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text
MMSEGO

3.6 0.0 gse VS MMSEGO

Chinese word splitting algorithm MMSEG in GO
address

3.3 6.5 gse VS address

Address handling for Go.
go-localize

3.3 0.0 gse VS go-localize

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
go2vec

3.2 0.0 gse VS go2vec

Read and use word2vec vectors in Go
stemmer

3.1 0.0 gse VS stemmer

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.
petrovich

2.9 3.8 gse VS petrovich

Golang port of Petrovich - an inflector for Russian anthroponyms.
porter2

2.9 0.0 gse VS porter2

High Performance Porter2 Stemmer
iuliia-go

2.8 1.8 gse VS iuliia-go

Transliterate Cyrillic → Latin in every possible way
dpar

2.8 3.2 gse VS dpar

Neural network transition-based dependency parser (in Rust)
govader

2.7 0.0 gse VS govader

vader sentiment analysis in go
go-mystem

2.6 0.0 gse VS go-mystem

CGo bindings to Yandex.Mystem
go-tinydate

2.5 0.0 gse VS go-tinydate

A tiny date object in Go. Tinydate uses only 4 bytes of memory
spreak

2.4 6.4 gse VS spreak

Flexible translation and humanization library for Go, based on the concepts behind gettext.
snowball

2.4 0.0 L1 gse VS snowball

Cgo binding for Snowball C library
paicehusk

2.4 0.0 gse VS paicehusk

Golang implementation of the Paice/Husk Stemming Algorithm
detectlanguage

2.0 0.0 gse VS detectlanguage

Detect Language API Go Client
gotokenizer

2.0 0.0 gse VS gotokenizer

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
golibstemmer

2.0 0.0 gse VS golibstemmer

Go bindings for the snowball libstemmer library including porter 2
libtextcat

1.8 0.0 gse VS libtextcat

Cgo binding for libtextcat C library
icu

1.8 0.0 gse VS icu

Cgo binding for icu4c library
t

1.8 3.5 gse VS t

t: translation util for go, using GNU gettext
shamoji

1.3 0.0 gse VS shamoji

The shamoji (杓文字) is a word filtering package
porter

1.2 0.0 gse VS porter

porter stemmer
gosentiwordnet

0.9 0.0 gse VS gosentiwordnet

💬 Sentiment analyzer library using SentiWordnet in Go
govader-backend

0.5 2.6 gse VS govader-backend

Sentimental Analysis Microservice
go-eco

0.5 0.0 gse VS go-eco

Automatically exported from code.google.com/p/go-eco
spelling-corrector

0.3 0.0 gse VS spelling-corrector

Spelling corrector for Spanish language

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of gse or a related project?

Add another 'Natural Language Processing' Package

Popular Comparisons

README

gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. And supports with elasticsearch and bleve.

简体中文

Gse is implements jieba by golang, and try add NLP support and more feature

Feature:

Support common, search engine, full mode, precise mode and HMM mode multiple word segmentation modes;
Support user and embed dictionary, Part-of-speech/POS tagging, analyze segment info, stop and trim words
Support multilingual: English, Chinese, Japanese and others
Support Traditional Chinese
Support HMM cut text use Viterbi algorithm
Support NLP by TensorFlow (in work)
Named Entity Recognition (in work)
Supports with elasticsearch and bleve
run JSON RPC service.

Algorithm:

Dictionary with double array trie (Double-Array Trie) to achieve
Segmenter algorithm is the shortest path (based on word frequency and dynamic programming), and DAG and HMM algorithm word segmentation.

Text Segmentation speed:

single thread 9.2MB/s
goroutines concurrent 26.8MB/s.
HMM text segmentation single thread 3.2MB/s. (2core 4threads Macbook Pro).

Binding:

gse-bind, binding JavaScript and other, support more language.

Install / update

With Go module support (Go 1.11+), just import:

import "github.com/go-ego/gse"

Otherwise, to install the gse package, run the command:

go get -u github.com/go-ego/gse

Use

package main

import (
    "fmt"
    "regexp"

    "github.com/go-ego/gse"
    "github.com/go-ego/gse/hmm/pos"
)

var (
    text = "Hello world, Helloworld. Winter is coming! こんにちは世界, 你好世界."

    new, _ = gse.New("zh,testdata/test_dict3.txt", "alpha")

    seg gse.Segmenter
    posSeg pos.Segmenter
)

func main() {
    // Loading the default dictionary
    seg.LoadDict()
    // Loading the default dictionary with embed
    // seg.LoadDictEmbed()
    // 
    // Loading the Simplified Chinese dictionary
    // seg.LoadDict("zh_s")
    // seg.LoadDictEmbed("zh_s")
    //
    // Loading the Traditional Chinese dictionary
    // seg.LoadDict("zh_t")
    // 
    // Loading the Japanese dictionary
    // seg.LoadDict("jp")
    // 
    // Load the dictionary
    // seg.LoadDict("your gopath"+"/src/github.com/go-ego/gse/data/dict/dictionary.txt")

    cut()

    segCut()
}

func cut() {
    hmm := new.Cut(text, true)
    fmt.Println("cut use hmm: ", hmm)

    hmm = new.CutSearch(text, true)
    fmt.Println("cut search use hmm: ", hmm)
    fmt.Println("analyze: ", new.Analyze(hmm, text))

    hmm = new.CutAll(text)
    fmt.Println("cut all: ", hmm)

    reg := regexp.MustCompile(`(\d+年|\d+月|\d+日|[\p{Latin}]+|[\p{Hangul}]+|\d+\.\d+|[a-zA-Z0-9]+)`)
    text1 := `헬로월드 헬로 서울, 2021年09月10日, 3.14`
    hmm = seg.CutDAG(text1, reg)
    fmt.Println("Cut with hmm and regexp: ", hmm, hmm[0], hmm[6])
}

func analyzeAndTrim(cut []string) {
    a := seg.Analyze(cut, "")
    fmt.Println("analyze the segment: ", a)

    cut = seg.Trim(cut)
    fmt.Println("cut all: ", cut)

    fmt.Println(seg.String(text, true))
    fmt.Println(seg.Slice(text, true))
}

func cutPos() {
    po := seg.Pos(text, true)
    fmt.Println("pos: ", po)
    po = seg.TrimPos(po)
    fmt.Println("trim pos: ", po)

    pos.WithGse(seg)
    po = posSeg.Cut(text, true)
    fmt.Println("pos: ", po)

    po = posSeg.TrimWithPos(po, "zg")
    fmt.Println("trim pos: ", po)
}

func segCut() {
    // Text Segmentation
    tb := []byte(text)
    fmt.Println(seg.String(text, true))

    segments := seg.Segment(tb)
    // Handle word segmentation results, search mode
    fmt.Println(gse.ToString(segments, true))
}

Look at an custom dictionary example

package main

import (
    "fmt"
    _ "embed"

    "github.com/go-ego/gse"
)

//go:embed test_dict3.txt
var testDict string

func main() {
    // var seg gse.Segmenter
    // seg.LoadDict("zh, testdata/test_dict.txt, testdata/test_dict1.txt")
    // seg.LoadStop()
    seg, err := gse.NewEmbed("zh, word 20 n"+testDict, "en")
    // seg.LoadDictEmbed()
    seg.LoadStopEmbed()

    text1 := "Hello world, こんにちは世界, 你好世界!"
    s1 := seg.Cut(text1, true)
    fmt.Println(s1)
    fmt.Println("trim: ", seg.Trim(s1))
    fmt.Println("stop: ", seg.Stop(s1))
    fmt.Println(seg.String(text1, true))

    segments := seg.Segment([]byte(text1))
    fmt.Println(gse.ToString(segments))
}

Look at an Chinese example

Look at an Japanese example

Elasticsearch

How to use it with elasticsearch?

go-gse-elastic

Authors

License

Gse is primarily distributed under the terms of "both the MIT license and the Apache License (Version 2.0)". See LICENSE-APACHE, LICENSE-MIT.

Thanks for sego and jieba(jiebago).

*Note that all licence references and agreements mentioned in the gse README section above are relevant to that project's source code only.