kagome alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view kagome alternatives based on common mentions on social networks and blogs.
-
prose
A library for text processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. -
gojieba
This is a Go implementation of jieba which a Chinese word splitting algorithm. -
gse
Go efficient text segmentation; support english, chinese, japanese and other. -
spaGO
Self-contained Machine Learning and Natural Language Processing library in Go. -
whatlanggo
A natural language detection package for Go. Supports 84 languages and 24 scripts (writing systems e.g. Latin, Cyrillic, etc). -
locales
đ a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
universal-translator
đŹ i18n Translator for Go/Golang using CLDR data + pluralization rules -
go-nlp
Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
RAKE.go
A Go port of the Rapid Automatic Keyword Extraction Algorithm (RAKE) -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
textcat
A Go package for n-gram based text categorization, with support for utf-8 and raw text -
MMSEGO
This is a GO implementation of MMSEG which a Chinese word splitting algorithm. -
stemmer
Stemmer packages for Go programming language. Includes English and German stemmers. -
petrovich
Petrovich is the library which inflects Russian names to given grammatical case. -
snowball
Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality Snowball native. -
go-localize
Simple and easy to use i18n (Internationalization and localization) engine -
golibstemmer
Go bindings for the snowball libstemmer library including porter 2 -
libtextcat
Cgo binding for libtextcat C library. Guaranteed compatibility with version 2.2. -
icu
Cgo binding for icu4c C library detection and conversion functions. Guaranteed compatibility with version 50.1. -
go-tinydate
A tiny date object in Go. Tinydate uses only 4 bytes of memory -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation) -
porter
This is a fairly straightforward port of Martin Porter's C implementation of the Porter stemming algorithm. -
gosentiwordnet
Sentiment analyzer using sentiwordnet lexicon in Go. -
go-eco
Similarity, dissimilarity and distance matrices; diversity, equitability and inequality measures; species richness estimators; coenocline models. -
detectlanguage
Language Detection API Go Client. Supports batch requests, short phrase or single word language detection.
Scout APM - Leading-edge performance monitoring starting at $39/month
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest. Visit our partner's website for more details.
Do you think we are missing an alternative of kagome or a related project?
Popular Comparisons
README
Kagome v2
Kagome is an open source Japanese morphological analyzer written in pure golang. The dictionary/statistical models such as MeCab-IPADIC, UniDic (unidic-mecab) and so on, are able to be embedded in binaries.
Improvements from v1.
- Dictionaries are maintained in a separate repository, and only the dictionaries you need are embedded in the binary.
- Brushed up and added several APIs.
Dictionaries
dict | source | package |
---|---|---|
MeCab IPADIC | mecab-ipadic-2.7.0-20070801 | github.com/ikawaha/kagome-dict/ipa |
UniDIC | unidic-mecab-2.1.2_src | github.com/ikawaha/kagome-dict/uni |
Experimental Features
dict | source | package |
---|---|---|
mecab-ipadic-NEologd | mecab-ipadic-neologd | github.com/ikawaha/kagome-ipa-neologd |
Korean MeCab | mecab-ko-dic-2.1.1-20180720 | github.com/ikawaha/kagome-dict-ko |
Segmentation mode for search
Kagome has segmentation mode for search such as Kuromoji.
- Normal: Regular segmentation
- Search: Use a heuristic to do additional segmentation useful for search
- Extended: Similar to search mode, but also uni-gram unknown words
Untokenized | Normal | Search | Extended |
---|---|---|---|
é˘čĽżĺ˝é犺港 | é˘čĽżĺ˝é犺港 | é˘čĽżăĺ˝éă犺港 | é˘čĽżăĺ˝éă犺港 |
ćĽćŹçľć¸ć°č | ćĽćŹçľć¸ć°č | ćĽćŹăçľć¸ăć°č | ćĽćŹăçľć¸ăć°č |
ăˇăă˘ă˝ăăăŚă§ă˘ă¨ăłă¸ă㢠| ăˇăă˘ă˝ăăăŚă§ă˘ă¨ăłă¸ă㢠| ăˇăă˘ăă˝ăăăŚă§ă˘ăă¨ăłă¸ă㢠| ăˇăă˘ăă˝ăăăŚă§ă˘ăă¨ăłă¸ă㢠|
ăă¸ăŤăĄă財ăŁă | ăă¸ăŤăĄăăă財ăŁăă | ăă¸ăŤăĄăăă財ăŁăă | ăăă¸ăăŤăăĄăăă財ăŁăă |
Programming example
package main
import (
"fmt"
"strings"
"github.com/ikawaha/kagome-dict/ipa"
"github.com/ikawaha/kagome/v2/tokenizer"
)
func main() {
t, err := tokenizer.New(ipa.Dict(), tokenizer.OmitBosEos())
if err != nil {
panic(err)
}
// wakati
fmt.Println("---wakati---")
seg := t.Wakati("ăăăăăăăăăăŽăăĄ")
fmt.Println(seg)
// tokenize
fmt.Println("---tokenize---")
tokens := t.Tokenize("ăăăăăăăăăăŽăăĄ")
for _, token := range tokens {
features := strings.Join(token.Features(), ",")
fmt.Printf("%s\t%v\n", token.Surface, features)
}
}
output:
---wakati---
[ăăă ă ăă ă ăă ㎠ăăĄ]
---tokenize---
ăăă ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
㎠ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă
ă㥠ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă
Commands
Install
Go
env GO111MODULE=on go get -u github.com/ikawaha/kagome/v2
Homebrew tap
brew install ikawaha/kagome/kagome
Usage
$ kagome -h
Japanese Morphological Analyzer -- github.com/ikawaha/kagome/v2
usage: kagome <command>
The commands are:
[tokenize] - command line tokenize (*default)
server - run tokenize server
lattice - lattice viewer
version - show version
tokenize [-file input_file] [-dict dic_file] [-userdict userdic_file] [-sysdict (ipa|uni)] [-simple false] [-mode (normal|search|extended)]
-dict string
dict
-file string
input file
-mode string
tokenize mode (normal|search|extended) (default "normal")
-simple
display abbreviated dictionary contents
-sysdict string
system dict type (ipa|uni) (default "ipa")
-udict string
user dict
Tokenize command
% kagome
ăăăăăăăăăăŽăăĄ
ăăă ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
㎠ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă
ă㥠ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă
EOS
Server command
API
Start a server and try to access the "/tokenize" endpoint.
% kagome server &
% curl -XPUT localhost:6060/tokenize -d'{"sentence":"ăăăăăăăăăăŽăăĄ", "mode":"normal"}' | jq .
Web App
Start a server and access http://localhost:6060
.
(To draw a lattice, demo application uses graphviz . You need graphviz installed.)
% kagome server &
Lattice command
A debug tool of tokenize process outputs a lattice in graphviz dot format.
% kagome lattice ç§ăŻé°ť | dot -Tpng -o lattice.png
Docker
Licence
MIT