Popularity

7.0

Stable

Activity

6.4

Stars 789

Watchers 23

Forks 52

Last Commit 7 days ago

Programming language: Go

License: MIT License

Tags: Natural Language Processing

Latest version: v2.3.4

kagome alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view kagome alternatives based on common mentions on social networks and blogs.

prose

8.7 1.9 kagome VS prose

DISCONTINUED. :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
go-i18n

8.5 7.1 kagome VS go-i18n

Translate your Go program into multiple languages.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

gse

8.4 4.4 kagome VS gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
gojieba

8.4 1.9 kagome VS gojieba

"结巴"中文分词的Golang版本
go-pinyin

7.9 4.5 kagome VS go-pinyin

汉字转拼音
spaGO

7.9 0.0 kagome VS spaGO

DISCONTINUED. Self-contained Machine Learning and Natural Language Processing library in Go
when

7.6 5.1 kagome VS when

A natural language date/time parser with pluggable rules
whatlanggo

6.8 0.0 kagome VS whatlanggo

Natural language detection library for Go
nlp

6.3 0.0 kagome VS nlp

DISCONTINUED. [UNMANTEINED] Extract values from strings and fill your structs with nlp.
sentences

6.2 4.5 kagome VS sentences

A multilingual command line sentence tokenizer in Golang
universal-translator

6.1 0.0 kagome VS universal-translator

:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules
locales

5.9 0.0 kagome VS locales

:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator
getlang

4.9 0.0 kagome VS getlang

Natural language detection package in pure Go
RAKE.go

4.5 0.0 kagome VS RAKE.go

A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)
go-unidecode

4.5 3.1 kagome VS go-unidecode

ASCII transliterations of Unicode text.
go-nlp

4.3 0.0 kagome VS go-nlp

DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work.
segment

4.3 0.0 kagome VS segment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
gounidecode

4.2 0.0 kagome VS gounidecode

Unicode transliterator for #golang
go-stem

3.9 0.0 kagome VS go-stem

Word Stemming in Go
textcat

3.8 0.0 kagome VS textcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text
MMSEGO

3.6 0.0 kagome VS MMSEGO

Chinese word splitting algorithm MMSEG in GO
address

3.3 6.5 kagome VS address

Address handling for Go.
go-localize

3.3 0.0 kagome VS go-localize

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
go2vec

3.2 0.0 kagome VS go2vec

Read and use word2vec vectors in Go
stemmer

3.1 0.0 kagome VS stemmer

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.
petrovich

2.9 3.8 kagome VS petrovich

Golang port of Petrovich - an inflector for Russian anthroponyms.
porter2

2.9 0.0 kagome VS porter2

High Performance Porter2 Stemmer
iuliia-go

2.8 1.8 kagome VS iuliia-go

Transliterate Cyrillic → Latin in every possible way
dpar

2.8 3.2 kagome VS dpar

Neural network transition-based dependency parser (in Rust)
govader

2.7 0.0 kagome VS govader

vader sentiment analysis in go
go-mystem

2.6 0.0 kagome VS go-mystem

CGo bindings to Yandex.Mystem
go-tinydate

2.5 0.0 kagome VS go-tinydate

A tiny date object in Go. Tinydate uses only 4 bytes of memory
spreak

2.4 6.4 kagome VS spreak

Flexible translation and humanization library for Go, based on the concepts behind gettext.
snowball

2.4 0.0 L1 kagome VS snowball

Cgo binding for Snowball C library
paicehusk

2.4 0.0 kagome VS paicehusk

Golang implementation of the Paice/Husk Stemming Algorithm
detectlanguage

2.0 0.0 kagome VS detectlanguage

Detect Language API Go Client
gotokenizer

2.0 0.0 kagome VS gotokenizer

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
golibstemmer

2.0 0.0 kagome VS golibstemmer

Go bindings for the snowball libstemmer library including porter 2
libtextcat

1.8 0.0 kagome VS libtextcat

Cgo binding for libtextcat C library
icu

1.8 0.0 kagome VS icu

Cgo binding for icu4c library
t

1.8 3.5 kagome VS t

t: translation util for go, using GNU gettext
shamoji

1.3 0.0 kagome VS shamoji

The shamoji (杓文字) is a word filtering package
porter

1.2 0.0 kagome VS porter

porter stemmer
gosentiwordnet

0.9 0.0 kagome VS gosentiwordnet

💬 Sentiment analyzer library using SentiWordnet in Go
govader-backend

0.5 2.6 kagome VS govader-backend

Sentimental Analysis Microservice
go-eco

0.5 0.0 kagome VS go-eco

Automatically exported from code.google.com/p/go-eco
spelling-corrector

0.3 0.0 kagome VS spelling-corrector

Spelling corrector for Spanish language

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of kagome or a related project?

Add another 'Natural Language Processing' Package

Popular Comparisons

README

Kagome v2

Kagome is an open source Japanese morphological analyzer written in pure golang. The dictionary/statistical models such as MeCab-IPADIC, UniDic (unidic-mecab) and so on, are able to be embedded in binaries.

Improvements from v1.

Dictionaries are maintained in a separate repository, and only the dictionaries you need are embedded in the binary.
Brushed up and added several APIs.

Dictionaries

dict	source	package
MeCab IPADIC	mecab-ipadic-2.7.0-20070801	github.com/ikawaha/kagome-dict/ipa
UniDIC	unidic-mecab-2.1.2_src	github.com/ikawaha/kagome-dict/uni

Experimental Features

dict	source	package
mecab-ipadic-NEologd	mecab-ipadic-neologd	github.com/ikawaha/kagome-ipa-neologd
Korean MeCab	mecab-ko-dic-2.1.1-20180720	github.com/ikawaha/kagome-dict-ko

Segmentation mode for search

Kagome has segmentation mode for search such as Kuromoji.

Normal: Regular segmentation
Search: Use a heuristic to do additional segmentation useful for search
Extended: Similar to search mode, but also uni-gram unknown words

Untokenized	Normal	Search	Extended
関西国際空港	関西国際空港	関西　国際　空港	関西　国際　空港
日本経済新聞	日本経済新聞	日本　経済　新聞	日本　経済　新聞
シニアソフトウェアエンジニア	シニアソフトウェアエンジニア	シニア　ソフトウェア　エンジニア	シニア　ソフトウェア　エンジニア
デジカメを買った	デジカメ　を　買っ　た	デジカメ　を　買っ　た	デ　ジ　カ　メ　を　買っ　た

Programming example

package main

import (
    "fmt"
    "strings"

    "github.com/ikawaha/kagome-dict/ipa"
    "github.com/ikawaha/kagome/v2/tokenizer"
)

func main() {
    t, err := tokenizer.New(ipa.Dict(), tokenizer.OmitBosEos())
    if err != nil {
        panic(err)
    }
    // wakati
    fmt.Println("---wakati---")
    seg := t.Wakati("すもももももももものうち")
    fmt.Println(seg)

    // tokenize
    fmt.Println("---tokenize---")
    tokens := t.Tokenize("すもももももももものうち")
    for _, token := range tokens {
        features := strings.Join(token.Features(), ",")
        fmt.Printf("%s\t%v\n", token.Surface, features)
    }
}

output:

---wakati---
[すもも も もも も もも の うち]
---tokenize---
すもも   名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも  名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも  名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち  名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ

Reference

Commands

Install

go install github.com/ikawaha/kagome/v2@latest

Homebrew tap

brew install ikawaha/kagome/kagome

Usage

$ kagome -h
Japanese Morphological Analyzer -- github.com/ikawaha/kagome/v2
usage: kagome <command>
The commands are:
   [tokenize] - command line tokenize (*default)
   server - run tokenize server
   lattice - lattice viewer
   sentence - tiny sentence splitter
   version - show version

tokenize [-file input_file] [-dict dic_file] [-userdict userdic_file] [-sysdict (ipa|uni)] [-simple false] [-mode (normal|search|extended)] [-split] [-json]
  -dict string
        dict
  -file string
        input file
  -json
        outputs in JSON format
  -mode string
        tokenize mode (normal|search|extended) (default "normal")
  -simple
        display abbreviated dictionary contents
  -split
        use tiny sentence splitter
  -sysdict string
        system dict type (ipa|uni) (default "ipa")
  -udict string
        user dict

Tokenize command

% # interactive mode
% kagome
すもももももももものうち
すもも   名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも  名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも  名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち  名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

% # piped standard input
echo "すもももももももものうち" | kagome
すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
の      助詞,連体化,*,*,*,*,の,ノ,ノ
うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

% # JSON output
% echo "猫" | kagome -json | jq .
[
  {
    "id": 286994,
    "start": 0,
    "end": 1,
    "surface": "猫",
    "class": "KNOWN",
    "pos": [
      "名詞",
      "一般",
      "*",
      "*"
    ],
    "base_form": "猫",
    "reading": "ネコ",
    "pronunciation": "ネコ",
    "features": [
      "名詞",
      "一般",
      "*",
      "*",
      "*",
      "*",
      "猫",
      "ネコ",
      "ネコ"
    ]
  }
]

echo "私ははにわよわわわんわん" | kagome -json | jq -r '.[].pronunciation'
ワタシ
ワ
ハニワ
ヨ
ワ
ワ
ワンワン

Server command

API

Start a server and try to access the "/tokenize" endpoint.

% kagome server &
% curl -XPUT localhost:6060/tokenize -d'{"sentence":"すもももももももものうち", "mode":"normal"}' | jq .

Web App

webapp

Start a server and access http://localhost:6060. (To draw a lattice, demo application uses graphviz . You need graphviz installed.)

% kagome server &

Lattice command

A debug tool of tokenize process outputs a lattice in graphviz dot format.

% kagome lattice 私は鰻 | dot -Tpng -o lattice.png

lattice

Docker

Building to WebAssembly

You can see how kagome wasm works in demo site. The source code can be found in ./sample/wasm.

Licence

MIT

kagome

Self-contained Japanese Morphological Analyzer written in pure Go