Popularity

8.7

Stable

Activity

1.9

Declining

Stars 2,924

Watchers 59

Forks 142

Last Commit almost 2 years ago

Programming language: Go

License: MIT License

Tags: Natural Language Processing

Latest version: v2.0.0

prose alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view prose alternatives based on common mentions on social networks and blogs.

go-i18n

8.5 7.1 prose VS go-i18n

Translate your Go program into multiple languages.
gojieba

8.4 1.9 prose VS gojieba

"结巴"中文分词的Golang版本

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

gse

8.4 4.4 prose VS gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
go-pinyin

7.9 4.5 prose VS go-pinyin

汉字转拼音
spaGO

7.9 0.0 prose VS spaGO

DISCONTINUED. Self-contained Machine Learning and Natural Language Processing library in Go
when

7.6 5.1 prose VS when

A natural language date/time parser with pluggable rules
kagome

7.0 6.4 prose VS kagome

Self-contained Japanese Morphological Analyzer written in pure Go
whatlanggo

6.8 0.0 prose VS whatlanggo

Natural language detection library for Go
nlp

6.3 0.0 prose VS nlp

DISCONTINUED. [UNMANTEINED] Extract values from strings and fill your structs with nlp.
sentences

6.2 4.5 prose VS sentences

A multilingual command line sentence tokenizer in Golang
universal-translator

6.1 0.0 prose VS universal-translator

:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules
locales

5.9 0.0 prose VS locales

:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator
getlang

4.9 0.0 prose VS getlang

Natural language detection package in pure Go
go-unidecode

4.5 3.1 prose VS go-unidecode

ASCII transliterations of Unicode text.
RAKE.go

4.5 0.0 prose VS RAKE.go

A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)
go-nlp

4.3 0.0 prose VS go-nlp

DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work.
segment

4.3 0.0 prose VS segment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
gounidecode

4.2 0.0 prose VS gounidecode

Unicode transliterator for #golang
go-stem

3.9 0.0 prose VS go-stem

Word Stemming in Go
textcat

3.8 0.0 prose VS textcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text
MMSEGO

3.6 0.0 prose VS MMSEGO

Chinese word splitting algorithm MMSEG in GO
go-localize

3.3 0.0 prose VS go-localize

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
address

3.3 6.5 prose VS address

Address handling for Go.
go2vec

3.2 0.0 prose VS go2vec

Read and use word2vec vectors in Go
stemmer

3.1 0.0 prose VS stemmer

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.
petrovich

2.9 3.8 prose VS petrovich

Golang port of Petrovich - an inflector for Russian anthroponyms.
porter2

2.9 0.0 prose VS porter2

High Performance Porter2 Stemmer
iuliia-go

2.8 1.8 prose VS iuliia-go

Transliterate Cyrillic → Latin in every possible way
dpar

2.8 3.2 prose VS dpar

Neural network transition-based dependency parser (in Rust)
govader

2.7 0.0 prose VS govader

vader sentiment analysis in go
go-mystem

2.6 0.0 prose VS go-mystem

CGo bindings to Yandex.Mystem
go-tinydate

2.5 0.0 prose VS go-tinydate

A tiny date object in Go. Tinydate uses only 4 bytes of memory
spreak

2.4 6.4 prose VS spreak

Flexible translation and humanization library for Go, based on the concepts behind gettext.
paicehusk

2.4 0.0 prose VS paicehusk

Golang implementation of the Paice/Husk Stemming Algorithm
snowball

2.4 0.0 L1 prose VS snowball

Cgo binding for Snowball C library
gotokenizer

2.0 0.0 prose VS gotokenizer

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
golibstemmer

2.0 0.0 prose VS golibstemmer

Go bindings for the snowball libstemmer library including porter 2
detectlanguage

2.0 0.0 prose VS detectlanguage

Detect Language API Go Client
icu

1.8 0.0 prose VS icu

Cgo binding for icu4c library
libtextcat

1.8 0.0 prose VS libtextcat

Cgo binding for libtextcat C library
t

1.8 3.5 prose VS t

t: translation util for go, using GNU gettext
shamoji

1.3 0.0 prose VS shamoji

The shamoji (杓文字) is a word filtering package
porter

1.2 0.0 prose VS porter

porter stemmer
gosentiwordnet

0.9 0.0 prose VS gosentiwordnet

💬 Sentiment analyzer library using SentiWordnet in Go
go-eco

0.5 0.0 prose VS go-eco

Automatically exported from code.google.com/p/go-eco
govader-backend

0.5 2.6 prose VS govader-backend

Sentimental Analysis Microservice
spelling-corrector

0.3 0.0 prose VS spelling-corrector

Spelling corrector for Spanish language

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of prose or a related project?

Add another 'Natural Language Processing' Package

Popular Comparisons

README

prose

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

You can find a more detailed summary on the library's performance here: Introducing prose v2.0.0: Bringing NLP to Go.

Installation

$ go get github.com/jdkato/prose/v2

Overview

package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("Go is an open-source programming language created at Google.")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag, tok.Label)
        // Go NNP B-GPE
        // is VBZ O
        // an DT O
        // ...
    }

    // Iterate over the doc's named-entities:
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Go GPE
        // Google GPE
    }

    // Iterate over the doc's sentences:
    for _, sent := range doc.Sentences() {
        fmt.Println(sent.Text)
        // Go is an open-source programming language created at Google.
    }
}

The document-creation process adheres to the following sequence of steps:

tokenization -> POS tagging -> NE extraction
            \
             segmentation

Each step may be disabled (assuming later steps aren't required) by passing the appropriate functional option. To disable named-entity extraction, for example, you'd do the following:

doc, err := prose.NewDocument(
        "Go is an open-source programming language created at Google.",
        prose.WithExtraction(false))

Tokenizing

prose includes a tokenizer capable of processing modern text, including the non-word character spans shown below.

Type	Example
Email addresses	`[email protected]`
Hashtags	`#trending`
Mentions	`@jdkato`
URLs	`https://github.com/jdkato/prose`
Emoticons	`:-)`, `>:(`, `o_0`, etc.

package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("@jdkato, go to http://example.com thanks :).")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag)
        // @jdkato NN
        // , ,
        // go VB
        // to TO
        // http://example.com NN
        // thanks NNS
        // :) SYM
        // . .
    }
}

Segmenting

prose includes one of the most accurate sentence segmenters available, according to the Golden Rules created by the developers of the pragmatic_segmenter.

Name	Language	License	GRS (English)	GRS (Other)	Speed†
Pragmatic Segmenter	Ruby	MIT	98.08% (51/52)	100.00%	3.84 s
prose	Go	MIT	75.00% (39/52)	N/A	0.96 s
TactfulTokenizer	Ruby	GNU GPLv3	65.38% (34/52)	48.57%	46.32 s
OpenNLP	Java	APLv2	59.62% (31/52)	45.71%	1.27 s
Standford CoreNLP	Java	GNU GPLv3	59.62% (31/52)	31.43%	0.92 s
Splitta	Python	APLv2	55.77% (29/52)	37.14%	N/A
Punkt	Python	APLv2	46.15% (24/52)	48.57%	1.79 s
SRX English	Ruby	GNU GPLv3	30.77% (16/52)	28.57%	6.19 s
Scapel	Ruby	GNU GPLv3	28.85% (15/52)	20.00%	0.13 s

† The original tests were performed using a MacBook Pro 3.7 GHz Quad-Core Intel Xeon E5 running 10.9.5, while prose was timed using a MacBook Pro 2.9 GHz Intel Core i7 running 10.13.3.

package main

import (
    "fmt"
    "strings"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, _ := prose.NewDocument(strings.Join([]string{
        "I can see Mt. Fuji from here.",
        "St. Michael's Church is on 5th st. near the light."}, " "))

    // Iterate over the doc's sentences:
    sents := doc.Sentences()
    fmt.Println(len(sents)) // 2
    for _, sent := range sents {
        fmt.Println(sent.Text)
        // I can see Mt. Fuji from here.
        // St. Michael's Church is on 5th st. near the light.
    }
}

Tagging

prose includes a tagger based on Textblob's "fast and accurate" POS tagger. Below is a comparison of its performance against NLTK's implementation of the same tagger on the Treebank corpus:

Library	Accuracy	5-Run Average (sec)
NLTK	0.893	7.224
`prose`	0.961	2.538

(See scripts/test_model.py for more information.)

The full list of supported POS tags is given below.

TAG	DESCRIPTION
`(`	left round bracket
`)`	right round bracket
`,`	comma
`:`	colon
`.`	period
`''`	closing quotation mark
``	opening quotation mark
`#`	number sign
`$`	currency
`CC`	conjunction, coordinating
`CD`	cardinal number
`DT`	determiner
`EX`	existential there
`FW`	foreign word
`IN`	conjunction, subordinating or preposition
`JJ`	adjective
`JJR`	adjective, comparative
`JJS`	adjective, superlative
`LS`	list item marker
`MD`	verb, modal auxiliary
`NN`	noun, singular or mass
`NNP`	noun, proper singular
`NNPS`	noun, proper plural
`NNS`	noun, plural
`PDT`	predeterminer
`POS`	possessive ending
`PRP`	pronoun, personal
`PRP$`	pronoun, possessive
`RB`	adverb
`RBR`	adverb, comparative
`RBS`	adverb, superlative
`RP`	adverb, particle
`SYM`	symbol
`TO`	infinitival to
`UH`	interjection
`VB`	verb, base form
`VBD`	verb, past tense
`VBG`	verb, gerund or present participle
`VBN`	verb, past participle
`VBP`	verb, non-3rd person singular present
`VBZ`	verb, 3rd person singular present
`WDT`	wh-determiner
`WP`	wh-pronoun, personal
`WP$`	wh-pronoun, possessive
`WRB`	wh-adverb

NER

prose v2.0.0 includes a much improved version of v1.0.0's chunk package, which can identify people (PERSON) and geographical/political Entities (GPE) by default.

package main

import (
    "github.com/jdkato/prose/v2"
)

func main() {
    doc, _ := prose.NewDocument("Lebron James plays basketbal in Los Angeles.")
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Lebron James PERSON
        // Los Angeles GPE
    }
}

However, in an attempt to make this feature more useful, we've made it straightforward to train your own models for specific use cases. See Prodigy + prose: Radically efficient machine teaching in Go for a tutorial.

*Note that all licence references and agreements mentioned in the prose README section above are relevant to that project's source code only.

prose

:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose alternatives and similar packages

Popular Comparisons

README

prose

Installation

Usage

Contents

Overview

Tokenizing

Segmenting

Tagging

NER