prose alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view prose alternatives based on common mentions on social networks and blogs.
-
gse
Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. -
universal-translator
:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules -
locales
:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
go-nlp
DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
go-localize
i18n (Internationalization and localization) engine written in Go, used for translating locale strings. -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
CodeRabbit: AI Code Reviews for Developers

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of prose or a related project?
Popular Comparisons
README
prose

prose
is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.
You can find a more detailed summary on the library's performance here: Introducing prose
v2.0.0: Bringing NLP to Go.
Installation
$ go get github.com/jdkato/prose/v2
Usage
Contents
Overview
package main
import (
"fmt"
"log"
"github.com/jdkato/prose/v2"
)
func main() {
// Create a new document with the default configuration:
doc, err := prose.NewDocument("Go is an open-source programming language created at Google.")
if err != nil {
log.Fatal(err)
}
// Iterate over the doc's tokens:
for _, tok := range doc.Tokens() {
fmt.Println(tok.Text, tok.Tag, tok.Label)
// Go NNP B-GPE
// is VBZ O
// an DT O
// ...
}
// Iterate over the doc's named-entities:
for _, ent := range doc.Entities() {
fmt.Println(ent.Text, ent.Label)
// Go GPE
// Google GPE
}
// Iterate over the doc's sentences:
for _, sent := range doc.Sentences() {
fmt.Println(sent.Text)
// Go is an open-source programming language created at Google.
}
}
The document-creation process adheres to the following sequence of steps:
tokenization -> POS tagging -> NE extraction
\
segmentation
Each step may be disabled (assuming later steps aren't required) by passing the appropriate functional option. To disable named-entity extraction, for example, you'd do the following:
doc, err := prose.NewDocument(
"Go is an open-source programming language created at Google.",
prose.WithExtraction(false))
Tokenizing
prose
includes a tokenizer capable of processing modern text, including the non-word character spans shown below.
Type | Example |
---|---|
Email addresses | [email protected] |
Hashtags | #trending |
Mentions | @jdkato |
URLs | https://github.com/jdkato/prose |
Emoticons | :-) , >:( , o_0 , etc. |
package main
import (
"fmt"
"log"
"github.com/jdkato/prose/v2"
)
func main() {
// Create a new document with the default configuration:
doc, err := prose.NewDocument("@jdkato, go to http://example.com thanks :).")
if err != nil {
log.Fatal(err)
}
// Iterate over the doc's tokens:
for _, tok := range doc.Tokens() {
fmt.Println(tok.Text, tok.Tag)
// @jdkato NN
// , ,
// go VB
// to TO
// http://example.com NN
// thanks NNS
// :) SYM
// . .
}
}
Segmenting
prose
includes one of the most accurate sentence segmenters available, according to the Golden Rules created by the developers of the pragmatic_segmenter
.
Name | Language | License | GRS (English) | GRS (Other) | Speed† |
---|---|---|---|---|---|
Pragmatic Segmenter | Ruby | MIT | 98.08% (51/52) | 100.00% | 3.84 s |
prose | Go | MIT | 75.00% (39/52) | N/A | 0.96 s |
TactfulTokenizer | Ruby | GNU GPLv3 | 65.38% (34/52) | 48.57% | 46.32 s |
OpenNLP | Java | APLv2 | 59.62% (31/52) | 45.71% | 1.27 s |
Standford CoreNLP | Java | GNU GPLv3 | 59.62% (31/52) | 31.43% | 0.92 s |
Splitta | Python | APLv2 | 55.77% (29/52) | 37.14% | N/A |
Punkt | Python | APLv2 | 46.15% (24/52) | 48.57% | 1.79 s |
SRX English | Ruby | GNU GPLv3 | 30.77% (16/52) | 28.57% | 6.19 s |
Scapel | Ruby | GNU GPLv3 | 28.85% (15/52) | 20.00% | 0.13 s |
† The original tests were performed using a MacBook Pro 3.7 GHz Quad-Core Intel Xeon E5 running 10.9.5, while
prose
was timed using a MacBook Pro 2.9 GHz Intel Core i7 running 10.13.3.
package main
import (
"fmt"
"strings"
"github.com/jdkato/prose/v2"
)
func main() {
// Create a new document with the default configuration:
doc, _ := prose.NewDocument(strings.Join([]string{
"I can see Mt. Fuji from here.",
"St. Michael's Church is on 5th st. near the light."}, " "))
// Iterate over the doc's sentences:
sents := doc.Sentences()
fmt.Println(len(sents)) // 2
for _, sent := range sents {
fmt.Println(sent.Text)
// I can see Mt. Fuji from here.
// St. Michael's Church is on 5th st. near the light.
}
}
Tagging
prose
includes a tagger based on Textblob's "fast and accurate" POS tagger. Below is a comparison of its performance against NLTK's implementation of the same tagger on the Treebank corpus:
Library | Accuracy | 5-Run Average (sec) |
---|---|---|
NLTK | 0.893 | 7.224 |
prose |
0.961 | 2.538 |
(See scripts/test_model.py
for more information.)
The full list of supported POS tags is given below.
TAG | DESCRIPTION |
---|---|
( |
left round bracket |
) |
right round bracket |
, |
comma |
: |
colon |
. |
period |
'' |
closing quotation mark |
`` |
opening quotation mark |
# |
number sign |
$ |
currency |
CC |
conjunction, coordinating |
CD |
cardinal number |
DT |
determiner |
EX |
existential there |
FW |
foreign word |
IN |
conjunction, subordinating or preposition |
JJ |
adjective |
JJR |
adjective, comparative |
JJS |
adjective, superlative |
LS |
list item marker |
MD |
verb, modal auxiliary |
NN |
noun, singular or mass |
NNP |
noun, proper singular |
NNPS |
noun, proper plural |
NNS |
noun, plural |
PDT |
predeterminer |
POS |
possessive ending |
PRP |
pronoun, personal |
PRP$ |
pronoun, possessive |
RB |
adverb |
RBR |
adverb, comparative |
RBS |
adverb, superlative |
RP |
adverb, particle |
SYM |
symbol |
TO |
infinitival to |
UH |
interjection |
VB |
verb, base form |
VBD |
verb, past tense |
VBG |
verb, gerund or present participle |
VBN |
verb, past participle |
VBP |
verb, non-3rd person singular present |
VBZ |
verb, 3rd person singular present |
WDT |
wh-determiner |
WP |
wh-pronoun, personal |
WP$ |
wh-pronoun, possessive |
WRB |
wh-adverb |
NER
prose
v2.0.0 includes a much improved version of v1.0.0's chunk package, which can identify people (PERSON
) and geographical/political Entities (GPE
) by default.
package main
import (
"github.com/jdkato/prose/v2"
)
func main() {
doc, _ := prose.NewDocument("Lebron James plays basketbal in Los Angeles.")
for _, ent := range doc.Entities() {
fmt.Println(ent.Text, ent.Label)
// Lebron James PERSON
// Los Angeles GPE
}
}
However, in an attempt to make this feature more useful, we've made it straightforward to train your own models for specific use cases. See Prodigy + prose
: Radically efficient machine teaching in Go for a tutorial.
*Note that all licence references and agreements mentioned in the prose README section above
are relevant to that project's source code only.