Popularity

6.2

Stable

Activity

4.5

Stars 419

Watchers 15

Forks 38

Last Commit about 2 months ago

Programming language: Go

License: MIT License

Tags: Natural Language Processing

Latest version: v1.0.6

sentences alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view sentences alternatives based on common mentions on social networks and blogs.

prose

8.7 1.9 sentences VS prose

DISCONTINUED. :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
go-i18n

8.5 7.1 sentences VS go-i18n

Translate your Go program into multiple languages.

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

gojieba

8.4 1.9 sentences VS gojieba

"结巴"中文分词的Golang版本
gse

8.4 4.4 sentences VS gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
go-pinyin

7.9 4.5 sentences VS go-pinyin

汉字转拼音
spaGO

7.9 0.0 sentences VS spaGO

DISCONTINUED. Self-contained Machine Learning and Natural Language Processing library in Go
when

7.6 5.1 sentences VS when

A natural language date/time parser with pluggable rules
kagome

7.0 6.4 sentences VS kagome

Self-contained Japanese Morphological Analyzer written in pure Go
whatlanggo

6.8 0.0 sentences VS whatlanggo

Natural language detection library for Go
nlp

6.3 0.0 sentences VS nlp

DISCONTINUED. [UNMANTEINED] Extract values from strings and fill your structs with nlp.
universal-translator

6.1 0.0 sentences VS universal-translator

:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules
locales

5.9 0.0 sentences VS locales

:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator
getlang

4.9 0.0 sentences VS getlang

Natural language detection package in pure Go
RAKE.go

4.5 0.0 sentences VS RAKE.go

A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)
go-unidecode

4.5 3.1 sentences VS go-unidecode

ASCII transliterations of Unicode text.
go-nlp

4.3 0.0 sentences VS go-nlp

DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work.
segment

4.3 0.0 sentences VS segment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
gounidecode

4.2 0.0 sentences VS gounidecode

Unicode transliterator for #golang
go-stem

3.9 0.0 sentences VS go-stem

Word Stemming in Go
textcat

3.8 0.0 sentences VS textcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text
MMSEGO

3.6 0.0 sentences VS MMSEGO

Chinese word splitting algorithm MMSEG in GO
go-localize

3.3 0.0 sentences VS go-localize

i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
address

3.3 6.5 sentences VS address

Address handling for Go.
go2vec

3.2 0.0 sentences VS go2vec

Read and use word2vec vectors in Go
stemmer

3.1 0.0 sentences VS stemmer

Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.
petrovich

2.9 3.8 sentences VS petrovich

Golang port of Petrovich - an inflector for Russian anthroponyms.
porter2

2.9 0.0 sentences VS porter2

High Performance Porter2 Stemmer
iuliia-go

2.8 1.8 sentences VS iuliia-go

Transliterate Cyrillic → Latin in every possible way
dpar

2.8 3.2 sentences VS dpar

Neural network transition-based dependency parser (in Rust)
govader

2.7 0.0 sentences VS govader

vader sentiment analysis in go
go-mystem

2.6 0.0 sentences VS go-mystem

CGo bindings to Yandex.Mystem
go-tinydate

2.5 0.0 sentences VS go-tinydate

A tiny date object in Go. Tinydate uses only 4 bytes of memory
spreak

2.4 6.4 sentences VS spreak

Flexible translation and humanization library for Go, based on the concepts behind gettext.
paicehusk

2.4 0.0 sentences VS paicehusk

Golang implementation of the Paice/Husk Stemming Algorithm
snowball

2.4 0.0 L1 sentences VS snowball

Cgo binding for Snowball C library
gotokenizer

2.0 0.0 sentences VS gotokenizer

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
golibstemmer

2.0 0.0 sentences VS golibstemmer

Go bindings for the snowball libstemmer library including porter 2
detectlanguage

2.0 0.0 sentences VS detectlanguage

Detect Language API Go Client
icu

1.8 0.0 sentences VS icu

Cgo binding for icu4c library
libtextcat

1.8 0.0 sentences VS libtextcat

Cgo binding for libtextcat C library
t

1.8 3.5 sentences VS t

t: translation util for go, using GNU gettext
shamoji

1.3 0.0 sentences VS shamoji

The shamoji (杓文字) is a word filtering package
porter

1.2 0.0 sentences VS porter

porter stemmer
gosentiwordnet

0.9 0.0 sentences VS gosentiwordnet

💬 Sentiment analyzer library using SentiWordnet in Go
go-eco

0.5 0.0 sentences VS go-eco

Automatically exported from code.google.com/p/go-eco
govader-backend

0.5 2.6 sentences VS govader-backend

Sentimental Analysis Microservice
spelling-corrector

0.3 0.0 sentences VS spelling-corrector

Spelling corrector for Spanish language

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of sentences or a related project?

Add another 'Natural Language Processing' Package

Popular Comparisons

README

MIT

Sentences - A command line sentence tokenizer

This command line utility will convert a blob of text into a list of sentences.

Demo
Docs

Install

go get gopkg.in/neurosnap/sentences.v1
go install gopkg.in/neurosnap/sentences.v1/_cmd/sentences

Binaries

Linux

[Linux 386](https:///storage.cloud.google.com/go-sentences/sentences_linux-386.tar.gz)
Linux AMD64

Mac

Windows

Command

[Command line](sentences.gif?raw=true)

Get it

go get gopkg.in/neurosnap/sentences.v1

Use it

import (
    "fmt"

    "gopkg.in/neurosnap/sentences.v1"
    "gopkg.in/neurosnap/sentences.v1/data"
)

func main() {
    text := `A perennial also-ran, Stallings won his seat when longtime lawmaker David Holmes
    died 11 days after the filing deadline. Suddenly, Stallings was a shoo-in, not
    the long shot. In short order, the Legislature attempted to pass a law allowing
    former U.S. Rep. Carolyn Cheeks Kilpatrick to file; Stallings challenged the
    law in court and won. Kilpatrick mounted a write-in campaign, but Stallings won.`

    // Compiling language specific data into a binary file can be accomplished
    // by using `make <lang>` and then loading the `json` data:
    b, _ := data.Asset("data/english.json");

    // load the training data
    training, _ := sentences.LoadTraining(b)

    // create the default sentence tokenizer
    tokenizer := sentences.NewSentenceTokenizer(training)
    sentences := tokenizer.Tokenize(text)

    for _, s := range sentences {
        fmt.Println(s.Text)
    }
}

English

This package attempts to fix some problems I noticed for english.

import (
    "fmt"

    "gopkg.in/neurosnap/sentences.v1/english"
)

func main() {
    text := "Hi there. Does this really work?"

    tokenizer, err := english.NewSentenceTokenizer(nil)
    if err != nil {
        panic(err)
    }

    sentences := tokenizer.Tokenize(text)
    for _, s := range sentences {
        fmt.Println(s.Text)
    }
}

Contributing

I need help maintaining this library. If you are interested in contributing to this library then please start by looking at the golder-rules branch which tests the Golden Rules for english sentence tokenization created by the Pragmatic Segmenter library.

Create an issue for a particular failing test and submit an issue/PR.

I'm happy to help anyone willing to contribute.

Customizable

Sentences was built around composability, most major components of this package can be extended.

Eager to make adhoc changes but don't know how to start? Have a look at github.com/neurosnap/sentences/english for a solid example.

Notice

I have not tested this tokenizer in any other language besides English. By default the command line utility loads english. I welcome anyone willing to test the other languages to submit updates as needed.

A primary goal for this package is to be multilingual so I'm willing to help in any way possible.

This library is a port of the nltk's punkt tokenizer.

A Punkt Tokenizer

An unsupervised multilingual sentence boundary detection library for golang. The way the punkt system accomplishes this goal is through training the tokenizer with text in that given language. Once the likelyhoods of abbreviations, collocations, and sentence starters are determined, finding sentence boundaries becomes easier.

There are many problems that arise when tokenizing text into sentences, the primary issue being abbreviations. The punkt system attempts to determine whether a word is an abbrevation, an end to a sentence, or even both through training the system with text in the given language. The punkt system incorporates both token- and type-based analysis on the text through two different phases of annotation.

Unsupervised multilingual sentence boundary detection

Performance

Using Brown Corpus which is annotated American English text, we compare this package with other libraries across multiple programming languages.

Library	Avg Speed (s, 10 runs)	Accuracy (%)
Sentences	1.96	98.95
NLTK	5.22	99.21

sentences

A multilingual command line sentence tokenizer in Golang