icu alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view icu alternatives based on common mentions on social networks and blogs.
-
prose
DISCONTINUED. :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction. -
gse
Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. -
universal-translator
:speech_balloon: i18n Translator for Go/Golang using CLDR data + pluralization rules -
locales
:earth_americas: a set of locales generated from the CLDR Project which can be used independently or within an i18n package; these were built for use with, but not exclusive to https://github.com/go-playground/universal-translator -
go-nlp
DISCONTINUED. Utilities for working with discrete probability distributions and other tools useful for doing NLP work. -
segment
A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 -
go-localize
i18n (Internationalization and localization) engine written in Go, used for translating locale strings. -
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
InfluxDB - Purpose built for real-time analytics at any scale.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of icu or a related project?
Popular Comparisons
README
About
Cgo binding for icu4c C library detection and conversion functions. Guaranteed compatibility with version 50.1.
Installation
Installation consists of several simple steps. They may be a bit different on your target system (e.g. require more permissions) so adapt them to the parameters of your system.
Install build-essential
Make sure you have build-essential installed. Otherwise icu would fail on the configuration stage.
Installation example using apt-get (Ubuntu):
sudo apt-get install build-essential
Install pkg-config
Make sure you have pkg-config installed.
Installation example using apt-get (Ubuntu):
sudo apt-get install pkg-config
Get icu4c C library code
Download and unarchive original icu4c archive from icu download section.
Example (for version 50.1):
wget http://download.icu-project.org/files/icu4c/50.1/icu4c-50_1-src.tgz
tar -zxvf icu4c-50_1-src.tgz
mv -i ./icu ~/where-you-store-libs
NOTE: If this link is not working or there are some problems with downloading, there is a stable version 50.1 snapshot saved in Github Downloads.
Build and install icu4c C library
From the directory, where you unarchived icu4c, run:
cd source
./configure
make
sudo make install
sudo ldconfig
Install Go wrapper
go get github.com/goodsign/icu
go test github.com/goodsign/icu (must PASS)
Installation notes
Make sure that you have your local library paths set correctly and that installation was successful. Otherwise, go build or go test may fail.
icu4c is installed in your local library directory (e.g. /usr/local/lib) and puts its libraries there. This path should be registered in your system (using ldconfig or exporting LD_LIBRARY_PATH, etc.) or the linker would fail.
icu4c installs its header files to local include folders (e.g. /usr/local/include/unicode) so there is no need to have additional .h files with this package, but the system must be properly set up to detect .h files in those directories.
Usage
Note: check icu documentation for returned encoding identifiers.
Detector
// Create detector
detector, err := NewCharsetDetector()
if err != nil {
//... Handle error ...
}
defer detector.Close()
// Guess encoding
encMatches, err := detector.GuessCharset(encodedText)
if err != nil {
//... Handle error ...
}
// Get charset with max confidence (goes first)
maxenc := encMatches[0].Charset
// Use maxenc.
// ...
Converter
...
// Create converter
converter := NewCharsetConverter(DefaultMaxTextSize)
// Convert to utf-8
converted, err := converter.ConvertToUtf8(encodedText, maxenc)
if nil != err {
//... Handle error ...
}
Usage notes
- Check NewCharsetConverter func comments for details on max text size parameter.
- Often you would use detector and converter in pair. So, the 'converter' usage example actually continues the 'detector' example and uses the 'maxenc' result from it.
More info
For more information on icu refer to the original website, which contains links on theory and other details.
icu4c Licence
ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and with other open source or free software.
Licence
The goodsign/icu binding is released under the BSD Licence
*Note that all licence references and agreements mentioned in the icu README section above
are relevant to that project's source code only.