Description
Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra! Designed to be fully compatible with Unicode characters! This library is 100% test covered 😁
go-edlib alternatives and similar packages
Based on the "Text Processing" category.
Alternatively, view go-edlib alternatives based on common mentions on social networks and blogs.
-
goldmark
:trophy: A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured. -
bluemonday
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS -
html-to-markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. -
omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc. -
mxj
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages. -
go-pkg-rss
DISCONTINUED. This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs. -
goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library -
github_flavored_markdown
GitHub Flavored Markdown renderer with fenced code block highlighting, clickable header anchor links. -
go-pkg-xmlx
DISCONTINUED. Extension to the standard Go XML package. Maintains a node tree that allows forward/backwards browsing and exposes some simple single/multi-node search functions. -
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
SaaSHub - Software Alternatives and Reviews
Do you think we are missing an alternative of go-edlib or a related project?
Popular Comparisons
README
Go-edlib : Edit distance and string comparison library
Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
Table of Contents
- Requirements
- Introduction
- Features
- Installation
- Benchmarks
- Documentation
- Examples
- Author
- [Contributing](#-contributing)
- [License](#-license)
Requirements
- Go (v1.13+)
Introduction
Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra! Designed to be fully compatible with Unicode characters! This library is 100% test covered 😁
Features
- Levenshtein ✨
- LCS (Longest common subsequence) with edit distance, backtrack and diff functions ✨
- Hamming ✨
- Damerau-Levenshtein, with following variants :
- OSA (Optimal string alignment) ✨
- Adjacent transpositions ✨
- Jaro & Jaro-Winkler similarity algorithms ✨
- Cosine Similarity algorithm to compare strings ✨
Computed similarity percentage functions based on all available edit distance algorithms in this lib ✨
Fuzzy search functions based on edit distance with unique or multiples strings output ✨
Unicode compatibility ! 🥳
And many more to come !
Benchmarks
You can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through StringsSimilarity function here
However, if you want or need more details, you can also viewing benchmark raw output here, which also includes memory allocations and test cases output (similarity result and errors).
If you are on Linux and want to run them on your setup, you can run ./tests/benchmark.sh
script.
Installation
Open bash into your project folder and run:
go get github.com/hbollon/go-edlib
And import it into your project:
import (
"github.com/hbollon/go-edlib"
)
Run tests
If you are on Linux and want to run all unit tests just run ./tests/tests.sh
script.
For Windows users you can run:
go test ./... # Add desired parameters to this command if you want
Documentation
You can find all the documentation here : Documentation
Examples
Calculate string similarity index between two string
You can use StringSimilarity(str1, str2, algorithm)
function.
algorithm parameter must one of the following constants:
// Algorithm identifiers
const (
Levenshtein Algorithm = iota
DamerauLevenshtein
OSADamerauLevenshtein
Lcs
Hamming
Jaro
JaroWinkler
Cosine
)
Example with levenshtein:
res, err := edlib.StringsSimilarity("string1", "string2", edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Similarity: %f", res)
}
Execute fuzzy search based on string similarity algorithm
1. Most matching unique result without threshold
You can use FuzzySearch(str, strList, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearch("testnig", strList, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result: %s", res)
}
Result: testing
2. Most matching unique result with threshold
You can use FuzzySearchThreshold(str, strList, minSimilarity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchThreshold("testnig", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig': %s", res)
}
res, err = edlib.FuzzySearchThreshold("hello", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'hello': %s", res)
}
Result for 'testnig': testing
Result for 'hello':
3. Most matching result set without threshold
You can use FuzzySearchSet(str, strList, resultQuantity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSet("testnig", strList, 3, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Results: %s", strings.Join(res, ", "))
}
Results: testing, test, tester
4. Most matching result set with threshold
You can use FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.5, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.5' threshold: %s", strings.Join(res, " "))
}
res, err = edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.7' threshold: %s", strings.Join(res, " "))
}
Result for 'testnig' with '0.5' threshold: testing test tester
Result for 'testnig' with '0.7' threshold: testing
Get raw edit distance (Levenshtein, LCS, Damerau–Levenshtein, Hamming)
You can use one of the following function to get an edit distance between two strings :
- LevenshteinDistance(str1, str2)
- DamerauLevenshteinDistance(str1, str2)
- OSADamerauLevenshteinDistance(str1, str2)
- LCSEditDistance(str1, str2)
- HammingDistance(str1, str2)
Example with Levenshtein distance:
res := edlib.LevenshteinDistance("kitten", "sitting")
fmt.Printf("Result: %d", res)
Result: 3
LCS, LCS Backtrack and LCS Diff
1. Compute LCS(Longuest Common Subsequence) between two strings
You can use LCS(str1, str2)
function.
lcs := edlib.LCS("ABCD", "ACBAD")
fmt.Printf("Length of their LCS: %d", lcs)
Length of their LCS: 3
2. Backtrack their LCS
You can use LCSBacktrack(str1, str2)
function.
res, err := edlib.LCSBacktrack("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", res)
}
LCS: ABD
3. Backtrack all their LCS
You can use LCSBacktrackAll(str1, str2)
function.
res, err := edlib.LCSBacktrackAll("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", strings.Join(res, ", "))
}
LCS: ABD, ACD
4. Get LCS Diff between two strings
You can use LCSDiff(str1, str2)
function.
res, err := edlib.LCSDiff("computer", "houseboat")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: \n%s\n%s", res[0], res[1])
}
LCS Diff:
h c o m p u s e b o a t e r
+ - - - + + + + + - -
Author
👤 Hugo Bollon
- Github: @hbollon
- LinkedIn: @Hugo Bollon
- Portfolio: hugobollon.me
🤝 Contributing
Contributions, issues and feature requests are welcome!Feel free to check issues page.
Show your support
Give a ⭐️ if this project helped you!
📝 License
Copyright © 2020 Hugo Bollon. This project is MIT License licensed.
*Note that all licence references and agreements mentioned in the go-edlib README section above
are relevant to that project's source code only.