Description
This is a simple libary to extract text from plaintext, .docx, .odt, .pdf and .rtf files.
cat alternatives and similar packages
Based on the "Specific Formats" category.
Alternatively, view cat alternatives based on common mentions on social networks and blogs.
-
bluemonday
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS -
mxj
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages. -
html-to-markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. -
omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc. -
go-pkg-rss
This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs. -
goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library -
xquery
XQuery lets you extract data from HTML/XML documents using XPath expression. -
github_flavored_markdown
GitHub Flavored Markdown renderer with fenced code block highlighting, clickable header anchor links. -
go-pkg-xmlx
Extension to the standard Go XML package. Maintains a node tree that allows forward/backwards browsing and exposes some simple single/multi-node search functions. -
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler -
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins. -
codetree
:evergreen_tree: Parses indented code and returns a tree structure. -
jsoncolor
Colorized JSON output for Go https://godoc.org/github.com/nwidger/jsoncolor
Clean code begins in your IDE with SonarLint
Do you think we are missing an alternative of cat or a related project?
Popular Comparisons
README
cat
This is a simple libary to extract text from plaintext, .docx, .odt, .pdf and .rtf files.
Install
go get -u github.com/lu4p/cat
Basic Usage
package main
import (
"fmt"
"github.com/lu4p/cat"
)
func main(){
txt, _ := cat.File("filename")
fmt.Println(txt)
}
*Note that all licence references and agreements mentioned in the cat README section above
are relevant to that project's source code only.