Description
Package csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream processing operations, indices and joins.
The library is primarily designed for ETL-like processes. It is mostly useful in places where the more advanced searching/joining capabilities of a fully-featured SQL
database are not required, but the same time the data transformations needed still include SQL-like operations.
csvplus alternatives and similar packages
Based on the "Specific Formats" category.
Alternatively, view csvplus alternatives based on common mentions on social networks and blogs.
-
sh
A shell parser, formatter, and interpreter with bash support; includes shfmt -
go-humanize
Go Humans! (formatters for units to human friendly sizes) -
bluemonday
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS -
mxj
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages. -
omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc. -
html-to-markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. -
go-pkg-rss
This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs. -
goribot
A simple golang spider/scraping framework,build a spider in 3 lines. -
goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library -
xquery
XQuery lets you extract data from HTML/XML documents using XPath expression. -
github_flavored_markdown
GitHub Flavored Markdown renderer with fenced code block highlighting, clickable header anchor links. -
go-pkg-xmlx
Extension to the standard Go XML package. Maintains a node tree that allows forward/backwards browsing and exposes some simple single/multi-node search functions. -
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler -
gonameparts
Takes a full name and splits it into individual name parts -
go-wildcard
Fast and light wildcard pattern matching. Fork from Minio project. -
codetree
:evergreen_tree: Parses indented code and returns a tree structure. -
jsoncolor
Colorized JSON output for Go https://godoc.org/github.com/nwidger/jsoncolor
Static code analysis for 29 languages.
Do you think we are missing an alternative of csvplus or a related project?
Popular Comparisons
README
csvplus
Package csvplus
extends the standard Go encoding/csv
package with fluent interface, lazy stream processing operations, indices and joins.
The library is primarily designed for ETL-like processes. It is mostly useful in places where the more advanced searching/joining capabilities of a fully-featured SQL database are not required, but the same time the data transformations needed still include SQL-like operations.
License: BSD
Examples
Simple sequential processing:
people := csvplus.FromFile("people.csv").SelectColumns("name", "surname", "id")
err := csvplus.Take(people).
Filter(csvplus.Like(csvplus.Row{"name": "Amelia"})).
Map(func(row csvplus.Row) csvplus.Row { row["name"] = "Julia"; return row }).
ToCsvFile("out.csv", "name", "surname")
if err != nil {
return err
}
More involved example:
customers := csvplus.FromFile("people.csv").SelectColumns("id", "name", "surname")
custIndex, err := csvplus.Take(customers).UniqueIndexOn("id")
if err != nil {
return err
}
products := csvplus.FromFile("stock.csv").SelectColumns("prod_id", "product", "price")
prodIndex, err := csvplus.Take(products).UniqueIndexOn("prod_id")
if err != nil {
return err
}
orders := csvplus.FromFile("orders.csv").SelectColumns("cust_id", "prod_id", "qty", "ts")
iter := csvplus.Take(orders).Join(custIndex, "cust_id").Join(prodIndex)
return iter(func(row csvplus.Row) error {
// prints lines like:
// John Doe bought 38 oranges for £0.03 each on 2016-09-14T08:48:22+01:00
_, e := fmt.Printf("%s %s bought %s %ss for £%s each on %s\n",
row["name"], row["surname"], row["qty"], row["product"], row["price"], row["ts"])
return e
})
Design principles
The package functionality is based on the operations on the following entities:
- type
Row
- type
DataSource
- type
Index
Type Row
Row
represents one row from a DataSource
. It is a map from column names
to the string values under those columns on the current row. The package expects a unique name
assigned to every column at source. Compared to using integer indices this provides more
convenience when complex transformations get applied to each row during processing.
type DataSource
Type DataSource
represents any source of zero or more rows, like .csv
file. This is a function
that when invoked feeds the given callback with the data from its source, one Row
at a time.
The type also has a number of operations defined on it that provide for easy composition of the
operations on the DataSource
, forming so called fluent interface.
All these operations are 'lazy', i.e. they are not performed immediately, but instead each of them
returns a new DataSource
.
There is also a number of convenience operations that actually invoke
the DataSource
function to produce a specific type of output:
IndexOn
to build an index on the specified column(s);UniqueIndexOn
to build a unique index on the specified column(s);ToCsv
to serialise theDataSource
to the givenio.Writer
in.csv
format;ToCsvFile
to store theDataSource
in the specified file in.csv
format;ToJSON
to serialise theDataSource
to the givenio.Writer
in JSON format;ToJSONFile
to store theDataSource
in the specified file in JSON format;ToRows
to convert theDataSource
to a slice ofRow
s.
Type Index
Index
is a sorted collection of rows. The sorting is performed on the columns specified when the index
is created. Iteration over an index yields a sorted sequence of rows. An Index
can be joined with
a DataSource
. The type has operations for finding rows and creating sub-indices in O(log(n)) time.
Another useful operation is resolving duplicates. Building an index takes O(n*log(n)) time. It should
be noted that the Index
building operation requires the entire dataset to be read into
the memory, so certain care should be taken when indexing huge datasets. An index can also be
stored to, or loaded from a disk file.
For more details see the documentation.
Project status
The project is in a usable state usually called "beta". Tested on Linux Mint 18.3 using Go version 1.10.2.
*Note that all licence references and agreements mentioned in the csvplus README section above
are relevant to that project's source code only.