omniparser alternatives and similar packages
Based on the "Specific Formats" category.
Alternatively, view omniparser alternatives based on common mentions on social networks and blogs.
-
sh
A shell parser, formatter, and interpreter with bash support; includes shfmt -
go-humanize
Go Humans! (formatters for units to human friendly sizes) -
bluemonday
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS -
commonregex
๐ซ A collection of common regular expressions for Go -
mxj
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages. -
xpath
XPath package for Golang, supports HTML, XML, JSON document query. -
html-to-markdown
โ๏ธ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. -
go-pkg-rss
This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs. -
goribot
A simple golang spider/scraping framework,build a spider in 3 lines. -
goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library -
xquery
XQuery lets you extract data from HTML/XML documents using XPath expression. -
github_flavored_markdown
GitHub Flavored Markdown renderer with fenced code block highlighting, clickable header anchor links. -
gospider
โก Light weight Golang spider framework | ่ฝป้็ Golang ็ฌ่ซๆกๆถ -
go-pkg-xmlx
Extension to the standard Go XML package. Maintains a node tree that allows forward/backwards browsing and exposes some simple single/multi-node search functions. -
go-fixedwidth
Encoding and decoding for fixed-width formatted data -
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler -
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins. -
gonameparts
Takes a full name and splits it into individual name parts -
codetree
:evergreen_tree: Parses indented code and returns a tree structure. -
jsoncolor
Colorized JSON output for Go https://godoc.org/github.com/nwidger/jsoncolor
Clean code begins in your IDE with SonarLint
Do you think we are missing an alternative of omniparser or a related project?
Popular Comparisons
README
omniparser
Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.
Please kindly consider sponsoring the project to fund future development and issue resolutions: https://github.com/sponsors/jf-tech
Min Golang Version: 1.14
Documentation
Docs:
- [Getting Started](./doc/gettingstarted.md): a tutorial for writing your first omniparser schema.
- [IDR](./doc/idr.md): in-memory data representation of ingested data for omniparser.
- [XPath Based Record Filtering and Data Extraction](./doc/xpath.md): xpath queries are essential to omniparser schema writing. Learn the concept and tricks in depth.
- [All About Transforms](./doc/transforms.md): everything about
transform_declarations
. - [Use of
custom_func
, Speciallyjavascript
](./doc/use_of_custom_funcs.md): An in depth look of howcustom_func
is used, specially the all mightyjavascript
(andjavascript_with_context
). - [CSV Schema in Depth](./doc/csv2_in_depth.md): everything about schemas for CSV input.
- [Fixed-Length Schema in Depth](./doc/fixedlength2_in_depth.md): everything about schemas for fixed-length (e.g. TXT) input
- [JSON/XML Schema in Depth](./doc/json_xml_in_depth.md): everything about schemas for JSON or XML input.
- [EDI Schema in Depth](./doc/edi_in_depth.md): everything about schemas for EDI input.
- [Programmability](./doc/programmability.md): Advanced techniques for using omniparser (or some of its components) in your code.
References:
- [Custom Functions](./doc/customfuncs.md): a complete reference of all built-in custom functions.
Examples:
- [CSV Examples](extensions/omniv21/samples/csv2)
- [Fixed-Length Examples](extensions/omniv21/samples/fixedlength2)
- [JSON Examples](extensions/omniv21/samples/json)
- [XML Examples](extensions/omniv21/samples/xml).
- [EDI Examples](extensions/omniv21/samples/edi).
- [Custom File Format](extensions/omniv21/samples/customfileformats/jsonlog)
- [Custom Funcs](extensions/omniv21/samples/customfuncs)
In the example folders above you will find pairs of input files and their schema files. Then in the
.snapshots
sub directory, you'll find their corresponding output files.
Online Playground
Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.
[](./cli/cmd/web/playground-demo.gif)
Why
- No good ETL transform/parser library exists in Golang.
- Even looking into Java and other languages, choices aren't many and all have limitations:
- Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.
Requirements
- Golang 1.14 or later.
Recent Major Feature Additions/Changes
- 2022/09: v1.0.4 released: added
csv2
file format that supersedes the originalcsv
format with support of hierarchical and nested records. - 2022/09: v1.0.3 released: added
fixedlength2
file format that supersedes the originalfixed-length
format with support of hierarchical and nested envelopes. - 1.0.0 Released!
- Added
Transform.RawRecord()
for caller of omniparser to access the raw ingested record. - Deprecated
custom_parse
in favor ofcustom_func
(custom_parse
is still usable for back-compatibility, it is just removed from all public docs and samples). - Added
NonValidatingReader
EDI segment reader. - Added fixed-length file format support in omniv21 handler.
- Added EDI file format support in omniv21 handler.
- Major restructure/refactoring
- Upgrade omni schema version to
omni.2.1
due a number of incompatible schema changes:'result_type'
->'type'
'ignore_error_and_return_empty_str
->'ignore_error'
'keep_leading_trailing_space'
->'no_trim'
- Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
- Changed the way we package custom functions for extensions: previously we collected custom functions from all extensions and then passed all of them to the extension that is used; this feels weird, now only the custom functions included in a particular extension are used in that extension.
- Deprecated/removed most of the custom functions in favor of using 'javascript'.
- A number of package renaming.
- Upgrade omni schema version to
- Added CSV file format support in omniv2 handler.
- Introduced IDR node cache for allocation recycling.
- Introduced [IDR](./doc/idr.md) for in-memory data representation.
- Added trie based high performance
times.SmartParse
. - Command line interface (one-off
transform
cmd or long-running httpserver
mode). javascript
engine integration as a custom_func.- JSON stream parser.
- Extensibility:
- Ability to provide custom functions.
- Ability to provide custom schema handler.
- Ability to customize the built-in omniv2 schema handler's parsing code.
- Ability to provide a new file format support to built-in omniv2 schema handler.