All Versions
62
Latest Version
Avg Release Cycle
6 days
Latest Release
-

Changelog History
Page 1

  • v0.60.0 Changes

    • ๐Ÿ’ฅ BREAKING: graduated ContentOnly option (-content in the CLI mode), from now on it is ENABLED by default.
  • v0.59.0 Changes

    • ๐Ÿ‘‰ use different segmentation logic based on the github.com/go-ego/gse segmenter for Chinese & Japanese languages;
    • ๐Ÿ‘Œ improved HTML parser logic: optimised the way it collects contents of a document and improved logic for splitting into sentences;
    • fallback to the English language for the stop words in cases when language detection is not reliable;
    • โž• added lang option to the CLI to be able to provide the language of the document;
    • โฌ†๏ธ bumped github.com/zoomio/stopwords to 0.11.0.
  • v0.58.0 Changes

    • stopped ignoring <h1> in cases when they are equal to the <title>, as in now they are included.
  • v0.57.0 Changes

    • โฌ†๏ธ Bumped github.com/zoomio/inout to 0.12.0;
    • ๐Ÿ›  Fixed -q option or Query in the code (HTTP/HTML mode only), so now it actually works and retrieves contents of the DOM element for the query;
    • Introduced -r option or WaitFor (HTTP/HTML mode only) to allow for waiting for certain DOM element to be ready before getting HTML;
    • Introduced -u option or WaitUntil (HTTP/HTML mode only) to allow to wait for a certain delay before getting HTML;
    • Introduced -i option or Screenshot (HTTP/HTML mode only) to capture a full screenshot of HTML in the given path.
  • v0.56.1 Changes

    • โž• Added macOS (darwin) ARM64 release.
  • v0.56.0 Changes

    • โฌ†๏ธ Bumped Go to 1.18;
    • ๐Ÿ’ฅ BREAKING: renamed ParseHTML, ParseMD & ParseText to ProcessHTML, ProcessMD & ProcessText respectively;
    • ๐Ÿ’ฅ BREAKING: renamed extension.Result to extension.ExtResult;
    • ๐Ÿ†• New option AllTagWeights for enabling parsing through everything;
    • ๐Ÿ†• New option ExcludeTagsString for prohibitting some of the tags;
    • ๐Ÿ“œ ParseHTML & ParseMD are made public to open up parsing capabilities.
  • v0.55.0 Changes

    • ๐Ÿ‘Œ improved handling of the words with the "`" or "'" symbols.
  • v0.54.0 Changes

    • ๐Ÿ’ฅ BREAKING FROM 0.53.0: changed config.StopWords option signature to expect a slice of strings instead of *stopwords.Register;
    • โฌ†๏ธ bumped github.com/zoomio/stopwords to 0.10.0.
  • v0.53.0 Changes

    • โž• added Option called StopWords to allow for custom stop-words setup, also made Domains variable public.
  • v0.52.0 Changes

    • โž• added URL sanitization in the texts, so it excludes things like http, https & www from them.