All Versions
62
Latest Version
Avg Release Cycle
6 days
Latest Release
-
Changelog History
Page 1
Changelog History
Page 1
-
v0.60.0 Changes
- ๐ฅ BREAKING: graduated
ContentOnly
option (-content
in the CLI mode), from now on it is ENABLED by default.
- ๐ฅ BREAKING: graduated
-
v0.59.0 Changes
- ๐ use different segmentation logic based on the
github.com/go-ego/gse
segmenter for Chinese & Japanese languages; - ๐ improved HTML parser logic: optimised the way it collects contents of a document and improved logic for splitting into sentences;
- fallback to the English language for the stop words in cases when language detection is not reliable;
- โ added
lang
option to the CLI to be able to provide the language of the document; - โฌ๏ธ bumped
github.com/zoomio/stopwords
to0.11.0
.
- ๐ use different segmentation logic based on the
-
v0.58.0 Changes
- stopped ignoring
<h1>
in cases when they are equal to the<title>
, as in now they are included.
- stopped ignoring
-
v0.57.0 Changes
- โฌ๏ธ Bumped
github.com/zoomio/inout
to0.12.0
; - ๐ Fixed
-q
option orQuery
in the code (HTTP/HTML mode only), so now it actually works and retrieves contents of the DOM element for the query; - Introduced
-r
option orWaitFor
(HTTP/HTML mode only) to allow for waiting for certain DOM element to be ready before getting HTML; - Introduced
-u
option orWaitUntil
(HTTP/HTML mode only) to allow to wait for a certain delay before getting HTML; - Introduced
-i
option orScreenshot
(HTTP/HTML mode only) to capture a full screenshot of HTML in the given path.
- โฌ๏ธ Bumped
-
v0.56.1 Changes
- โ Added macOS (darwin) ARM64 release.
-
v0.56.0 Changes
- โฌ๏ธ Bumped Go to 1.18;
- ๐ฅ BREAKING: renamed
ParseHTML
,ParseMD
&ParseText
toProcessHTML
,ProcessMD
&ProcessText
respectively; - ๐ฅ BREAKING: renamed
extension.Result
toextension.ExtResult
; - ๐ New option
AllTagWeights
for enabling parsing through everything; - ๐ New option
ExcludeTagsString
for prohibitting some of the tags; - ๐
ParseHTML
&ParseMD
are made public to open up parsing capabilities.
-
v0.55.0 Changes
- ๐ improved handling of the words with the "`" or "'" symbols.
-
v0.54.0 Changes
- ๐ฅ BREAKING FROM 0.53.0: changed
config.StopWords
option signature to expect a slice of strings instead of*stopwords.Register
; - โฌ๏ธ bumped
github.com/zoomio/stopwords
to0.10.0
.
- ๐ฅ BREAKING FROM 0.53.0: changed
-
v0.53.0 Changes
- โ added
Option
calledStopWords
to allow for custom stop-words setup, also madeDomains
variable public.
- โ added
-
v0.52.0 Changes
- โ added URL sanitization in the texts, so it excludes things like http, https & www from them.