Description
Package feed implements a flexible, robust and efficient RSS/Atom parser.
xml alternatives and similar packages
Based on the "Text Processing" category.
Alternatively, view xml alternatives based on common mentions on social networks and blogs.
-
goldmark
:trophy: A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured. -
bluemonday
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS -
html-to-markdown
โ๏ธ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. -
omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc. -
mxj
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages. -
go-pkg-rss
DISCONTINUED. This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs. -
go-edlib
๐ String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc... -
goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library -
gospider
DISCONTINUED. โก Light weight Golang spider framework | ่ฝป้็ Golang ็ฌ่ซๆกๆถ [GET https://api.github.com/repos/zhshch2002/gospider: 404 - Not Found // See: https://docs.github.com/rest/repos/repos#get-a-repository] -
github_flavored_markdown
GitHub Flavored Markdown renderer with fenced code block highlighting, clickable header anchor links. -
go-pkg-xmlx
DISCONTINUED. Extension to the standard Go XML package. Maintains a node tree that allows forward/backwards browsing and exposes some simple single/multi-node search functions. -
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
CodeRabbit: AI Code Reviews for Developers

Do you think we are missing an alternative of xml or a related project?
README
Feed Parser (RSS, Atom)
Package feed implements a flexible, robust and efficient RSS/Atom parser.
If you just want some bytes to be quickly parsed into an object without care about underlying feed type, you can start with this: Simple Use
If you want to take a deeper dive into how you can customize the parser behavior:
- Extending BasicFeed
- Robustness and recovery from bad input
- Parse with specification compliancy checking
- RSS and Atom extensions
Installation & Use
Get the pkg
go get github.com/jloup/xml
Use it in code
import "github.com/jloup/xml/feed"
Simple Use : feed.Parse(io.Reader, feed.DefaultOptions)
Example:
f, err := os.Open("feed.txt")
if err != nil {
return
}
myfeed, err := feed.Parse(f, feed.DefaultOptions)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
return
}
fmt.Printf("FEED '%s'\n", myfeed.Title)
for i, entry := range myfeed.Entries {
fmt.Printf("\t#%v '%s' (%s)\n\t\t%s\n\n", i, entry.Title,
entry.Link,
entry.Summary)
}
Output:
FEED 'Me, Myself and I'
#0 'Breakfast' (http://example.org/2005/04/02/breakfast)
eggs and bacon, yup !
#1 'Dinner' (http://example.org/2005/04/02/dinner)
got soap delivered !
feed.Parse returns a BasicFeed which fields are :
// Rss channel or Atom feed
type BasicFeed struct {
Title string
Id string // Atom:feed:id | RSS:channel:link
Date time.Time
Image string // Atom:feed:logo:iri | RSS:channel:image:url
Entries []BasicEntryBlock
}
type BasicEntryBlock struct {
Title string
Link string
Date time.Time // Atom:entry:updated | RSS:item:pubDate
Id string // Atom:entry:id | RSS:item:guid
Summary string
}
Extending BasicFeed
BasicFeed is really basic struct implementing feed.UserFeed interface. You may want to access more values extracted from feeds. For this purpose you can pass your own implementation of feed.UserFeed to feed.ParseCustom.
type UserFeed interface {
PopulateFromAtomFeed(f *atom.Feed) // see github.com/jloup/xml/feed/atom
PopulateFromAtomEntry(e *atom.Entry)
PopulateFromRssChannel(c *rss.Channel) // see github.com/jloup/xml/feed/rss
PopulateFromRssItem(i *rss.Item)
}
func ParseCustom(r io.Reader, feed UserFeed, options ParseOptions) error
To avoid starting from scratch, you can embed feed.BasicEntryBlock and feed.BasicFeedBlock in your structs
Example:
type MyFeed struct {
feed.BasicFeedBlock
Generator string
Entries []feed.BasicEntryBlock
}
func (m *MyFeed) PopulateFromAtomFeed(f *atom.Feed) {
m.BasicFeedBlock.PopulateFromAtomFeed(f)
m.Generator = fmt.Sprintf("%s V%s", f.Generator.Uri.String(),
f.Generator.Version.String())
}
func (m *MyFeed) PopulateFromRssChannel(c *rss.Channel) {
m.BasicFeedBlock.PopulateFromRssChannel(c)
m.Generator = c.Generator.String()
}
func (m *MyFeed) PopulateFromAtomEntry(e *atom.Entry) {
newEntry := feed.BasicEntryBlock{}
newEntry.PopulateFromAtomEntry(e)
m.Entries = append(m.Entries, newEntry)
}
func (m *MyFeed) PopulateFromRssItem(i *rss.Item) {
newEntry := feed.BasicEntryBlock{}
newEntry.PopulateFromRssItem(i)
m.Entries = append(m.Entries, newEntry)
}
func main() {
f, err := os.Open("feed.txt")
if err != nil {
return
}
myfeed := &MyFeed{}
err = feed.ParseCustom(f, myfeed, feed.DefaultOptions)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
return
}
fmt.Printf("FEED '%s' generated with %s\n", myfeed.Title, myfeed.Generator)
for i, entry := range myfeed.Entries {
fmt.Printf("\t#%v '%s' (%s)\n", i, entry.Title, entry.Link)
}
}
Output:
FEED 'Me, Myself and I' generated with http://www.atomgenerator.com/ V1.0
#0 'Breakfast' (http://example.org/2005/04/02/breakfast)
#1 'Dinner' (http://example.org/2005/04/02/dinner)
Robustness and recovery from bad input
Feeds are wildly use and it is quite common that a single invalid character, missing closing/starting tag invalidate the whole feed. Standard encoding/xml is quite pedantic (as it should) about input xml.
In order to produce an output feed at all cost, you can set the number of times you want the parser to recover from invalid input via XMLTokenErrorRetry field in ParseOptions. The strategy is quite simple, if xml decoder returns an XMLTokenError while parsing, the faulty token will be removed from input and the parser will retry to build a feed from it. It useful when invalid html, xml is present in content tag (atom) for example.
Example:
f, err := os.Open("testdata/invalid_atom.xml")
opt := feed.DefaultOptions
opt.XMLTokenErrorRetry = 1
_, err = feed.Parse(f, opt)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
} else {
fmt.Println("no error")
}
Output:
no error
with XMLTokenError set to 0, it would have produced the following error:
Cannot parse feed: [XMLTokenError] XML syntax error on line 574: illegal character code U+000C
Parse with specification compliancy checking
RSS and Atom feeds should conform to a specification (which is complex for Atom). The common behavior of Parse functions is to not be too restrictive about input feeds. To validate feeds, you can pass a custom FlagChecker to ParseOptions. If you really know what you are doing you can enable/disable only some spec checks.
Error flags can be found for each standard in packages documentation:
- RSS : github.com/jloup/xml/feed/rss
- Atom : github.com/jloup/xml/feed/atom
Example:
// the input feed is not compliant to spec
f, err := os.Open("feed.txt")
if err != nil {
return
}
// the input feed should be 100% compliant to spec...
flags := xmlutils.NewErrorChecker(xmlutils.EnableAllError)
//... but it is OK if Atom entry does not have <updated> field
flags.DisableErrorChecking("entry", atom.MissingDate)
options := feed.ParseOptions{extension.Manager{}, &flags}
myfeed, err := feed.Parse(f, options)
if err != nil {
fmt.Printf("Cannot parse feed:\n%s\n", err)
return
}
fmt.Printf("FEED '%s'\n", myfeed.Title)
Output:
Cannot parse feed:
in 'feed':
[MissingId]
feed's id should exist
Rss and Atom extensions
Both formats allow to add third party extensions. Some extensions have been implemented for the example e.g. RSS dc:creator (github.com/jloup/xml/feed/rss/extension/dc)
Example:
type ExtendedFeed struct {
feed.BasicFeedBlock
Entries []ExtendedEntry
}
type ExtendedEntry struct {
feed.BasicEntryBlock
Creator string // <dc:creator> only present in RSS feeds
Entries []feed.BasicEntryBlock
}
func (f *ExtendedFeed) PopulateFromAtomEntry(e *atom.Entry) {
newEntry := ExtendedEntry{}
newEntry.PopulateFromAtomEntry(e)
f.Entries = append(f.Entries, newEntry)
}
func (f *ExtendedFeed) PopulateFromRssItem(i *rss.Item) {
newEntry := ExtendedEntry{}
newEntry.PopulateFromRssItem(i)
creator, ok := dc.GetCreator(i)
// we must check the item actually has a dc:creator element
if ok {
newEntry.Creator = creator.String()
}
f.Entries = append(f.Entries, newEntry)
}
func main() {
f, err := os.Open("rss.txt")
if err != nil {
return
}
//Manager is in github.com/jloup/xml/feed/extension
manager := extension.Manager{}
// we add the dc extension to it
// dc extension is in "github.com/jloup/xml/feed/rss/extension/dc"
dc.AddToManager(&manager)
opt := feed.DefaultOptions
//we pass our custom extension Manager to ParseOptions
opt.ExtensionManager = manager
myfeed := &ExtendedFeed{}
err = feed.ParseCustom(f, myfeed, opt)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
return
}
fmt.Printf("FEED '%s'\n", myfeed.Title)
for i, entry := range myfeed.Entries {
fmt.Printf("\t#%v '%s' by %s (%s)\n", i, entry.Title,
entry.Creator,
entry.Link)
}
}
Output:
FEED 'Me, Myself and I'
#0 'Breakfast' by Peter J. (http://example.org/2005/04/02/breakfast)
#1 'Dinner' by Peter J. (http://example.org/2005/04/02/dinner)