Popularity
4.7
Stable
Activity
0.0
Stable
156
5
11

Programming language: Go
License: MIT License
Tags: Goroutines    

neilotoole/errgroup alternatives and similar packages

Based on the "Goroutines" category.
Alternatively, view neilotoole/errgroup alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of neilotoole/errgroup or a related project?

Add another 'Goroutines' Package

README

Actions Status Go Report Card release Coverage GoDoc [license](./LICENSE)

neilotoole/errgroup

neilotoole/errgroup is a drop-in alternative to Go's wonderful sync/errgroup but limited to N goroutines. This is useful for interaction with rate-limited APIs, databases, and the like.

Overview

In effect, neilotoole/errgroup is sync/errgroup but with a worker pool of N goroutines. The exported API is identical but for an additional function WithContextN, which allows the caller to specify the maximum number of goroutines (numG) and the capacity of the queue channel (qSize) used to hold work before it is picked up by a worker goroutine. The zero Group and the Group returned by WithContext have numG and qSize equal to runtime.NumCPU.

Usage

The exported API of this package mirrors the sync/errgroup package. The only change needed is the import path of the package, from:

import (
  "golang.org/x/sync/errgroup"
)

to

import (
  "github.com/neilotoole/errgroup"
)

Then use in the normal manner. See the godoc for more.

g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
    // do something
    return nil
})

err := g.Wait()

Many users will have no need to tweak the numG and qCh params. However, benchmarking may suggest particular values for your workload. For that you'll need WithContextN:

numG, qSize := 8, 4
g, ctx := errgroup.WithContextN(ctx, numG, qSize)

Performance

The motivation for creating neilotoole/errgroup was to provide rate-limiting while maintaining the lovely sync/errgroup semantics. Sacrificing some performance vs sync/errgroup was assumed. However, benchmarking suggests that this implementation can be more effective than sync/errgroup when tuned for a specific workload.

Below is a selection of benchmark results. How to read this: a workload is X tasks of Y complexity. The workload is executed for:

  • sync/errgroup, listed as sync_errgroup
  • a non-parallel implementation (sequential)
  • various {numG, qSize} configurations of neilotoole/errgroup, listed as errgroupn_{numG}_{qSize}
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_default_16_16-16              25574         46867 ns/op         688 B/op         12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_4_4-16                        24908         48926 ns/op         592 B/op         12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_16_4-16                       24895         48313 ns/op         592 B/op         12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_32_4-16                       24853         48284 ns/op         592 B/op         12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/sync_errgroup-16                        18784         65826 ns/op        1858 B/op         55 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/sequential-16                           10000        111483 ns/op           0 B/op          0 allocs/op

BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_default_16_16-16              3745        325993 ns/op        1168 B/op         27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_4_4-16                        5186        227034 ns/op        1072 B/op         27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_16_4-16                       3970        312816 ns/op        1076 B/op         27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_32_4-16                       3715        320757 ns/op        1073 B/op         27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/sync_errgroup-16                        2739        432093 ns/op        1862 B/op         55 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/sequential-16                           2306        520947 ns/op           0 B/op          0 allocs/op

BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_default_16_16-16              354       3602666 ns/op        1822 B/op         47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_4_4-16                        420       2468605 ns/op        1712 B/op         47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_16_4-16                       334       3581349 ns/op        1716 B/op         47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_32_4-16                       310       3890316 ns/op        1712 B/op         47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/sync_errgroup-16                        253       4740462 ns/op        8303 B/op        255 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/sequential-16                           200       5924693 ns/op           0 B/op          0 allocs/op

The overall impression is that neilotoole/errgroup can provide higher throughput than sync/errgroup for these (CPU-intensive) workloads, sometimes significantly so. As always, these benchmark results should not be taken as gospel: your results may vary.

Design Note

Why require an explicit qSize limit?

If the number of calls to Group.Go results in qCh becoming full, the Go method will block until worker goroutines relieve qCh. This behavior is in contrast to sync/errgroup's Go method, which doesn't block. While neilotoole/errgroup aims to be as much of a behaviorally similar "drop-in" alternative to sync/errgroup as possible, this blocking behavior is a conscious deviation.

Noting that the capacity of qCh is controlled by qSize, it's probable an alternative implementation could be built that uses a (growable) slice acting - if qCh is full - as a buffer for functions passed to Go. Consideration of this potential design led to this issue regarding unlimited capacity channels, or perhaps better characterized in this particular case as "growable capacity channels". If such a feature existed in the language, it's possible that this implementation might have taken advantage of it, at least in the first-pass release (benchmarking notwithstanding). However benchmarking seems to suggest that a relatively small qSize has performance benefits for some workloads, so it's possible that the explicit qSize requirement is a better design choice regardless.


*Note that all licence references and agreements mentioned in the neilotoole/errgroup README section above are relevant to that project's source code only.