Programming language: Go
License: BSD 2-clause "Simplified" License
Latest version: v1.0.0-alpha

spaGO alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view spaGO alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of spaGO or a related project?

Add another 'Natural Language Processing' Package


If you like the project, please ★ star this repository to show your support! 🤩

Currently, the main branch contains version v1.0.0-alpha.0, which differs substantially from version v0.7.0. For NLP-related features, check out the v0.7.0 release branch. The CHANGELOG details the major changes.

A Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing.

Spago is self-contained, in that it uses its own lightweight computational graph both for training and inference, easy to understand from start to finish.

It provides:

  • Automatic differentiation via dynamic define-by-run execution
  • Gradient descent optimizers (Adam, RAdam, RMS-Prop, AdaGrad, SGD)
  • Feed-forward layers (Linear, Highway, Convolution...)
  • Recurrent layers (LSTM, GRU, BiLSTM...)
  • Attention layers (Self-Attention, Multi-Head Attention...)
  • Memory-efficient Word Embeddings (with badger key–value store)
  • Gob compatible neural models for serialization



Clone this repo or get the library:

go get -u github.com/nlpodyssey/spago


The core module of Spago relies only on [testify](github.com/stretchr/testify) for unit testing. In other words, it has "zero dependencies", and we are committed to keeping it that way as much as possible.

Spago uses a multi-module workspace to ensure that additional dependencies are downloaded only when specific features (e.g. persistent embeddings) are used.

Getting Started

A good place to start is by looking at the implementation of built-in neural models, such as the LSTM. Except for a few linear algebra operations written in assembly for optimal performance (a bit of copying from Gonum), it's straightforward Go code, so you don't have to worry. In fact, SpaGO could have been written by you :)

The behavior of a neural model is characterized by a combination of parameters and equations. Mathematical expressions must be defined using the auto-grad ag package in order to take advantage of automatic differentiation.

In this sense, we can say the computational graph is at the center of the Spago machine learning framework.

Example 1

Here is an example of how to calculate the sum of two variables:

package main

import (

type T = float32

func main() {
  // create a new node of type variable with a scalar
  a := ag.Var(mat.NewScalar(T(2.0))).WithGrad(true)
  // create another node of type variable with a scalar
  b := ag.Var(mat.NewScalar(T(5.0))).WithGrad(true)
  // create an addition operator (the calculation is actually performed here)
  c := ag.Add(a, b)

  // print the result
  fmt.Printf("c = %v (float%d)\n", c.Value(), c.Value().Scalar().BitSize())

  ag.Backward(c, mat.NewScalar(T(0.5)))
  fmt.Printf("ga = %v\n", a.Grad())
  fmt.Printf("gb = %v\n", b.Grad())


c = [7] (float32)
ga = [0.5]
gb = [0.5]

Example 2

Here is a simple implementation of the perceptron formula:

package main

import (

  . "github.com/nlpodyssey/spago/ag"

func main() {
  x := Var(mat.NewScalar(-0.8)).WithName("x")
  w := Var(mat.NewScalar(0.4)).WithName("w")
  b := Var(mat.NewScalar(-0.2)).WithName("b")

  y := Sigmoid(Add(Mul(w, x), b))

  err := dot.Encode(encoding.NewGraph(y), os.Stdout)
  if err != nil {

In this case, we are interested in rendering the resulting graph with Graphviz:

go run main.go | dot -Tpng -o g.png

Example 3

As a next step, let's take a look at how to create a linear regression model ($y = wx + b$) and how it will be trained.

The following algorithm will try to learn the correct values for weight and bias.

By the end of our training, our equation will approximate the line of best fit the objective function $y = 3x + 1$.

package main

import (


const (
    epochs   = 100  // number of epochs
    examples = 1000 // number of examples

type Linear struct {
    W nn.Param `spago:"type:weights"`
    B nn.Param `spago:"type:biases"`

func NewLinear[T float.DType](in, out int) *Linear {
    return &Linear{
        W: nn.NewParam(mat.NewEmptyDense[T](out, in)),
        B: nn.NewParam(mat.NewEmptyVecDense[T](out)),

func (m *Linear) InitWithRandomWeights(seed uint64) *Linear {
    initializers.XavierUniform(m.W.Value(), 1.0, rand.NewLockedRand(seed))
    return m

func (m *Linear) Forward(x ag.Node) ag.Node {
    return ag.Add(ag.Mul(m.W, x), m.B)

func main() {
    m := NewLinear[float64](1, 1).InitWithRandomWeights(42)

    optimizer := gd.NewOptimizer(m, sgd.New[float64](sgd.NewConfig(0.001, 0.9, true)))

    normalize := func(x float64) float64 { return x / float64(examples) }
    objective := func(x float64) float64 { return 3*x + 1 }
    criterion := losses.MSE

    learn := func(input, expected float64) float64 {
        x, target := ag.Scalar(input), ag.Scalar(expected)
        y := m.Forward(x)
        loss := criterion(y, target, true)
        defer ag.Backward(loss) //  free the memory of the graph before return
        return loss.Value().Scalar().F64()

    for epoch := 0; epoch < epochs; epoch++ {
        for i := 0; i < examples; i++ {
            x := normalize(float64(i))
            loss := learn(x, objective(x))
            if i%100 == 0 {

    fmt.Printf("\nW: %.2f | B: %.2f\n\n", m.W.Value().Scalar().F64(), m.B.Value().Scalar().F64())


W: 3.00 | B: 1.00


Goroutines play a very important role in making Spago efficient; in fact Forward operations are executed concurrently (up to GOMAXPROCS). As soon as an Operator is created (usually by calling one of the functions in the ag package, such as Add, Prod, etc.), the related Function's Forward procedure is performed on a new goroutine. Nevertheless, it's always safe to ask for the Operator's Value() without worries: if it's called too soon, the function will lock until the result is computed, and then return the value.

Known Limits

Sadly, at the moment, Spago is not GPU friendly by design.

Projects using SpaGo

Below is a list of projects that use Spago:


We're glad you're thinking about contributing to Spago! If you think something is missing or could be improved, please open issues and pull requests. If you'd like to help this project grow, we'd love to have you!

To start contributing, check the Contributing Guidelines.


We encourage you to write an issue. This would help the community grow.

If you really want to write to us privately, please email Matteo Grella with your questions or comments.


Spago is part of the open-source NLP Odyssey initiative initiated by members of the EXOP team (now part of Crisis24). I would therefore like to thank EXOP GmbH here, which is providing full support for development by promoting the project and giving it increasing importance.


We appreciate contributions of all kinds. We especially want to thank Spago fiscal sponsors who contribute to ongoing project maintenance.

See our Open Collective page if you too are interested in becoming a sponsor.

*Note that all licence references and agreements mentioned in the spaGO README section above are relevant to that project's source code only.