Popularity

4.2

Growing

Activity

5.2

Stars 113

Watchers 3

Forks 8

Last Commit about 1 month ago

Programming language: Go

License: MIT License

Tags: Machine Learning

go-featureprocessing alternatives and similar packages

Based on the "Machine Learning" category.
Alternatively, view go-featureprocessing alternatives based on common mentions on social networks and blogs.

GoLearn

9.6 0.0 go-featureprocessing VS GoLearn

Machine Learning for Go
gorse

9.4 6.7 go-featureprocessing VS gorse

Gorse open source recommender system engine

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

Gorgonia

9.1 2.8 go-featureprocessing VS Gorgonia

Gorgonia is a library that helps facilitate machine learning in Go.
gosseract

8.4 6.7 go-featureprocessing VS gosseract

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library
m2cgen

8.4 0.0 go-featureprocessing VS m2cgen

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
tfgo

8.3 1.5 go-featureprocessing VS tfgo

Tensorflow + Go, the gopher way
envd

8.0 8.8 go-featureprocessing VS envd

🏕️ Reproducible development environment
goml

7.9 0.0 go-featureprocessing VS goml

On-line Machine Learning in Go (and so much more)
gago

7.3 0.0 go-featureprocessing VS gago

:four_leaf_clover: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution)
bayesian

7.2 2.3 go-featureprocessing VS bayesian

Naive Bayesian Classification for Golang.
CloudForest

7.1 0.0 go-featureprocessing VS CloudForest

Ensembles of decision trees in go/golang.
ocrserver

7.0 0.0 go-featureprocessing VS ocrserver

A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well
onnx-go

6.8 2.3 go-featureprocessing VS onnx-go

onnx-go gives the ability to import a pre-trained neural network within Go without being linked to a framework or library.
gobrain

6.7 0.0 go-featureprocessing VS gobrain

Neural Networks written in go
go-deep

6.6 3.5 go-featureprocessing VS go-deep

Artificial Neural Network
sklearn

6.0 0.0 go-featureprocessing VS sklearn

bits of sklearn ported to Go #golang
regommend

5.9 0.0 go-featureprocessing VS regommend

Recommendation engine for Go
go-galib

5.5 0.0 go-featureprocessing VS go-galib

Genetic Algorithms library written in Go / golang
Goptuna

5.4 6.4 go-featureprocessing VS Goptuna

A hyperparameter optimization framework, inspired by Optuna.
goga

5.3 0.0 go-featureprocessing VS goga

Golang Genetic Algorithm
goRecommend

5.3 0.0 go-featureprocessing VS goRecommend

Collaborative Filtering (CF) Algorithms in Go!
shield

5.2 0.0 go-featureprocessing VS shield

Bayesian text classifier with flexible tokenizers and storage backends for Go
go-fann

4.6 0.0 go-featureprocessing VS go-fann

Go bindings for FANN, library for artificial neural networks
goscore

4.4 0.0 go-featureprocessing VS goscore

Go Scoring API for PMML
neat

4.2 0.0 go-featureprocessing VS neat

DISCONTINUED. Plug-and-play, parallel Go framework for NeuroEvolution of Augmenting Topologies (NEAT).
fonet

4.1 0.0 go-featureprocessing VS fonet

fonet is a deep neural network package for Go.
libsvm

4.0 0.0 go-featureprocessing VS libsvm

libsvm go version
NEUGO

3.8 0.0 go-featureprocessing VS NEUGO

DISCONTINUED. NEUGO: Neural Networks in Go
go-pr

3.8 0.0 go-featureprocessing VS go-pr

Pattern recognition package in Go lang.
GoMind

3.7 0.0 go-featureprocessing VS GoMind

A simplistic Neural Network Library in Go
neural-go

3.7 0.0 go-featureprocessing VS neural-go

A multilayer perceptron network implemented in Go, with training via backpropagation.
Varis

3.5 0.0 go-featureprocessing VS Varis

Golang Neural Network
golinear

3.4 0.0 go-featureprocessing VS golinear

DISCONTINUED. liblinear bindings for Go
go-cluster

3.2 0.0 go-featureprocessing VS go-cluster

k-modes and k-prototypes clustering algorithms implementation in Go
EAGO

2.9 0.0 go-featureprocessing VS EAGO

DISCONTINUED. EAGO: Evolutionary Algorithms in Go
evoli

2.9 0.0 go-featureprocessing VS evoli

Genetic Algorithm and Particle Swarm Optimization
godist

2.8 0.0 go-featureprocessing VS godist

Probability distributions and associated methods in Go
randomforest

2.8 2.6 go-featureprocessing VS randomforest

Random Forest implementation in golang
ddt

2.3 0.0 go-featureprocessing VS ddt

Golang Dynamic Decision Tree
probab

2.0 0.0 go-featureprocessing VS probab

Automatically exported from code.google.com/p/probab
mlgo

1.2 0.0 go-featureprocessing VS mlgo

Automatically exported from code.google.com/p/mlgo

Do you think we are missing an alternative of go-featureprocessing or a related project?

Add another 'Machine Learning' Package

Popular Comparisons

README

go-featureprocessing

Fast, simple sklearn-like feature processing for Go

[x] Does not cross cgo boundary
[x] No memory allocation
[x] No reflection
[x] Convenient serialization
[x] Generated code has 100% test coverage and benchmarks
[x] Fitting
[x] UTF-8
[x] Parallel batch transform
[x] Faster than sklearn in batch mode

//go:generate go run github.com/nikolaydubina/go-featureprocessing/cmd/generate -struct=Employee

type Employee struct {
    Age         int     `feature:"identity"`
    Salary      float64 `feature:"minmax"`
    Kids        int     `feature:"maxabs"`
    Weight      float64 `feature:"standard"`
    Height      float64 `feature:"quantile"`
    City        string  `feature:"onehot"`
    Car         string  `feature:"ordinal"`
    Income      float64 `feature:"kbins"`
    Description string  `feature:"tfidf"`
    SecretValue float64
}

Code above will generate a new struct as well benchmarks and tests using google/gofuzz.

employee := Employee{
   Age:         22,
   Salary:      1000.0,
   Kids:        2,
   Weight:      85.1,
   Height:      160.0,
   City:        "Pangyo",
   Car:         "Tesla",
   Income:      9000.1,
   SecretValue: 42,
   Description: "large text fields is not a problem neither, tf-idf can help here too! more advanced NLP will be added later!",
}

var fp EmployeeFeatureTransformer

config, _ := ioutil.ReadAll("employee_feature_processor.json")
json.Unmarshal(config, &fp)

features := fp.Transform(&employee)
// []float64{22, 1, 0.5, 1.0039999999999998, 1, 1, 0, 0, 0, 1, 5, 0.7674945674619879, 0.4532946552278861, 0.4532946552278861}

names := fp.FeatureNames()
// []string{"Age", "Salary", "Kids", "Weight", "Height", "City_Pangyo", "City_Seoul", "City_Daejeon", "City_Busan", "Car", "Income", "Description_text", "Description_problem", "Description_help"}

You can also fit transformer based on data

fp := EmployeeFeatureTransformer{}
fp.Fit([]Employee{...})

config, _ := json.Marshal(data)
_ = ioutil.WriteFile("employee_feature_processor.json", config, 0644)

This transformer can be serialized and de-serialized by standard Go routines. Serialized transformer is easy to read, update, and integrate with other tools.

{
   "Age_identity": {},
   "Salary_minmax": {"Min": 500, "Max": 900},
   "Kids_maxabs": {"Max": 4},
   "Weight_standard": {"Mean": 60, "STD": 25},
   "Height_quantile": {"Quantiles": [20, 100, 110, 120, 150]},
   "City_onehot": {"Mapping": {"Pangyo": 0, "Seoul": 1, "Daejeon": 2, "Busan": 3},
   "Car_ordinal": {"Mapping": {"BMW": 90000, "Tesla": 1}},
   "Income_kbins": {"Quantiles": [1000, 1100, 2000, 3000, 10000]},
   "Description_tfidf": {
      "Mapping": {"help": 2, "problem": 1, "text": 0},
      "Separator": " ",
      "DocCount": [1, 2, 2],
      "NumDocuments": 2,
      "Normalizer": {}
   }
}

Or you can manually initialize it.

fp := EmployeeFeatureTransformer{
   Salary: MinMaxScaler{Min: 500, Max: 900},
   Kids:   MaxAbsScaler{Max: 4},
   Weight: StandardScaler{Mean: 60, STD: 25},
   Height: QuantileScaler{Quantiles: []float64{20, 100, 110, 120, 150}},
   City:   OneHotEncoder{Mapping: map[string]uint{"Pangyo": 0, "Seoul": 1, "Daejeon": 2, "Busan": 3}},
   Car:    OrdinalEncoder{Mapping: map[string]uint{"Tesla": 1, "BMW": 90000}},
   Income: KBinsDiscretizer{QuantileScaler: QuantileScaler{Quantiles: []float64{1000, 1100, 2000, 3000, 10000}}},
   Description: TFIDFVectorizer{
      NumDocuments:    2,
      DocCount:        []uint{1, 2, 2},
      CountVectorizer: CountVectorizer{Mapping: map[string]uint{"text": 0, "problem": 1, "help": 2}, Separator: " "},
   },
}

Benchmarks

For typical use, with this struct encoder you can get ~100ns processing time for a single sample. How fast you need to get? Here are some numbers:

                       0 - C++ FlatBuffers decode
                     ...
                   200ps - 4.6GHz single cycle time
                1ns      - L1 cache latency
               10ns      - L2/L3 cache SRAM latency
               20ns      - DDR4 CAS, first byte from memory latency
               20ns      - C++ raw hardcoded structs access
               80ns      - C++ FlatBuffers decode/traverse/dealloc
 ---------->  100ns      - go-featureprocessing typical processing
              150ns      - PCIe bus latency
              171ns      - Go cgo call boundary, 2015
              200ns      - some High Frequency Trading FPGA claims
              800ns      - Go Protocol Buffers Marshal
              837ns      - Go json-iterator/go json decode
           1µs           - Go Protocol Buffers Unmarshal
           1µs           - High Frequency Trading FPGA
           3µs           - Go JSON Marshal
           7µs           - Go JSON Unmarshal
           9µs           - Go XML Marshal
          10µs           - PCIe/NVLink startup time
          17µs           - Python JSON encode or decode times
          30µs           - UNIX domain socket, eventfd, fifo pipes latency
          30µs           - Go XML Unmarshal
         100µs           - Redis intrinsic latency
         100µs           - AWS DynamoDB + DAX
         100µs           - KDB+ queries
         100µs           - High Frequency Trading direct market access range
         200µs           - 1GB/s network air latency
         200µs           - Go garbage collector latency 2018
         500µs           - NGINX/Kong added latency
     10ms                - AWS DynamoDB
     10ms                - WIFI6 "air" latency
     15ms                - AWS Sagemaker latency
     30ms                - 5G "air" latency
    100ms                - typical roundtrip from mobile to backend
    200ms                - AWS RDS MySQL/PostgreSQL or AWS Aurora
 10s                     - AWS Cloudfront 1MB transfer time

This is significantly faster than sklearn, or calling sklearn from Go, for few samples. And it performs similarly or faster than sklearn for large number of samples. [bench_log](docs/bench_log.png) [bench_lin](docs/bench_lin.png)

For full benchmarks go to /docs/benchmarks, some extract for typical struct:

goos: darwin
goarch: amd64
pkg: github.com/nikolaydubina/go-featureprocessing/cmd/generate/tests
BenchmarkEmployeeFeatureTransformer_Transform-8                                     62135674            206 ns/op          208 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_Transform_Inplace-8                             89993084            123 ns/op            0 B/op        0 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_10elems-8                           5921253           1881 ns/op         2048 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_100elems-8                           528890          20532 ns/op        21760 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_1000elems-8                           53524         238542 ns/op       221185 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_10000elems-8                           4879        2267683 ns/op      2007048 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_100000elems-8                           475       23257147 ns/op     20004876 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_1000000elems-8                           46      284763749 ns/op    192004098 B/op        1 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_10elems_8workers-8                  1552704           7362 ns/op         2064 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_100elems_8workers-8                  412455          29814 ns/op        21776 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_1000elems_8workers-8                  63822         177183 ns/op       213008 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_10000elems_8workers-8                  8704        1505994 ns/op      2162707 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_100000elems_8workers-8                  800       15840396 ns/op     21602323 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_1000000elems_8workers-8                  72      139700740 ns/op    192004112 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_5000000elems_8workers-8                   9     1720488586 ns/op       1040007184 B/op        2 allocs/op
BenchmarkEmployeeFeatureTransformer_TransformAll_15000000elems_8workers-8                  1    14009776007 ns/op       3240001552 B/op        2 allocs/op

[beta] Reflection based version

If you can't use go:gencode version, you can try relfection based version. Note, that reflection version intrudes overhead that is particularly noticeable if your struct has a lot of fields. You would get ~2x time increase for struct with large composite transformers. And you would get ~20x time increase for struct with 32 fields. Note, some features like serialization and de-serialization are not supported yet.

Benchmarks:

goos: darwin
goarch: amd64

// reflection
pkg: github.com/nikolaydubina/go-featureprocessing/structtransformer
BenchmarkStructTransformerTransform_32fields-4                           1732573              2079 ns/op             512 B/op          2 allocs/op

// non-reflection
pkg: github.com/nikolaydubina/go-featureprocessing/cmd/generate/tests
BenchmarkWith32FieldsFeatureTransformer_Transform-8                     31678317           116 ns/op         256 B/op          1 allocs/op
BenchmarkWith32FieldsFeatureTransformer_Transform_Inplace-8             80729049            43 ns/op           0 B/op          0 allocs/op

Profiling

From profiling benchmarks for struct with 32 fields, we see that reflect version takes much longer and spends time on what looks like reflection related code. Meanwhile go:generate version is fast enough to compar to testing routines themselves and spends 50% of the time on allocating single output slice, which is good since means memory access is a bottleneck. Run make profile to make profiles. Flamegraphs were produced from pprof output by https://www.speedscope.app/.

gencode: [gencode](docs/codegen_transform_cpu_profile.png) [gencode_selected](docs/codegen_transform_cpu_profile_selected.png)

reflect: [reflect](docs/reflect_transform_cpu_profile.png)