Description
Another cross-platform, efficient, practical and pretty CSV/TSV toolkit
Yes, you could just use spreadsheet softwares like MS excel to
do most of the job.
Howerver it's all by clicking and typing, which is not
automatically and time-consuming to repeate, especially when we want to
apply similar operations with different datasets or purposes.
Hope it be helpful to you.
csvtk alternatives and similar packages
Based on the "Utilities" category.
Alternatively, view csvtk alternatives based on common mentions on social networks and blogs.
-
项目文档
🚀Vite+Vue3+Gin拥有AI辅助的基础开发平台,支持TS和JS混用。它集成了JWT鉴权、权限管理、动态路由、显隐可控组件、分页封装、多点登录拦截、资源权限、上传下载、代码生成器、表单生成器和可配置的导入导出等开发必备功能。 -
excelize
Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets -
Kopia
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included. -
goreporter
A Golang tool that does static analysis, unit testing, code review and generate code quality report. -
create-go-app
✨ A complete and self-contained solution for developers of any qualification to create a production-ready project with backend (Go), frontend (JavaScript, TypeScript) and deploy automation (Ansible, Docker) by running only one CLI command. -
EaseProbe
A simple, standalone, and lightweight tool that can do health/status checking, written in Go. -
filetype
Fast, dependency-free Go package to infer binary file types based on the magic numbers header signature -
boilr
:zap: boilerplate template manager that generates files or directories from template repositories -
beaver
💨 A real time messaging system to build a scalable in-app notifications, multiplayer games, chat apps in web and mobile apps. -
go-underscore
Helpfully Functional Go - A useful collection of Go utilities. Designed for programmer happiness.
CodeRabbit: AI Code Reviews for Developers

Do you think we are missing an alternative of csvtk or a related project?
Popular Comparisons
README
csvtk - a cross-platform, efficient and practical CSV/TSV toolkit
- Documents: http://bioinf.shenwei.me/csvtk ( Usage and Tutorial). 中文介绍
- Source code: https://github.com/shenwei356/csvtk
- Latest version:
Introduction
Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data science.
People usually use spreadsheet software like MS Excel to process table data. However this is all by clicking and typing, which is not automated and is time-consuming to repeat, especially when you want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more code is needed to handle the header line. Shell commands do not support selecting columns with column names either.
csvtk
is convenient for rapid data investigation
and also easy to integrate into analysis pipelines.
It could save you lots of time in (not) writing Python/R scripts.
Table of Contents
<!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- Features
- Subcommands
- Installation
- Command-line completion
- Compared to
csvkit
- Examples
- Acknowledgements
- Contact
- License
- Starchart
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Features
- Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
- Fast, multiple-CPUs supported (some commands)
- Practical functions provided by N subcommands
- Support STDIN and gziped input/output file, easy being used in pipe
- Most of the subcommands support unselecting fields and fuzzy fields,
e.g.
-f "-id,-name"
for all fields except "id" and "name",-F -f "a.*"
for all fields with prefix "a.". - Support some common plots (see usage)
- Seamlessly support for data with meta line (e.g.,
sep=,
) of separator declaration used by MS Excel
Subcommands
49 subcommands in total.
Information
headers
: prints headersdim
: dimensions of CSV filenrow
: print number of recordsncol
: print number of columnssummary
: summary statistics of selected numeric or text fields (groupby group fields)watch
: online monitoring and histogram of selected fieldcorr
: calculate Pearson correlation between numeric columns
Format conversion
pretty
: converts CSV to readable aligned tablecsv2tab
: converts CSV to tabular formattab2csv
: converts tabular format to CSVspace2tab
: converts space delimited format to CSVtranspose
: transposes CSV datacsv2md
: converts CSV to markdown formatcsv2rst
: convert CSV to reStructuredText formatcsv2json
: converts CSV to JSON formatcsv2xlsx
: convert CSV/TSV files to XLSX filexlsx2csv
: converts XLSX to CSV format
Set operations
head
: prints first N recordsconcat
: concatenates CSV/TSV files by rowssample
: sampling by proportioncut
: select and arrange fieldsgrep
: greps data by selected fields with patterns/regular expressionsuniq
: unique data without sortingfreq
: frequencies of selected fieldsinter
: intersection of multiple filesfilter
: filters rows by values of selected fields with arithmetic expressionfilter2
: filters rows by awk-like arithmetic/string expressionsjoin
: join files by selected fields (inner, left and outer join)split
splits CSV/TSV into multiple files according to column valuessplitxlsx
: splits XLSX sheet into multiple sheets according to column valuescomb
: compute combinations of items at every row
Edit
add-header
: add column namesdel-header
: delete column namesrename
: renames column names with new namesrename2
: renames column names by regular expressionreplace
: replaces data of selected fields by regular expressionround
: round float to n decimal placesmutate
: creates new columns from selected fields by regular expressionmutate2
: creates new column from selected fields by awk-like arithmetic/string expressionssep
: separate column into multiple columnsgather
: gathers columns into key-value pairsunfold
: unfold multiple values in cells of a fieldfold
: fold multiple values of a field into cells of groupsfmtdate
: format date of selected fields
Ordering
sort
: sorts by selected fields
Ploting
Misc
cat
stream file and report progressversion
print version information and check for updategenautocomplete
generate shell autocompletion script (bash|zsh|fish|powershell)
Installation
csvtk
is implemented in Go programming language,
executable binary files for most popular operating systems are freely available
in release page.
Method 1: Download binaries (latest stable/dev version)
Just download compressed
executable file of your operating system,
and decompress it with tar -zxvf *.tar.gz
command or other tools.
And then:
For Linux-like systems
If you have root privilege simply copy it to
/usr/local/bin
:sudo cp csvtk /usr/local/bin/
Or copy to anywhere in the environment variable
PATH
:mkdir -p $HOME/bin/; cp csvtk $HOME/bin/
For windows, just copy
csvtk.exe
toC:\WINDOWS\system32
.
Method 2: Install via conda (latest stable version)

conda install -c bioconda csvtk
Method 3: Install via homebrew
brew install csvtk
Method 4: For Go developer (latest stable/dev version)
go get -u github.com/shenwei356/csvtk/csvtk
Method 5: For ArchLinux AUR users (may be not the latest)
yaourt -S csvtk
Command-line completion
Bash:
# generate completion shell
csvtk genautocomplete --shell bash
# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc
Zsh:
# generate completion shell
csvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk
# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc
fish:
csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish
Compared to csvkit
csvkit, attention: this table wasn't updated for 2 years.
Features | csvtk | csvkit | Note |
---|---|---|---|
Read Gzip | Yes | Yes | read gzip files |
Fields ranges | Yes | Yes | e.g. -f 1-4,6 |
Unselect fileds | Yes | -- | e.g. -1 for excluding first column |
Fuzzy fields | Yes | -- | e.g. ab* for columns with name prefix "ab" |
Reorder fields | Yes | Yes | it means -f 1,2 is different from -f 2,1 |
Rename columns | Yes | -- | rename with new name(s) or from existed names |
Sort by multiple keys | Yes | Yes | bash sort like operations |
Sort by number | Yes | -- | e.g. -k 1:n |
Multiple sort | Yes | -- | e.g. -k 2:r -k 1:nr |
Pretty output | Yes | Yes | convert CSV to readable aligned table |
Unique data | Yes | -- | unique data of selected fields |
frequency | Yes | -- | frequencies of selected fields |
Sampling | Yes | -- | sampling by proportion |
Mutate fields | Yes | -- | create new columns from selected fields |
Replace | Yes | -- | replace data of selected fields |
Similar tools:
- csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats. http://csvkit.rtfd.org/
- xsv - A fast CSV toolkit written in Rust.
- miller - Miller is like sed, awk, cut, join, and sort for name-indexed data such as CSV and tabular JSON http://johnkerl.org/miller
- tsv-utils - Command line utilities for tab-separated value files written in the D programming language.
Examples
Attention
- The CSV parser requires all the lines have same number of fields/columns. Even lines with spaces will cause error. Use '-I/--ignore-illegal-row' to skip these lines if neccessary.
- By default, csvtk thinks your files have header row, if not, switch flag
-H
on. - Column names better be unique.
- By default, lines starting with
#
will be ignored, if the header row starts with#
, please assign flag-C
another rare symbol, e.g.'$'
. - By default, csvtk handles CSV files, use flag
-t
for tab-delimited files. - If
"
exists in tab-delimited files, use flag-l
. - Do not mix use field (column) numbers and names.
Examples
Pretty result
$ csvtk pretty names.csv id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123
Summary of selected numeric fields, supporting "group-by"
$ cat testdata/digitals2.csv \ | csvtk summary --ignore-non-digits --fields f4:sum,f5:sum --groups f1,f2 \ | csvtk pretty f1 f2 f4:sum f5:sum bar xyz 7.00 106.00 bar xyz2 4.00 4.00 foo bar 6.00 3.00 foo bar2 4.50 5.00
Select fields/columns (
cut
)
- By index: `csvtk cut -f 1,2`
- By names: `csvtk cut -f first_name,username`
- **Unselect**: `csvtk cut -f -1,-2` or `csvtk cut -f -first_name`
- **Fuzzy fields**: `csvtk cut -F -f "*_name,username"`
- Field ranges: `csvtk cut -f 2-4` for column 2,3,4 or `csvtk cut -f -3--1` for discarding column 1,2,3
- All fields: `csvtk cut -F -f "*"`
- Search by selected fields (
grep
) (matched parts will be highlighted as red)
- By exactly matching: `csvtk grep -f first_name -p Robert -p Rob`
- By regular expression: `csvtk grep -f first_name -r -p Rob`
- By pattern list: `csvtk grep -f first_name -P name_list.txt`
- Remore rows containing missing data (NA): `csvtk grep -F -f "*" -r -p "^$" -v `
- Rename column names (
rename
andrename2
)
- Setting new names: `csvtk rename -f A,B -n a,b` or `csvtk rename -f 1-3 -n a,b,c`
- Replacing with original names by regular express: `cat ../testdata/c.csv | ./csvtk rename2 -F -f "*" -p "(.*)" -r 'prefix_$1'` for adding prefix to all column names.
- Edit data with regular expression (
replace
)
- Remove Chinese charactors: `csvtk replace -F -f "*_name" -p "\p{Han}+" -r ""`
- Create new column from selected fields by regular expression (
mutate
)
- In default, copy a column: `csvtk mutate -f id `
- Extract prefix of data as group name (get "A" from "A.1" as group name):
`csvtk mutate -f sample -n group -p "^(.+?)\."`
- Sort by multiple keys (
sort
)
- By single column : `csvtk sort -k 1` or `csvtk sort -k last_name`
- By multiple columns: `csvtk sort -k 1,2` or `csvtk sort -k 1 -k 2` or `csvtk sort -k last_name,age`
- Sort by number: `csvtk sort -k 1:n` or `csvtk sort -k 1:nr` for reverse number
- Complex sort: `csvtk sort -k region -k age:n -k id:nr`
- In natural order: `csvtk sort -k chr:N`
- Join multiple files by keys (
join
)
- All files have same key column: `csvtk join -f id file1.csv file2.csv`
- Files have different key columns: `csvtk join -f "username;username;name" names.csv phone.csv adress.csv -k`
- Filter by numbers (
filter
)
- Single field: `csvtk filter -f "id>0"`
- **Multiple fields**: `csvtk filter -f "1-3>0"`
- Using `--any` to print record if any of the field satisfy the condition: `csvtk filter -f "1-3>0" --any`
- **fuzzy fields**: `csvtk filter -F -f "A*!=0"`
- Filter rows by awk-like arithmetic/string expressions (
filter2
)
- Using field index: `csvtk filter2 -f '$3>0'`
- Using column names: `csvtk filter2 -f '$id > 0'`
- Both arithmetic and string expressions: `csvtk filter2 -f '$id > 3 || $username=="ken"'`
- More complicated: `csvtk filter2 -H -t -f '$1 > 2 && $2 % 2 == 0'`
Ploting
plot histogram with data of the second column:
csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 | display
[histogram.png](testdata/figures/histogram.png)
- plot boxplot with data of the "GC Content" (third) column,
group information is the "Group" column.
csvtk -t plot box testdata/grouped_data.tsv.gz -g "Group" \
-f "GC Content" --width 3 | display

- plot horiz boxplot with data of the "Length" (second) column,
group information is the "Group" column.
csvtk -t plot box testdata/grouped_data.tsv.gz -g "Group" -f "Length" \
--height 3 --width 5 --horiz --title "Horiz box plot" | display

- plot line plot with X-Y data
csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group | display

- plot scatter plot with X-Y data
csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group --scatter | display

Acknowledgements
We are grateful to Zhiluo Deng and Li Peng for suggesting features and reporting bugs.
Thanks Albert Vilella for features suggestion, which makes csvtk feature-rich。
Contact
Create an issue to report bugs, propose new functions or ask for help.
Or leave a comment.
License
Starchart
*Note that all licence references and agreements mentioned in the csvtk README section above
are relevant to that project's source code only.