Popularity
5.2
Growing
Activity
5.5
-
249
2
14
Description
unix-way web crawler
Programming language: Go
License: MIT License
crawley alternatives and similar packages
Based on the "Command Line" category.
Alternatively, view crawley alternatives based on common mentions on social networks and blogs.
-
Rich Interactive Widgets for Terminal UIs
Terminal UI library with rich, interactive widgets — written in Golang -
tcell
Tcell is an alternate terminal package, similar in some ways to termbox, but better in others. -
survey
DISCONTINUED. A golang library for building interactive and accessible prompts with full support for windows and posix terminals. -
pterm
✨ #PTerm is a modern Go module to easily beautify console output. Featuring charts, progressbars, tables, trees, text input, select menus and much more 🚀 It's completely configurable and 100% cross-platform compatible. -
cointop
DISCONTINUED. A fast and lightweight interactive terminal based UI application for tracking cryptocurrencies 🚀 -
The Platinum Searcher
A code search tool similar to ack and the_silver_searcher(ag). It supports multi platforms and multi encodings. -
asciigraph
Go package to make lightweight ASCII line graph ╭┈╯ in command line apps with no other dependencies. -
CLI Color
🎨 Terminal color rendering library, support 8/16 colors, 256 colors, RGB color rendering output, support Print/Sprintf methods, compatible with Windows. GO CLI 控制台颜色渲染工具库,支持16色,256色,RGB色彩渲染输出,使用类似于 Print/Sprintf,兼容并支持 Windows 环境的色彩渲染 -
go-size-analyzer
A tool for analyzing the size of compiled Go binaries, offering cross-platform support, detailed breakdowns, and multiple output formats.
InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
Promo
www.influxdata.com
Do you think we are missing an alternative of crawley or a related project?
Popular Comparisons
README
crawley
Crawls web pages and prints any link it can find.
features
- fast html SAX-parser (powered by
golang.org/x/net/html
) - small (below 1500 SLOC), idiomatic, 100% test covered codebase
- grabs most of useful resources urls (pics, videos, audios, forms, etc...)
- found urls are streamed to stdout and guranteed to be unique (with fragments omitted)
- scan depth (limited by starting host and path, by default - 0) can be configured
- can crawl rules and sitemaps from
robots.txt
brute
mode - scan html comments for urls (this can lead to bogus results)- make use of
HTTP_PROXY
/HTTPS_PROXY
environment values + handles proxy auth - directory-only scan mode (aka
fast-scan
) - user-defined cookies, in curl-compatible format (i.e.
-cookie "ONE=1; TWO=2" -cookie "ITS=ME" -cookie @cookie-file
) - user-defined headers, same as curl:
-header "ONE: 1" -header "TWO: 2" -header @headers-file
- tag filter - allow to specify tags to crawl for (single:
-tag a -tag form
, multiple:-tag a,form
, or mixed) - url ignore - allow to ignore urls with matched substrings from crawling (i.e.: '-ignore logout')
- js parser - extract api endpoints from js files, this done by regexp, so results can be messy
examples
# print all links from first page:
crawley http://some-test.site
# print all js files and api endpoints:
crawley -depth -1 -tag script -js http://some-test.site
# print all endpoints from js:
crawley -js http://some-test.site/app.js
# download all png images from site:
crawley -depth -1 -tag img http://some-test.site | grep '\.png$' | wget -i -
# fast directory traversal:
crawley -headless -delay 0 -depth -1 -dirs only http://some-test.site
installation
- binaries for Linux, FreeBSD, macOS and Windows, just download and run.
- archlinux you can use your favourite AUR helper to install it, e. g.
paru -S crawley-bin
.
usage
crawley [flags] url
possible flags:
-brute
scan html comments
-cookie value
extra cookies for request, can be used multiple times, accept files with '@'-prefix
-delay duration
per-request delay (0 - disable) (default 150ms)
-depth int
scan depth (-1 - unlimited)
-dirs string
policy for non-resource urls: show / hide / only (default "show")
-header value
extra headers for request, can be used multiple times, accept files with '@'-prefix
-headless
disable pre-flight HEAD requests
-help
this flags (and their defaults) description
-ignore value
patterns (in urls) to be ignored in crawl process
-js
scan js files for endpoints
-proxy-auth string
credentials for proxy: user:password
-robots string
policy for robots.txt: ignore / crawl / respect (default "ignore")
-silent
suppress info and error messages in stderr
-skip-ssl
skip ssl verification
-tag value
tags filter, single or comma-separated tag names allowed
-user-agent string
user-agent string
-version
show version
-workers int
number of workers (default - number of CPU cores)
*Note that all licence references and agreements mentioned in the crawley README section above
are relevant to that project's source code only.