freqcount is a streaming word frequency CLI and Python package. It counts text from files,
directories, gzip files, or stdin, then formats the results for humans or downstream tools.
From this folder:
python -m pip install -e .Then run:
freq README.mdDuring development you can also run without installing:
$env:PYTHONPATH="src"
python -m freqcount count README.mdfreq file.txt
freq count file.txt -n 50 --format table
freq count . --recursive --glob "*.md"
freq rare file.txt -n 20
freq word file.txt --word python
freq compare a.txt b.txt
freq diff a.txt b.txt
freq merge chapter1.txt chapter2.txt chapter3.txt
freq ngram file.txt --ngram 2 -n 20
Get-Content file.txt | freq -
freq count archive.txt.gz --format json| Name | Behavior |
|---|---|
whitespace |
Fast str.split() baseline. |
regex |
Unicode-aware \w+ tokens. |
smart |
Keeps contractions, URLs, and emails together. |
| Format | Use |
|---|---|
plain |
Human-readable terminal output. |
table |
Aligned columns with bars. |
json |
Machine-readable output for scripts. |
csv |
Spreadsheet-friendly rows. |
markdown |
Paste-ready Markdown table. |
Bundled stopword lists ship for en, es, fr, and de.
freq count notes.txt --language en
freq count notes.txt --no-stopwords
freq count notes.txt --stopwords domain.txt --add-stopwords foo,bar
freq stopwords add project codex
freq stopwords remove project
freq stopwords import custom-stopwords.txt
freq stopwords listUser stopwords are stored at ~/.freqcount/stopwords/user.txt.
Optional config lives at ~/.freqcount/config.toml.
tokenizer = "smart"
formatter = "plain"
language = "en"
top_n = 20
chunk_size = 8192
normalizers = ["casefold"]CLI flags override config values, and config values override hardcoded defaults.
Every counting run records a compact JSONL entry at ~/.freqcount/history.jsonl.
freq history --last 10python -m pip install -e ".[dev]"
python -m pytest
python -m pytest --cov=src/freqcount --cov-report=term-missing
python -m ruff check .
python -m mypy srcTests are organized by module to keep intent clear:
tests/test_cli_inprocess.py- direct CLI handler and helper coverage.tests/test_cli_subprocess_module.py- end-to-end CLI subprocess checks.tests/test_counter_module.py- counting engine and token-removal behavior.tests/test_domain_module.py-FrequencyTableandSortOrderalgebra/formatting.tests/test_filters_tokenizers_cli_module.py- filters, tokenizers, and CLI branch paths.tests/test_formatters_runtime_module.py- formatters, ANSI/config, and__main__.tests/test_sources_module.py- sources, decoding, and filesystem traversal.tests/test_stopwords_history_module.py- stopword repositories and run history persistence.tests/test_properties_module.py- property-based invariants (when Hypothesis is installed).
Current baseline in this repository:
pytest -q: 33 passing tests.pytest --cov=src/freqcount --cov-report=term-missing: 100% line coverage.