Skip to content

PrincetonAfeez/Frequency-Counter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Frequency Counter

freqcount is a streaming word frequency CLI and Python package. It counts text from files, directories, gzip files, or stdin, then formats the results for humans or downstream tools.

Install

From this folder:

python -m pip install -e .

Then run:

freq README.md

During development you can also run without installing:

$env:PYTHONPATH="src"
python -m freqcount count README.md

Examples

freq file.txt
freq count file.txt -n 50 --format table
freq count . --recursive --glob "*.md"
freq rare file.txt -n 20
freq word file.txt --word python
freq compare a.txt b.txt
freq diff a.txt b.txt
freq merge chapter1.txt chapter2.txt chapter3.txt
freq ngram file.txt --ngram 2 -n 20
Get-Content file.txt | freq -
freq count archive.txt.gz --format json

Tokenizers

Name Behavior
whitespace Fast str.split() baseline.
regex Unicode-aware \w+ tokens.
smart Keeps contractions, URLs, and emails together.

Output Formats

Format Use
plain Human-readable terminal output.
table Aligned columns with bars.
json Machine-readable output for scripts.
csv Spreadsheet-friendly rows.
markdown Paste-ready Markdown table.

Stopwords

Bundled stopword lists ship for en, es, fr, and de.

freq count notes.txt --language en
freq count notes.txt --no-stopwords
freq count notes.txt --stopwords domain.txt --add-stopwords foo,bar
freq stopwords add project codex
freq stopwords remove project
freq stopwords import custom-stopwords.txt
freq stopwords list

User stopwords are stored at ~/.freqcount/stopwords/user.txt.

Config

Optional config lives at ~/.freqcount/config.toml.

tokenizer = "smart"
formatter = "plain"
language = "en"
top_n = 20
chunk_size = 8192
normalizers = ["casefold"]

CLI flags override config values, and config values override hardcoded defaults.

History

Every counting run records a compact JSONL entry at ~/.freqcount/history.jsonl.

freq history --last 10

Development

python -m pip install -e ".[dev]"
python -m pytest
python -m pytest --cov=src/freqcount --cov-report=term-missing
python -m ruff check .
python -m mypy src

Test Suite

Tests are organized by module to keep intent clear:

  • tests/test_cli_inprocess.py - direct CLI handler and helper coverage.
  • tests/test_cli_subprocess_module.py - end-to-end CLI subprocess checks.
  • tests/test_counter_module.py - counting engine and token-removal behavior.
  • tests/test_domain_module.py - FrequencyTable and SortOrder algebra/formatting.
  • tests/test_filters_tokenizers_cli_module.py - filters, tokenizers, and CLI branch paths.
  • tests/test_formatters_runtime_module.py - formatters, ANSI/config, and __main__.
  • tests/test_sources_module.py - sources, decoding, and filesystem traversal.
  • tests/test_stopwords_history_module.py - stopword repositories and run history persistence.
  • tests/test_properties_module.py - property-based invariants (when Hypothesis is installed).

Current baseline in this repository:

  • pytest -q: 33 passing tests.
  • pytest --cov=src/freqcount --cov-report=term-missing: 100% line coverage.

Releases

No releases published

Packages

 
 
 

Contributors

Languages