Skip to content

Make whisper an optional extra with faster-whisper by default#1877

Open
Dreamsorcerer wants to merge 6 commits intodevfrom
sam/move-whisper
Open

Make whisper an optional extra with faster-whisper by default#1877
Dreamsorcerer wants to merge 6 commits intodevfrom
sam/move-whisper

Conversation

@Dreamsorcerer
Copy link
Copy Markdown
Collaborator

@Dreamsorcerer Dreamsorcerer commented Apr 17, 2026

Problem

Whisper requires downloading a 150MB model and depends on torch (with GBs of CUDA downloads).

Solution

Provide faster-whisper by default (2MB) and use as a fallback when whisper is not available.
This avoids the 150MB download, and means we are one step closer to not depending on torch for a base install.

Breaking Changes

Users need to request dimos[whisper] now for full whisper feature.

Test

python -c "
from dimos.stream.audio.pipelines import stt
node = stt()
node.emit_text().subscribe(on_next=lambda t: print(f"USER: {t}"))
from dimos.stream.audio.utils import keepalive
keepalive()
"

@Dreamsorcerer Dreamsorcerer marked this pull request as ready for review April 17, 2026 15:54
@Dreamsorcerer
Copy link
Copy Markdown
Collaborator Author

TTS seems to work pretty well with faster-whisper anyway.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 17, 2026

Greptile Summary

This PR makes openai-whisper an optional extra (dimos[whisper]) and adds faster-whisper as the default audio transcription backend in dimos[agents], significantly reducing the default install footprint. The WhisperNode class now auto-detects which backend is available at import time, preferring openai-whisper if present and falling back to faster-whisper otherwise.

  • The UserWarning on line 36–40 fires for every default install (faster-whisper is in agents), misleading users into thinking their setup is degraded when it is the intended configuration.
  • faster-whisper in pyproject.toml has no lower version bound, but device=\"auto\" requires >=1.0.0, which can cause a TypeError at runtime on older installs.

Confidence Score: 4/5

Safe to merge after fixing the misleading UserWarning that fires for all default users.

One P1 finding: the UserWarning implies a degraded fallback state for every default install, which will confuse users. The missing version constraint on faster-whisper (P2) could also cause a runtime TypeError with older versions. Both are straightforward one-line fixes.

dimos/stream/audio/stt/node_whisper.py (misleading warning), pyproject.toml (missing version constraint)

Important Files Changed

Filename Overview
dimos/stream/audio/stt/node_whisper.py Adds faster-whisper fallback when openai-whisper is absent; misleading UserWarning fires for all default installs, and caller's modelopts dict is mutated via pop().
pyproject.toml Moves openai-whisper to a new optional [whisper] extra and adds faster-whisper to [agents]; no version constraint on faster-whisper despite using device="auto" (requires >=1.0.0).
uv.lock Lockfile updated to reflect new faster-whisper dependency; no manual review needed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[import WhisperNode] --> B{try: import whisper}
    B -- success --> C[_USE_FASTER_WHISPER = False\nopenai-whisper backend]
    B -- ImportError --> D{try: from faster_whisper\nimport WhisperModel}
    D -- success --> E[UserWarning fired\n_USE_FASTER_WHISPER = True\nfaster-whisper backend]
    D -- ImportError --> F[Raise ImportError\nNo backend found]

    C --> G[WhisperNode.__init__]
    E --> G
    G --> H{_USE_FASTER_WHISPER?}
    H -- True --> I[pop fp16 → compute_type\nWhisperModel device=auto]
    H -- False --> J[whisper.load_model]

    I --> K[transcribe → segments iterator\njoin seg.text]
    J --> L[transcribe → dict\nresult text]
Loading

Reviews (1): Last reviewed commit: "Add warning" | Re-trigger Greptile

Comment on lines +36 to +41
warnings.warn(
"openai-whisper not installed, falling back to faster-whisper. "
"Install openai-whisper for the full backend: pip install dimos[whisper]",
stacklevel=2,
)
_USE_FASTER_WHISPER = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Warning fires for all default users

faster-whisper is now bundled in the agents extra, meaning it is the intended default backend. Every user who installs the standard package will have only faster-whisper available, so this UserWarning fires for 100% of default users on every first import. The message "falling back to faster-whisper" implies a degraded or misconfigured setup, which will confuse users who have done nothing wrong.

Consider removing the warning entirely (faster-whisper is the intended default, not a fallback), or downgrade it to logging.debug, or reword it so it only surfaces if the user somehow expected openai-whisper:

Suggested change
warnings.warn(
"openai-whisper not installed, falling back to faster-whisper. "
"Install openai-whisper for the full backend: pip install dimos[whisper]",
stacklevel=2,
)
_USE_FASTER_WHISPER = True
warnings.warn(
"openai-whisper is not installed; using faster-whisper. "
"Install openai-whisper via `pip install dimos[whisper]` to use it instead.",
stacklevel=2,
)

Or simply omit the warning since this is the documented default path.

Comment on lines +67 to +71
if _USE_FASTER_WHISPER:
fp16 = modelopts.pop("fp16", False)
compute_type = "float16" if fp16 else "int8"
self.modelopts = modelopts
self.model = WhisperModel(model, device="auto", compute_type=compute_type)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Caller's dict mutated by .pop()

modelopts.pop("fp16", False) mutates the dict that was passed in. If the caller stored a reference to their options dict and inspects it after construction, the fp16 key will be silently gone. A shallow copy before popping avoids this:

Suggested change
if _USE_FASTER_WHISPER:
fp16 = modelopts.pop("fp16", False)
compute_type = "float16" if fp16 else "int8"
self.modelopts = modelopts
self.model = WhisperModel(model, device="auto", compute_type=compute_type)
opts = dict(modelopts)
fp16 = opts.pop("fp16", False)
compute_type = "float16" if fp16 else "int8"
self.modelopts = opts

Comment thread pyproject.toml Outdated
"openai",
"openai-whisper",
"sounddevice",
"faster-whisper",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No version constraint on faster-whisper

faster-whisper is listed without a minimum version. The device="auto" argument passed to WhisperModel (line 71 of node_whisper.py) was introduced in faster-whisper>=1.0.0. Without a lower bound, users on older versions will get a runtime TypeError. Adding a floor constraint prevents this:

Suggested change
"faster-whisper",
"faster-whisper>=1.0.0",

Comment thread pyproject.toml
]

whisper = [
"dimos[agents]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include dimos[agents] here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to install without agents? I was just assuming that we'd want all the other dependencies here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants