feat: add tool calling support to m serve by markstur · Pull Request #850 · generative-computing/mellea

markstur · 2026-04-13T23:38:41Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes m serve OpenAI API tool calling round-trip #825

Successfully added tool calling support to m serve CLI with proper type annotations. Here's what was implemented:

Changes Made

1. Updated Models (`cli/serve/models.py`)

Added ToolCallFunction model for function details in tool calls
Added ChatCompletionMessageToolCall model for tool call structure
Extended ChatCompletionMessage to include optional tool_calls field
Updated Choice.finish_reason to support "tool_calls" value

2. Modified Server Logic (`cli/serve/app.py`)

Added json and Literal imports for proper typing
Imported new tool-related models
Updated _build_model_options() to pass through tools (mapped to ModelOption.TOOLS) and tool_choice parameters
Enhanced make_chat_endpoint() to:
- Extract tool calls from ModelOutputThunk.tool_calls with proper type checking (isinstance(dict))
- Generate unique IDs for each tool call in format call_<24-char-hex>
- Serialize tool arguments to JSON
- Set finish_reason with proper Literal type annotation
- Return tool calls in OpenAI-compatible format

3. Comprehensive Tests (`test/cli/test_serve_tool_calling.py`)

8 new tests covering:
- Single and multiple tool calls
- Tool call formatting and serialization
- Complex nested arguments
- Tool parameters passed to model_options
- Backward compatibility (requests without tools)
- Usage info alongside tool calls

4. Updated Existing Test (`test/cli/test_serve.py`)

Renamed test_tool_params_excluded_from_model_options to test_tool_params_passed_to_model_options
Updated assertions to verify tools and tool_choice are now passed through

5. Example Code

docs/examples/m_serve/m_serve_example_tool_calling.py: Complete server example with GetWeatherTool and GetStockPriceTool implementations
docs/examples/m_serve/client_tool_calling.py: Client demonstrating how to call the tool-enabled server with various scenarios

Key Features

✅ OpenAI-Compatible: Follows OpenAI's tool calling API format
✅ Type-Safe: Proper Literal type annotations for finish_reason
✅ Robust Type Checking: Uses isinstance(dict) to avoid Mock object issues
✅ Automatic Tool Call Detection: Extracts tool calls from ModelOutputThunk
✅ Proper Finish Reasons: Returns "tool_calls" when tools are invoked, "stop" otherwise
✅ Unique Tool Call IDs: Generates unique IDs in format call_<24-char-hex>
✅ JSON Serialization: Properly serializes tool arguments to JSON strings
✅ Backward Compatible: Works with existing code that doesn't use tools
✅ Fully Tested: All 43 serve tests pass, including 8 new tool-specific tests
✅ Type Checked: Passes mypy type checking

Usage

Start server with tool support:

uv run m serve docs/examples/m_serve/m_serve_example_tool_calling.py

Call with tools from client:

response = requests.post(
    "http://localhost:8080/v1/chat/completions",
    json={
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": "What's the weather in Paris?"}],
        "tools": [...],  # Tool definitions
        "tool_choice": "auto"
    }
)

The implementation properly handles tool calls from Mellea's ModelOutputThunk and formats them according to OpenAI's API specification with full type safety.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

github-actions · 2026-04-13T23:38:52Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1 · 2026-04-14T15:44:40Z

@markstur Do you want review comments yet or still WIP?

markstur · 2026-04-14T15:47:58Z

@markstur Do you want review comments yet or still WIP?

Comments would be great! It is draft because I need to do more review/test myself on the generated code. I don't want to waste your time but comments early would be very welcome.

psschwei

Code Review: feat: add tool calling support to m serve

Good feature PR — the core plumbing is correct and the OpenAI-compatible response format looks right. A couple of bugs to fix before merge, plus some improvements.

Summary

The implementation correctly wires tool calling through the serve endpoint: tools maps to ModelOption.TOOLS, tool_choice passes through as-is, and the response extracts tool calls from ModelOutputThunk into the OpenAI format. The Pydantic models mirror the OpenAI types well, and tests cover the main paths.

Two bugs need fixing (see inline comments):

Empty tool_calls dict produces incorrect finish_reason: "tool_calls" with an empty array
Client example's multi-turn loop duplicates the assistant message for each tool call

Other improvements (see inline comments):

Unused loop variable tool_name
eval() in example code with # noqa suppressing the security lint for copy-pasters
Missing test for the empty dict edge case
hasattr check is always true for ModelOutputThunk — defensive but masks upstream bugs

What's working well

Pydantic models (ToolCallFunction, ChatCompletionMessageToolCall) closely match OpenAI types
_build_model_options change is clean — tools removed from exclusion set, mapped to ModelOption.TOOLS
8 well-structured tests covering single/multiple tool calls, finish reasons, model_options passthrough, complex args, usage info, and backward compat
Existing test updated consistently from "excluded" to "passed"

planetf1

Two additional items not yet covered in existing review comments.

planetf1

Two additional items not covered in existing review comments.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

…y dict Fixed the bug where an empty tool_calls dict ({}) incorrectly produced finish_reason="tool_calls" with an empty array instead of finish_reason="stop" with tool_calls=None. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

…xample Issue: The assistant message was being added inside the loop for each tool call, causing duplication when multiple tool calls were present. Fix: Moved the assistant message append outside the loop (before processing tool calls), so it's only added once. Now the loop only adds tool responses. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

The dict key tool_name is never used — the function name comes from model_tool_call.name. Using .values() instead. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Replaced hasattr() with direct __dict__ membership tests to correctly distinguish: 1. Typed instances (ModelOutputThunk[float](...)) - have __orig_class__ in their instance dict 2. Untyped instances (ModelOutputThunk(...)) - do NOT have __orig_class__ in their instance dict Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Security issue resolved in `m_serve_example_tool_calling.py`: **Changes made:** - Replaced `CalculatorTool` (which used unsafe `eval()` with `# noqa: S307`) with `GetStockPriceTool` - New tool demonstrates API-calling pattern with mock stock prices (AAPL, GOOGL, MSFT, TSLA) - Updated all references: `calculator_tool` → `stock_price_tool` - Maintains the same tool calling demonstration with two tools (weather + stock price) **Why this is better:** - Eliminates security risk entirely (no `eval()` or suppressed lints) - Still demonstrates multiple tools effectively - Uses safe, realistic API-calling pattern that users can copy - No dangerous code that could be copy-pasted into production Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

The pass-thru behavior was not clear enough, so adding it to ModelOptions where important options are known. Most of these are sentinels which are removed (because @@@) but this will be like TEMPERATURE which is passed through to the backends. No behavior change, but give a handly constant and a place to look for these. This does not address all the other possible pass through args. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

- switch server example to OpenAIBackend - align tool-calling example with tested Granite model setup - narrow advertised tools when `tool_choice` selects a specific function - enable `tool_calls=True` in the serve path - replace calculator example with stock-price tool - examples 1/2 as tool-call-only demos - example 4 as the full tool execution round-trip - improve client diagnostics for empty/no-tool responses Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur · 2026-04-17T20:36:59Z

Code Review: feat: add tool calling support to m serve

Good feature PR — the core plumbing is correct and the OpenAI-compatible response format looks right. A couple of bugs to fix before merge, plus some improvements.

Summary

The implementation correctly wires tool calling through the serve endpoint: tools maps to ModelOption.TOOLS, tool_choice passes through as-is, and the response extracts tool calls from ModelOutputThunk into the OpenAI format. The Pydantic models mirror the OpenAI types well, and tests cover the main paths.

Two bugs need fixing (see inline comments):

Empty tool_calls dict produces incorrect finish_reason: "tool_calls" with an empty array

Client example's multi-turn loop duplicates the assistant message for each tool call

Other improvements (see inline comments):

Unused loop variable tool_name

eval() in example code with # noqa suppressing the security lint for copy-pasters

Missing test for the empty dict edge case

hasattr check is always true for ModelOutputThunk — defensive but masks upstream bugs

What's working well

Pydantic models (ToolCallFunction, ChatCompletionMessageToolCall) closely match OpenAI types

_build_model_options change is clean — tools removed from exclusion set, mapped to ModelOption.TOOLS

8 well-structured tests covering single/multiple tool calls, finish reasons, model_options passthrough, complex args, usage info, and backward compat

Existing test updated consistently from "excluded" to "passed"

Fixed all these.
The eval one goes away with the removal of calc (replaced by stock "look-up")

psschwei

LGTM
Since @planetf1 had also reviewed, will leave final approval to him

@psschwei

dismissing the requested change which was resolved
@psschwei approved but waiting for @planetf1 to approve

planetf1

Most issues fine - I just have one concern -- the finish reason I don't think is what you intended for streaming?

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

planetf1 · 2026-04-28T12:23:05Z

+                    ChatCompletionChunkChoice(
+                        index=0,
+                        delta=ChatCompletionChunkDelta(tool_calls=tool_calls),
+                        finish_reason=None,


delta.tool_calls is missing the required index field — breaks compatibility with real OpenAI SDK clients.

The OpenAI streaming spec requires each item in delta.tool_calls to carry index: int. Clients including the openai Python SDK, LangChain, and LiteLLM key their delta-reassembly state machine on this field — without it they silently drop tool calls, coalesce them incorrectly, or raise a TypeError depending on version.

The root cause is that ChatCompletionChunkDelta.tool_calls (models.py:173) reuses ChatCompletionMessageToolCall — a complete (non-streaming) model that has no index field. The streaming delta shape needs a distinct model where index is required and id, type, and function are optional (since args can arrive across multiple chunks).

The bundled client_streaming_tool_calling.py masks this because it reads delta.tool_calls verbatim rather than going through SDK delta reassembly — so the example passes but any spec-compliant consumer breaks.

planetf1 · 2026-04-28T12:26:03Z


+            # Extract tool calls from the ModelOutputThunk if available
+            tool_calls_list = build_tool_calls(output)
+            tool_calls = (


build_tool_calls is called here unconditionally, then discarded on the streaming branch.

When request.stream is True (line 212), execution returns via StreamingResponse before tool_calls is ever used — and streaming.py:108 calls build_tool_calls(output) again, generating entirely different call_<uuid> IDs. The IDs minted here are never sent to any client.

This is harmless today, but any logging or tracing that captures tool_calls_list from this path will record IDs that never appear in the actual response, making debugging painful. Consider moving this block into the non-streaming branch (after the if request.stream: guard), or computing once and passing the result into stream_chat_completion_chunks.

planetf1 · 2026-04-28T12:27:24Z

+
+
+class TestToolCalling:
+    """Tests for tool calling functionality."""


The streaming tool-calling path in streaming.py (lines 107–145) has no test coverage here.

All 8 tests in this class call make_chat_endpoint directly with non-streaming requests. The new tool-call chunk emission, finish_reason logic, and build_tool_calls call in streaming.py are entirely untested — which is also how the missing index field (see other comment) shipped without being caught.

A streaming test using FastAPI's TestClient would catch chunk-shape regressions. For example:

@pytest.mark.asyncio async def test_tool_calls_streaming(self, mock_module, sample_tool_request): sample_tool_request.stream = True # ... assert chunks contain delta.tool_calls with index field

planetf1 · 2026-04-28T12:32:40Z

+            "description": "Get the current weather in a given location",
+            "parameters": {
+                "RootModel": {
+                    "type": "object",


This "RootModel" key is non-standard and will cause any real OpenAI-compatible client to fail.

The standard OpenAI tool format sends parameters as a bare JSON Schema object:

"parameters": { "type": "object", "properties": { ... }, "required": [...] }

This server requires the non-standard {"parameters": {"RootModel": {...}}} envelope because FunctionParameters in cli/serve/models.py defines RootModel as a literal field name rather than accepting the schema directly. Any client using the OpenAI SDK, LangChain, or the standard tool format will get a Pydantic validation error.

This is a pre-existing issue in models.py, but this PR introduces the first public examples that document and normalise the non-standard shape. Worth opening a tracked issue to fix FunctionParameters to accept a bare dict[str, Any] before these examples become the established pattern.

planetf1 · 2026-04-28T12:43:01Z

+
+        # Serialize the arguments to JSON string
+        args_json = json.dumps(model_tool_call.args)
+


json.dumps with no fallback encoder will raise TypeError if model_tool_call.args contains a non-JSON-serialisable value — datetime, Decimal, Path, numpy scalar, etc.

Tool args come from model output and backend parsing, so there is no contract guaranteeing they are pure-JSON. This TypeError bubbles up through the generic except Exception handler in app.py and returns an opaque 500 to the client.

A simple guard:

args_json = json.dumps(model_tool_call.args, default=str)

Or catch TypeError here and raise a more descriptive error so the 500 message at least identifies the offending tool.

markstur requested a review from a team as a code owner April 13, 2026 23:38

markstur requested review from avinash2692 and psschwei April 13, 2026 23:38

markstur marked this pull request as draft April 13, 2026 23:38

github-actions Bot added the enhancement New feature or request label Apr 13, 2026

psschwei previously requested changes Apr 14, 2026

View reviewed changes

psschwei reviewed Apr 14, 2026

View reviewed changes

Comment thread cli/serve/app.py Outdated

Comment thread cli/serve/app.py Outdated

Comment thread test/cli/test_serve_tool_calling.py

Comment thread docs/examples/m_serve/client_tool_calling.py Outdated

planetf1 reviewed Apr 16, 2026

View reviewed changes

Comment thread cli/serve/app.py Outdated

Comment thread docs/examples/m_serve/m_serve_example_tool_calling.py Outdated

planetf1 reviewed Apr 16, 2026

View reviewed changes

Comment thread docs/examples/m_serve/m_serve_example_tool_calling.py Outdated

markstur added 10 commits April 17, 2026 13:08

feat: add tool calling support to m serve

e62a13a

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: cli app.py loop variable tool_name is never used

62b57b9

The dict key tool_name is never used — the function name comes from model_tool_call.name. Using .values() instead. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: replace repeated hard-coded string with constant

d2bf9c9

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: remove unused imports in example

313a497

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur force-pushed the issue_825 branch from f2cea42 to 313a497 Compare April 17, 2026 20:34

markstur requested a review from psschwei April 17, 2026 20:49

markstur marked this pull request as ready for review April 17, 2026 20:49

markstur requested a review from planetf1 April 17, 2026 20:49

psschwei reviewed Apr 20, 2026

View reviewed changes

planetf1 requested changes Apr 22, 2026

View reviewed changes

Comment thread cli/serve/app.py

feat: cli support for OpenAI API tool calling with streaming

b4c05c5

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

markstur force-pushed the issue_825 branch from 2947b34 to b4c05c5 Compare April 24, 2026 01:26

markstur requested a review from planetf1 April 24, 2026 01:50

planetf1 reviewed Apr 28, 2026

View reviewed changes



		class TestToolCalling:
		"""Tests for tool calling functionality."""


		# Serialize the arguments to JSON string
		args_json = json.dumps(model_tool_call.args)

Conversation

markstur commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Changes Made

1. Updated Models (cli/serve/models.py)

2. Modified Server Logic (cli/serve/app.py)

3. Comprehensive Tests (test/cli/test_serve_tool_calling.py)

4. Updated Existing Test (test/cli/test_serve.py)

5. Example Code

Key Features

Usage

Testing

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

planetf1 commented Apr 14, 2026

Uh oh!

markstur commented Apr 14, 2026

Uh oh!

psschwei left a comment

Choose a reason for hiding this comment

Code Review: feat: add tool calling support to m serve

Summary

What's working well

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstur commented Apr 17, 2026

Code Review: feat: add tool calling support to m serve

Summary

What's working well

Uh oh!

psschwei left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

planetf1 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markstur commented Apr 13, 2026 •

edited

Loading

1. Updated Models (`cli/serve/models.py`)

2. Modified Server Logic (`cli/serve/app.py`)

3. Comprehensive Tests (`test/cli/test_serve_tool_calling.py`)

4. Updated Existing Test (`test/cli/test_serve.py`)