Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ All notable changes to the [Nucleus Python Client](https://github.com/scaleapi/n
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.18.3](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.18.3) - 2026-05-18

### Added
- `DatasetItem.phash` field exposing the 64-character "0/1" perceptual-hash string when populated by the Nucleus backend. Available on every SDK method that yields a `DatasetItem` (e.g. `items_and_annotation_generator`, `items_generator`, `query_items`, `dataset.items`, `iloc`/`refloc`/`loc`).

## [0.18.2](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.18.2) - 2026-05-08

### Added
Expand Down
52 changes: 52 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# CLAUDE.md

Notes for Claude Code when working in this repo (the Nucleus Python SDK).

## What this repo is

The official Python client for Nucleus. Wraps the `/v1/nucleus` REST endpoints on `scaleapi`. Distributed on PyPI as `scale-nucleus`.

- Sources live under `nucleus/`.
- Backend lives in the `scaleapi` repo at `server/src/routes/v1/select.ts` and `server/src/lib/select/api/`.
- The default API base URL is `NUCLEUS_ENDPOINT = "https://api.scale.com/v1/nucleus"` (`nucleus/constants.py`). Override via the `endpoint=` kwarg or `NUCLEUS_ENDPOINT` env var (e.g. point at fedramp).

## Release workflow

Releases are version-numbered with [Semantic Versioning](https://semver.org/) and tracked in `CHANGELOG.md` using the [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.

When making a user-facing change, the convention (see PRs #459, #455) is:

1. Bump `version = "..."` in `pyproject.toml` under `[tool.poetry]`. This is the single version source — there is no `__version__` in `nucleus/__init__.py`.
- Patch bump for additive, backwards-compatible changes (new fields, new methods).
- Minor bump for new features that change behaviour or remove deprecated paths.
- Major bump for breaking changes (Python version drops, sentinel removal, etc.).
2. Prepend a `## [X.Y.Z](https://github.com/scaleapi/nucleus-python-client/releases/tag/vX.Y.Z) - YYYY-MM-DD` section to `CHANGELOG.md` with `### Added` / `### Changed` / `### Fixed` / `### Removed` subsections as appropriate.
3. Commit the version bump + CHANGELOG entry alongside the code change in the same PR.

Pure refactors / doc-only PRs (#456) sometimes skip the version bump. When in doubt, bump.

## Branch and PR conventions

- Branch naming: `<author>/<kebab-description>` (e.g. `vinayparakala/expose-phash-on-dataset-item`).
- PR title commonly starts with the Linear ticket: `[DE-XXXX] <description>` — see `git log --oneline -20`.
- PRs land via squash merge.

## Architecture pointers

- `nucleus/__init__.py` — `NucleusClient`, top-level operations.
- `nucleus/dataset.py` — `Dataset` class. Most user-facing methods live here (item upload/fetch, generators, queries, slices, autotags, exports). Generators page through the backend via `nucleus/utils.py:paginate_generator`.
- `nucleus/dataset_item.py` — `DatasetItem` dataclass. **`DatasetItem.from_json` is the single deserialization entry point** for items coming back from the API — every SDK method that returns a `DatasetItem` (generators, queries, `iloc`/`refloc`/`loc`, the `items` property) routes through it. To expose a new server-side field on items, add it to the dataclass + `from_json` and you're done on the SDK side.
- `nucleus/utils.py` — `convert_export_payload` and `format_dataset_item_response` are the shared shapers used by the export and single-item endpoints. They wrap raw JSON into typed objects via the respective `from_json` classmethods.
- `nucleus/constants.py` — All API payload keys are constants here. When adding a new field, add a `*_KEY` constant first and reference it from `from_json` / `to_payload` rather than inlining the string.
- `nucleus/annotation.py`, `nucleus/prediction.py` — Annotation and prediction types. Each has its own `from_json` / `to_payload`.

## Testing

Run the suite from the repo root:

```bash
poetry install
poetry run pytest tests
```

Many tests require a real `NUCLEUS_API_KEY` and hit the live API; use `pytest -k <name>` to scope. Pre-commit hooks (`.pre-commit-config.yaml`) run black, ruff, isort.
1 change: 1 addition & 0 deletions nucleus/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@
OBJECT_IDS_KEY = "object_ids"
P1_KEY = "p1"
P2_KEY = "p2"
PHASH_KEY = "phash"
POINTS_KEY = "points"
POINTCLOUD_KEY = "pointcloud"
POINTCLOUD_LOCATION_KEY = "pointcloud_location"
Expand Down
6 changes: 6 additions & 0 deletions nucleus/dataset_item.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
INDEX_ID_KEY,
METADATA_KEY,
ORIGINAL_IMAGE_URL_KEY,
PHASH_KEY,
POINTCLOUD_URL_KEY,
PROCESSED_URL_KEY,
REFERENCE_ID_KEY,
Expand Down Expand Up @@ -123,6 +124,10 @@ class DatasetItem: # pylint: disable=R0902
embedding_info: Optional[DatasetItemEmbeddingInfo] = None
width: Optional[int] = None
height: Optional[int] = None
# Perceptual hash of the underlying image as a 64-character "0/1" binary
# string. Populated by the Nucleus backend on items that have been pHash
# backfilled; None for pointcloud items or items without a backfilled hash.
phash: Optional[str] = None

def __post_init__(self):
assert self.reference_id is not None, "reference_id is required."
Expand Down Expand Up @@ -178,6 +183,7 @@ def from_json(cls, payload: dict):
pointcloud_location=pointcloud_url,
reference_id=payload.get(REFERENCE_ID_KEY),
metadata=payload.get(METADATA_KEY, {}),
phash=payload.get(PHASH_KEY),
)

def local_file_exists(self):
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ ignore = ["E501", "E741", "E731", "F401"] # Easy ignore for getting it running

[tool.poetry]
name = "scale-nucleus"
version = "0.18.2"
version = "0.18.3"
description = "The official Python client library for Nucleus, the Data Platform for AI"
license = "MIT"
authors = ["Scale AI Nucleus Team <nucleusapi@scaleapi.com>"]
Expand Down