Reference/hydra-sdk

Hydra SDK

Python interface to the Dominir data layer. Rust-native scan engine, zero-copy Arrow interchange, ontology-validated writes, and in-memory graph traversal.

python

from hydra_sdk import HydraStore, InstanceFrame

store = HydraStore()

frame = store.scan(object_key="Person")
df = frame.to_pandas()

report = store.write_instance(
    "person_Instance",
    {"hasGivenName": "Alice", "hasFamilyName": "Smith"},
)

Exports

python

from hydra_sdk import (
    HydraStore,        # main entry point
    InstanceFrame,     # scan result wrapper
    HydraGraph,        # in-memory graph
    TraverseResult,    # traversal output
    hoist,             # extract leaf slots to Arrow columns
    to_chart_df,       # extract leaf slots to Pandas DataFrame
    HydraMissingField, # required petiole absent
    HydraTypeMismatch, # cardinality violation
    HydraLockTimeout,  # write lock contention
    HydraInvalidID,    # bad instance_id format
)

HydraStore

Constructor

python

HydraStore(
    repo_root:     str | Path | None = None,   # workspace path (auto-detected)
    default_store: str | None        = None,   # scope to one store
    batch_id:      str               = "00",   # 2-char provenance prefix for IDs
)

No arguments needed inside Dominir. repo_root is only for external use.

Properties

Property	Type	Description
`default_store`	`str \| None`	Get/set the workspace scope. Writable.
`store_ids`	`list[str]`	All store IDs in `storage/fragments/`.
`model_names`	`list[str]`	All model names from the ontology.

Reading Data

`store.scan()`

python

store.scan(
    store_id:    str | None = None,
    *,
    object_key:  str | None = None,
    instance_id: str | None = None,
    of_branch:   str | None = None,
    value_str:   str | None = None,
) -> InstanceFrame

All filters are AND-combined. Filtering executes in Rust during the scan — never post-hoc in Python.

Parameter	Matches
`object_key`	Schema type. Accepts raw keys (`person_Instance`) or model names (`Person`).
`instance_id`	Exact instance ID (`Pe-00-a3bX9k`).
`of_branch`	Leaf branch name (`Gender`, `GivenName`).
`value_str`	Exact leaf value string.

Store scoping:

`store_id`	`default_store`	Scans
`"Demo"`	any	Demo only
`None`	`"Demo"`	Demo only
`None`	`None`	All stores

`store.scan_all()`

Same filters as scan(). Ignores default_store — always scans every store.

`store.scan_json()`

Same as scan() but returns list[dict] instead of InstanceFrame.

Writing Data

`store.write_instance()`

python

store.write_instance(
    object_key:  str,
    petioles:    dict[str, Any],
    *,
    instance_id: str | None = None,   # auto-generated HydraID if omitted
    store_id:    str | None = None,   # falls back to default_store → "default"
    version:     int        = 1,
    created_by:  str | None = None,
) -> dict

petioles is a flat {slot_name: value} dict:

python

{"hasGivenName": "Alice", "hasFamilyName": "Smith", "hasDOB": "1990-05-12"}

Returns a ValidationReport:

Key	Type	Description
`success`	`bool`	`True` if no hard errors.
`canonical_count`	`int`	Petioles written to the primary index.
`shadow_petioles`	`list[str]`	Slots not in the ontology (written with `shadow::` prefix).
`errors`	`list[dict]`	Validation errors. Each has a `type` key.

Raises:

Exception	When
`HydraMissingField`	Required petiole absent or unknown `object_key`.
`HydraTypeMismatch`	Cardinality constraint violated.
`HydraLockTimeout`	Write lock not acquired within 5 seconds.
`HydraInvalidID`	Manual `instance_id` doesn't match HydraID format.

`store.validate_write()`

python

store.validate_write(object_key: str, petioles: dict[str, Any]) -> dict

Dry-run validation. Same return as write_instance() but no disk I/O.

python

report = store.validate_write("person_Instance", {"hasGivenName": "Alice"})
# report["errors"] → [{"type": "MissingRequiredField", "petiole": "hasFamilyName"}]

InstanceFrame

Thin wrapper around pyarrow.Table. All filtering returns a new frame (chainable).

Member	Returns	Description
`table`	`pa.Table`	Underlying Arrow table (zero-copy).
`types`	`list[str]`	Unique instance types in the frame.
`to_pandas()`	`DataFrame`	Convert to Pandas. Auto-parses JSON columns.
`to_dicts()`	`list[dict]`	Rows as plain Python dicts.
`filter_type(t)`	`InstanceFrame`	Keep rows where `instance_type == t`.
`filter_id(id)`	`InstanceFrame`	Keep rows where `instance_id == id`.
`len(frame)`	`int`	Row count.

Columns

Column	Type	Content
`instance_type`	`str`	Model name (`Person`, `Case`).
`instance_id`	`str`	Unique ID (`Pe-00-FQiher`).
`version`	`int64`	Monotonic version counter.
`timestamp`	`str` (nullable)	ISO-8601 write timestamp.
`leaf_values`	`str` (JSON)	`{slot: {ofBranch, values: [...]}}`
`petiole_order`	`str` (JSON)	Ordered list of slot names.

In the raw Arrow table, leaf_values and petiole_order are UTF-8 strings containing JSON. Only to_pandas() and to_dicts() inflate them into Python objects.

leaf_values structure

Each instance's leaf_values is a nested dict:

json

{
  "hasGivenName": {"ofBranch": "GivenName", "values": ["Jennifer"]},
  "hasGender":    {"ofBranch": "Gender",    "values": ["Female"]},
  "hasDOB":       {"ofBranch": "Date",      "values": ["1990-05-12"]}
}

Tidy Helpers

`hoist()`

python

from hydra_sdk import hoist

table = hoist(frame, ["hasGivenName", "hasGender"], keep_ids=True)
# → pa.Table with columns: instance_type, instance_id, hasGivenName, hasGender

Extracts the first value from each slot. Returns a pyarrow.Table (zero-copy where possible).

`to_chart_df()`

python

from hydra_sdk import to_chart_df

df = to_chart_df(frame, ["hasGivenName", "hasGender"])
# → pandas.DataFrame ready for alt.Chart(df) or plt.bar(...)

Equivalent to hoist(...).to_pandas().

HydraGraph

In-memory graph with BFS/DFS traversal. Nodes are instances keyed as ModelName-InstanceId. Edges are derived from cross-reference leaf values in the ontology.

Building

python

graph = store.load_graph(store_id="Demo")
# <HydraGraph nodes=2490 edges=3812>

`graph.traverse()`

python

graph.traverse(
    start_key: str,
    *,
    algo:      str = "bfs",   # "bfs" or "dfs"
    max_depth: int = 3,       # 0–25
) -> TraverseResult

Root key formats:

Format	Example	Seeds
`ModelName-InstanceId`	`"Person-Pe-00-abc"`	Single instance at depth 0.
`leaf::Branch::Value`	`"leaf::Gender::Female"`	All instances with that value at depth 1.
`type::ModelName`	`"type::Person"`	All instances of that type at depth 0.

TraverseResult

Field	Type	Description
`algo`	`str`	`"bfs"` or `"dfs"`.
`start_key`	`str`	The root key used.
`max_depth`	`int`	Depth limit applied.
`levels`	`dict[str, int]`	Node key → depth from root.
`order`	`list[str]`	Keys in visit order.
`edges`	`list[tuple[str, str]]`	Undirected `(source, target)` pairs.
`instances`	`list[dict]`	Full instance data for each node.

Schema Introspection

Method	Returns
`store.model_names`	All model names from the ontology (`list[str]`).
`store.validate_write(object_key, petioles)`	Validation report with errors for missing/invalid slots.

For full slot-level schema: call the hydra_schema agent tool with object_key="person_Instance".

HydraID

Structured 12-character identifier: Pe-NB-FQiher.

Pe-NB-FQiher
│   │   └── 6-char base62 suffix (timestamp + entropy, 35 bits)
│   └────── 2-char batch prefix  (write source provenance)
└────────── 2-char type prefix   (from ontology id_prefix)

Batch ID	Source
`00`	Default / unspecified
`NB`	Notebook
`AI`	Agent
`AP`	API / backend
`MG`	Data migration

Exceptions

Exception	Raised when
`HydraMissingField`	Required petiole absent, or `object_key` unknown.
`HydraTypeMismatch`	Cardinality constraint violated.
`HydraLockTimeout`	Write lock not acquired within 5 s (another writer active).
`HydraInvalidID`	Manual `instance_id` not in HydraID format.

All are raised from Rust before any disk mutation.

Shadow Petioles

Slots not in the ontology are stored with a shadow::ModelName branch prefix.

python

report = store.write_instance(
    "person_Instance",
    {
        "hasGivenName":     "Alice",      # canonical
        "confidence_score": 0.92,         # shadow — not in ontology
    },
)
report["shadow_petioles"]   # ["confidence_score"]
# Retrieve later:
store.scan(of_branch="shadow::Person")

Agent Mode

When running under an agent context, store is a read-only proxy. Write attempts raise PermissionError. Agents write via the hydra_write tool, which goes through the policy engine for human approval.

Recipes

Scan and inspect

python

frame = store.scan(object_key="Person")
for row in frame.to_dicts()[:3]:
    name = row["leaf_values"].get("hasGivenName", {}).get("values", [])
    print(f"{row['instance_id']}: {name}")

Flatten leaves to a DataFrame

python

from hydra_sdk import to_chart_df

df = to_chart_df(
    store.scan(object_key="Person"),
    ["hasGivenName", "hasFamilyName", "hasGender"],
)

Arrow-level compute (zero-copy)

python

import pyarrow.compute as pc

frame = store.scan(object_key="Person")
mask = pc.starts_with(frame.table["timestamp"], "2024")
recent = frame.table.filter(mask)

Write with upsert (version bump)

python

store.write_instance("person_Instance",
    {"hasGivenName": "Alice", "hasFamilyName": "Smith"},
    instance_id="Pe-NB-abc123", version=1)

store.write_instance("person_Instance",
    {"hasGivenName": "Alice", "hasFamilyName": "Jones"},
    instance_id="Pe-NB-abc123", version=2)   # higher version wins

Dry-run validation before batch write

python

records = [
    {"hasGivenName": "Bob", "hasFamilyName": "Jones"},
    {"hasGivenName": "Carol"},
]

for rec in records:
    report = store.validate_write("person_Instance", rec)
    if report["success"]:
        store.write_instance("person_Instance", rec)
    else:
        print(f"Invalid: {report['errors']}")

Graph traversal

python

graph = store.load_graph(store_id="Demo")

result = graph.traverse("Person-Pe-00-abc", algo="bfs", max_depth=3)
for key in result.order[:5]:
    print(f"  L{result.levels[key]}: {key}")

# From a leaf value
result = graph.traverse("leaf::Gender::Female", max_depth=2)

# From a type root
result = graph.traverse("type::Person", algo="dfs", max_depth=1)

Error handling

python

from hydra_sdk import HydraMissingField, HydraTypeMismatch, HydraLockTimeout

try:
    store.write_instance("person_Instance", {"hasGivenName": "Alice"})
except HydraMissingField as e:
    print(f"Missing: {e}")
except HydraTypeMismatch as e:
    print(f"Cardinality: {e}")
except HydraLockTimeout:
    print("Store busy — retry.")

PreviousConnecting MCP Servers