Reference/hydra-sdk

Hydra SDK

Python interface to the Dominir data layer. Rust-native scan engine, zero-copy Arrow interchange, ontology-validated writes, and in-memory graph traversal.

python
from hydra_sdk import HydraStore, InstanceFrame

store = HydraStore()

frame = store.scan(object_key="Person")
df = frame.to_pandas()

report = store.write_instance(
    "person_Instance",
    {"hasGivenName": "Alice", "hasFamilyName": "Smith"},
)

Exports

python
from hydra_sdk import (
    HydraStore,        # main entry point
    InstanceFrame,     # scan result wrapper
    HydraGraph,        # in-memory graph
    TraverseResult,    # traversal output
    hoist,             # extract leaf slots to Arrow columns
    to_chart_df,       # extract leaf slots to Pandas DataFrame
    HydraMissingField, # required petiole absent
    HydraTypeMismatch, # cardinality violation
    HydraLockTimeout,  # write lock contention
    HydraInvalidID,    # bad instance_id format
)

HydraStore

Constructor

python
HydraStore(
    repo_root:     str | Path | None = None,   # workspace path (auto-detected)
    default_store: str | None        = None,   # scope to one store
    batch_id:      str               = "00",   # 2-char provenance prefix for IDs
)

No arguments needed inside Dominir. repo_root is only for external use.

Properties

PropertyTypeDescription
default_storestr | NoneGet/set the workspace scope. Writable.
store_idslist[str]All store IDs in storage/fragments/.
model_nameslist[str]All model names from the ontology.

Reading Data

store.scan()

python
store.scan(
    store_id:    str | None = None,
    *,
    object_key:  str | None = None,
    instance_id: str | None = None,
    of_branch:   str | None = None,
    value_str:   str | None = None,
) -> InstanceFrame

All filters are AND-combined. Filtering executes in Rust during the scan — never post-hoc in Python.

ParameterMatches
object_keySchema type. Accepts raw keys (person_Instance) or model names (Person).
instance_idExact instance ID (Pe-00-a3bX9k).
of_branchLeaf branch name (Gender, GivenName).
value_strExact leaf value string.

Store scoping:

store_iddefault_storeScans
"Demo"anyDemo only
None"Demo"Demo only
NoneNoneAll stores

store.scan_all()

Same filters as scan(). Ignores default_store — always scans every store.

store.scan_json()

Same as scan() but returns list[dict] instead of InstanceFrame.

Writing Data

store.write_instance()

python
store.write_instance(
    object_key:  str,
    petioles:    dict[str, Any],
    *,
    instance_id: str | None = None,   # auto-generated HydraID if omitted
    store_id:    str | None = None,   # falls back to default_store → "default"
    version:     int        = 1,
    created_by:  str | None = None,
) -> dict

petioles is a flat {slot_name: value} dict:

python
{"hasGivenName": "Alice", "hasFamilyName": "Smith", "hasDOB": "1990-05-12"}

Returns a ValidationReport:

KeyTypeDescription
successboolTrue if no hard errors.
canonical_countintPetioles written to the primary index.
shadow_petioleslist[str]Slots not in the ontology (written with shadow:: prefix).
errorslist[dict]Validation errors. Each has a type key.

Raises:

ExceptionWhen
HydraMissingFieldRequired petiole absent or unknown object_key.
HydraTypeMismatchCardinality constraint violated.
HydraLockTimeoutWrite lock not acquired within 5 seconds.
HydraInvalidIDManual instance_id doesn't match HydraID format.

store.validate_write()

python
store.validate_write(object_key: str, petioles: dict[str, Any]) -> dict

Dry-run validation. Same return as write_instance() but no disk I/O.

python
report = store.validate_write("person_Instance", {"hasGivenName": "Alice"})
# report["errors"] → [{"type": "MissingRequiredField", "petiole": "hasFamilyName"}]

InstanceFrame

Thin wrapper around pyarrow.Table. All filtering returns a new frame (chainable).

MemberReturnsDescription
tablepa.TableUnderlying Arrow table (zero-copy).
typeslist[str]Unique instance types in the frame.
to_pandas()DataFrameConvert to Pandas. Auto-parses JSON columns.
to_dicts()list[dict]Rows as plain Python dicts.
filter_type(t)InstanceFrameKeep rows where instance_type == t.
filter_id(id)InstanceFrameKeep rows where instance_id == id.
len(frame)intRow count.

Columns

ColumnTypeContent
instance_typestrModel name (Person, Case).
instance_idstrUnique ID (Pe-00-FQiher).
versionint64Monotonic version counter.
timestampstr (nullable)ISO-8601 write timestamp.
leaf_valuesstr (JSON){slot: {ofBranch, values: [...]}}
petiole_orderstr (JSON)Ordered list of slot names.

In the raw Arrow table, leaf_values and petiole_order are UTF-8 strings containing JSON. Only to_pandas() and to_dicts() inflate them into Python objects.

leaf_values structure

Each instance's leaf_values is a nested dict:

json
{
  "hasGivenName": {"ofBranch": "GivenName", "values": ["Jennifer"]},
  "hasGender":    {"ofBranch": "Gender",    "values": ["Female"]},
  "hasDOB":       {"ofBranch": "Date",      "values": ["1990-05-12"]}
}

Tidy Helpers

hoist()

python
from hydra_sdk import hoist

table = hoist(frame, ["hasGivenName", "hasGender"], keep_ids=True)
# → pa.Table with columns: instance_type, instance_id, hasGivenName, hasGender

Extracts the first value from each slot. Returns a pyarrow.Table (zero-copy where possible).

to_chart_df()

python
from hydra_sdk import to_chart_df

df = to_chart_df(frame, ["hasGivenName", "hasGender"])
# → pandas.DataFrame ready for alt.Chart(df) or plt.bar(...)

Equivalent to hoist(...).to_pandas().

HydraGraph

In-memory graph with BFS/DFS traversal. Nodes are instances keyed as ModelName-InstanceId. Edges are derived from cross-reference leaf values in the ontology.

Building

python
graph = store.load_graph(store_id="Demo")
# <HydraGraph nodes=2490 edges=3812>

graph.traverse()

python
graph.traverse(
    start_key: str,
    *,
    algo:      str = "bfs",   # "bfs" or "dfs"
    max_depth: int = 3,       # 0–25
) -> TraverseResult

Root key formats:

FormatExampleSeeds
ModelName-InstanceId"Person-Pe-00-abc"Single instance at depth 0.
leaf::Branch::Value"leaf::Gender::Female"All instances with that value at depth 1.
type::ModelName"type::Person"All instances of that type at depth 0.

TraverseResult

FieldTypeDescription
algostr"bfs" or "dfs".
start_keystrThe root key used.
max_depthintDepth limit applied.
levelsdict[str, int]Node key → depth from root.
orderlist[str]Keys in visit order.
edgeslist[tuple[str, str]]Undirected (source, target) pairs.
instanceslist[dict]Full instance data for each node.

Schema Introspection

MethodReturns
store.model_namesAll model names from the ontology (list[str]).
store.validate_write(object_key, petioles)Validation report with errors for missing/invalid slots.

For full slot-level schema: call the hydra_schema agent tool with object_key="person_Instance".

HydraID

Structured 12-character identifier: Pe-NB-FQiher.

Pe-NB-FQiher │ │ └── 6-char base62 suffix (timestamp + entropy, 35 bits) │ └────── 2-char batch prefix (write source provenance) └────────── 2-char type prefix (from ontology id_prefix)
Batch IDSource
00Default / unspecified
NBNotebook
AIAgent
APAPI / backend
MGData migration

Exceptions

ExceptionRaised when
HydraMissingFieldRequired petiole absent, or object_key unknown.
HydraTypeMismatchCardinality constraint violated.
HydraLockTimeoutWrite lock not acquired within 5 s (another writer active).
HydraInvalidIDManual instance_id not in HydraID format.

All are raised from Rust before any disk mutation.

Shadow Petioles

Slots not in the ontology are stored with a shadow::ModelName branch prefix.

python
report = store.write_instance(
    "person_Instance",
    {
        "hasGivenName":     "Alice",      # canonical
        "confidence_score": 0.92,         # shadow — not in ontology
    },
)
report["shadow_petioles"]   # ["confidence_score"]
# Retrieve later:
store.scan(of_branch="shadow::Person")

Agent Mode

When running under an agent context, store is a read-only proxy. Write attempts raise PermissionError. Agents write via the hydra_write tool, which goes through the policy engine for human approval.

Recipes

Scan and inspect

python
frame = store.scan(object_key="Person")
for row in frame.to_dicts()[:3]:
    name = row["leaf_values"].get("hasGivenName", {}).get("values", [])
    print(f"{row['instance_id']}: {name}")

Flatten leaves to a DataFrame

python
from hydra_sdk import to_chart_df

df = to_chart_df(
    store.scan(object_key="Person"),
    ["hasGivenName", "hasFamilyName", "hasGender"],
)

Arrow-level compute (zero-copy)

python
import pyarrow.compute as pc

frame = store.scan(object_key="Person")
mask = pc.starts_with(frame.table["timestamp"], "2024")
recent = frame.table.filter(mask)

Write with upsert (version bump)

python
store.write_instance("person_Instance",
    {"hasGivenName": "Alice", "hasFamilyName": "Smith"},
    instance_id="Pe-NB-abc123", version=1)

store.write_instance("person_Instance",
    {"hasGivenName": "Alice", "hasFamilyName": "Jones"},
    instance_id="Pe-NB-abc123", version=2)   # higher version wins

Dry-run validation before batch write

python
records = [
    {"hasGivenName": "Bob", "hasFamilyName": "Jones"},
    {"hasGivenName": "Carol"},
]

for rec in records:
    report = store.validate_write("person_Instance", rec)
    if report["success"]:
        store.write_instance("person_Instance", rec)
    else:
        print(f"Invalid: {report['errors']}")

Graph traversal

python
graph = store.load_graph(store_id="Demo")

result = graph.traverse("Person-Pe-00-abc", algo="bfs", max_depth=3)
for key in result.order[:5]:
    print(f"  L{result.levels[key]}: {key}")

# From a leaf value
result = graph.traverse("leaf::Gender::Female", max_depth=2)

# From a type root
result = graph.traverse("type::Person", algo="dfs", max_depth=1)

Error handling

python
from hydra_sdk import HydraMissingField, HydraTypeMismatch, HydraLockTimeout

try:
    store.write_instance("person_Instance", {"hasGivenName": "Alice"})
except HydraMissingField as e:
    print(f"Missing: {e}")
except HydraTypeMismatch as e:
    print(f"Cardinality: {e}")
except HydraLockTimeout:
    print("Store busy — retry.")
PreviousConnecting MCP Servers