2026-03-25: Ray Control Plane Bootstrap
Repro pin: merlin@
2903e28b· iree_bar@ddf4685ae1Status: Active
This log tracks the first Merlin-owned Ray control-plane slice.
The goal of this workstream is not "use Ray somewhere". The goal is:
- keep TargetGen as the planning source of truth,
- make Ray the run-submission and execution control plane,
- make future MCP and agent interoperability sit on top of that.
What landed first
The first bootstrap slice adds:
- a new
merlin rayCLI surface, - file-backed cluster/run/artifact/resource metadata under
build/generated/ray/, targetgen execute --engine ray,- a new architecture note for the Ray control plane.
Why start this way
Merlin already had a useful local TargetGen executor. Replacing it outright with a distributed system would have created two moving targets at once.
The bootstrap decision was:
- keep the local executor,
- submit that executor as a Ray Job,
- let Merlin own the run records even before the full Serve/MCP layer exists.
That gives us a clean migration path:
- one planner,
- one execution contract,
- one future distributed control plane.
Local bootstrap flow
When Ray is installed:
conda run -n merlin-dev uv run tools/merlin.py ray cluster start-local
conda run -n merlin-dev uv run tools/merlin.py targetgen execute \
target_specs/examples/nvidia_vulkan_ada/capability.yaml \
--overlay target_specs/examples/nvidia_vulkan_ada/overlays/desktop_local.yaml \
--engine ray
If Ray is not installed yet, the run is still materialized as a Merlin run record with a blocked status and an actionable message. That behavior is intentional because it keeps the interface stable during bring-up.
Current run metadata
The current run root is:
Current records:
cluster/bootstrap.jsonruns/<run_id>/run_request.jsonruns/<run_id>/run_record.jsonruns/<run_id>/artifacts.jsonresources/leases/<lease_id>.json
These are meant to be easy to inspect while the Ray Serve API and MCP gateway are still under construction.
Test commands
# Start a local Ray cluster and write the bootstrap record
./merlin ray cluster start-local
# Submit a TargetGen run through the Ray engine
./merlin targetgen execute \
target_specs/examples/nvidia_vulkan_ada/capability.yaml \
--overlay target_specs/examples/nvidia_vulkan_ada/overlays/desktop_local.yaml \
--engine ray
# Verify the run record landed
ls build/generated/ray/runs/
cat build/generated/ray/runs/<run_id>/run_record.json
If Ray itself is not installed, the second command still materializes a run
record with status blocked and an actionable message — that is intentional
during bring-up.
Immediate follow-ups
- Add Ray Serve apps for run, artifact, and review APIs.
- Add the MCP gateway on top of those APIs.
- Move board, FireSim, and simulator coordination behind broker actors.
- Add real Ray-backed status and log handling once the runtime is present in the default Merlin environment.