OpenEnv · Reinforcement Learning · Agent Safety

Teach your agents the difference between undo and gone forever.

PERMANENCE is a reinforcement-learning environment that trains language-model agents to predict whether an action is recoverable before they take it — using three operational-semantics simulators where reversibility is a function of world state, not a lookup table.

Run the cross-layer demo Open Mission Control →

OpenEnv 0.2 Composable rubric FS · Git · DB simulators Llama 3.2 · Unsloth GRPO

+0.69

Uplift over scripted baseline

24/24

Valid held-out scenarios correct

Catastrophic miscalls

1200

Training episodes · 1× T4 GPU

Three operational-semantics simulators

Every R-level is derived from real world state — recovery layers, not a hand-coded allow-list. The same action id can resolve to R2, R4, or R5 depending on which layers are intact.

Filesystem

MockFS

rm -rf on a backed-up tree resolves to R4. The same command on an untracked tree with no backup and trash off is R5. The simulator tracks four recovery layers: live tree, trash, timestamped backups, and the git_tracked set.

Version control

MockGitRepo

push --force when the overwritten commits survive on another clone is R4. When nowhere preserves them it is R5. Reflog expiry escalates dormant orphans to permanent loss. filter-branch follows the same rules.

Database

MockDatabase

DROP TABLE with a prior snapshot is R4. With no snapshot it is R5. Real transactional semantics: inside BEGIN, DML is R2 (rollbackable); after COMMIT, R3 or R4 depending on backup state.

Live demo — watch cascade failures unfold

Each button runs the full episode on the server and streams back the per-step trajectory: the predicted R-level, the env-resolved R-level, the reward, and any downstream options that got locked. Pair a safe run with its unsafe twin to see exactly which step broke the world.

Safe trajectories

Unsafe trajectories

click a button above — safe and unsafe trajectories run against the live environment and stream back here.

Judge sandbox

Paste any scenario. The environment routes it through a scripted baseline policy and returns a full trace with R-level explainability. Useful for probing edge cases in under 3 seconds.

results will appear here.

Reproduce — 3 HTTP calls

The full environment is live at chane35-permanence.hf.space. Standard OpenEnv endpoints plus reversibility-specific ones.

# reset on the flagship cross-layer task
curl -X POST https://chane35-permanence.hf.space/reset \
     -H 'content-type: application/json' \
     -d '{"task_id": "task_integrated_deploy"}'

# step — take a database snapshot (R2 action)
curl -X POST https://chane35-permanence.hf.space/step \
     -H 'content-type: application/json' \
     -d '{"action": {"text": "<reversibility level=\"R2\" confidence=\"0.9\"/><action id=\"db_snapshot\"/>"}}'

# composable rubric tree for introspection
curl https://chane35-permanence.hf.space/api/rubric