OpenEnv documentation

Concepts

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.4.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Concepts

OpenEnv follows a client-server model inspired by Gymnasium’s simple API. Agents send structured actions to isolated environments and receive observations, rewards, and episode status in return.

+-----------------+     HTTP/WebSocket     +-----------------+
|   Your Agent    | <--------------------> |   Environment   |
|   (Client)      |    step/reset/state    |    (Server)     |
+-----------------+                        +-----------------+

Key Abstractions

Environment

An Environment is an isolated execution context where your agent can take actions and receive observations. Environments usually run as servers and expose a standard API.

Action

An Action is a structured command that your agent sends to the environment. Each environment defines its own action schema.

from coding_env import CodeAction

action = CodeAction(code="print('Hello!')")

Observation

An Observation is the response from the environment after taking an action. It contains the current state visible to your agent.

result = client.step(action)
print(result.observation.stdout)  # "Hello!"

StepResult

A StepResult bundles together everything returned from a step:

  • observation: what the agent can see
  • reward: numeric reward signal for training
  • done: whether the episode has ended
  • metadata: additional metadata returned alongside the observation

Rubric

A Rubric is a composable unit of reward computation that lives inside the environment. Rubrics can be combined with WeightedSum, Gate, and Sequential; use LLM judges for subjective criteria; and handle delayed rewards with TrajectoryRubric. See the Rubrics tutorial for the full API.

Client

A Client is how you connect to and interact with an environment. OpenEnv provides both async and sync clients.

from openenv import AutoEnv

env = AutoEnv.from_env("coding")

async with env as client:
    result = await client.reset()
    result = await client.step(action)

with env.sync() as client:
    result = client.reset()
    result = client.step(action)

The Step Loop

with env.sync() as client:
    result = client.reset()

    while not result.terminated:
        obs = result.observation
        action = decide_action(obs)
        result = client.step(action)
        learn(result.reward)

Connection Methods

Method Use Case Example
HTTP URL Remote servers, Hugging Face Spaces EnvClient(base_url="https://...")
Docker Local development EnvClient.from_docker_image("env:latest")
Cloud / custom runtime Run the server on a cloud sandbox EnvClient.from_docker_image("env:latest", provider=DaytonaProvider())
Auto-discovery Installed packages or known environments AutoEnv.from_env("echo")

See the Runtime Providers guide for the available providers and how to pick one.

Environment Anatomy

Every OpenEnv environment consists of:

my_env/
β”œβ”€β”€ openenv.yaml          # Manifest file
β”œβ”€β”€ my_env/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ client.py         # Client classes
β”‚   β”œβ”€β”€ server.py         # Server/Environment
β”‚   └── models.py         # Pydantic models
β”œβ”€β”€ Dockerfile            # Container definition
β”œβ”€β”€ pyproject.toml        # Package metadata
└── README.md             # Documentation

The Manifest (openenv.yaml)

name: my_env
version: 0.1.0
description: My custom environment

client:
  class_name: MyEnvClient
  module: my_env.client

action:
  class_name: MyAction
  module: my_env.models

observation:
  class_name: MyObservation
  module: my_env.models

default_image: my-env:latest
spec_version: 1

Models (Pydantic)

Custom Action, Observation, and State types subclass the base classes from openenv.core.env_server.types β€” not pydantic.BaseModel directly. The base Observation already carries done and reward fields, which step() populates; Action and State add metadata plumbing used by the server.

from openenv.core.env_server.types import Action, Observation, State


class MyAction(Action):
    command: str
    args: list[str] = []


class MyObservation(Observation):
    output: str
    success: bool


class MyState(State):
    history: list[str] = []

Environment Class

Environments subclass the abstract Environment[ActT, ObsT, StateT] base and implement reset, step, and the state property. Reward and termination are carried on the returned observation β€” they are not a tuple return value.

from openenv.core.env_server.interfaces import Environment


class MyEnvironment(Environment[MyAction, MyObservation, MyState]):
    def reset(self, seed=None, episode_id=None, **kwargs) -> MyObservation:
        ...

    def step(self, action: MyAction, timeout_s=None, **kwargs) -> MyObservation:
        ...

    @property
    def state(self) -> MyState:
        ...

Server (FastAPI)

Use create_app from openenv.core.env_server to wrap the environment as a FastAPI application. Pass the environment class (used as a factory so each WebSocket session gets its own instance) along with the action and observation types:

from openenv.core.env_server import create_app

app = create_app(
    MyEnvironment,
    MyAction,
    MyObservation,
    env_name="my_env",
)

This is what the environment’s server/app.py entry point typically does β€” see envs/echo_env/server/app.py for a minimal real example.

Rewards via the Rubric

Rewards are computed inside the environment, not by external code. The base Environment accepts an optional rubric on __init__ β€” pass it to super().__init__(rubric=...), call self._reset_rubric() from reset, and self._apply_rubric(action, observation) from step (or _apply_rubric_async from step_async). The Rubrics tutorial covers the composable API end-to-end.

Next Steps

Update on GitHub