Skip to content

Run Lifecycle

This page covers the part of the SDK that every Runicorn user should understand well: creating a run, logging metrics, recording summaries, and finishing cleanly.


rn.init(...)

Create a new run:

Signature:

def init(
    path: str | None = None,
    storage: str | None = None,
    run_id: str | None = None,
    alias: str | None = None,
    capture_env: bool = False,
    snapshot_code: bool = False,
    workspace_root: str | None = None,
    snapshot_format: str = "zip",
    force_snapshot: bool = False,
    capture_console: bool = False,
    tqdm_mode: str = "smart",
) -> Run | NoOpRun

import runicorn as rn

run = rn.init(
    path="cv/resnet50/baseline",
    storage="E:\\RunicornData",
    alias="trial-01",
    capture_env=False,
    snapshot_code=False,
    workspace_root=None,
    snapshot_format="zip",
    force_snapshot=False,
    capture_console=False,
    tqdm_mode="smart",
)
Parameter Type Optional Default Recommended
path str \| None Yes None Yes
storage str \| None Yes None Situational
run_id str \| None Yes auto-generated No
alias str \| None Yes None Usually
capture_env bool Yes False Situational
snapshot_code bool Yes False Usually
workspace_root str \| None Yes None Situational
snapshot_format str Yes "zip" No
force_snapshot bool Yes False Rarely
capture_console bool Yes False Usually
tqdm_mode str Yes "smart" Yes when using console capture

Parameter notes

  • path: Hierarchical run path such as cv/resnet50/baseline. None becomes "default".
  • storage: Override the resolved storage root for this run only.
  • run_id: Explicit run ID. Mostly useful for controlled imports, tests, or special recovery flows.
  • alias: Short human-friendly label shown in the UI.
  • capture_env: Capture environment information such as runtime and system context.
  • snapshot_code: Archive the workspace at run start so the run records the code state it used.
  • workspace_root: Explicit root directory for code snapshotting and output scanning.
  • snapshot_format: Snapshot archive format. Currently only "zip" is supported.
  • force_snapshot: Override snapshot size or file-count safety checks for unusually large workspaces.
  • capture_console: Capture stdout/stderr into logs.txt so logs appear in the viewer even if you use print().
  • tqdm_mode: Controls tqdm handling: smart, all, or none.

Path rules

path is normalized and validated:

  • None becomes "default"
  • "/" becomes the root path
  • backslashes are normalized to /
  • invalid characters and traversal-like values are rejected

Examples:

rn.init(path="cv/detection/yolo")
rn.init(path="research/ablation/seed-42")
rn.init(path="/")  # root-level runs

run.log(...)

Log scalar metrics during training:

Signature:

def log(
    data: dict | None = None,
    *,
    step: int | None = None,
    stage: str | None = None,
    **kwargs,
) -> None

Parameter Type Optional Default Recommended
data dict \| None Yes None Usually
step int \| None Yes auto-increment Situational
stage str \| None Yes None Situational
**kwargs extra scalar fields Yes none Situational

Parameter notes

  • data: Main metric payload as a dict, such as {"loss": 0.42, "acc": 0.88}.
  • step: Explicit step value. If omitted, Runicorn increments the internal global_step.
  • stage: Optional stage label. In many projects, a common pattern is values like "epoch_1", "epoch_2", and so on.
  • **kwargs: Additional metrics. Useful when you prefer run.log(loss=..., acc=...).
run.log({"loss": 0.42, "acc": 0.88}, step=1, stage="epoch_1")
run.log(val_loss=0.39, val_acc=0.90, step=2, stage="epoch_2")

What run.log() does

  • accepts either a dict, keyword metrics, or both
  • records a global_step
  • records a time field automatically
  • optionally records stage

Step behavior

If you do not pass step, Runicorn auto-increments an internal global step counter:

run.log({"loss": 0.5})  # global_step = 1
run.log({"loss": 0.4})  # global_step = 2

If you do pass step, Runicorn uses that explicit value:

run.log({"loss": 0.5}, step=100)

Stage behavior

Use stage when you want the UI to separate metric records into coarse buckets. In many training scripts, people simply use epoch-style labels:

run.log({"loss": 0.8}, step=50, stage="epoch_1")
run.log({"loss": 0.4}, step=100, stage="epoch_2")
run.log({"val_loss": 0.35}, step=150, stage="epoch_3")

How to handle train vs eval without step misalignment

In practice, many users do not rely on stage alone to distinguish train and eval. A clearer pattern is:

  • use different metric names such as train_loss, train_acc, val_loss, val_acc
  • keep train and eval aligned to the same training step or epoch boundary
  • avoid bumping step by one just because you entered evaluation

For example, if you evaluate once per epoch or every 10,000 steps, it is usually better to treat that eval result as belonging to the same step boundary you just finished, instead of logging it at step + 1.

One practical pattern is to keep the latest eval value on the training timeline:

last_val_acc = 0.0

for step in range(1, total_steps + 1):
    train_loss, train_acc = train_step()

    if step % 10000 == 0:
        last_val_acc = evaluate()

    run.log({
        "train_loss": train_loss,
        "train_acc": train_acc,
        "val_acc": last_val_acc,
    }, step=step, stage=f"epoch_{current_epoch}")

Why people do this:

  • train and eval metrics stay on the same step axis
  • the chart remains aligned at epoch or evaluation boundaries
  • you avoid the visual offset caused by logging eval at step + 1

If your team prefers it, using an initial placeholder such as val_acc = 0.0 until the first eval finishes is a valid pragmatic choice for alignment-focused charts.


run.set_primary_metric(...)

Tell Runicorn which metric should be treated as the main score:

Signature:

def set_primary_metric(metric_name: str, mode: str = "max") -> None

Parameter Type Optional Default Recommended
metric_name str No none Yes
mode str Yes "max" Yes

Parameter notes

  • metric_name: The metric to surface in the experiments table and compare flows.
  • mode: Use "max" when higher is better, "min" when lower is better.
run.set_primary_metric("val_acc", mode="max")

Or for loss:

run.set_primary_metric("val_loss", mode="min")

The best value is tracked as you log metrics and written into the summary on finish.


run.log_text(...)

Write text lines into the run log:

Signature:

def log_text(text: str) -> None

Parameter Type Optional Default Recommended
text str No none Situational

Parameter notes

  • text: Free-form log text. Multi-line strings are supported and timestamps are prepended to non-empty lines.
run.log_text("starting epoch 3")
run.log_text("checkpoint saved")

run.log_text(...) writes directly to the run's logs.txt.

Important behavior:

  • it does write to logs.txt
  • it does not automatically print to your terminal
  • if you want both terminal output and Runicorn logging, call both print(...) and run.log_text(...), or enable capture_console=True

Runicorn prepends timestamps to non-empty lines before writing them.

run.log_text(...) vs print(...) when capture_console=True

Call Terminal output logs.txt Typical use
print(...) Yes Yes Console-first progress messages that you also want captured
run.log_text(...) No Yes Explicit Runicorn log entries even when you do not want terminal output

This is helpful when:

  • you want explicit, Runicorn-owned log entries
  • you want short checkpoint/progress notes
  • you want messages to appear in the logs tab even without console capture

Where to view the recorded text

In the Web UI, open the run detail page and switch to the Logs tab. That tab reads the run log stream from logs.txt, so run.log_text(...) entries appear there alongside other captured log output.


run.summary(...)

Store run-level summary data.

Use it for the final result or current overall state of a run, not for per-step history.

Signature:

def summary(update: dict) -> None

Parameter Type Optional Default Recommended
update dict No none Yes

Parameter notes

  • update: Summary fields to merge into summary.json, such as final scores, notes, selected checkpoints, or any metadata you want surfaced at the run level.

Typical example:

run.summary({
    "final_acc": 0.91,
    "best_epoch": 8,
    "notes": "best seed so far",
})

Most users call run.summary(...) near the end of training, after an important evaluation, or whenever they want to update a run-level field such as best_ckpt, notes, or final_acc.

Summary updates merge into the existing summary file, so multiple calls are fine:

run.summary({"final_acc": 0.91})
run.summary({"notes": "best seed so far"})

Use run.log(...) for timeline data and run.summary(...) for run-level conclusions.

In the Web UI, summary fields appear in the run detail page as run-level metadata.


run.finish(...)

Finish the run explicitly:

Signature:

def finish(status: str = "finished") -> None

Parameter Type Optional Default Recommended
status str Yes "finished" Yes

Parameter notes

  • status: Final run status. In practice, use finished, failed, or interrupted.
# Most common
run.finish()

# Explicit status
run.finish(status="finished")

What happens if you omit the parameter

If you call run.finish() without arguments, Runicorn uses the default status:

  • finished

So in normal scripts, omitting the parameter is completely fine.

When status is chosen automatically

There are two important automatic cases:

  • if you use with rn.init(...) as run:, Runicorn marks the run as finished on normal exit and failed if the block raises an exception
  • if a run is still marked as running but the process later disappears unexpectedly, the viewer's background status checker can mark it as failed

When status is not inferred automatically

If you call run.finish(...) yourself, Runicorn does not inspect your training state and guess whether the run was successful. It uses:

  • the explicit status you passed, or
  • finished if you passed nothing

Recommended final statuses when calling it yourself:

  • finished
  • failed
  • interrupted

In 0.7.0, finish behavior is safer than older versions:

  • post-finish writes are blocked
  • output watchers are stopped
  • best-metric summary fields are flushed
  • status is normalized to valid stored values

Context manager style

This is a common and safer pattern when you want automatic cleanup:

import runicorn as rn

with rn.init(path="training/safe") as run:
    run.log({"loss": 0.1}, step=1)

Benefits:

  • success exits become finished
  • exceptions become failed
  • you are less likely to forget cleanup

Disable tracking without changing code shape

import runicorn as rn

rn.set_enabled(False)
run = rn.init(path="debug")
run.log({"loss": 0.1})   # no-op
run.finish()             # no-op

This returns a no-op run object, which is useful for inference scripts, debugging, or temporarily disabling tracking in shared code.


Full lifecycle example

import math
import torch
import torch.nn as nn
import torch.optim as optim
from PIL import Image, ImageDraw
import runicorn as rn

run = rn.init(
    path="cv/resnet50/baseline",
    alias="seed-42",
    capture_console=True,
)

model = nn.Sequential(
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10),
)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

run.set_primary_metric("val_acc", mode="max")
run.log_config(extra={"optimizer": "adam", "lr": 1e-3, "epochs": 3})

latest_val_acc = 0.0

for epoch in range(1, 4):
    model.train()
    epoch_acc_sum = 0.0
    epoch_steps = 0

    dataloader = [
        (torch.randn(32, 128), torch.randint(0, 10, (32,)))
        for _ in range(5)
    ]

    for batch_idx, (inputs, targets) in enumerate(dataloader, start=1):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        curve_step = (epoch - 1) * len(dataloader) + batch_idx
        simulated_train_loss = 1.6 * math.exp(-0.18 * curve_step) + 0.04 * torch.rand(1).item()
        batch_acc = min(0.95, 0.35 + 0.08 * math.log1p(curve_step) + 0.02 * torch.rand(1).item())
        epoch_acc_sum += batch_acc
        epoch_steps += 1

        print(f"train_loss: {simulated_train_loss:.4f}")

        run.log({
            "train_loss": simulated_train_loss,
            "train_acc": batch_acc,
            "val_acc": latest_val_acc,
        }, stage=f"epoch_{epoch}")

    train_acc = epoch_acc_sum / epoch_steps
    latest_val_acc = 0.70 + 0.05 * epoch

    preview = Image.new("RGB", (320, 160), "white")
    draw = ImageDraw.Draw(preview)
    draw.rounded_rectangle((10, 10, 310, 150), radius=12, outline="navy", width=3)
    draw.text((24, 28), f"Epoch {epoch}", fill="black")
    draw.text((24, 68), f"train_acc={train_acc:.3f}", fill="green")
    draw.text((24, 102), f"val_acc={latest_val_acc:.3f}", fill="purple")
    run.log_image(
        "epoch_preview",
        preview,
        caption=f"Metrics snapshot for epoch {epoch}",
    )

run.summary({
    "epochs": 3,
    "final_train_acc": train_acc,
    "final_val_acc": latest_val_acc,
    "final_note": "baseline complete",
})

run.finish()

In this pattern, step is auto-incremented by Runicorn, and val_acc is carried as the latest known evaluation value. That keeps train and validation metrics on one aligned step axis instead of introducing a separate extra step just for evaluation.


Next steps