Run Lifecycle
This page covers the part of the SDK that every Runicorn user should understand well: creating a run, logging metrics, recording summaries, and finishing cleanly.
rn.init(...)
Create a new run:
Signature:
def init(
path: str | None = None,
storage: str | None = None,
run_id: str | None = None,
alias: str | None = None,
capture_env: bool = False,
snapshot_code: bool = False,
workspace_root: str | None = None,
snapshot_format: str = "zip",
force_snapshot: bool = False,
capture_console: bool = False,
tqdm_mode: str = "smart",
) -> Run | NoOpRun
import runicorn as rn
run = rn.init(
path="cv/resnet50/baseline",
storage="E:\\RunicornData",
alias="trial-01",
capture_env=False,
snapshot_code=False,
workspace_root=None,
snapshot_format="zip",
force_snapshot=False,
capture_console=False,
tqdm_mode="smart",
)
| Parameter | Type | Optional | Default | Recommended |
|---|---|---|---|---|
path | str \| None | Yes | None | Yes |
storage | str \| None | Yes | None | Situational |
run_id | str \| None | Yes | auto-generated | No |
alias | str \| None | Yes | None | Usually |
capture_env | bool | Yes | False | Situational |
snapshot_code | bool | Yes | False | Usually |
workspace_root | str \| None | Yes | None | Situational |
snapshot_format | str | Yes | "zip" | No |
force_snapshot | bool | Yes | False | Rarely |
capture_console | bool | Yes | False | Usually |
tqdm_mode | str | Yes | "smart" | Yes when using console capture |
Parameter notes
path: Hierarchical run path such ascv/resnet50/baseline.Nonebecomes"default".storage: Override the resolved storage root for this run only.run_id: Explicit run ID. Mostly useful for controlled imports, tests, or special recovery flows.alias: Short human-friendly label shown in the UI.capture_env: Capture environment information such as runtime and system context.snapshot_code: Archive the workspace at run start so the run records the code state it used.workspace_root: Explicit root directory for code snapshotting and output scanning.snapshot_format: Snapshot archive format. Currently only"zip"is supported.force_snapshot: Override snapshot size or file-count safety checks for unusually large workspaces.capture_console: Capture stdout/stderr intologs.txtso logs appear in the viewer even if you useprint().tqdm_mode: Controls tqdm handling:smart,all, ornone.
Path rules
path is normalized and validated:
Nonebecomes"default""/"becomes the root path- backslashes are normalized to
/ - invalid characters and traversal-like values are rejected
Examples:
rn.init(path="cv/detection/yolo")
rn.init(path="research/ablation/seed-42")
rn.init(path="/") # root-level runs
run.log(...)
Log scalar metrics during training:
Signature:
| Parameter | Type | Optional | Default | Recommended |
|---|---|---|---|---|
data | dict \| None | Yes | None | Usually |
step | int \| None | Yes | auto-increment | Situational |
stage | str \| None | Yes | None | Situational |
**kwargs | extra scalar fields | Yes | none | Situational |
Parameter notes
data: Main metric payload as a dict, such as{"loss": 0.42, "acc": 0.88}.step: Explicit step value. If omitted, Runicorn increments the internalglobal_step.stage: Optional stage label. In many projects, a common pattern is values like"epoch_1","epoch_2", and so on.**kwargs: Additional metrics. Useful when you preferrun.log(loss=..., acc=...).
run.log({"loss": 0.42, "acc": 0.88}, step=1, stage="epoch_1")
run.log(val_loss=0.39, val_acc=0.90, step=2, stage="epoch_2")
What run.log() does
- accepts either a dict, keyword metrics, or both
- records a
global_step - records a
timefield automatically - optionally records
stage
Step behavior
If you do not pass step, Runicorn auto-increments an internal global step counter:
If you do pass step, Runicorn uses that explicit value:
Stage behavior
Use stage when you want the UI to separate metric records into coarse buckets. In many training scripts, people simply use epoch-style labels:
run.log({"loss": 0.8}, step=50, stage="epoch_1")
run.log({"loss": 0.4}, step=100, stage="epoch_2")
run.log({"val_loss": 0.35}, step=150, stage="epoch_3")
How to handle train vs eval without step misalignment
In practice, many users do not rely on stage alone to distinguish train and eval. A clearer pattern is:
- use different metric names such as
train_loss,train_acc,val_loss,val_acc - keep train and eval aligned to the same training step or epoch boundary
- avoid bumping
stepby one just because you entered evaluation
For example, if you evaluate once per epoch or every 10,000 steps, it is usually better to treat that eval result as belonging to the same step boundary you just finished, instead of logging it at step + 1.
One practical pattern is to keep the latest eval value on the training timeline:
last_val_acc = 0.0
for step in range(1, total_steps + 1):
train_loss, train_acc = train_step()
if step % 10000 == 0:
last_val_acc = evaluate()
run.log({
"train_loss": train_loss,
"train_acc": train_acc,
"val_acc": last_val_acc,
}, step=step, stage=f"epoch_{current_epoch}")
Why people do this:
- train and eval metrics stay on the same step axis
- the chart remains aligned at epoch or evaluation boundaries
- you avoid the visual offset caused by logging eval at
step + 1
If your team prefers it, using an initial placeholder such as val_acc = 0.0 until the first eval finishes is a valid pragmatic choice for alignment-focused charts.
run.set_primary_metric(...)
Tell Runicorn which metric should be treated as the main score:
| Parameter | Type | Optional | Default | Recommended |
|---|---|---|---|---|
metric_name | str | No | none | Yes |
mode | str | Yes | "max" | Yes |
Parameter notes
metric_name: The metric to surface in the experiments table and compare flows.mode: Use"max"when higher is better,"min"when lower is better.
Or for loss:
The best value is tracked as you log metrics and written into the summary on finish.
run.log_text(...)
Write text lines into the run log:
| Parameter | Type | Optional | Default | Recommended |
|---|---|---|---|---|
text | str | No | none | Situational |
Parameter notes
text: Free-form log text. Multi-line strings are supported and timestamps are prepended to non-empty lines.
run.log_text(...) writes directly to the run's logs.txt.
Important behavior:
- it does write to
logs.txt - it does not automatically print to your terminal
- if you want both terminal output and Runicorn logging, call both
print(...)andrun.log_text(...), or enablecapture_console=True
Runicorn prepends timestamps to non-empty lines before writing them.
run.log_text(...) vs print(...) when capture_console=True
| Call | Terminal output | logs.txt | Typical use |
|---|---|---|---|
print(...) | Yes | Yes | Console-first progress messages that you also want captured |
run.log_text(...) | No | Yes | Explicit Runicorn log entries even when you do not want terminal output |
This is helpful when:
- you want explicit, Runicorn-owned log entries
- you want short checkpoint/progress notes
- you want messages to appear in the logs tab even without console capture
Where to view the recorded text
In the Web UI, open the run detail page and switch to the Logs tab. That tab reads the run log stream from logs.txt, so run.log_text(...) entries appear there alongside other captured log output.
run.summary(...)
Store run-level summary data.
Use it for the final result or current overall state of a run, not for per-step history.
| Parameter | Type | Optional | Default | Recommended |
|---|---|---|---|---|
update | dict | No | none | Yes |
Parameter notes
update: Summary fields to merge intosummary.json, such as final scores, notes, selected checkpoints, or any metadata you want surfaced at the run level.
Typical example:
Most users call run.summary(...) near the end of training, after an important evaluation, or whenever they want to update a run-level field such as best_ckpt, notes, or final_acc.
Summary updates merge into the existing summary file, so multiple calls are fine:
Use run.log(...) for timeline data and run.summary(...) for run-level conclusions.
In the Web UI, summary fields appear in the run detail page as run-level metadata.
run.finish(...)
Finish the run explicitly:
| Parameter | Type | Optional | Default | Recommended |
|---|---|---|---|---|
status | str | Yes | "finished" | Yes |
Parameter notes
status: Final run status. In practice, usefinished,failed, orinterrupted.
What happens if you omit the parameter
If you call run.finish() without arguments, Runicorn uses the default status:
finished
So in normal scripts, omitting the parameter is completely fine.
When status is chosen automatically
There are two important automatic cases:
- if you use
with rn.init(...) as run:, Runicorn marks the run asfinishedon normal exit andfailedif the block raises an exception - if a run is still marked as
runningbut the process later disappears unexpectedly, the viewer's background status checker can mark it asfailed
When status is not inferred automatically
If you call run.finish(...) yourself, Runicorn does not inspect your training state and guess whether the run was successful. It uses:
- the explicit
statusyou passed, or finishedif you passed nothing
Recommended final statuses when calling it yourself:
finishedfailedinterrupted
In 0.7.0, finish behavior is safer than older versions:
- post-finish writes are blocked
- output watchers are stopped
- best-metric summary fields are flushed
- status is normalized to valid stored values
Context manager style
This is a common and safer pattern when you want automatic cleanup:
Benefits:
- success exits become
finished - exceptions become
failed - you are less likely to forget cleanup
Disable tracking without changing code shape
import runicorn as rn
rn.set_enabled(False)
run = rn.init(path="debug")
run.log({"loss": 0.1}) # no-op
run.finish() # no-op
This returns a no-op run object, which is useful for inference scripts, debugging, or temporarily disabling tracking in shared code.
Full lifecycle example
import math
import torch
import torch.nn as nn
import torch.optim as optim
from PIL import Image, ImageDraw
import runicorn as rn
run = rn.init(
path="cv/resnet50/baseline",
alias="seed-42",
capture_console=True,
)
model = nn.Sequential(
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
run.set_primary_metric("val_acc", mode="max")
run.log_config(extra={"optimizer": "adam", "lr": 1e-3, "epochs": 3})
latest_val_acc = 0.0
for epoch in range(1, 4):
model.train()
epoch_acc_sum = 0.0
epoch_steps = 0
dataloader = [
(torch.randn(32, 128), torch.randint(0, 10, (32,)))
for _ in range(5)
]
for batch_idx, (inputs, targets) in enumerate(dataloader, start=1):
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
curve_step = (epoch - 1) * len(dataloader) + batch_idx
simulated_train_loss = 1.6 * math.exp(-0.18 * curve_step) + 0.04 * torch.rand(1).item()
batch_acc = min(0.95, 0.35 + 0.08 * math.log1p(curve_step) + 0.02 * torch.rand(1).item())
epoch_acc_sum += batch_acc
epoch_steps += 1
print(f"train_loss: {simulated_train_loss:.4f}")
run.log({
"train_loss": simulated_train_loss,
"train_acc": batch_acc,
"val_acc": latest_val_acc,
}, stage=f"epoch_{epoch}")
train_acc = epoch_acc_sum / epoch_steps
latest_val_acc = 0.70 + 0.05 * epoch
preview = Image.new("RGB", (320, 160), "white")
draw = ImageDraw.Draw(preview)
draw.rounded_rectangle((10, 10, 310, 150), radius=12, outline="navy", width=3)
draw.text((24, 28), f"Epoch {epoch}", fill="black")
draw.text((24, 68), f"train_acc={train_acc:.3f}", fill="green")
draw.text((24, 102), f"val_acc={latest_val_acc:.3f}", fill="purple")
run.log_image(
"epoch_preview",
preview,
caption=f"Metrics snapshot for epoch {epoch}",
)
run.summary({
"epochs": 3,
"final_train_acc": train_acc,
"final_val_acc": latest_val_acc,
"final_note": "baseline complete",
})
run.finish()
In this pattern, step is auto-incremented by Runicorn, and val_acc is carried as the latest known evaluation value. That keeps train and validation metrics on one aligned step axis instead of introducing a separate extra step just for evaluation.