Add W&B to a Python library - Weights & Biases Documentation

This guide explains how to integrate W&B into a Python library so that your users can track experiments, monitor system metrics, and manage models when they use your code. It’s intended for library authors and maintainers who want to expose W&B functionality through their own framework, SDK, or reusable training code. Follow these recommendations if you’re integrating W&B into a complex codebase (such as a training framework, SDK, or reusable library) where the codebase is more involved than a single Python training script or Jupyter notebook.

If you’re new to W&B, review the core guides (for example, Experiment Tracking) before continuing.

The following sections walk through the major integration decisions in order: how users install W&B, how they authenticate, how to start and configure runs, how to log metrics and artifacts, and how to support distributed training and hyperparameter sweeps.

Decide how users install W&B

Before you start, decide whether W&B should be a required dependency or an optional feature of your library. This choice affects how you import wandb, how you document installation, and how you handle environments where wandb isn’t present.

Require W&B as a dependency

If W&B is central to your library’s functionality, add the W&B Python SDK (wandb) to your dependencies so that it’s installed automatically alongside your library:

torch==1.8.0 
...
wandb==0.13.*

Make W&B optional on installation

If W&B is an optional feature, allow your library to run without it installed so that users who don’t need experiment tracking can still use your code. You can either import wandb conditionally in Python or declare it as an optional dependency in pyproject.toml.

Python
pyproject.toml

Detect whether wandb is available and raise a clear error if a user enables W&B features without installing it:

try:
    import wandb
    _WANDB_AVAILABLE = True
except ImportError:
    _WANDB_AVAILABLE = False

Declare wandb as an optional dependency to your pyproject.toml file:

[project]
name = "my_awesome_lib"
version = "0.1.0"
dependencies = [
    "torch",
    "sklearn"
]

[project.optional-dependencies]
dev = [
    "wandb"
]

Authenticate users

W&B uses API keys to authenticate users and machines. Before users can log runs from your library, they must generate an API key and make it available to the wandb client.

Create an API key

An API key authenticates a client or machine to W&B. Generate an API key from your user profile so that you can use it for the login steps that follow.

For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.

Install and log in to W&B

After you have an API key, install the wandb library locally and log in so that subsequent runs can authenticate to W&B. Choose the tab that matches your environment.

Command Line
Python
Python notebook

Set the WANDB_API_KEY environment variable to your API key:
```
export WANDB_API_KEY=[YOUR-API-KEY]
```
Install the wandb library and log in:
```
pip install wandb

wandb login
```

Navigate to your terminal and install the Python SDK.
```
pip install wandb
```
Log in to W&B from your Python script or notebook. W&B prompts you to enter your API key.
```
import wandb
wandb.login()
```

Copy and paste the following code snippet into a cell in your Jupyter notebook and run it. W&B prompts you to enter your API key.

!pip install wandb

import wandb
wandb.login()

Start a run

After you set up authentication, the next step is to start a W&B run so that your library has somewhere to log metrics, configs, and artifacts. A run represents a single unit of computation, such as a training experiment. Most libraries create one run per training job. For more information about runs, see W&B Runs. Initialize a run with wandb.init() and specify a name for your project and your team entity (team name). If you don’t specify a project, W&B stores your run in a default project called “uncategorized”:

with wandb.init(project="[PROJECT-NAME]", entity="[ENTITY]") as run:
    ...

W&B recommends that you use a context manager to ensure that your run is properly closed, even if an error occurs. If you don’t use a context manager, you must call run.finish() to close the run and log all the data to W&B. Closing the run guarantees that all metrics, configs, and artifacts are uploaded before the process exits.

When to call wandb.init()Call wandb.init() as early as possible. W&B captures stdout, stderr, and error messages, which makes debugging easier.Wrap your entire training loop in a wandb.init() context manager to ensure that all relevant information is captured in the run. This includes any error messages, which can be crucial for debugging.

Set `wandb` as an optional dependency

If you want to make wandb optional at runtime, so that users can run your library without producing W&B runs, use one of the following approaches:

Define a wandb flag.
Set wandb to be disabled in wandb.init().
Set wandb to be offline. This still runs wandb, but doesn’t communicate back to W&B over the internet.

Define a wandb flag such as:

Python
Bash

trainer = my_trainer(..., use_wandb=True)

python train.py ... --use-wandb

Set wandb to be disabled in wandb.init():

Python
Bash

wandb.init(mode="disabled")

export WANDB_MODE=disabled

wandb disabled

Set wandb to be offline:

Environment Variable
Bash

export WANDB_MODE=offline

os.environ['WANDB_MODE'] = 'offline'

wandb offline

Define a run config

After you initialize a run, you can attach a configuration dictionary that records the hyperparameters and other metadata associated with that run. Logging a config makes runs easier to compare, filter, and reproduce later. Provide a configuration dictionary when you initialize your run to log hyperparameters and other metadata to W&B. Use the W&B App to compare runs based on their config parameters and filter them in the Runs table. You can also use these parameters to group runs together in the W&B App. For example, in the following image, the batch size (batch_size) is defined as a config parameter and is visible (see first column) in the Runs table. This lets users filter and compare runs based on their batch size:

Typical config parameter values include:

Model name, version, architecture parameters, and hyperparameters.
Dataset name, version, number of training or validation examples.
Training parameters such as learning rate, batch size, and optimizer.

The following code snippet shows how to log a config:

config = {"batch_size": 32, ...}
with wandb.init(..., config=config) as run:
    ...

Update the run config

Some configuration values, such as model parameter counts, might not be known when you call wandb.init(). If values aren’t available at initialization time, update the config later with wandb.Run.config.update. For example, you might want to add a model’s parameters after you instantiate the model:

with wandb.init(...) as run:
    model = MyModel(...)
    run.config.update({"model_parameters": 3500})

For more information, see Configure experiments.

Log metrics and data

After you start and configure a run, you can begin logging metrics and other data so that W&B records them against the run.

Log metrics

To log scalar metrics such as loss or accuracy, create a dictionary where each key is the name of a metric. Pass this dictionary object to wandb.Run.log() to log it to W&B:

NUM_EPOCHS = 10

for epoch in range(NUM_EPOCHS):
    for input, ground_truth in data: 
        prediction = model(input) 
        loss = loss_fn(prediction, ground_truth) 
        metrics = { "loss": loss } 
        run.log(metrics)

Use metric name prefixes to group related metrics in the W&B App. Common prefixes include train/ and val/ for training and validation metrics, respectively, but you can use any prefix that makes sense for your use case. This creates separate sections in your project’s workspace for your training and validation metrics, or other metric types you’d like to separate:

with wandb.init(...) as run:
    metrics = {
        "train/loss": 0.4,
        "train/learning_rate": 0.4,
        "val/loss": 0.5, 
        "val/accuracy": 0.7
    }
    run.log(metrics)

For more information, see wandb.Run.log().

Control the x-axis

By default, the wandb SDK manages its own step counter, which might not match the step semantics of your training loop. If you perform multiple calls to wandb.Run.log() for the same training step, the wandb SDK increments an internal step counter for each call to wandb.Run.log(). This counter might not align with the training step in your training loop. To avoid this situation, define your x-axis step explicitly with wandb.Run.define_metric(), one time, immediately after you call wandb.init():

with wandb.init(...) as run:
    run.define_metric("*", step_metric="global_step")

The glob pattern, *, means that every metric uses global_step as the x-axis in your charts. If you only want certain metrics logged against global_step, you can specify them instead:

run.define_metric("train/loss", step_metric="global_step")

Now, log your metrics, your step metric, and your global_step each time you call wandb.Run.log():

for step, (input, ground_truth) in enumerate(data):
    ...
    run.log({"global_step": step, "train/loss": 0.1})
    run.log({"global_step": step, "eval/loss": 0.2})

If you don’t have access to the independent step variable (for example, global_step isn’t available during your validation loop), wandb automatically uses the previously logged value for global_step. In this case, ensure you log an initial value for the metric so that it’s defined when it’s needed.

Log media and structured data

In addition to scalars, you can log images, tables, text, audio, video, and more. Logging media alongside metrics helps users inspect qualitative model behavior over time. Some considerations when logging data include:

How often should the metric be logged? Should it be optional?
What type of data could be helpful in visualizing?
- For images, you can log sample predictions and segmentation masks to see the evolution over time.
- For text, you can log tables of sample predictions for later exploration.

For more information, see Log objects and media.

Support distributed training

If your library can run training across multiple processes or machines, decide how W&B should behave in that setting so that logs are coherent and not duplicated. For frameworks that support distributed environments, you can adapt any of the following workflows:

Log only from the main process (recommended).
Log from every process and group runs using a shared group name.

For more information, see Log distributed training experiments.

Track models and datasets with artifacts

In addition to metrics, you can persist the models and datasets your library produces or consumes so that users can reproduce and compare runs. Use W&B Artifacts to track and version models and datasets. Artifacts provide storage and versioning for machine learning assets, and they automatically track lineage to show how data and models are related.

Stored Datasets and Model Checkpoints in W&B

Consider the following when integrating artifacts into your library:

Whether to log model checkpoints or datasets as artifacts (in case you want to make it optional).
Artifact input references (for example, entity/project/artifact).
Logging frequency of model checkpoints or datasets. For example, every epoch or every 500 steps.

Log model checkpoints

Logging model checkpoints as artifacts lets users recover, version, and share trained weights. A common approach is to log checkpoints as artifacts using the unique run ID that W&B generates as part of the artifact name.

metadata = {"eval/accuracy": 0.8, "train/steps": 800} 

artifact = wandb.Artifact(
                name=f"model-{run.id}", 
                metadata=metadata, 
                type="model"
                ) 
artifact.add_dir("output_model") # local directory where the model weights are stored

aliases = ["best", "epoch_10"] 
run.log_artifact(artifact, aliases=aliases)

The preceding snippet logs a model checkpoint as an artifact with metadata such as evaluation accuracy and training steps. The artifact’s name includes the unique run ID, and it’s tagged with custom aliases for quick reference.

Log input artifacts

To capture lineage between data and models, log the datasets or pretrained models that a run consumes as inputs:

dataset = wandb.Artifact(name="flowers", type="dataset")
dataset.add_file("flowers.npy")
run.use_artifact(dataset)

The preceding snippet creates an artifact for a dataset called “flowers” and adds a file to it. The run.use_artifact() call associates the artifact with the current run so that W&B can track the lineage of the dataset used in the run.

Download artifacts

After you log artifacts, your library (or its users) can download previously logged artifacts from W&B to use in training or inference code. The right approach depends on whether you already have an active run. If you have a run context, use wandb.Run.use_artifact() to reference an artifact in W&B and then call wandb.Artifact.download() to download it to a local directory. Using use_artifact() also records the artifact as an input to the current run, preserving lineage.

with wandb.init(...) as run:
    artifact = run.use_artifact("user/project/artifact:latest")
    local_path = artifact.download()

Use the W&B Public API to reference and download an artifact without initializing a run. This is useful in scenarios such as distributed environments or when you perform inference, where you might not want to create a new run.

import wandb
artifact = wandb.Api().artifact("user/project/artifact:latest")
local_path = artifact.download()

For more information, see Download and use artifacts.

Tune hyperparameters

If your library supports hyperparameter tuning, you can integrate W&B Sweeps to manage and visualize experiments. Sweeps coordinate multiple runs across a defined search space and surface the results in the W&B App so users can compare configurations side by side.

Documentation Index

​Decide how users install W&B

​Require W&B as a dependency

​Make W&B optional on installation

​Authenticate users

​Create an API key

​Install and log in to W&B

​Start a run

​Set wandb as an optional dependency

​Define a run config

​Update the run config

​Log metrics and data

​Log metrics

​Control the x-axis

​Log media and structured data

​Support distributed training

​Track models and datasets with artifacts

​Log model checkpoints

​Log input artifacts

​Download artifacts

​Tune hyperparameters

Decide how users install W&B

Require W&B as a dependency

Make W&B optional on installation

Authenticate users

Create an API key

Install and log in to W&B

Start a run

Set `wandb` as an optional dependency

Define a run config

Update the run config

Log metrics and data

Log metrics

Control the x-axis

Log media and structured data

Support distributed training

Track models and datasets with artifacts

Log model checkpoints

Log input artifacts

Download artifacts

Tune hyperparameters