fastai - Weights & Biases Documentation

You can integrate fastai with W&B using the WandbCallback class to track experiments, log metrics, and visualize model performance during training. This page shows how to set up authentication, add the callback to your training loop, and configure logging for both single-process and distributed training. Check out these interactive docs with examples for more details. An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Command Line
Python
Python notebook

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=[YOUR-API-KEY]
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

Add the `WandbCallback` to the `learner` or `fit` method

To start logging your fastai training runs to W&B, attach the WandbCallback to either a single fit call or the learner itself.

import wandb
from fastai.callback.wandb import *

# start logging a wandb run
wandb.init(project="my_project")

# To log only during one training phase
learn.fit(..., cbs=WandbCallback())

# To log continuously for all training phases
learn = learner(..., cbs=WandbCallback())

If you use version 1 of fastai, refer to the fastai v1 docs.

WandbCallback arguments

Use the following arguments to control what WandbCallback logs during training:

Args	Description
`log`	Whether to log the model’s: `gradients`, `parameters`, `all`, or `None` (default). Losses and metrics are always logged.
`log_preds`	Whether to log prediction samples (default to `True`).
`log_preds_every_epoch`	Whether to log predictions every epoch or at the end (default to `False`).
`log_model`	Whether to log the model (default to `False`). This also requires `SaveModelCallback`.
`model_name`	The name of the `file` to save, overrides `SaveModelCallback`.
`log_dataset`	`False` (default). `True` logs the folder referenced by `learn.dls.path`. A path can be defined explicitly to reference which folder to log. Note: subfolder “models” is always ignored.
`dataset_name`	Name of the logged dataset (default to `folder name`).
`valid_dl`	`DataLoaders` containing items used for prediction samples (default to random items from `learn.dls.valid`).
`n_preds`	Number of logged predictions (default to 36).
`seed`	Used for defining random samples.

For custom workflows, you can manually log your datasets and models:

log_dataset(path, name=None, metadata={})
log_model(path, name=None, metadata={})

Note: any subfolder “models” is ignored.

Distributed training

fastai supports distributed training by using the context manager distrib_ctx. W&B supports this automatically and enables you to track your multi-GPU experiments without additional configuration. The following sections describe how to integrate W&B with distributed training and how to limit logging to the main process. Review this minimal example:

Script
Python notebook

import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()

Then, in your terminal, execute:

torchrun --nproc_per_node 2 train.py

In this case, the machine has 2 GPUs.

You can now run distributed training directly inside a notebook.

import wandb
from fastai.vision.all import *

from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fit(1)

notebook_launcher(train, num_processes=2)

Log only on the main process

In the preceding examples, wandb launches one run per process. At the end of the training, you have two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you must manually detect which process you are in and avoid creating runs (calling wandb.init() in all other processes).

Script
Python notebook

import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()

In your terminal, call:

torchrun --nproc_per_node 2 train.py

import wandb
from fastai.vision.all import *

from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fit(1)

notebook_launcher(train, num_processes=2)

Examples

For end-to-end demonstrations of the fastai integration, see the following references:

Visualize, track, and compare fastai models: A documented walkthrough.
Image segmentation on CamVid: A sample use case of the integration.

Documentation Index

​Sign up and create an API key

​Install the wandb library and log in

​Add the WandbCallback to the learner or fit method

​WandbCallback arguments

​Distributed training

​Log only on the main process

​Examples

Sign up and create an API key

Install the `wandb` library and log in

Add the `WandbCallback` to the `learner` or `fit` method

WandbCallback arguments

Distributed training

Log only on the main process

Examples