Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-style-guide-models-integrations-20260527-015516.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

You can integrate fastai with W&B using the WandbCallback class to track experiments, log metrics, and visualize model performance during training. This page shows how to set up authentication, add the callback to your training loop, and configure logging for both single-process and distributed training. Check out these interactive docs with examples for more details.

Sign up and create an API key

An API key authenticates your machine to W&B. You can generate an API key from your user profile.
For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.
  1. Click your user profile icon in the upper right corner.
  2. Select User Settings, then scroll to the API Keys section.

Install the wandb library and log in

To install the wandb library locally and log in:
  1. Set the WANDB_API_KEY environment variable to your API key.
    export WANDB_API_KEY=[YOUR-API-KEY]
    
  2. Install the wandb library and log in.
    pip install wandb
    
    wandb login
    

Add the WandbCallback to the learner or fit method

To start logging your fastai training runs to W&B, attach the WandbCallback to either a single fit call or the learner itself.
import wandb
from fastai.callback.wandb import *

# start logging a wandb run
wandb.init(project="my_project")

# To log only during one training phase
learn.fit(..., cbs=WandbCallback())

# To log continuously for all training phases
learn = learner(..., cbs=WandbCallback())
If you use version 1 of fastai, refer to the fastai v1 docs.

WandbCallback arguments

Use the following arguments to control what WandbCallback logs during training:
ArgsDescription
logWhether to log the model’s: gradients, parameters, all, or None (default). Losses and metrics are always logged.
log_predsWhether to log prediction samples (default to True).
log_preds_every_epochWhether to log predictions every epoch or at the end (default to False).
log_modelWhether to log the model (default to False). This also requires SaveModelCallback.
model_nameThe name of the file to save, overrides SaveModelCallback.
log_dataset
  • False (default).
  • True logs the folder referenced by learn.dls.path.
  • A path can be defined explicitly to reference which folder to log.

Note: subfolder “models” is always ignored.

dataset_nameName of the logged dataset (default to folder name).
valid_dlDataLoaders containing items used for prediction samples (default to random items from learn.dls.valid).
n_predsNumber of logged predictions (default to 36).
seedUsed for defining random samples.
For custom workflows, you can manually log your datasets and models:
  • log_dataset(path, name=None, metadata={})
  • log_model(path, name=None, metadata={})
Note: any subfolder “models” is ignored.

Distributed training

fastai supports distributed training by using the context manager distrib_ctx. W&B supports this automatically and enables you to track your multi-GPU experiments without additional configuration. The following sections describe how to integrate W&B with distributed training and how to limit logging to the main process. Review this minimal example:
import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()
Then, in your terminal, execute:
torchrun --nproc_per_node 2 train.py
In this case, the machine has 2 GPUs.

Log only on the main process

In the preceding examples, wandb launches one run per process. At the end of the training, you have two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you must manually detect which process you are in and avoid creating runs (calling wandb.init() in all other processes).
import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()
In your terminal, call:
torchrun --nproc_per_node 2 train.py

Examples

For end-to-end demonstrations of the fastai integration, see the following references: