spaCy - Weights & Biases Documentation

spaCy is an NLP library that provides fast, accurate models. As of spaCy v3, you can use W&B with spacy train to track your spaCy model’s training metrics and to save and version your models and datasets. All it takes is a few added lines in your configuration. This page is for spaCy users who want to use W&B to monitor training runs, compare experiments, and version the models and datasets produced by spacy train. An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Command Line
Python
Python notebook

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=[YOUR-API-KEY]
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

Add the `WandbLogger` to your spaCy config file

spaCy config files specify all aspects of training, not only logging (GPU allocation, optimizer choice, dataset paths, and more). Minimally, under [training.logger] you need to provide the key @loggers with the value "spacy.WandbLogger.v3", plus a project_name.

For more on how spaCy training config files work and on other options you can pass in to customize training, see spaCy’s documentation.

[training.logger]
@loggers = "spacy.WandbLogger.v3"
project_name = "my_spacy_project"
remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
log_dataset_dir = "./corpus"
model_log_interval = 1000

The following table describes the WandbLogger configuration options:

Name	Description
`project_name`	`str`. The name of the W&B project. W&B creates the project automatically if it doesn’t exist yet.
`remove_config_values`	`List[str]` . A list of values to exclude from the config before W&B uploads it. `[]` by default.
`model_log_interval`	`Optional int`. `None` by default. If set, enables model versioning with artifacts. Pass in the number of steps to wait between logging model checkpoints.
`log_dataset_dir`	`Optional str`. If you pass a path, W&B uploads the dataset as an artifact at the beginning of training. `None` by default.
`entity`	`Optional str` . If passed, W&B creates the run in the specified entity.
`run_name`	`Optional str` . If specified, W&B creates the run with the specified name.

Start training

With the WandbLogger added to your spaCy training config, you can run spacy train as usual and W&B captures the run automatically.

Command Line
Python
Python notebook

python -m spacy train \
    config.cfg \
    --output ./output \
    --paths.train ./train \
    --paths.dev ./dev

python -m spacy train \
    config.cfg \
    --output ./output \
    --paths.train ./train \
    --paths.dev ./dev

!python -m spacy train \
    config.cfg \
    --output ./output \
    --paths.train ./train \
    --paths.dev ./dev

When training begins, spaCy outputs a link to your training run’s W&B page, which takes you to this run’s experiment tracking dashboard in the W&B web UI.

Documentation Index

​Sign up and create an API key

​Install the wandb library and log in

​Add the WandbLogger to your spaCy config file

​Start training

Sign up and create an API key

Install the `wandb` library and log in

Add the `WandbLogger` to your spaCy config file

Start training