Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-style-guide-models-integrations-20260527-015516.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

spaCy is an NLP library that provides fast, accurate models. As of spaCy v3, you can use W&B with spacy train to track your spaCy model’s training metrics and to save and version your models and datasets. All it takes is a few added lines in your configuration. This page is for spaCy users who want to use W&B to monitor training runs, compare experiments, and version the models and datasets produced by spacy train.

Sign up and create an API key

An API key authenticates your machine to W&B. You can generate an API key from your user profile.
For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.
  1. Click your user profile icon in the upper right corner.
  2. Select User Settings, then scroll to the API Keys section.

Install the wandb library and log in

To install the wandb library locally and log in:
  1. Set the WANDB_API_KEY environment variable to your API key.
    export WANDB_API_KEY=[YOUR-API-KEY]
    
  2. Install the wandb library and log in.
    pip install wandb
    
    wandb login
    

Add the WandbLogger to your spaCy config file

spaCy config files specify all aspects of training, not only logging (GPU allocation, optimizer choice, dataset paths, and more). Minimally, under [training.logger] you need to provide the key @loggers with the value "spacy.WandbLogger.v3", plus a project_name.
For more on how spaCy training config files work and on other options you can pass in to customize training, see spaCy’s documentation.
[training.logger]
@loggers = "spacy.WandbLogger.v3"
project_name = "my_spacy_project"
remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
log_dataset_dir = "./corpus"
model_log_interval = 1000
The following table describes the WandbLogger configuration options:
NameDescription
project_namestr. The name of the W&B project. W&B creates the project automatically if it doesn’t exist yet.
remove_config_valuesList[str] . A list of values to exclude from the config before W&B uploads it. [] by default.
model_log_intervalOptional int. None by default. If set, enables model versioning with artifacts. Pass in the number of steps to wait between logging model checkpoints.
log_dataset_dirOptional str. If you pass a path, W&B uploads the dataset as an artifact at the beginning of training. None by default.
entityOptional str . If passed, W&B creates the run in the specified entity.
run_nameOptional str . If specified, W&B creates the run with the specified name.

Start training

With the WandbLogger added to your spaCy training config, you can run spacy train as usual and W&B captures the run automatically.
python -m spacy train \
    config.cfg \
    --output ./output \
    --paths.train ./train \
    --paths.dev ./dev
When training begins, spaCy outputs a link to your training run’s W&B page, which takes you to this run’s experiment tracking dashboard in the W&B web UI.