Hugging Face Transformers - Weights & Biases Documentation

The Hugging Face Transformers library makes NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The W&B integration adds experiment tracking and model versioning to centralized dashboards. This guide shows you how to connect the Hugging Face Trainer to W&B. Your training runs then automatically log metrics, model checkpoints, and evaluation outputs to a centralized dashboard. By the end, you’ll be able to compare runs, save and reload model checkpoints from W&B Artifacts, and customize logging for your own workflows. This guide assumes you’re already familiar with training models using the Hugging Face Transformers Trainer.

Quick start

os.environ["WANDB_PROJECT"] = "[MY-PROJECT-NAME]"  # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint"  # log all model checkpoints

from transformers import TrainingArguments, Trainer

args = TrainingArguments(..., report_to="wandb")  # turn on W&B logging
trainer = Trainer(..., args=args)

If you’d rather dive straight into working code, check out this Google Colab.

Get started: track experiments

This section walks you through authenticating to W&B, installing the client library, naming your project, and turning on logging in your Trainer so that your first training run shows up in the W&B Dashboard. An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Command Line
Python
Python notebook

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=[YOUR-API-KEY]
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

If you’re using W&B for the first time, check out the quickstart.

Name the project

A W&B Project stores all of the charts, data, and models logged from related runs. Naming your project helps you organize your work and keep all the information about a single project in one place. To add a run to a project, set the WANDB_PROJECT environment variable to the name of your project. The WandbCallback picks up this project name environment variable and uses it when setting up your run.

Command Line
Python
Python notebook

WANDB_PROJECT=amazon_sentiment_analysis

import os
os.environ["WANDB_PROJECT"]="amazon_sentiment_analysis"

%env WANDB_PROJECT=amazon_sentiment_analysis

Make sure you set the project name before you initialize the Trainer.

If you don’t specify a project name, the project name defaults to huggingface.

Log your training runs to W&B

When you define your Trainer training arguments, either inside your code or from the command line, set report_to to "wandb" to enable logging with W&B. Without this setting, the Trainer doesn’t send any data to W&B. The logging_steps argument in TrainingArguments controls how often training metrics are pushed to W&B during training. You can also give a name to the training run in W&B using the run_name argument. That’s it. Your models now log losses, evaluation metrics, model topology, and gradients to W&B while they train.

Command Line
Python

python run_glue.py \     # run your Python script
  --report_to wandb \    # enable logging to W&B
  --run_name bert-base-high-lr \   # name of the W&B run (optional)
  # other command line arguments here

from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    # other args and kwargs here
    report_to="wandb",  # enable logging to W&B
    run_name="bert-base-high-lr",  # name of the W&B run (optional)
    logging_steps=1,  # how often to log to W&B
)

trainer = Trainer(
    # other args and kwargs here
    args=args,  # your training args
)

trainer.train()  # start training and logging to W&B

Using TensorFlow? Swap the PyTorch Trainer for the TensorFlow TFTrainer.

Turn on model checkpointing

In addition to logging metrics, you can save the trained model weights themselves to W&B so they can be versioned, downloaded, and shared across your team. With Artifacts, you can store up to 100 GB of models and datasets for free and then use the W&B Registry. With Registry, you can register models to explore and evaluate them, prepare them for staging, or deploy them in your production environment. To log your Hugging Face model checkpoints to Artifacts, set the WANDB_LOG_MODEL environment variable to one of:

checkpoint: Upload a checkpoint every args.save_steps from the TrainingArguments.
end: Upload the model at the end of training, if load_best_model_at_end is also set.
false: Don’t upload the model.

Command Line
Python
Python notebook

WANDB_LOG_MODEL="checkpoint"

import os

os.environ["WANDB_LOG_MODEL"] = "checkpoint"

%env WANDB_LOG_MODEL="checkpoint"

Any Transformers Trainer you initialize from now on uploads models to your W&B project. The model checkpoints you log are viewable through the Artifacts UI, and include the full model lineage. See an example model checkpoint in the Artifacts UI.

By default, your model saves to W&B Artifacts as model-{run_id} when WANDB_LOG_MODEL is set to end or checkpoint-{run_id} when WANDB_LOG_MODEL is set to checkpoint. However, if you pass a run_name in your TrainingArguments, the model saves as model-{run_name} or checkpoint-{run_name}.

W&B Registry

After you log your checkpoints to Artifacts, you can register your best model checkpoints and centralize them across your team with Registry. With Registry, you can organize your best models by task, manage the lifecycles of models, track and audit the entire ML lifecycle, and automate downstream actions. To link a model Artifact, refer to Registry.

Visualize evaluation outputs during training

Visualizing your model outputs during training or evaluation is often essential to understand how your model trains. Inspecting concrete predictions alongside loss curves helps you spot quality issues that aggregate metrics can hide. Using the callbacks system in the Transformers Trainer, you can log more helpful data to W&B Tables. This includes your models’ text generation outputs or other predictions. For a full guide on how to log evaluation outputs while training to a W&B Table like the following, see Log and view evaluation samples during training.

Shows a W&B Table with evaluation outputs

Finish your W&B run (notebook only)

If your training is encapsulated in a Python script, the W&B run ends when your script finishes. If you’re using a Jupyter or Google Colab notebook, call run.finish() to signal that training is complete.

run = wandb.init()
trainer.train()  # start training and logging to W&B

# post-training analysis, testing, other logged code

run.finish()

Visualize your results

After you log your training results, you can explore them in the W&B Dashboard. You can compare runs, zoom in on findings, and explore your data with interactive visualizations. At this point you have a working integration: your Trainer logs metrics to a named project, optionally saves checkpoints to Artifacts, and surfaces evaluation outputs in the W&B Dashboard.

Advanced features and FAQs

The following sections cover common follow-up tasks, such as saving the best model, resuming training from a checkpoint, customizing logging callbacks, and configuring W&B behavior through environment variables.

Save the best model

If you pass TrainingArguments with load_best_model_at_end=True to your Trainer, W&B saves the best performing model checkpoint to Artifacts. If you save your model checkpoints as Artifacts, you can promote them to the Registry. In Registry, you can:

Organize your best model versions by ML task.
Centralize models and share them with your team.
Stage models for production or bookmark them for further evaluation.
Trigger downstream CI/CD processes.

Load a saved model

If you saved your model to W&B Artifacts with WANDB_LOG_MODEL, you can download your model weights for more training or to run inference. Load them back into the same Hugging Face architecture that you used before.

# Create a new run
with wandb.init(project="amazon_sentiment_analysis") as run:
    # Pass the name and version of Artifact
    my_model_name = "model-bert-base-high-lr:latest"
    my_model_artifact = run.use_artifact(my_model_name)

    # Download model weights to a folder and return the path
    model_dir = my_model_artifact.download()

    # Load your Hugging Face model from that folder
    #  using the same model class
    model = AutoModelForSequenceClassification.from_pretrained(
        model_dir, num_labels=num_labels
    )

    # Do additional training, or run inference

Resume training from a checkpoint

If you set WANDB_LOG_MODEL='checkpoint', you can resume training by using the model_dir as the model_name_or_path argument in your TrainingArguments and passing resume_from_checkpoint=True to Trainer.

last_run_id = "xxxxxxxx"  # fetch the run_id from your wandb workspace

# resume the wandb run from the run_id
with wandb.init(
    project=os.environ["WANDB_PROJECT"],
    id=last_run_id,
    resume="must",
) as run:
    # Connect an Artifact to the run
    my_checkpoint_name = f"checkpoint-{last_run_id}:latest"
    my_checkpoint_artifact = run.use_artifact(my_model_name)

    # Download checkpoint to a folder and return the path
    checkpoint_dir = my_checkpoint_artifact.download()

    # reinitialize your model and trainer
    model = AutoModelForSequenceClassification.from_pretrained(
        "[MODEL-NAME]", num_labels=num_labels
    )
    # your training arguments here.
    training_args = TrainingArguments()

    trainer = Trainer(model=model, args=training_args)

    # make sure use the checkpoint dir to resume training from the checkpoint
    trainer.train(resume_from_checkpoint=checkpoint_dir)

Log and view evaluation samples during training

The WandbCallback in the Transformers library handles logging to W&B through the Transformers Trainer. You can customize this callback to log model predictions, confusion matrices, or other custom data. To do so, subclass WandbCallback and add functionality that uses additional methods from the Trainer class. The following is the general pattern to add this new callback to the Hugging Face Trainer, followed by a code-complete example to log evaluation outputs to a W&B Table:

# Instantiate the Trainer as normal
trainer = Trainer()

# Instantiate the new logging callback, passing it the Trainer object
evals_callback = WandbEvalsCallback(trainer, tokenizer, ...)

# Add the callback to the Trainer
trainer.add_callback(evals_callback)

# Begin Trainer training as normal
trainer.train()

View evaluation samples during training

The following section shows how to customize the WandbCallback to run model predictions and log evaluation samples to a W&B Table during training. This runs every eval_steps using the on_evaluate method of the Trainer callback. The decode_predictions function decodes the predictions and labels from the model output using the tokenizer. Then, the code creates a pandas DataFrame from the predictions and labels and adds an epoch column to the DataFrame. Finally, the code creates a wandb.Table from the DataFrame and logs it to W&B. You can control the frequency of logging by logging the predictions every freq epochs.

Unlike the regular WandbCallback, this custom callback needs to be added to the trainer after the Trainer is instantiated, not during initialization of the Trainer. This is because the Trainer instance is passed to the callback during initialization.

from transformers.integrations import WandbCallback
import pandas as pd


def decode_predictions(tokenizer, predictions):
    labels = tokenizer.batch_decode(predictions.label_ids)
    logits = predictions.predictions.argmax(axis=-1)
    prediction_text = tokenizer.batch_decode(logits)
    return {"labels": labels, "predictions": prediction_text}


class WandbPredictionProgressCallback(WandbCallback):
    """Custom WandbCallback to log model predictions during training.

    This callback logs model predictions and labels to a wandb.Table at each
    logging step during training. It allows to visualize the
    model predictions as the training progresses.

    Attributes:
        trainer (Trainer): The Hugging Face Trainer instance.
        tokenizer (AutoTokenizer): The tokenizer associated with the model.
        sample_dataset (Dataset): A subset of the validation dataset
          for generating predictions.
        num_samples (int, optional): Number of samples to select from
          the validation dataset for generating predictions. Defaults to 100.
        freq (int, optional): Frequency of logging. Defaults to 2.
    """

    def __init__(self, trainer, tokenizer, val_dataset, num_samples=100, freq=2):
        """Initializes the WandbPredictionProgressCallback instance.

        Args:
            trainer (Trainer): The Hugging Face Trainer instance.
            tokenizer (AutoTokenizer): The tokenizer associated
              with the model.
            val_dataset (Dataset): The validation dataset.
            num_samples (int, optional): Number of samples to select from
              the validation dataset for generating predictions.
              Defaults to 100.
            freq (int, optional): Frequency of logging. Defaults to 2.
        """
        super().__init__()
        self.trainer = trainer
        self.tokenizer = tokenizer
        self.sample_dataset = val_dataset.select(range(num_samples))
        self.freq = freq

    def on_evaluate(self, args, state, control, **kwargs):
        super().on_evaluate(args, state, control, **kwargs)
        # control the frequency of logging by logging the predictions
        # every `freq` epochs
        if state.epoch % self.freq == 0:
            # generate predictions
            predictions = self.trainer.predict(self.sample_dataset)
            # decode predictions and labels
            predictions = decode_predictions(self.tokenizer, predictions)
            # add predictions to a wandb.Table
            predictions_df = pd.DataFrame(predictions)
            predictions_df["epoch"] = state.epoch
            records_table = self._wandb.Table(dataframe=predictions_df)
            # log the table to wandb
            self._wandb.log({"sample_predictions": records_table})


# First, instantiate the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_datasets["train"],
    eval_dataset=lm_datasets["validation"],
)

# Instantiate the WandbPredictionProgressCallback
progress_callback = WandbPredictionProgressCallback(
    trainer=trainer,
    tokenizer=tokenizer,
    val_dataset=lm_dataset["validation"],
    num_samples=10,
    freq=2,
)

# Add the callback to the trainer
trainer.add_callback(progress_callback)

For a more detailed example, see this Colab.

Additional W&B settings

You can further configure what is logged with Trainer by setting environment variables. For a full list of W&B environment variables, see the environment variables reference.

Environment Variable	Usage
`WANDB_PROJECT`	Give your project a name (`huggingface` by default)
`WANDB_LOG_MODEL`	Log the model checkpoint as a W&B Artifact (`false` by default) `false` (default): No model checkpointing `checkpoint`: Upload a checkpoint every `args.save_steps` (set in the Trainer’s `TrainingArguments`). `end`: Upload the final model checkpoint at the end of training.
`WANDB_WATCH`	Set whether to log your model’s gradients, parameters, or neither. `false` (default): No gradient or parameter logging `gradients`: Log histograms of the gradients `all`: Log histograms of gradients and parameters
`WANDB_DISABLED`	Set to `true` to turn off logging entirely (`false` by default)
`WANDB_QUIET`	Set to `true` to limit statements logged to standard output to critical statements only (`false` by default)
`WANDB_SILENT`	Set to `true` to silence the output printed by `wandb` (`false` by default)

Command Line
Notebook

WANDB_WATCH=all
WANDB_SILENT=true

%env WANDB_WATCH=all
%env WANDB_SILENT=true

Customize `wandb.init()`

The WandbCallback that Trainer uses calls wandb.init() under the hood when Trainer is initialized. Alternatively, you can set up your runs manually by calling wandb.init() before the Trainer is initialized. This gives you full control over your W&B run configuration. The following is an example of what you might pass to init. For wandb.init() details, see the wandb.init() reference.

wandb.init(
    project="amazon_sentiment_analysis",
    name="bert-base-high-lr",
    tags=["baseline", "high-lr"],
    group="bert",
)

Additional resources

The following are six Transformers and W&B related articles for further reading.

Get help or request features

For any issues, questions, or feature requests for the Hugging Face W&B integration, post in this thread on the Hugging Face forums or open an issue on the Hugging Face Transformers GitHub repo.

Documentation Index

​Quick start

​Get started: track experiments

​Sign up and create an API key

​Install the wandb library and log in

​Name the project

​Log your training runs to W&B

​Turn on model checkpointing

​W&B Registry

​Visualize evaluation outputs during training

​Finish your W&B run (notebook only)

​Visualize your results

​Advanced features and FAQs

​Save the best model

​Load a saved model

​Resume training from a checkpoint

​Log and view evaluation samples during training

​View evaluation samples during training

​Additional W&B settings

​Customize wandb.init()

​Additional resources

​Get help or request features

Quick start

Get started: track experiments

Sign up and create an API key

Install the `wandb` library and log in

Name the project

Log your training runs to W&B

Turn on model checkpointing

W&B Registry

Visualize evaluation outputs during training

Finish your W&B run (notebook only)

Visualize your results

Advanced features and FAQs

Save the best model

Load a saved model

Resume training from a checkpoint

Log and view evaluation samples during training

View evaluation samples during training

Additional W&B settings

Customize `wandb.init()`

Additional resources

Get help or request features