This guide shows you how to deploy a model artifact from W&B to an NVIDIA NeMo Inference Microservice (NIM) so you can serve the model for scalable inference. To do this, use W&B Launch. W&B Launch converts model artifacts to NVIDIA NeMo Model format and deploys them to a running NIM/Triton server. This lets you take a tracked W&B model directly to a production-ready endpoint without manual conversion. W&B Launch accepts the following compatible model types:Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-style-guide-models-integrations-20260527-015516.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Deployment time varies by model and machine type. The base
Llama2-7b config takes about 1 minute on Google Cloud’s a2-ultragpu-1g.Quickstart
Follow these steps to create a launch queue, register the deployment job, run an agent, and submit the deployment.-
Create a launch queue if you don’t have one already. The queue defines how the job runs on your GPU machine. See the following example queue configuration.

-
Create this job in your project. This registers the deployment job code with your W&B project so Launch can run it.
-
Launch an agent on your GPU machine. The agent polls the queue and executes the deployment job when you submit it.
-
Submit the deployment launch job with your desired configurations from the Launch UI. You can also submit through the CLI.

-
You can track the deployment process in the Launch UI.

-
After the deployment completes, the NIM/Triton endpoint serves the model and is ready for inference requests. To test the model,
curlthe endpoint. The model name is alwaysensemble.