Instructions to use nvidia/Nemotron-Content-Safety-Reasoning-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nvidia/Nemotron-Content-Safety-Reasoning-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nvidia/Nemotron-Content-Safety-Reasoning-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nvidia/Nemotron-Content-Safety-Reasoning-4B", dtype="auto")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nvidia/Nemotron-Content-Safety-Reasoning-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nvidia/Nemotron-Content-Safety-Reasoning-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/Nemotron-Content-Safety-Reasoning-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nvidia/Nemotron-Content-Safety-Reasoning-4B

SGLang

How to use nvidia/Nemotron-Content-Safety-Reasoning-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nvidia/Nemotron-Content-Safety-Reasoning-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/Nemotron-Content-Safety-Reasoning-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nvidia/Nemotron-Content-Safety-Reasoning-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/Nemotron-Content-Safety-Reasoning-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nvidia/Nemotron-Content-Safety-Reasoning-4B with Docker Model Runner:
```
docker model run hf.co/nvidia/Nemotron-Content-Safety-Reasoning-4B
```

multilingual support?

by AirAgentSDE - opened Jan 30

Discussion

AirAgentSDE

Jan 30

Hi, thanks for your great job!
I am considering deploying it with Nemoguardrails for my application. I noticed that the training data primarily consists of English. Have you tested its performance on multilingual datasets?

trebedea

NVIDIA org Feb 18

Hi @AirAgentSDE ,

It's great you find our model useful for your app. We are planning a subsequence release that is multilingual and multimodal, but the current version is mainly intended for English.
I cannot really recommend using it for non-English languages without proper testing, as the model has only been trained on English text.

Can you share your use case and language?

AirAgentSDE

Feb 26

I am currently developing an LLM guard for enterprise applications, with plans to later extend this to an agent guard. Given our diverse use cases, I am interested in a model capable of self-adapting to custom policies—for now gpt-oss-safeguard:20b/120b. However, due to latency concerns, a smaller model would be preferable for speed-sensitive scenarios.

AirAgentSDE

20 days ago

Additionally, the primary language is Simplified Chinese in out use case.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment