Instructions to use Youssofal/MiniMax-M2.7-abliterated-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Youssofal/MiniMax-M2.7-abliterated-BF16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Youssofal/MiniMax-M2.7-abliterated-BF16", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Youssofal/MiniMax-M2.7-abliterated-BF16", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Youssofal/MiniMax-M2.7-abliterated-BF16", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Youssofal/MiniMax-M2.7-abliterated-BF16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Youssofal/MiniMax-M2.7-abliterated-BF16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Youssofal/MiniMax-M2.7-abliterated-BF16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Youssofal/MiniMax-M2.7-abliterated-BF16

SGLang

How to use Youssofal/MiniMax-M2.7-abliterated-BF16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Youssofal/MiniMax-M2.7-abliterated-BF16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Youssofal/MiniMax-M2.7-abliterated-BF16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Youssofal/MiniMax-M2.7-abliterated-BF16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Youssofal/MiniMax-M2.7-abliterated-BF16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Youssofal/MiniMax-M2.7-abliterated-BF16 with Docker Model Runner:
```
docker model run hf.co/Youssofal/MiniMax-M2.7-abliterated-BF16
```

Abliteration Text Issues

by LittleNicky55 - opened Apr 23

Discussion

LittleNicky55

Apr 23

More as an FYI than anything - model works well but sometimes gets confused in the differences between 0 and O - probably a artifact of the ARA method. I've tried to rebuild some of the tensors manually but have been unable to isolate the issue myself.

Youssofal

Owner Apr 23

Ill look into it, thanks for letting me know.

LittleNicky55

Apr 24

The 0/O thing is actually part of a broader pattern where the model confuses visually similar characters: 0 & O, 1 & I, 5 & S. It shows up constantly in anything with numbers, ports, IPs, identifiers.

Some examples:

'8080' comes out as '8O8O' in the thinking blocks
'ed25519' turned into 'ed2S5I9'
Port '10022' got truncated to '1022' (lost a digit entirely)
Shows up in both thinking and output, though the model sometimes self-corrects in output

I did a bunch of A/B testing to isolate the cause. The base MiniMax M2.7 has zero confusion at temp=1.0; your BF16 upload and my FP8 requant both show the same confusion, so it's definitely not a quantization artifact, it's coming from the ARA process itself.

MiniMax uses a byte-level tokenizer where every digit is its own token (0 is token 48, O is token 79). The abliteration seems to have pushed their internal representations close enough together that sampling picks the wrong one some percentage of the time. Lowering temperature to 0.6 helps but doesn't fix it.

I tried restoring all 256 expert w2 weights from the base model, but didn't help. The o_proj modifications contribute too. So it's not isolated to one set of tensors; the damage is spread across the tensors that ARA touched in layers 30-51.

hfmefi69

28 days ago

hi Youssofal,
thanks for your work and dedication. Just curious about plans to redo the abliteration? Many people are very eager to try new version of it.

Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment