Instructions to use skyblanket/GLM-5-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use skyblanket/GLM-5-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="skyblanket/GLM-5-abliterated") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("skyblanket/GLM-5-abliterated") model = AutoModelForCausalLM.from_pretrained("skyblanket/GLM-5-abliterated") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use skyblanket/GLM-5-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "skyblanket/GLM-5-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skyblanket/GLM-5-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/skyblanket/GLM-5-abliterated
- SGLang
How to use skyblanket/GLM-5-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "skyblanket/GLM-5-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skyblanket/GLM-5-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "skyblanket/GLM-5-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skyblanket/GLM-5-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use skyblanket/GLM-5-abliterated with Docker Model Runner:
docker model run hf.co/skyblanket/GLM-5-abliterated
Is this abliterated or derestricted?
Is this vanilla abliterate or you have also applied norm-preservation and biprojection update?
The latter result in better quality usually
vanilla but it still has issues , are u able to infer ? ortho weight direction done
My plan was to extract a LoRA from the difference of this model and the vanilla through SVD decomposition of the weight differences (example: mergekit LoRA extraction).
This way it is possible to launch it coupled with unsloth dynamic 2bit quants in llama.cpp as LoRAs can be converted in gguf files. The problem is the huge disk space for the difference, and I cannot rent a large disk space server or delete a half of my SSD.
Hmm, technically, if the weights in the shards perfectly correspond to the other shards, this extraction can be done in streaming fashion!
Download shard 1 -> download shard 1* -> substract all shard 1 weights from the shard 1* weights -> extract LoRA for each weight in the difference through SVD -> discard the downloaded shards -> proceed to downloading shards until all are processed -> save the LoRA -> convert the LoRA to .gguf -> launch 2bit unsloth quant with LoRA -> test the model
Yeah, seems like a solid plan. Though may need some debugging and reliable failsafe coding π₯΄