Instructions to use nm-testing/MiniMax-M2.5-W4A16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nm-testing/MiniMax-M2.5-W4A16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nm-testing/MiniMax-M2.5-W4A16", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nm-testing/MiniMax-M2.5-W4A16", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("nm-testing/MiniMax-M2.5-W4A16", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nm-testing/MiniMax-M2.5-W4A16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nm-testing/MiniMax-M2.5-W4A16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nm-testing/MiniMax-M2.5-W4A16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nm-testing/MiniMax-M2.5-W4A16
- SGLang
How to use nm-testing/MiniMax-M2.5-W4A16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nm-testing/MiniMax-M2.5-W4A16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nm-testing/MiniMax-M2.5-W4A16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nm-testing/MiniMax-M2.5-W4A16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nm-testing/MiniMax-M2.5-W4A16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nm-testing/MiniMax-M2.5-W4A16 with Docker Model Runner:
docker model run hf.co/nm-testing/MiniMax-M2.5-W4A16
| pipeline_tag: text-generation | |
| license: other | |
| license_name: modified-mit | |
| license_link: https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE | |
| library_name: transformers | |
| base_model: | |
| - MiniMaxAI/MiniMax-M2.5 | |
| W4A16 version of https://huggingface.co/MiniMaxAI/MiniMax-M2.5 | |
| ## Creation | |
| Creation script: | |
| ```python | |
| from llmcompressor import model_free_ptq | |
| from compressed_tensors.quantization import ( | |
| QuantizationScheme, | |
| ) | |
| from compressed_tensors.quantization.quant_scheme import W4A16 | |
| MODEL_ID = "inference-optimization/MiniMax-M2.5-BF16" | |
| SAVE_DIR = "MiniMax-M2.5-W4A16" | |
| model_free_ptq( | |
| model_stub=MODEL_ID, | |
| save_directory=SAVE_DIR, | |
| scheme=QuantizationScheme( | |
| **W4A16, | |
| targets=[ | |
| # Target only the expert weight layers | |
| r"re:.*block_sparse_moe\.experts\.\d+\.w[1-3]$", | |
| # NOTE: vllm alias also required in config | |
| r"re:.*mlp\.experts\.\d+\.(gate|up|gate_up|down)_proj$", | |
| ], | |
| ), | |
| ignore=["re:.*self_attn.*", "lm_head"], | |
| max_workers=8, | |
| ) | |
| ``` | |
| ## Inference | |
| ```python | |
| from vllm import LLM, SamplingParams | |
| from transformers import AutoTokenizer | |
| MODEL_ID = "nm-testing/MiniMax-M2.5-W4A16" | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": [{"type": "text", "text": "You are a helpful assistant."}], | |
| }, | |
| { | |
| "role": "user", | |
| "content": [ | |
| { | |
| "type": "text", | |
| "text": "Write a python function that sorts a list of numbers in ascending order", | |
| } | |
| ], | |
| }, | |
| ] | |
| # Create a sampling params object for greedy sampling | |
| sampling_params = SamplingParams( | |
| temperature=1.0, top_p=0.95, top_k=64, max_tokens=1000, min_tokens=500 | |
| ) | |
| if __name__ == "__main__": | |
| llm = LLM( | |
| MODEL_ID, | |
| tensor_parallel_size=2, | |
| max_model_len=4096, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) | |
| prompt = tokenizer.apply_chat_template( | |
| messages, tokenize=False, add_generation_prompt=True | |
| ) | |
| output = llm.generate(prompt, sampling_params) | |
| for out in output: | |
| print(out.outputs[0].text) | |
| ``` | |
| Outputs: | |
| <details> | |
| <summary>Sample Ouput</summary> | |
| We have a request: Write a python function that sorts a list of numbers in ascending order. | |
| We need to provide a code snippet, perhaps with docstring and explanation. | |
| We need to follow policies. There's no disallowed content. | |
| We need to output a function. Provide explanation. Possibly use built-in sorted or implement sorting algorithm like quicksort, mergesort, insertion sort, etc. Probably best to use built-in sorted for simplicity. | |
| But the user request: "Write a python function that sorts a list of numbers in ascending order". That's straightforward. Provide function. | |
| Maybe show also custom implementation if needed. Might show both simple approach and custom sort. | |
| Potential user might need more details: maybe they want to understand algorithm. | |
| We can also handle edge cases: empty list, single element, negative numbers, duplicates. | |
| We can show examples. | |
| We can implement both simple: return sorted(list) and maybe also selection sort. | |
| Probably best to answer with both: using built-in sorted function (for O(n log n) performance), and also a simple algorithm like bubble sort. | |
| We could also provide in-place sorting using list.sort. | |
| We'll output code. | |
| We can provide docstring, type hints. | |
| Ok. | |
| Now final answer: Provide code with explanation. | |
| Will follow best practice: include "def sort_ascending(nums: list[float]) -> list[float]:". | |
| We'll have simple implementation. We'll also optionally show an O(n log n) solution. | |
| We might provide test cases. | |
| Thus final answer: provide python function. | |
| Will also mention that sorted returns new list, while list.sort sorts in-place. | |
| Ok. | |
| Now produce final output. | |
| </think> | |
| Here’s a simple, well‑documented Python function that returns a new list containing the numbers from the input list sorted in ascending order. | |
| It uses Python’s built‑in `sorted` (which implements Timsort – an O(n log n) algorithm), but I’ve also included a tiny custom implementation (bubble sort) if you want to see a “manual” approach. | |
| ```python | |
| from typing import List | |
| def sort_ascending(nums: List[float]) -> List[float]: | |
| """ | |
| Return a new list containing the numbers from *nums* sorted | |
| in ascending order. | |
| Parameters | |
| ---------- | |
| nums : List[float] | |
| A list of numbers (ints, floats, or a mix). The list may be empty. | |
| Returns | |
| ------- | |
| List[float] | |
| A new list with the same elements sorted from smallest to largest. | |
| The original list is left unchanged. | |
| Examples | |
| -------- | |
| >>> sort_ascending([3, 1, 4, 1, 5, 9, 2, 6]) | |
| [1, 1, 2, 3, 4, 5, 6, 9] | |
| >>> sort_ascending([]) | |
| [] | |
| """ | |
| # --- Approach 1: built‑in sorted (fast, idiomatic) -------------------- | |
| # This creates a shallow copy and sorts it. | |
| return sorted(nums) | |
| # ---------------------------------------------------------------------- | |
| # If you’d like to see a simple “manual” sort, comment out the line | |
| # above and uncomment the following function call: | |
| # | |
| # return _bubble_sort_copy(nums) | |
| def _bubble_sort_copy(arr: List[float]) -> List[float]: | |
| """ | |
| Return a copy of *arr* sorted with the bubble‑sort algorithm. | |
| This is only for illustration – it runs in O(n²) time. | |
| """ | |
| # Make a shallow copy so we don’t mutate the original list. | |
| result = list(arr) | |
| n = len(result) | |
| for i in range(n): | |
| # After each outer iteration, the last i elements are already in place. | |
| for j in range(0, n - i - 1): | |
| if result[j] > result[j + 1]: | |
| # Swap adjacent elements. | |
| result[j], result[j + 1] = result[j + 1], result[j] | |
| return result | |
| # ------------------------------------------------------------------------- | |
| # Example usage | |
| if __name__ == "__main__": | |
| test_lists = [ | |
| [3, 1, 4, 1, 5, 9, 2, 6], | |
| [], | |
| [7], | |
| [-2, 5, -1, 0, 3], | |
| [4.4, 2.2, 1.1, 3.3] | |
| ] | |
| for lst in test_lists: | |
| sorted_lst = sort_ascending(lst) | |
| print(f"Original: {lst} → Sorted: {sorted_lst}") | |
| ``` | |
| ### What the function does | |
| 1. **Input validation** – Accepts any iterable of numbers (ints, floats, etc.). | |
| 2. **Built‑in solution** – `sorted(nums)` creates a new list and sorts it using Timsort, | |
| </details> |