kernels-community
/

megablocks

Kernels

Model card Files Files and versions

xet

Community

kernels-bot commited on 3 days ago

Commit

1b3fb21

verified ·

1 Parent(s): 7038338

Uploaded using `kernel-builder`.

Browse files

Files changed (1) hide show

README.md +31 -64

README.md CHANGED Viewed

@@ -1,76 +1,43 @@
 ---
 license: apache-2.0
-tags:
-  - kernels
 ---
-## Quickstart
-```bash
-uv run https://huggingface.co/kernels-community/megablocks/raw/main/readme_example.py
-```
 ```python
-# /// script
-# requires-python = "==3.10"
-# dependencies = [
-#     "numpy",
-#     "kernels",
-#     "torch"
-# ]
-# ///
-import torch
-from collections import namedtuple
 from kernels import get_kernel
-# Make reproducible
-torch.manual_seed(42)
-torch.cuda.manual_seed(42)
-# Download optimized kernels from the Hugging Face hub
-megablocks = get_kernel("kernels-community/megablocks")
-print("MegaBlocks kernel downloaded successfully.")
-model = megablocks.layers.MegaBlocksMoeMLP()
-model.experts = namedtuple("Experts", ["gate_up_proj", "gate_down_proj", "down_proj", "hidden_size"])
-print("MegaBlocksMoeMLP instance created successfully.")
-# Config
-ne, hs, isz = 128, 1152, 3072
-# Router with proper initialization
-model.router = torch.nn.Linear(hs, ne, device="cuda")
-torch.nn.init.kaiming_uniform_(model.router.weight)
-# Expert layers with realistic weights
-e = model.experts
-e.gate_up_proj = torch.nn.Parameter(torch.randn(ne, hs, isz, device="cuda") * 0.02)
-e.gate_up_proj_bias = torch.nn.Parameter(torch.zeros(ne, isz, device="cuda"))
-e.down_proj = torch.nn.Parameter(torch.randn(ne, 1536, hs, device="cuda") * 0.02)
-e.down_proj_bias = torch.nn.Parameter(torch.zeros(ne, hs, device="cuda"))
-e.hidden_size = hs
-print("Expert layers initialized successfully.")
-# Test with normalized input
-x = torch.randn(1, 1, hs, device="cuda") * 0.1
-output, expert_weights = model(x)
-print("Model forward pass completed successfully.")
-print(f"Output shape: {output.shape}")
-print(f"Output range: [{output.min():.3f}, {output.max():.3f}]")
-print(f"Output: {output.flatten()[:10]}")
-print(f"Expert weights sum: {expert_weights.sum():.3f}")
 ```
-### Performance
-<img class="dark:hidden border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_light_animation.svg" />
-<img class="hidden dark:block border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_dark_animation.svg" />
-<img class="dark:hidden border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_light_latency.svg" />
-<img class="hidden dark:block border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_dark_latency.svg" />
-<img class="dark:hidden border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_light_throughput.svg" />
-<img class="hidden dark:block border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_dark_throughput.svg" />

 ---
+library_name: kernels
 license: apache-2.0
 ---
+This is the repository card of kernels-community/megablocks that has been pushed on the Hub. It was built to be used with the [`kernels` library](https://github.com/huggingface/kernels). This card was automatically generated.
+## How to use
 ```python
+# make sure `kernels` is installed: `pip install -U kernels`
 from kernels import get_kernel
+kernel_module = get_kernel("kernels-community/megablocks")
+MyReplacementLayer = kernel_module.MyReplacementLayer
+MyReplacementLayer(...)
 ```
+## Available functions
+- `MyReplacementLayer`
+- `exclusive_cumsum`
+- `inclusive_cumsum`
+- `histogram`
+- `indices`
+- `replicate_forward`
+- `replicate_backward`
+- `sort`
+- `cumsum`
+- `argsort`
+- `Arguments`
+- `ParallelDroplessMLP`
+- `dMoE`
+- `SparseGLU`
+- `MLP`
+- `SparseMLP`
+- `MoE`
+- `ParallelMLP`
+- `get_load_balancing_loss`
+## Benchmarks
+Benchmarking script is available for this kernel. Run `kernels benchmark kernels-community/megablocks`.