Kernels
kernels-bot commited on
Commit
1b3fb21
·
verified ·
1 Parent(s): 7038338

Uploaded using `kernel-builder`.

Browse files
Files changed (1) hide show
  1. README.md +31 -64
README.md CHANGED
@@ -1,76 +1,43 @@
1
  ---
 
2
  license: apache-2.0
3
- tags:
4
- - kernels
5
  ---
6
 
7
- ## Quickstart
8
 
9
- ```bash
10
- uv run https://huggingface.co/kernels-community/megablocks/raw/main/readme_example.py
11
- ```
12
 
13
  ```python
14
- # /// script
15
- # requires-python = "==3.10"
16
- # dependencies = [
17
- # "numpy",
18
- # "kernels",
19
- # "torch"
20
- # ]
21
- # ///
22
-
23
- import torch
24
- from collections import namedtuple
25
-
26
  from kernels import get_kernel
27
 
28
- # Make reproducible
29
- torch.manual_seed(42)
30
- torch.cuda.manual_seed(42)
31
-
32
- # Download optimized kernels from the Hugging Face hub
33
- megablocks = get_kernel("kernels-community/megablocks")
34
- print("MegaBlocks kernel downloaded successfully.")
35
-
36
- model = megablocks.layers.MegaBlocksMoeMLP()
37
- model.experts = namedtuple("Experts", ["gate_up_proj", "gate_down_proj", "down_proj", "hidden_size"])
38
- print("MegaBlocksMoeMLP instance created successfully.")
39
-
40
- # Config
41
- ne, hs, isz = 128, 1152, 3072
42
 
43
- # Router with proper initialization
44
- model.router = torch.nn.Linear(hs, ne, device="cuda")
45
- torch.nn.init.kaiming_uniform_(model.router.weight)
46
-
47
- # Expert layers with realistic weights
48
- e = model.experts
49
- e.gate_up_proj = torch.nn.Parameter(torch.randn(ne, hs, isz, device="cuda") * 0.02)
50
- e.gate_up_proj_bias = torch.nn.Parameter(torch.zeros(ne, isz, device="cuda"))
51
- e.down_proj = torch.nn.Parameter(torch.randn(ne, 1536, hs, device="cuda") * 0.02)
52
- e.down_proj_bias = torch.nn.Parameter(torch.zeros(ne, hs, device="cuda"))
53
- e.hidden_size = hs
54
- print("Expert layers initialized successfully.")
55
-
56
- # Test with normalized input
57
- x = torch.randn(1, 1, hs, device="cuda") * 0.1
58
- output, expert_weights = model(x)
59
- print("Model forward pass completed successfully.")
60
-
61
- print(f"Output shape: {output.shape}")
62
- print(f"Output range: [{output.min():.3f}, {output.max():.3f}]")
63
- print(f"Output: {output.flatten()[:10]}")
64
- print(f"Expert weights sum: {expert_weights.sum():.3f}")
65
  ```
66
 
67
- ### Performance
68
-
69
- <img class="dark:hidden border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_light_animation.svg" />
70
- <img class="hidden dark:block border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_dark_animation.svg" />
71
-
72
- <img class="dark:hidden border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_light_latency.svg" />
73
- <img class="hidden dark:block border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_dark_latency.svg" />
74
-
75
- <img class="dark:hidden border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_light_throughput.svg" />
76
- <img class="hidden dark:block border border-gray-200 dark:border-gray-700 rounded-lg" src="media/benches_dark_throughput.svg" />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: kernels
3
  license: apache-2.0
 
 
4
  ---
5
 
6
+ This is the repository card of kernels-community/megablocks that has been pushed on the Hub. It was built to be used with the [`kernels` library](https://github.com/huggingface/kernels). This card was automatically generated.
7
 
8
+ ## How to use
 
 
9
 
10
  ```python
11
+ # make sure `kernels` is installed: `pip install -U kernels`
 
 
 
 
 
 
 
 
 
 
 
12
  from kernels import get_kernel
13
 
14
+ kernel_module = get_kernel("kernels-community/megablocks")
15
+ MyReplacementLayer = kernel_module.MyReplacementLayer
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ MyReplacementLayer(...)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ```
19
 
20
+ ## Available functions
21
+ - `MyReplacementLayer`
22
+ - `exclusive_cumsum`
23
+ - `inclusive_cumsum`
24
+ - `histogram`
25
+ - `indices`
26
+ - `replicate_forward`
27
+ - `replicate_backward`
28
+ - `sort`
29
+ - `cumsum`
30
+ - `argsort`
31
+ - `Arguments`
32
+ - `ParallelDroplessMLP`
33
+ - `dMoE`
34
+ - `SparseGLU`
35
+ - `MLP`
36
+ - `SparseMLP`
37
+ - `MoE`
38
+ - `ParallelMLP`
39
+ - `get_load_balancing_loss`
40
+
41
+ ## Benchmarks
42
+
43
+ Benchmarking script is available for this kernel. Run `kernels benchmark kernels-community/megablocks`.