Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
fxmeng
's Collections
TransMLA-base
CLOVER-Commonsense-148k
PiSSA-LLaMA-3-8B
PiSSA-LLaMA-3-70B
PiSSA-LLaMA-2-7B
PiSSA-LLaMA-3-8B-Instruct
PiSSA-Qwen2
PiSSA Datasets
Mixtral-1~8x7B-Instruct-v0.1
TransMLA-base
updated
Nov 7
Base Model for TransMLA
Upvote
-
TransMLA: Multi-head Latent Attention Is All You Need
Paper
•
2502.07864
•
Published
Feb 11
•
58
fxmeng/transmla_pretrain_6B_tokens
Viewer
•
Updated
Jul 5
•
5.94M
•
1.55k
fxmeng/transmla_pretrain_1B_tokens
Viewer
•
Updated
Jul 5
•
1.14M
•
141
fxmeng/transmla_pretrain_100m_tokens
Viewer
•
Updated
Jul 5
•
100k
•
19
fxmeng/TransMLA-llama3-8b-8k
8B
•
Updated
20 days ago
•
55
fxmeng/TransMLA-llama3-8b-32k
8B
•
Updated
20 days ago
•
40
Upvote
-
Share collection
View history
Collection guide
Browse collections