wesley-mtk

AI & ML interests

None yet

Recent Activity

liked a model 26 days ago

nvidia/gpt-oss-120b-Eagle3-short-context

upvoted an article about 1 month ago

Continuous batching from first principles

upvoted an article about 2 months ago

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

View all activity

Organizations

None yet

liked a model 26 days ago

nvidia/gpt-oss-120b-Eagle3-short-context

Text Generation • Updated 17 days ago • 5.13k • 10

upvoted an article about 1 month ago

Article

Continuous batching from first principles

Nov 25

•

288

upvoted an article about 2 months ago

Article

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

Sep 16

•

liked a model about 2 months ago

merve/smol-vision

Image-Text-to-Text • Updated Nov 5 • 188

liked a Space about 2 months ago

The Smol Training Playbook

📚

2.72k

The secrets to building world-class LLMs

liked a model 5 months ago

Qwen/Qwen3-Coder-30B-A3B-Instruct

Text Generation • 31B • Updated 26 days ago • 849k • • 835

liked a dataset 5 months ago

a-m-team/AM-DeepSeek-R1-Distilled-1.4M

Preview • Updated Mar 30 • 1.61k • 172

upvoted an article 10 months ago

Article

Open R1: Update #3

Mar 11

•

296

upvoted 2 articles 11 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

•

1.31k

Article

Open-R1: Update #1

Feb 2

•

305

upvoted 2 articles 12 months ago

Article

Mastering Tensor Dimensions in Transformers

Jan 12

•

125

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

Jan 15

•

liked a dataset about 1 year ago

Multilingual-Multimodal-NLP/McEval-Instruct

Viewer • Updated Jun 12, 2024 • 35.9k • 197 • 35

upvoted an article about 1 year ago

Article

Low Latency CPU Based Educational Value Classifier With Generic Educational Value

Jun 12, 2024

•

liked a dataset about 1 year ago

yuxiang630/hqcode

Viewer • Updated Aug 1, 2024 • 221k • 58 • 16

upvoted an article about 1 year ago

Article

Let's talk about LLM evaluation

May 23, 2024

•

204

liked a Space over 1 year ago

FineWeb: decanting the web for the finest text data at scale

🍷

1.24k

Generate high-quality text data for LLMs using FineWeb

liked a dataset over 1 year ago

bigcode/the-stack-v2-train-smol-ids

Viewer • Updated Apr 23, 2024 • 40.1M • 1.72k • 44

upvoted an article over 1 year ago

Article

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Jun 13, 2024

•

liked a Space over 1 year ago

Open LLM Progress Tracker

🔬

151

Visualize Open vs. Proprietary LLM Progress

wesley-mtk

AI & ML interests

Recent Activity

Organizations

wesley-mtk's activity

Continuous batching from first principles

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

The Smol Training Playbook

Open R1: Update #3

Open-source DeepResearch – Freeing our search agents

Open-R1: Update #1

Mastering Tensor Dimensions in Transformers

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

Low Latency CPU Based Educational Value Classifier With Generic Educational Value

Let's talk about LLM evaluation

FineWeb: decanting the web for the finest text data at scale

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Open LLM Progress Tracker