Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
17.5
TFLOPS
1
1
50
ottomate
alfredo-ottomate
Follow
nico-martin's profile picture
rozyjuise's profile picture
21world's profile picture
3 followers
·
27 following
https://ottomate.io
AI & ML interests
None yet
Recent Activity
liked
a Space
7 days ago
CohereLabs/Cohere-Transcribe-WebGPU
reacted
to
Parveshiiii
's
post
with 🔥
7 days ago
Just did something I’ve been meaning to try for ages. In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3. Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated. Turns out it doesn’t have to be. microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable. If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for. I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face. Blog → https://parveshiiii.github.io/blogs/microtok/ Trained tokenizer → https://huggingface.co/Parveshiiii/microtok GitHub repo → https://github.com/Parveshiiii/microtok
liked
a model
7 days ago
CohereLabs/cohere-transcribe-03-2026
View all activity
Organizations
alfredo-ottomate
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
an
article
about 2 months ago
view article
Article
Transformers.js v4: Now Available on NPM!
Feb 9
•
91