Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published about 1 month ago • 208
Snyhlxde/shiftedattn-10-23-7b-qwen2p5-coder-n16w16-distilln32w16-ar-1-cyclic-noise-all-1e-6 Updated Nov 6
Snyhlxde/shiftedattn-10-16-7b-qwen2p5-coder-n32w16-n16distill-data-v2-ar-1-cyclic-noise-all-1e-6 Updated Oct 23 • 1
Snyhlxde/shiftedattn-10-23-7b-qwen2p5-coder-n16w16-distilln32w16-ar-1-cyclic-noise-all-1e-6 Updated Nov 6
Snyhlxde/shiftedattn-10-16-7b-qwen2p5-coder-n32w16-n16distill-data-v2-ar-1-cyclic-noise-all-1e-6 Updated Oct 23 • 1
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20 • 120