metadata
license: apache-2.0
GeoCAD-LLM: CAD Sequence Generation via Multimodal LLMs with Equivariant Geometric Features🛠️
GeoCAD-LLM_4B_pc🛠️
- Base_model: Qwen3-4B-Instruct
- Max sequence length: 8,192
- Epoch: 2
- Learning rate: 1e-4
- Batch size: 128
- This model specialized for pc-text-to-CAD. However, it also supports multi-modality.
GeoCAD-LLM Contributions🔥
- State-of-the-art Performace🏆 in Text2CAD datasets. (as shown in below Tables)
- Multimodal CAD Generation🌐: Both text-to-CAD and pc-text-to-CAD.
- GeoCAD-LLM directly generate CAD vector sequence as natural language.
- Novel Two Stage Training Pipeline🧭: In stage1, training semantic geometry alignment. In stage2, training fine-grained geometry. Especially, we direct levearge E(3)-equivariant features for geomtry-consistent supervision, inherently ensuring geometric feature consistency regardless of input orientation.
- Apply Point Cloud Dropout (PCD) technique🧶: PCD mitigates over-reliance on geometric inputs and improves multimodal generalization. Also, it is a critical training technique for multimodal CAD generation.
Performace (text-to-CAD & pc-text-to-CAD)🔥
Qualitative Results
Please check our paper and supplementary materials.🤗
Bibtex🤗
(TODO)


