nkkbr
/

ViCA2

Video-Text-to-Text

text-generation

vision-language

video understanding

visuospatial cognition

spatial reasoning

Eval Results (legacy)

Model card Files Files and versions

nkkbr commited on Dec 15, 2025

Commit

c1f48c6

·

verified ·

1 Parent(s): 4262a2d

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -217,4 +217,26 @@ text_outputs = tokenizer.batch_decode(cont, skip_special_tokens=True)[0].strip()
 print(repr(text_outputs))
 ```
 ---

 print(repr(text_outputs))
 ```
+## Citation
+If you find our work helpful, we would appreciate it if you cite the following papers.
+```bibtex
+@misc{feng2025vica2,
+      title={Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts},
+      author={Feng, Qi},
+      publisher={arXiv:2505.12363},
+      year={2025},
+}
+```
+```bibtex
+@misc{feng2025vica,
+      title={Visuospatial Cognitive Assistant},
+      author={Feng, Qi},
+      publisher={arXiv:2505.12312},
+      year={2025},
+}
+```
 ---