Commit
·
5608691
1
Parent(s):
6b79b2a
video datasets
Browse files
README.md
CHANGED
|
@@ -156,6 +156,8 @@ We compare our model to the original Flamingo along with [OpenFlamingo](openflam
|
|
| 156 |
|
| 157 |
We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
|
| 158 |
|
|
|
|
|
|
|
| 159 |
<img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
|
| 160 |
|
| 161 |
TODO: update this table
|
|
|
|
| 156 |
|
| 157 |
We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
|
| 158 |
|
| 159 |
+
As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
|
| 160 |
+
|
| 161 |
<img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
|
| 162 |
|
| 163 |
TODO: update this table
|