Update README.md
Browse files
README.md
CHANGED
|
@@ -150,9 +150,8 @@ model (CPTR) using an encoder-decoder transformer [[1]](#1). The source image is
|
|
| 150 |
to the transformer encoder in sequence patches. Hence, one can treat the image
|
| 151 |
captioning problem as a machine translation task.
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
width="80%" padding="100px 100px 100px 10px">
|
| 156 |
|
| 157 |
Figure 1: Encoder Decoder Architecture
|
| 158 |
|
|
@@ -183,9 +182,8 @@ The encoder side deals solely with the image part, where it is beneficial to
|
|
| 183 |
exploit the relative position of the features we have. Refer to Figure 2 for
|
| 184 |
the model architecture.
|
| 185 |
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
width="80%" padding="100px 100px 100px 10px">
|
| 189 |
|
| 190 |
Figure 2: Model Architecture
|
| 191 |
|
|
@@ -344,9 +342,7 @@ The reason for overfitting may be due to the following reasons:
|
|
| 344 |
|
| 345 |
4. Unsuitable hyperparameters
|
| 346 |
|
| 347 |
-
|
| 348 |
-
| :--: | :--: |
|
| 349 |
-
| Figure 3: Loss Curve | Figure 4: Bleu-4 score curv |
|
| 350 |
|
| 351 |
### Inference Output
|
| 352 |
|
|
@@ -359,9 +355,7 @@ distribution of the lengths is positively skewed. More specifically, the
|
|
| 359 |
maximum caption length generated by the model (21 tokens) accounts for 98.66%
|
| 360 |
of the lengths in the training set. See “code/experiment.ipynb Section 1.3”.
|
| 361 |
|
| 362 |
-
|
| 363 |
-
src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/lens.png"
|
| 364 |
-
padding="100px 100px 100px 10px">
|
| 365 |
|
| 366 |
Figure 5: Generated caption's lengths distribution
|
| 367 |
|
|
|
|
| 150 |
to the transformer encoder in sequence patches. Hence, one can treat the image
|
| 151 |
captioning problem as a machine translation task.
|
| 152 |
|
| 153 |
+
|
| 154 |
+

|
|
|
|
| 155 |
|
| 156 |
Figure 1: Encoder Decoder Architecture
|
| 157 |
|
|
|
|
| 182 |
exploit the relative position of the features we have. Refer to Figure 2 for
|
| 183 |
the model architecture.
|
| 184 |
|
| 185 |
+
|
| 186 |
+

|
|
|
|
| 187 |
|
| 188 |
Figure 2: Model Architecture
|
| 189 |
|
|
|
|
| 342 |
|
| 343 |
4. Unsuitable hyperparameters
|
| 344 |
|
| 345 |
+

|
|
|
|
|
|
|
| 346 |
|
| 347 |
### Inference Output
|
| 348 |
|
|
|
|
| 355 |
maximum caption length generated by the model (21 tokens) accounts for 98.66%
|
| 356 |
of the lengths in the training set. See “code/experiment.ipynb Section 1.3”.
|
| 357 |
|
| 358 |
+

|
|
|
|
|
|
|
| 359 |
|
| 360 |
Figure 5: Generated caption's lengths distribution
|
| 361 |
|