AXERA-TECH
/

MobileCLIP

Image-Text-to-Text

Model card Files Files and versions

jordan0811 commited on Sep 17

Commit

6eca370

·

verified ·

1 Parent(s): 46195af

Create README.md

Files changed (1) hide show

README.md +61 -0

README.md ADDED Viewed

	@@ -0,0 +1,61 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- apple/MobileCLIP2-S4
+- apple/MobileCLIP2-S2
+pipeline_tag: image-text-to-text
+tags:
+- MobileCLIP
+- MobileCLIP2
+- CLIP
+- Classification
+---
+# MobileCLIP2
+The following versions of MobileCLIP2 have been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2
+- MobileCLIP2-S2
+- MobileCLIP2-S4
+If you want to know how to convert the MobileCLIP2 model into an axmodel that can run on the axera npu board, please read [this link](https://github.com/AXERA-TECH/axera.ml-mobileclip) in detail.
+## Support Platform
+- AX650
+## End-of-board inference time
+- MobileCLIP2-S2
+| Stage | Time |
+  |------|------|
+  | image encoder | 19.146 ms  |
+  | text encoder | 5.675 ms  |
+-  MobileCLIP2-S4
+  | Stage | Time |
+  |------|------|
+  | image encoder | 65.328 ms  |
+  | text encoder | 12.663 ms  |
+## How to use
+Download all files from this repository to the device
+Run the following command:
+```bash
+python3 run_axmodel.py -ie ./mobileclip2_s4_image_encoder.axmodel -te ./mobileclip2_s4_text_encoder.axmodel -i ./zebra.jpg -t "a zebra" "a dog" "two zebras"
+```
+Model input and output examples are as follows:
+1. the image you want to input:
+    ![](zebra.jpg)
+3. The description of the text you want to categorize:
+    ["a zebra", "a dog", "two zebras"]
+4. Model output class confidence scores:
+    Label probs: [[6.095444e-02 5.628616e-14 9.390456e-01]]