Qwen3-Embedding-8B-CoreML (ANE-Optimized)
English
This repository provides a pre-converted CoreML bundle for Qwen3-Embedding-8B and a lightweight OpenAI-compatible embedding service for Apple Silicon.
Bundle Specs
| Item | Value |
|---|---|
| Base model | Qwen/Qwen3-Embedding-8B |
| Embedding dimension | 4096 |
| Profiles | b1_s128 |
| Bundle path | bundles/qwen3_ane_bundle_8b |
| Default model id | qwen3-embedding-8b-ane |
| Package size (approx.) | 14G |
Repository Structure
qwen3_ane_embed/: runtime service code (/embeddings,/v1/embeddings)bundles/qwen3_ane_bundle_8b/: CoreML bundle and tokenizersetup_venv.sh: create Python 3.11 venv and install dependenciesrun_server.sh: start service with environment checksrequirements-service.txt: runtime dependencies
Quick Start
./setup_venv.sh
./run_server.sh
Health check:
curl -s http://127.0.0.1:8000/health
Embedding request:
curl -s http://127.0.0.1:8000/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{
"input": ["hello world", "ANE embedding service"],
"model": "qwen3-embedding-8b-ane",
"encoding_format": "float",
"dimensions": 1024
}'
Notes
- Fixed-shape profile:
b1_s128. - Single input max length is 128 tokens.
- First embedding request has a long warm-up phase (can take minutes on first load).
- Default compute setting is
cpu_and_ne(ANE-preferred, not ANE-guaranteed).
中文
这个仓库提供 Qwen3-Embedding-8B 的预转换 CoreML bundle,以及一个面向 Apple Silicon 的轻量 OpenAI 兼容 embedding 服务。
Bundle 规格
| 项目 | 值 |
|---|---|
| 基础模型 | Qwen/Qwen3-Embedding-8B |
| 向量维度 | 4096 |
| Profile | b1_s128 |
| Bundle 路径 | bundles/qwen3_ane_bundle_8b |
| 默认模型名 | qwen3-embedding-8b-ane |
| 包体积(约) | 14G |
目录结构
qwen3_ane_embed/:服务代码(/embeddings、/v1/embeddings)bundles/qwen3_ane_bundle_8b/:CoreML bundle 与 tokenizersetup_venv.sh:创建 Python 3.11 虚拟环境并安装依赖run_server.sh:带自检的启动脚本requirements-service.txt:运行时依赖
快速开始
./setup_venv.sh
./run_server.sh
健康检查:
curl -s http://127.0.0.1:8000/health
Embedding 请求:
curl -s http://127.0.0.1:8000/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{
"input": ["hello world", "ANE embedding service"],
"model": "qwen3-embedding-8b-ane",
"encoding_format": "float",
"dimensions": 1024
}'
说明
- 当前是固定 shape 的
b1_s128profile。 - 单条输入最大 128 token。
- 首次请求有明显预热,可能到分钟级。
- 默认
cpu_and_ne,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。
License
Apache-2.0. Please also follow the license and usage terms of the base Qwen model.
- Downloads last month
- 3