Qwen3-Embedding-8B-CoreML (ANE-Optimized)

English

This repository provides a pre-converted CoreML bundle for Qwen3-Embedding-8B and a lightweight OpenAI-compatible embedding service for Apple Silicon.

Bundle Specs

Item	Value
Base model	`Qwen/Qwen3-Embedding-8B`
Embedding dimension	`4096`
Profiles	`b1_s128`
Bundle path	`bundles/qwen3_ane_bundle_8b`
Default model id	`qwen3-embedding-8b-ane`
Package size (approx.)	`14G`

Repository Structure

qwen3_ane_embed/: runtime service code (/embeddings, /v1/embeddings)
bundles/qwen3_ane_bundle_8b/: CoreML bundle and tokenizer
setup_venv.sh: create Python 3.11 venv and install dependencies
run_server.sh: start service with environment checks
requirements-service.txt: runtime dependencies

Quick Start

./setup_venv.sh
./run_server.sh

Health check:

curl -s http://127.0.0.1:8000/health

Embedding request:

curl -s http://127.0.0.1:8000/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{
    "input": ["hello world", "ANE embedding service"],
    "model": "qwen3-embedding-8b-ane",
    "encoding_format": "float",
    "dimensions": 1024
  }'

Notes

Fixed-shape profile: b1_s128.
Single input max length is 128 tokens.
First embedding request has a long warm-up phase (can take minutes on first load).
Default compute setting is cpu_and_ne (ANE-preferred, not ANE-guaranteed).

中文

这个仓库提供 Qwen3-Embedding-8B 的预转换 CoreML bundle，以及一个面向 Apple Silicon 的轻量 OpenAI 兼容 embedding 服务。

Bundle 规格

项目	值
基础模型	`Qwen/Qwen3-Embedding-8B`
向量维度	`4096`
Profile	`b1_s128`
Bundle 路径	`bundles/qwen3_ane_bundle_8b`
默认模型名	`qwen3-embedding-8b-ane`
包体积（约）	`14G`

目录结构

qwen3_ane_embed/：服务代码（/embeddings、/v1/embeddings）
bundles/qwen3_ane_bundle_8b/：CoreML bundle 与 tokenizer
setup_venv.sh：创建 Python 3.11 虚拟环境并安装依赖
run_server.sh：带自检的启动脚本
requirements-service.txt：运行时依赖

快速开始

./setup_venv.sh
./run_server.sh

健康检查：

curl -s http://127.0.0.1:8000/health

Embedding 请求：

curl -s http://127.0.0.1:8000/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{
    "input": ["hello world", "ANE embedding service"],
    "model": "qwen3-embedding-8b-ane",
    "encoding_format": "float",
    "dimensions": 1024
  }'

说明

当前是固定 shape 的 b1_s128 profile。
单条输入最大 128 token。
首次请求有明显预热，可能到分钟级。
默认 cpu_and_ne，是偏向 ANE 调度，不等于 100% 仅 ANE 执行。

License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.

Downloads last month: 3

Model tree for tooktang/Qwen3-Embedding-8B-CoreML

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Embedding-8B

Quantized

(20)

this model