Qwen3-Embedding-8B-CoreML (ANE-Optimized)

English

This repository provides a pre-converted CoreML bundle for Qwen3-Embedding-8B and a lightweight OpenAI-compatible embedding service for Apple Silicon.

Bundle Specs

Item Value
Base model Qwen/Qwen3-Embedding-8B
Embedding dimension 4096
Profiles b1_s128
Bundle path bundles/qwen3_ane_bundle_8b
Default model id qwen3-embedding-8b-ane
Package size (approx.) 14G

Repository Structure

  • qwen3_ane_embed/: runtime service code (/embeddings, /v1/embeddings)
  • bundles/qwen3_ane_bundle_8b/: CoreML bundle and tokenizer
  • setup_venv.sh: create Python 3.11 venv and install dependencies
  • run_server.sh: start service with environment checks
  • requirements-service.txt: runtime dependencies

Quick Start

./setup_venv.sh
./run_server.sh

Health check:

curl -s http://127.0.0.1:8000/health

Embedding request:

curl -s http://127.0.0.1:8000/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{
    "input": ["hello world", "ANE embedding service"],
    "model": "qwen3-embedding-8b-ane",
    "encoding_format": "float",
    "dimensions": 1024
  }'

Notes

  • Fixed-shape profile: b1_s128.
  • Single input max length is 128 tokens.
  • First embedding request has a long warm-up phase (can take minutes on first load).
  • Default compute setting is cpu_and_ne (ANE-preferred, not ANE-guaranteed).

中文

这个仓库提供 Qwen3-Embedding-8B 的预转换 CoreML bundle,以及一个面向 Apple Silicon 的轻量 OpenAI 兼容 embedding 服务。

Bundle 规格

项目
基础模型 Qwen/Qwen3-Embedding-8B
向量维度 4096
Profile b1_s128
Bundle 路径 bundles/qwen3_ane_bundle_8b
默认模型名 qwen3-embedding-8b-ane
包体积(约) 14G

目录结构

  • qwen3_ane_embed/:服务代码(/embeddings/v1/embeddings
  • bundles/qwen3_ane_bundle_8b/:CoreML bundle 与 tokenizer
  • setup_venv.sh:创建 Python 3.11 虚拟环境并安装依赖
  • run_server.sh:带自检的启动脚本
  • requirements-service.txt:运行时依赖

快速开始

./setup_venv.sh
./run_server.sh

健康检查:

curl -s http://127.0.0.1:8000/health

Embedding 请求:

curl -s http://127.0.0.1:8000/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{
    "input": ["hello world", "ANE embedding service"],
    "model": "qwen3-embedding-8b-ane",
    "encoding_format": "float",
    "dimensions": 1024
  }'

说明

  • 当前是固定 shape 的 b1_s128 profile。
  • 单条输入最大 128 token。
  • 首次请求有明显预热,可能到分钟级。
  • 默认 cpu_and_ne,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。

License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tooktang/Qwen3-Embedding-8B-CoreML

Base model

Qwen/Qwen3-8B-Base
Quantized
(20)
this model