knowledgator
/

gliner-bi-small-v2.0

+---
+license: apache-2.0
+language:
+- en
+library_name: gliner
+pipeline_tag: token-classification
+tags:
+- NER
+- GLiNER
+- information extraction
+- encoder
+- entity recognition
+- modernbert
+- bi-encoder
+- scalable-ner
+- zero-shot-ner
+base_model:
+- jhu-clsp/ettin-encoder-32m
+- jhu-clsp/ettin-encoder-68m
+- jhu-clsp/ettin-encoder-150m
+- jhu-clsp/ettin-encoder-400m
+- sentence-transformers/all-MiniLM-L6-v2
+- sentence-transformers/all-MiniLM-L12-v2
+- BAAI/bge-small-en-v1.5
+- BAAI/bge-base-en-v1.5
+---
+# GLiNER-bi-Encoder: Scalable Zero-Shot Named Entity Recognition
+![image](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/2PPPxCfKpt9eS9D1_anf8.png)
+## About
+GLiNER-bi-Encoder is a novel architecture for Named Entity Recognition (NER) that combines zero-shot flexibility with industrial-scale efficiency. Unlike the original GLiNER, which uses joint encoding, the bi-encoder design **decouples text and entity-type encoding**, enabling the recognition of thousands of entity types simultaneously with minimal computational overhead.
+### Key Advantages
+**Massive Scalability**: Handle 1000+ entity types with near-constant inference speed when using pre-computed label embeddings
+**130× Faster**: Up to 130× throughput improvement compared to uni-encoder approaches at 1024 entity types
+**State-of-the-Art Performance**: Achieves 61.5% Micro-F1 on CrossNER benchmark in zero-shot setting
+**Efficient Caching**: Pre-compute and cache entity type embeddings for instant reuse across millions of documents
+## Architecture
+The bi-encoder architecture employs two specialized, independent transformers:
+- **Text Encoder**: Processes input sequences using ModernBERT-based encoders (Ettin family)
+- **Label Encoder**: Embeds entity type descriptions using specialized sentence transformers (BGE, MiniLM)
+This separation removes the context-window bottleneck and enables:
+- Pre-computation of entity type embeddings
+- Constant memory usage for text encoding regardless of entity count
+- Efficient nearest-neighbor search for entity matching
+## Model Variants
+GLiNER-bi-V2 Models:
+| Model name | Params | Text Encoder | Label Encoder | Avg. CrossNER | Inference Speed (H100, ex/s) | Inference Speed (pre-computed) |
+|------------|--------|--------------|---------------|---------------|------------------------------|--------------------------------|
+| [gliner-bi-edge-v2.0](https://huggingface.co/knowledgator/gliner-bi-edge-v2.0) | 60 M | [ettin-encoder-32m](https://huggingface.co/jhu-clsp/ettin-encoder-32m) | [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 54.0% | 13.64 | 24.62 |
+| [gliner-bi-small-v2.0](https://huggingface.co/knowledgator/gliner-bi-small-v2.0) | 108 M | [ettin-encoder-68m](https://huggingface.co/jhu-clsp/ettin-encoder-68m) | [all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) | 57.2% | 7.99 | 15.22 |
+| [gliner-bi-base-v2.0](https://huggingface.co/knowledgator/gliner-bi-base-v2.0) | 194 M | [ettin-encoder-150m](https://huggingface.co/jhu-clsp/ettin-encoder-150m) | [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 60.3% | 5.91 | 9.51 |
+| [gliner-bi-large-v2.0](https://huggingface.co/knowledgator/gliner-bi-large-v2.0) | 530 M | [ettin-encoder-400m](https://huggingface.co/jhu-clsp/ettin-encoder-400m) | [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 61.5% | 2.68 | 3.60 |
+**Recommendation**: The **base variant (194M)** achieves 98% of large model performance while operating 2.6× faster, making it optimal for most production scenarios.
+## Installation & Usage
+### Installation
+```bash
+pip install gliner -U
+pip install transformers>=4.48.0
+```
+For flash attention support:
+```bash
+pip install flash-attn triton
+```
+### Basic Usage
+```python
+from gliner import GLiNER
+# Load model
+model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")
+text = """
+Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player.
+"""
+labels = ["person", "award", "date", "competitions", "teams"]
+entities = model.predict_entities(text, labels, threshold=0.3)
+for entity in entities:
+    print(entity["text"], "=>", entity["label"])
+```
+**Output:**
+```
+Cristiano Ronaldo dos Santos Aveiro => person
+5 February 1985 => date
+Al Nassr => teams
+Portugal national team => teams
+Ballon d'Or => award
+UEFA Men's Player of the Year Awards => award
+European Golden Shoes => award
+```
+### Advanced Usage: Pre-computing Entity Embeddings
+For scenarios with large, static entity taxonomies (hundreds to millions of types):
+```python
+from gliner import GLiNER
+model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")
+# Pre-compute embeddings for thousands of entity types
+entity_types = ["person", "organization", "location", ...] # Can be thousands
+texts = ["Your documents here", ...]
+# Encode entity types once
+entity_embeddings = model.encode_labels(entity_types, batch_size=8)
+# Use pre-computed embeddings for fast inference
+outputs = model.batch_predict_with_embeds(texts, entity_embeddings, entity_types)
+```
+This approach provides:
+- **130× speedup** at 1024 entity types
+- **Constant inference time** regardless of entity count
+- **Efficient caching** for repeated use
+### Flash Attention & Extended Context
+```python
+model = GLiNER.from_pretrained(
+    "knowledgator/gliner-bi-base-v2.0",
+    _attn_implementation='flash_attention_2',
+    max_len=2048
+).to('cuda:0')
+```
+### Zero-Shot NER Performance
+Comprehensive evaluation across 19 diverse NER datasets:
+| Dataset | gliner-bi-edge-v2.0 | gliner-bi-small-v2.0 | gliner-bi-base-v2.0 | gliner-bi-large-v2.0 |
+|---------|---------------------|----------------------|---------------------|----------------------|
+| ACE 2004 | 26.4% | 27.5% | 28.9% | 31.9% |
+| ACE 2005 | 26.2% | 28.1% | 30.0% | 31.4% |
+| AnatEM | 39.1% | 43.6% | 35.4% | 39.5% |
+| Broad Tweet Corpus | 70.0% | 71.7% | 72.1% | 70.9% |
+| CoNLL 2003 | 61.6% | 64.2% | 65.6% | 66.5% |
+| FabNER | 22.4% | 23.2% | 24.3% | 22.7% |
+| FindVehicle | 35.6% | 40.3% | 40.6% | 39.1% |
+| GENIA_NER | 50.1% | 53.8% | 56.8% | 60.1% |
+| HarveyNER | 15.0% | 10.6% | 12.6% | 14.7% |
+| MultiNERD | 64.6% | 66.0% | 68.0% | 64.0% |
+| Ontonotes | 31.4% | 31.9% | 33.3% | 32.5% |
+| PolyglotNER | 45.1% | 46.3% | 46.6% | 46.8% |
+| TweetNER7 | 36.9% | 40.9% | 40.4% | 41.7% |
+| WikiANN en | 52.3% | 54.0% | 54.9% | 56.3% |
+| WikiNeural | 78.0% | 79.9% | 80.0% | 76.6% |
+| bc2gm | 58.1% | 59.9% | 62.7% | 61.4% |
+| bc4chemd | 45.8% | 49.1% | 53.6% | 50.5% |
+| bc5cdr | 68.5% | 71.5% | 73.0% | 71.7% |
+| ncbi | 65.9% | 65.4% | 65.2% | 65.9% |
+| **Average** | **47.0%** | **48.8%** | **49.7%** | **49.7%** |
+### CrossNER Zero-Shot Benchmark
+| Dataset | gliner-bi-edge-v2.0 | gliner-bi-small-v2.0 | gliner-bi-base-v2.0 | gliner-bi-large-v2.0 |
+|---------|---------------------|----------------------|---------------------|----------------------|
+| CrossNER_AI | 53.8% | 54.7% | 58.3% | 57.4% |
+| CrossNER_literature | 56.2% | 62.6% | 65.2% | 63.2% |
+| CrossNER_music | 68.2% | 72.3% | 73.4% | 74.0% |
+| CrossNER_politics | 68.7% | 70.0% | 70.8% | 73.0% |
+| CrossNER_science | 63.2% | 66.1% | 68.0% | 67.6% |
+| mit-movie | 30.5% | 35.2% | 46.2% | 51.0% |
+| mit-restaurant | 37.1% | 39.5% | 40.3% | 44.3% |
+| **Average (Zero-Shot Benchmark)** | **54.0%** | **57.2%** | **60.3%** | **61.5%** |
+### Inference Speed Comparison
+Throughput (examples/second) by number of entity types on H100 GPU (batch_size=1):
+| Model | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | **Avg** |
+|-------|---|---|---|---|----|----|----|-----|-----|-----|------|---------|
+| gliner-bi-edge-v2.0 | 17.0 | 27.0 | 5.05 | 22.4 | 17.5 | 13.9 | 15.2 | 12.5 | 10.8 | 5.43 | 3.23 | **13.64** |
+| gliner-bi-edge-v2.0 (pre-computed) | 19.3 | 25.0 | 28.2 | 32.6 | 31.0 | 32.6 | 22.2 | 22.7 | 22.2 | 16.9 | 18.3 | **24.62** |
+| gliner-bi-small-v2.0 | 12.5 | 12.8 | 5.98 | 11.6 | 10.6 | 9.43 | 6.94 | 7.35 | 5.74 | 3.33 | 1.60 | **7.99** |
+| gliner-bi-small-v2.0 (pre-computed) | 14.7 | 15.9 | 14.3 | 15.3 | 15.4 | 15.4 | 15.6 | 15.3 | 15.5 | 15.7 | 14.3 | **15.22** |
+| gliner-bi-base-v2.0 | 8.13 | 8.62 | 4.85 | 8.00 | 7.52 | 6.76 | 5.71 | 5.21 | 4.64 | 3.21 | 2.30 | **5.91** |
+| gliner-bi-base-v2.0 (pre-computed) | 9.52 | 10.2 | 9.80 | 9.95 | 10.0 | 9.93 | 8.93 | 6.71 | 9.35 | 9.71 | 10.5 | **9.51** |
+| gliner-bi-large-v2.0 | 3.52 | 2.53 | 3.87 | 3.50 | 3.66 | 3.19 | 1.90 | 2.46 | 2.39 | 1.62 | 0.87 | **2.68** |
+| gliner-bi-large-v2.0 (pre-computed) | 4.37 | 4.07 | 4.53 | 4.54 | 4.47 | 3.46 | 3.85 | 3.04 | 2.82 | 1.84 | 2.64 | **3.60** |
+| | | | | | | | | | | | | |
+| gliner_small-v2.5 (uni-encoder) | 10.7 | 14.6 | 14.1 | 13.2 | 11.9 | 10.3 | 7.91 | 4.26 | 1.29 | 0.43 | 0.14 | **8.08** |
+| gliner_medium-v2.5 (uni-encoder) | 7.81 | 8.51 | 8.39 | 7.58 | 7.12 | 5.62 | 4.18 | 2.19 | 0.68 | 0.23 | 0.07 | **4.76** |
+| gliner_large-v2.5 (uni-encoder) | 2.89 | 3.28 | 3.29 | 2.90 | 2.61 | 2.33 | 1.71 | 1.12 | 0.31 | 0.09 | 0.03 | **1.87** |
+**Key Insight**: Bi-encoder with pre-computed embeddings maintains near-constant speed (5.2% degradation from 1→1024 labels) while uni-encoder shows 98.7% degradation.
+## Use Cases
+### Biomedical Entity Linking
+Process millions of documents against UMLS (4M+ concepts), SNOMED CT, or other large medical ontologies with pre-computed embeddings.
+### Enterprise Knowledge Extraction
+Deploy dynamic taxonomies that evolve without model retraining. Add new entity types instantly by computing their embeddings.
+### Scientific Literature Mining
+Extract entities across multiple specialized domains (chemistry, biology, physics) with domain-specific label encoders.
+## Entity Linking with GLiNKER
+GLiNER-bi-Encoder extends naturally to entity linking through the **GLiNKER** framework—a modular DAG-based pipeline for:
+- Mention extraction with GLiNER
+- Candidate retrieval from knowledge bases via pre-computed embeddings
+- Entity disambiguation using bi-encoder scoring
+**Learn more**: [GLiNKER Repository](https://github.com/Knowledgator/GLinker)
+## Model Details
+### Training Data
+- **Pre-training**: 8M samples (Large/Base/Small), 10M samples (Edge) from FineFineWeb, annotated with GPT-4o
+- **Post-training**: 40K high-quality samples with sequences up to 2048 tokens for long-context refinement
+### Training Configuration
+- **Focal Loss**: α=0.7 (pre-training), α=0.8 (post-training), γ=2.0
+- **Optimizer**: AdamW with differential learning rates (encoder: 1e-5, other: 3e-5)
+- **Context Length**: 1024 tokens (pre-training), 2048 tokens (post-training)
+- **Maximum Span Width**: 12 tokens
+- **Dropout**: 0.35
+## Citation
+If you use GLiNER-bi-Encoder in your research, please cite:
+```bibtex
+@misc{stepanov2024glinermultitask,
+      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
+      author={Ihor Stepanov and Mykhailo Shtopko},
+      year={2024},
+      eprint={2406.12925},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG}
+}
+```
+## Acknowledgments
+We sincerely thank Urchade Zaratiana (creator of GLiNER) and Tom Aarsen (maintainer of Sentence Transformers) for their foundational work.
+## Join Our Community
+Connect with our community on Discord for news, support, and discussions: [Join Discord](https://discord.gg/HbW9aNJ9)
+## Resources
+- **Paper**: [arXiv preprint (coming soon)](https://arxiv.org)
+- **GLiNKER Framework**: [GLiNKER](https://github.com/Knowledgator/GLinker)
+- **Model Collection**: [HuggingFace Collection](https://hf.co/collections/knowledgator/gliner-bi-v2)
+---
+**Knowledgator Engineering © 2026**