Image-Text-to-Text
PEFT
English
Chinese
biology
protein
DNA
multimodal
vision-language
mixture-of-experts
Mixture of Experts
lora
bioinformatics
conversational
Instructions to use dnagpt/OmniGene-4-MM-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use dnagpt/OmniGene-4-MM-LoRA with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
OmniGene-4-MM (Stage 3 v3, LoRA + embedding)
Multi-modal extension of OmniGene-4 v5 that adds four vision modalities (chemical-structure images, medical / pathology imagery, charts) on top of the v5 sequence + language capability.
This repository hosts the LoRA adapter + extended embedding (~1.7 GB).
You need to first load the base model
dnagpt/OmniGene-4-SFT-v5-merged
and then patch it with the artefacts here. A merged BF16 release is forthcoming
as dnagpt/OmniGene-4-MM-merged.
Headline numbers
| Capability | Stage 3 v3 | v5 (text-only) |
|---|---|---|
| BioPAWS standard homology | 85.0 % | 99.4 % |
| BioPAWS remote homology | 69.5 % | 82.6 % |
Vis-CheBI20 struct_recog |
1.00 | — |
Vis-CheBI20 struct_cap |
0.96 | — |
| Cell-marker → cell-type ID (kw-overlap) | 0.95 | — |
| SMILES → physicochem descriptor (kw-overlap) | 0.91 | — |
| Protein-pair homology generation (kw-overlap) | 1.00 | — |
| Total compute | ~1.5 GPU-days (single H20) | 1.5 GPU-days |
Files
| File | Size | What it is |
|---|---|---|
lora_weights.pt |
160 MB | LoRA adapter state-dict (r=64, α=128, on q/k/v/o, gate/up/down, router.proj) |
embedding_weights.pt |
1.6 GB | Extended embedding table (290,172 × 2,816, BF16) |
tokenizer.json + tokenizer_config.json |
37 MB | Tokenizer with 28,028 biological tokens |
processor_config.json |
2 KB | Multimodal processor configuration |
chat_template.jinja |
16 KB | Chat template |
meta.json |
0.3 KB | Training hyperparameters |
Loading
import torch
from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
from peft import LoraConfig, inject_adapter_in_model
from huggingface_hub import hf_hub_download
# 1. Load base
BASE = "dnagpt/OmniGene-4-SFT-v5-merged"
ADAPTER = "dnagpt/OmniGene-4-MM-LoRA"
tok = AutoTokenizer.from_pretrained(ADAPTER)
proc = AutoProcessor.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(
BASE, torch_dtype=torch.bfloat16, device_map="auto",
)
# 2. Inject empty LoRA at the same target modules used during training
lora_cfg = LoraConfig(
r=64, lora_alpha=128, lora_dropout=0.05, bias="none",
target_modules=['q_proj','k_proj','v_proj','o_proj',
'gate_proj','up_proj','down_proj','router.proj'],
)
inject_adapter_in_model(lora_cfg, model.model.language_model, adapter_name="stage2")
# 3. Patch in trained weights
sd = model.state_dict()
for k, v in torch.load(hf_hub_download(ADAPTER, "lora_weights.pt"), map_location="cpu").items():
if k in sd: sd[k].copy_(v)
emb = torch.load(hf_hub_download(ADAPTER, "embedding_weights.pt"), map_location="cpu")
model.get_input_embeddings().weight.data.copy_(emb)
model.eval()
Multi-modal usage
from PIL import Image
img = Image.open("molecule.png").convert("RGB")
msgs = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Please list the functional groups of the molecule."},
]}]
text = proc.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
inp = proc(text=text, images=[img], return_tensors="pt").to(model.device)
out = model.generate(**inp, max_new_tokens=160, do_sample=False)
print(tok.decode(out[0][inp.input_ids.shape[1]:], skip_special_tokens=True))
Training pipeline
Three-stage LoRA fine-tuning starting from the v5 merged checkpoint:
- Stage 1 (~0.4 GPU-days): vision-only warmup, 10K steps, LR 5e-5
- Stage 2 (~1.0 GPU-days): mixed text + vision, 6K steps, LR 5e-6
- Stage 3 v3 (~0.5 GPU-days): heavy-homology with frozen embedding, 3K steps, LR 2e-5
See scripts in the GitHub repository for complete reproducibility.
Citation
@article{wang2026omnigene4,
title = {OmniGene-4: A Unified Bio-Language MoE Model with Router-Level
Interpretability and Modality-Invariant Transfer},
author = {Wang, Liang},
year = {2026},
note = {Manuscript at Patterns (Cell Press). Preprint:
bioRxiv 10.1101/2026.01.03.697478. Code:
https://github.com/maris205/omnigene4}
}
License
Code: MIT (see GitHub). Model weights: Apache 2.0 (inherited from Gemma-4 base).
- Downloads last month
- -
Model tree for dnagpt/OmniGene-4-MM-LoRA
Base model
dnagpt/OmniGene-4-SFT-v5-merged