|
|
---
|
|
|
license: cc-by-nc-4.0
|
|
|
---
|
|
|
|
|
|
# SeamlessM4T-v2 T2TT Lite Model
|
|
|
|
|
|
Extracted from `facebook/seamless-m4t-v2-large`, containing only T2TT (Text-to-Text Translation) components.
|
|
|
|
|
|
> Original Model: [facebook/seamless-m4t-v2-large](https://huggingface.co/facebook/seamless-m4t-v2-large)
|
|
|
>
|
|
|
> Official Documentation: [SeamlessM4T-v2 Documentation](https://huggingface.co/docs/transformers/en/model_doc/seamless_m4t_v2)
|
|
|
|
|
|
Note: This package only reorganizes publicly available weights from Meta's original model for T2TT usage. No new training or fine-tuning is introduced. All rights of the model and weights belong to their original owner.
|
|
|
|
|
|
## Supported Features
|
|
|
|
|
|
- **T2TT (Text-to-Text Translation)**: Multilingual text translation
|
|
|
- **96 Languages**: Supports text translation between 96 languages
|
|
|
|
|
|
## Included Components
|
|
|
|
|
|
### Model Weights
|
|
|
- `text_encoder`: Text encoder
|
|
|
- `text_decoder`: Text decoder
|
|
|
- `shared.weight`: Shared word embeddings
|
|
|
- `lang_embed`: Language embeddings
|
|
|
|
|
|
## Model Size
|
|
|
|
|
|
- Original Model: ~8.6 GB
|
|
|
- Lite Model: ~5.1 GB
|
|
|
- Removed Weights: 1219 (speech_encoder, t2u_model, vocoder)
|
|
|
- Space Saved: ~3.5 GB
|
|
|
|
|
|
## Usage Examples
|
|
|
|
|
|
### 1. Basic T2TT: Text-to-Text Translation
|
|
|
|
|
|
```python
|
|
|
from transformers import SeamlessM4Tv2Model, AutoProcessor
|
|
|
|
|
|
# Load model
|
|
|
model = SeamlessM4Tv2Model.from_pretrained("jaman21/seamless-m4t-v2-t2tt")
|
|
|
processor = AutoProcessor.from_pretrained("jaman21/seamless-m4t-v2-t2tt")
|
|
|
|
|
|
# Translate text
|
|
|
text_inputs = processor(text="Hello, how are you?", src_lang="eng", return_tensors="pt")
|
|
|
output_tokens = model.generate(**text_inputs, tgt_lang="fra", generate_speech=False)
|
|
|
translated_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
|
|
|
print(translated_text) # "Bonjour, comment allez-vous?"
|
|
|
```
|
|
|
|
|
|
### 2. Advanced Generation Strategies
|
|
|
|
|
|
```python
|
|
|
# Beam search for better quality (slower)
|
|
|
text_inputs = processor(text="The quick brown fox jumps", src_lang="eng", return_tensors="pt")
|
|
|
outputs = model.generate(
|
|
|
**text_inputs,
|
|
|
tgt_lang="jpn",
|
|
|
generate_speech=False,
|
|
|
num_beams=5, # Use beam search
|
|
|
max_new_tokens=256,
|
|
|
early_stopping=True
|
|
|
)
|
|
|
|
|
|
# Sampling for more diverse output
|
|
|
outputs = model.generate(
|
|
|
**text_inputs,
|
|
|
tgt_lang="kor",
|
|
|
generate_speech=False,
|
|
|
do_sample=True, # Enable sampling
|
|
|
top_k=50,
|
|
|
top_p=0.95,
|
|
|
temperature=0.8 # 0.0-1.0: lower is more deterministic, higher is more random (affects translation quality)
|
|
|
)
|
|
|
```
|
|
|
|
|
|
### 3. Batch Processing Multiple Texts
|
|
|
|
|
|
```python
|
|
|
# Process multiple texts at once
|
|
|
texts = [
|
|
|
"Hello, how are you?",
|
|
|
"What is your name?",
|
|
|
"Nice to meet you!"
|
|
|
]
|
|
|
|
|
|
text_inputs = processor(text=texts, src_lang="eng", return_tensors="pt", padding=True)
|
|
|
output_tokens = model.generate(**text_inputs, tgt_lang="ita", generate_speech=False)
|
|
|
|
|
|
# Decode all outputs
|
|
|
translations = processor.batch_decode(output_tokens, skip_special_tokens=True)
|
|
|
for orig, trans in zip(texts, translations):
|
|
|
print(f"{orig} -> {trans}")
|
|
|
```
|
|
|
|
|
|
### 4. Control Generation Length and Quality
|
|
|
|
|
|
```python
|
|
|
text_inputs = processor(text="Translate this sentence", src_lang="eng", return_tensors="pt")
|
|
|
|
|
|
# Higher quality but more computationally expensive
|
|
|
high_quality_output = model.generate(
|
|
|
**text_inputs,
|
|
|
tgt_lang="rus",
|
|
|
generate_speech=False,
|
|
|
num_beams=5, # Beam search
|
|
|
max_new_tokens=512, # Allow longer output
|
|
|
length_penalty=1.0, # No length penalty
|
|
|
early_stopping=True,
|
|
|
use_cache=True # Accelerate generation
|
|
|
)
|
|
|
|
|
|
# Faster generation speed, acceptable quality
|
|
|
fast_output = model.generate(
|
|
|
**text_inputs,
|
|
|
tgt_lang="rus",
|
|
|
generate_speech=False,
|
|
|
num_beams=1, # Greedy decoding for better translation quality (slower)
|
|
|
max_new_tokens=256,
|
|
|
use_cache=True
|
|
|
)
|
|
|
```
|
|
|
|
|
|
### 5. GPU/CPU Usage
|
|
|
|
|
|
```python
|
|
|
import torch
|
|
|
|
|
|
# Move model to GPU if available
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
model = model.to(device)
|
|
|
|
|
|
# Process inputs on the same device
|
|
|
text_inputs = processor(text="Hello", src_lang="eng", return_tensors="pt")
|
|
|
text_inputs = {k: v.to(device) for k, v in text_inputs.items()}
|
|
|
|
|
|
# Generate
|
|
|
with torch.inference_mode(): # More efficient than torch.no_grad()
|
|
|
outputs = model.generate(**text_inputs, tgt_lang="cmn", generate_speech=False)
|
|
|
```
|
|
|
|
|
|
## License
|
|
|
|
|
|
Same as the original model: **CC-BY-NC-4.0**
|
|
|
|
|
|
For commercial use, please refer to Meta's licensing terms.
|
|
|
|
|
|
## References
|
|
|
|
|
|
- [SeamlessM4T-v2 Paper](https://arxiv.org/abs/2312.05187)
|
|
|
- [Official Model Card](https://huggingface.co/facebook/seamless-m4t-v2-large)
|
|
|
- [Transformers Documentation](https://huggingface.co/docs/transformers/en/model_doc/seamless_m4t_v2)
|
|
|
- [GitHub Repository](https://github.com/facebookresearch/seamless_communication)
|
|
|
|