OpenMed-mLiteClinical-IrishPPSN-135M-v1

OpenMed-mLiteClinical-IrishPPSN-135M-v1 is a compact DistilBERT token-classification model derived from OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1.

It keeps the base model's existing PII label set and adds one new entity:

  • B-PPSN
  • I-PPSN

The model is intended for Irish Personal Public Service Number (PPSN) detection and masking in English and Irish Gaelic text, especially in healthcare, public-service, and citizen-support flows.

Deployment Note

  • CPU-first deployment, low batch / short text: temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16 is a good option
  • Why: on the current word_aligned single-example path it preserved PPSN quality and improved end-to-end CPU throughput on the multilingual suite from about 31.27 to 45.80 examples/s while halving model size
  • Important caveat: in a batched CPU forward matrix, fp16 became slower than fp32 once sequence length and batch size grew; see the fp16 repo for the runtime matrix
  • If your CPU deployment batches aggressively, keep this fp32 repo or use the q8 repo if you need maximum throughput and can accept lower PPSN quality

For CPU guidance in these model cards:

  • short text: about <= 32 tokenizer tokens
  • 33-63 tokens: gray zone, benchmark your workload
  • >= 64 tokens: not short for this recommendation
  • low batch: batch_size = 1
  • batch_size = 2-4: gray zone, benchmark your workload
  • batch_size >= 8: not low batch for this recommendation

Practical CPU rule:

  • Prefer fp16 for batch_size = 1 and about <= 32 tokens
  • Prefer this fp32 repo once batch size or sequence length grows materially

What This Model Is

  • A full transformers checkpoint, not a LoRA adapter
  • A derivative of the OpenMed mLiteClinical 135M PII model
  • Tuned specifically to add PPSN coverage while preserving the base model's other PII behavior
  • Best used with the bundled word_aligned PPSN decoder

What This Model Is Not

  • Not a general Irish-PII upgrade across all Irish-specific identifiers
  • Not optimized for plain pipeline(..., aggregation_strategy="simple") PPSN extraction
  • Not a replacement for downstream policy logic or deterministic validation when your application requires checksum validation

Recommended Inference

Use the bundled inference_word_aligned.py script:

python3 inference_word_aligned.py \
  --ppsn-min-score 0.4 \
  --text "My PPSN is 1234567TW and I need help with my housing grant." \
  --json

Why this path:

  • PPSNs are short and often tokenized into fragments
  • the custom word_aligned decoder reconstructs cleaner full-span PPSN outputs
  • the default token-classification aggregation path can produce partial spans

Intended Use

  • De-identification / masking of PPSNs in English or Irish Gaelic text
  • Triage and QA for citizen-to-government or healthcare text flows
  • A lightweight base for further Irish identifier fine-tuning if you need more than PPSN

Limitations

  • This release is strongest on PPSN, not on other Irish-specific identifiers such as Eircode, Irish phone numbers, or Irish IBAN/BIC handling
  • Lowercase / weak-context PPSN cases improved versus the base model, but this is still not a checksum-aware system
  • You should benchmark on your own traffic before production rollout

Key Results

  • User raw PPSN regression F1: 0.8000
  • QA PPSN regression v6 validated F1: 0.6667
  • QA PPSN regression v8 F1: 0.7385
  • Irish PPSN regression F1: 0.8000
  • Irish large PPSN F1: 0.8979
  • Multilingual gov + citizen + HSE PPSN suite F1: 0.9704
  • Non-PPSN agreement vs base mLiteClinical: 1.0000

Files

  • Core model:
    • model.safetensors
    • config.json
    • tokenizer.json
    • tokenizer_config.json
    • special_tokens_map.json
    • label_meta.json
    • vocab.txt
  • Inference / QA helpers:
    • inference_word_aligned.py
    • qa_config.json
    • pyproject.toml
  • Evaluation reports:
    • eval/

Base Model

  • Upstream: OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
  • Upstream license: Apache-2.0

License

This derivative release is distributed under Apache-2.0, consistent with the upstream base model license. See NOTICE for attribution.

Portfolio Comparison

Updated: 2026-03-16.

Use this section for the fastest public comparison across the temsa PII masking portfolio.

  • The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput.
  • The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput.
  • Missing cells in the archive tables mean the older release did not ship that metric in its public bundle.
  • DiffMask rows use the reconciled clean_single_pass harness that matches the deployed runtime.
  • GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact.
  • The same content is shipped as PORTFOLIO_COMPARISON.md inside each public model repo.

Irish Core PII: Comparable Public Checkpoints

Repo Stack Full Core F1 Q8 Core F1 Q8 Multilingual PPSN F1 Q8 Core ex/s
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc4 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 299.0
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 317.9
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 292.5
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 337.3
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc27 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 270.0
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 212.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 278.9
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 237.6
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 106.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 150.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 181.9
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 73.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 126.2
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 119.2
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 126.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 73.6
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 94.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 119.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 128.9
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 89.0
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 89.0
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 84.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9333 61.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9333 61.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9222 61.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9222 61.5
temsa/IrishCore-GlobalPointer-135M-v1-rc4 GlobalPointer raw-only span-matrix 1.0000 1.0000 0.9333 221.6
temsa/IrishCore-GlobalPointer-135M-v1-rc3 GlobalPointer raw-only span-matrix 1.0000 1.0000 0.9213 204.9
temsa/IrishCore-GlobalPointer-135M-v1-rc2 GlobalPointer raw-only span-matrix 0.9934 0.9934 0.9326 231.2
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 Raw-only token-span 0.9737 0.9737 0.9176 46.1
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7 Hybrid classifier + generated scanner spec 1.0000 0.9934 1.0000 30.0
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6 Hybrid classifier + repair decoders 1.0000 0.9934 1.0000 29.5
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 Hybrid classifier + repair decoders 0.9737 0.9669 0.9333 34.4
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4 Hybrid classifier + repair decoders 0.9870 0.9740 0.9600 114.2
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3 Hybrid classifier + repair decoders 0.9806 0.9677 0.9333 44.9
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2 Hybrid classifier + repair decoders 0.9554 0.9615 0.7887 119.1
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 Hybrid classifier baseline 0.9530 0.9333 0.9882 103.3
temsa/IrishCore-DiffMask-135M-v1-rc6 DiffMask token-span, scanner-free 0.9801 0.9733 0.9274 130.3
temsa/IrishCore-DiffMask-135M-v1-rc5 DiffMask token-span, scanner-free 0.9733 0.9733 0.9379 249.2
temsa/IrishCore-DiffMask-135M-v1-rc4 DiffMask token-span, scanner-free 0.9733 0.9733 0.9371 29.5
temsa/IrishCore-DiffMask-135M-v1-rc3 DiffMask token-span, scanner-free 0.9664 0.9664 0.9591 30.0
temsa/IrishCore-DiffMask-135M-v1-rc2 DiffMask token-span, scanner-free 0.9664 0.9664 0.9212 247.1
temsa/IrishCore-DiffMask-135M-v1-rc1 DiffMask token-span, scanner-free 0.9801 0.9934 0.9412 251.2

Irish Core PII: Other Public Checkpoints

Repo Stack Full Core F1 Q8 Core F1 Q8 Multilingual PPSN F1 Notes
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 Hybrid classifier prototype 0.9487 Predates the public q8 artifact.

Finance-boundary q8 F1 is 1.0000 for OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8, and all public IrishCore-DiffMask releases from rc1 to rc6. OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 ships 0.8750 on that public q8 suite.

PPSN-Only: Comparable Public Artifacts

Repo Artifact Irish Large F1 Multilingual PPSN F1 User Raw F1 QA v8 F1 CPU ex/s
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1 fp32 canonical checkpoint 0.8979 0.9704 0.8000 0.7385 57.4
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16 fp16 CPU/GPU artifact 0.9704 0.8000 0.7385 45.8
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8 dynamic int8 CPU artifact 0.9040 132.1

PPSN-Only: Historical Public Checkpoints

Repo Main Published Metrics Notes
temsa/OpenMed-PPSN-mLiteClinical-v1 same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 Legacy alias; prefer temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1.
temsa/OpenMed-PPSN-v6-raw-rc2 irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row.
temsa/OpenMed-PPSN-v5_1 irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging.
temsa/OpenMed-PPSN-v5 irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging.
temsa/OpenMed-PPSN-v4 synthetic non-PPSN drift check only Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row.

If you need the strongest current raw-only Irish core model, start with IrishCore-GlobalPointer-135M-v1-rc4. If you need the fastest CPU-first raw-only line, compare it against IrishCore-DiffMask-135M-v1-rc6. If you need a PPSN-only artifact, compare the canonical fp32, fp16, and q8 variants of OpenMed-mLiteClinical-IrishPPSN-135M-v1 directly in the table above.

Downloads last month
662
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1

Datasets used to train temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1

Evaluation results

  • Irish large F1 on irish_ppsn_eval_large_v2
    self-reported
    0.898
  • Multilingual suite F1 on multilingual_ppsn_v1_all
    self-reported
    0.970