OpenMed-mLiteClinical-IrishPPSN-135M-v1
OpenMed-mLiteClinical-IrishPPSN-135M-v1 is a compact DistilBERT token-classification model derived from OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1.
It keeps the base model's existing PII label set and adds one new entity:
B-PPSNI-PPSN
The model is intended for Irish Personal Public Service Number (PPSN) detection and masking in English and Irish Gaelic text, especially in healthcare, public-service, and citizen-support flows.
Deployment Note
- CPU-first deployment, low batch / short text:
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16is a good option - Why: on the current
word_alignedsingle-example path it preserved PPSN quality and improved end-to-end CPU throughput on the multilingual suite from about31.27to45.80examples/s while halving model size - Important caveat: in a batched CPU forward matrix,
fp16became slower thanfp32once sequence length and batch size grew; see thefp16repo for the runtime matrix - If your CPU deployment batches aggressively, keep this fp32 repo or use the
q8repo if you need maximum throughput and can accept lower PPSN quality
For CPU guidance in these model cards:
short text: about<= 32tokenizer tokens33-63tokens: gray zone, benchmark your workload>= 64tokens: not short for this recommendationlow batch:batch_size = 1batch_size = 2-4: gray zone, benchmark your workloadbatch_size >= 8: not low batch for this recommendation
Practical CPU rule:
- Prefer
fp16forbatch_size = 1and about<= 32tokens - Prefer this
fp32repo once batch size or sequence length grows materially
What This Model Is
- A full
transformerscheckpoint, not a LoRA adapter - A derivative of the OpenMed mLiteClinical 135M PII model
- Tuned specifically to add PPSN coverage while preserving the base model's other PII behavior
- Best used with the bundled
word_alignedPPSN decoder
What This Model Is Not
- Not a general Irish-PII upgrade across all Irish-specific identifiers
- Not optimized for plain
pipeline(..., aggregation_strategy="simple")PPSN extraction - Not a replacement for downstream policy logic or deterministic validation when your application requires checksum validation
Recommended Inference
Use the bundled inference_word_aligned.py script:
python3 inference_word_aligned.py \
--ppsn-min-score 0.4 \
--text "My PPSN is 1234567TW and I need help with my housing grant." \
--json
Why this path:
- PPSNs are short and often tokenized into fragments
- the custom
word_aligneddecoder reconstructs cleaner full-span PPSN outputs - the default token-classification aggregation path can produce partial spans
Intended Use
- De-identification / masking of PPSNs in English or Irish Gaelic text
- Triage and QA for citizen-to-government or healthcare text flows
- A lightweight base for further Irish identifier fine-tuning if you need more than PPSN
Limitations
- This release is strongest on PPSN, not on other Irish-specific identifiers such as Eircode, Irish phone numbers, or Irish IBAN/BIC handling
- Lowercase / weak-context PPSN cases improved versus the base model, but this is still not a checksum-aware system
- You should benchmark on your own traffic before production rollout
Key Results
- User raw PPSN regression F1:
0.8000 - QA PPSN regression v6 validated F1:
0.6667 - QA PPSN regression v8 F1:
0.7385 - Irish PPSN regression F1:
0.8000 - Irish large PPSN F1:
0.8979 - Multilingual gov + citizen + HSE PPSN suite F1:
0.9704 - Non-PPSN agreement vs base mLiteClinical:
1.0000
Files
- Core model:
model.safetensorsconfig.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonlabel_meta.jsonvocab.txt
- Inference / QA helpers:
inference_word_aligned.pyqa_config.jsonpyproject.toml
- Evaluation reports:
eval/
Base Model
- Upstream:
OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 - Upstream license: Apache-2.0
License
This derivative release is distributed under Apache-2.0, consistent with the upstream base model license. See NOTICE for attribution.
Portfolio Comparison
Updated: 2026-03-16.
Use this section for the fastest public comparison across the temsa PII masking portfolio.
- The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput.
- The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput.
- Missing cells in the archive tables mean the older release did not ship that metric in its public bundle.
- DiffMask rows use the reconciled
clean_single_passharness that matches the deployed runtime. - GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact.
- The same content is shipped as
PORTFOLIO_COMPARISON.mdinside each public model repo.
Irish Core PII: Comparable Public Checkpoints
| Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Q8 Core ex/s |
|---|---|---|---|---|---|
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc4 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 299.0 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 317.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 292.5 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 337.3 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc27 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 270.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 212.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 278.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 237.6 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 106.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 150.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 181.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.2 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.2 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.6 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 94.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 128.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 84.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 |
temsa/IrishCore-GlobalPointer-135M-v1-rc4 |
GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9333 | 221.6 |
temsa/IrishCore-GlobalPointer-135M-v1-rc3 |
GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9213 | 204.9 |
temsa/IrishCore-GlobalPointer-135M-v1-rc2 |
GlobalPointer raw-only span-matrix | 0.9934 | 0.9934 | 0.9326 | 231.2 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 |
Raw-only token-span | 0.9737 | 0.9737 | 0.9176 | 46.1 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7 |
Hybrid classifier + generated scanner spec | 1.0000 | 0.9934 | 1.0000 | 30.0 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6 |
Hybrid classifier + repair decoders | 1.0000 | 0.9934 | 1.0000 | 29.5 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 |
Hybrid classifier + repair decoders | 0.9737 | 0.9669 | 0.9333 | 34.4 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4 |
Hybrid classifier + repair decoders | 0.9870 | 0.9740 | 0.9600 | 114.2 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3 |
Hybrid classifier + repair decoders | 0.9806 | 0.9677 | 0.9333 | 44.9 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2 |
Hybrid classifier + repair decoders | 0.9554 | 0.9615 | 0.7887 | 119.1 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 |
Hybrid classifier baseline | 0.9530 | 0.9333 | 0.9882 | 103.3 |
temsa/IrishCore-DiffMask-135M-v1-rc6 |
DiffMask token-span, scanner-free | 0.9801 | 0.9733 | 0.9274 | 130.3 |
temsa/IrishCore-DiffMask-135M-v1-rc5 |
DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9379 | 249.2 |
temsa/IrishCore-DiffMask-135M-v1-rc4 |
DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9371 | 29.5 |
temsa/IrishCore-DiffMask-135M-v1-rc3 |
DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9591 | 30.0 |
temsa/IrishCore-DiffMask-135M-v1-rc2 |
DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9212 | 247.1 |
temsa/IrishCore-DiffMask-135M-v1-rc1 |
DiffMask token-span, scanner-free | 0.9801 | 0.9934 | 0.9412 | 251.2 |
Irish Core PII: Other Public Checkpoints
| Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Notes |
|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 |
Hybrid classifier prototype | 0.9487 | — | — | Predates the public q8 artifact. |
Finance-boundary q8 F1 is 1.0000 for OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8, and all public IrishCore-DiffMask releases from rc1 to rc6. OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 ships 0.8750 on that public q8 suite.
PPSN-Only: Comparable Public Artifacts
| Repo | Artifact | Irish Large F1 | Multilingual PPSN F1 | User Raw F1 | QA v8 F1 | CPU ex/s |
|---|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1 |
fp32 canonical checkpoint | 0.8979 | 0.9704 | 0.8000 | 0.7385 | 57.4 |
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16 |
fp16 CPU/GPU artifact | — | 0.9704 | 0.8000 | 0.7385 | 45.8 |
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8 |
dynamic int8 CPU artifact | — | 0.9040 | — | — | 132.1 |
PPSN-Only: Historical Public Checkpoints
| Repo | Main Published Metrics | Notes |
|---|---|---|
temsa/OpenMed-PPSN-mLiteClinical-v1 |
same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 | Legacy alias; prefer temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1. |
temsa/OpenMed-PPSN-v6-raw-rc2 |
irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 | Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row. |
temsa/OpenMed-PPSN-v5_1 |
irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. |
temsa/OpenMed-PPSN-v5 |
irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. |
temsa/OpenMed-PPSN-v4 |
synthetic non-PPSN drift check only | Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row. |
If you need the strongest current raw-only Irish core model, start with IrishCore-GlobalPointer-135M-v1-rc4. If you need the fastest CPU-first raw-only line, compare it against IrishCore-DiffMask-135M-v1-rc6. If you need a PPSN-only artifact, compare the canonical fp32, fp16, and q8 variants of OpenMed-mLiteClinical-IrishPPSN-135M-v1 directly in the table above.
- Downloads last month
- 662
Model tree for temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1
Datasets used to train temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1
Evaluation results
- Irish large F1 on irish_ppsn_eval_large_v2self-reported0.898
- Multilingual suite F1 on multilingual_ppsn_v1_allself-reported0.970