Abliteration Text Issues

#2
by LittleNicky55 - opened

More as an FYI than anything - model works well but sometimes gets confused in the differences between 0 and O - probably a artifact of the ARA method. I've tried to rebuild some of the tensors manually but have been unable to isolate the issue myself.

Ill look into it, thanks for letting me know.

The 0/O thing is actually part of a broader pattern where the model confuses visually similar characters: 0 & O, 1 & I, 5 & S. It shows up constantly in anything with numbers, ports, IPs, identifiers.

Some examples:

  • '8080' comes out as '8O8O' in the thinking blocks
  • 'ed25519' turned into 'ed2S5I9'
  • Port '10022' got truncated to '1022' (lost a digit entirely)
  • Shows up in both thinking and output, though the model sometimes self-corrects in output

I did a bunch of A/B testing to isolate the cause. The base MiniMax M2.7 has zero confusion at temp=1.0; your BF16 upload and my FP8 requant both show the same confusion, so it's definitely not a quantization artifact, it's coming from the ARA process itself.

MiniMax uses a byte-level tokenizer where every digit is its own token (0 is token 48, O is token 79). The abliteration seems to have pushed their internal representations close enough together that sampling picks the wrong one some percentage of the time. Lowering temperature to 0.6 helps but doesn't fix it.

I tried restoring all 256 expert w2 weights from the base model, but didn't help. The o_proj modifications contribute too. So it's not isolated to one set of tensors; the damage is spread across the tensors that ARA touched in layers 30-51.

hi Youssofal,
thanks for your work and dedication. Just curious about plans to redo the abliteration? Many people are very eager to try new version of it.

Thanks

Sign up or log in to comment