https://gitlab.com/Azuro721/trueperfect-ai (Link to Main Repo)
TruePerfect Sampler Settings
Turn any LLM into a perfected version of it only by using one-for-all universal sampler parameters.
This is my work of 6+ months; with trials and errors, I managed to achieve perfect settings for any LLM.
Requirements:
Any Q5_K_M/Q6(K_M) model, lower will not work correctly and there's no way to fix these issues, only temporarily by adjusting specific parameters.
CPU backend is strongly recommended for absolute results!
Smart Context should be disabled.
ContextShift should be disabled.
Flash Attention should be disalbed.
General and simplified info about specific sampler parameters:
Temperature:
Helps increasing creativity and usage of "smarter" words; has a strong dependence on Top-K (i.e. will produce very random and nonsensical results with wrong Top-K).Top-K:
Helps expanding choices, diversity (variety), level of detail, progression (per turn/output) and amount of simultaneous actions per turn; has a strong dependence on Temperature.Will produce very "rapidly-switching", incoherent and nonsensical output with wrong Temperature.
Repetition Penalty:
Helps with repetitions on "lower-end" models and controls creativity (less with lower value); very strict and needs extremely precise adjusting to achieve stable results.Lower values (~1.07 and less) will produce more and more detailed output, with attention to things (smaller details), simultaneous events and correction anatomical details (like paws=paw-pads, claws and etc; not hands); tends to choose correct other details like species, body type and etc.
Higher values (~1.105 and more) will produce shorter outputs with more creative way of phrasing and more "surprising" events, choices and shifts; combine with higher Top-K to achieve maximum level of detail, length of output (per turn), attention, and slow down the overall progression.
Top-P:
Helps to achieve very coherent and stable outputs; controls the way of phrasing, logic of actions and choices, attention to smaller details, consistency, amount of simultaneous actions per turn (higher values will try to switch things, in most cases without letting to finish); needs specific, accurate values to achieve perfect coherence, smooth transitions, accurate descriptions, smooth flow, expected (less random) choices.Top-A:
Helps to correct deviations related to Top-K.Right value will produce very stable, coherent and "rich" outputs, with correct choices, consistency, actions and overall sense of logic; needs very precise adjustement.
Wrong value will produce random, "rapidly-switching" and often nonsensical outputs, with complete mixup of details and sense of logic.
TFS:
Helps to achieve perfect sense of logic with correct details and overall structures of response; will affect the way of phrasing, logic, consistency, and overall sense.Needs extremely precise adjustement to achieve perfect results.
Repetition Range:
Affects the overall repetition check; higher values (~70+) might help with amount of generated text per turn, attention to further details and slowdown of progression, but will increase the chances to be more repetitive.Seed:
Helps to achieve much better consistency, with more attention, subtle and more expected changes.Repetition Slope:
A "softer" way of Repetition Penalty; very unstable and inconsistent.Values less than 1.0 only makes things incoherent and inconsistent.
Values higher than ~1.11 won't make any changes with strict settings, only slow down the generation.
Min-P
Helps increasing the coherence and logic, but only for specific cases.I personally don't recommend using it with strict settings, as it will cutoff diversity and choices, which will make text more deterministic and less "exciting".
Typical:
More "intelligent" cutoff method; might output more "surprising" and varied outputs and "tails".Disable and don't use with strict settings, as it will cutoff more (unlikely) speciic details with lower values, which might be important to overall choices and logic of actions.
Presence Penalty:
Avoid in any case.Affects the choices off context and instructions in negative way; whether used with strict settings or not, it will cause issues even with very low value (0.01 or lower)
Smoothing Factor:
Avoid in any case.Increases stability of outputs with very randomized settings, but will cause incorrect choices later, and will cause issues with strict settings, like nonsensical choices, actions, events and etc (even with low values like 0.002).
Mirostat:
Helps to steady the temperature (might replace) and dynamically adjusts the effective temperature based on more "surprising" tokens.Tau helps to adjust the diversity; higher - more diverse and creative; lower - more deterministic.
Eta helps to increase the frequency of temperature updates; smaller - stable and slow; higher - faster but less coherent.
Never managed to get stable enough results for very long period of progression, so I personally avoid it and use strict settings instead.
Smoothing Curve (new):
Dynamically adjusts the Penalty, temperature, probability to avoid sudden changes; 1 and lower.Has stronger effect with higher Repetition Penalty.
Avoid with strict settings.
Values below 0.96 are not recommended.
Adaptive-P (new):
Can't work with high Temperature; Avoid at any case.DRY:
Avoid in any case.Helps to steady any repetitions, but never managed to get stable enough outputs.
XTC:
Threshold - helps to cutoff the high-probability tokens (most likely), which mostly helps with lower Temperature.Probability - the chance for Threshold to cutoff the desired most likely tokens.
Ready-to-use AIO general settings:
V1 **-CREATIVE-BALANCED-**
Very fine with very good creativity, level of detail (might skip some due to higher Top-A compared to V2), emotional connections, "surprising" outcomes and descriptions.
Optional Adjustments
If Repetition Penalty 1.12082 outputs overly descriptive results, try improving the descriptions for character cards or altering the instructions, it will fix most of such issues; otherwise (if nothing helps), use 1.121.In some cases, tends to confuse things like character names, species (mostly from trained data), pronouns, misuse (confuse) of User's actions, improper pronouns and etc (mostly due to low-probability token picks). In such cases, lower Repetition Penalty to 1.02612; this will noticeably reduce the frequency of such occurrences and alter the outputs in a more focused and less creative way, with attention to finer details and more subtle transitions.
In some specific models (most likely), TFS 0.9551 might improve things even further, with more attention, creativity, and performance overall; do not use if you notice overly long descriptions (extremely long after each section).
V2 **-INSANE-DETAIL/ATTENTION-**
OVERKILL.
RECOMENDATIONS: use only if you want insane level of detail and attention; V1 is more preferred for general role play and good reactions, emotional connections, creativity and shorter outputs with well-preserved details. Use V2 only with very complex cases, or if you want extreme level of detail with very high attention and slow transisitons during certain events, as well as attention to very fine descriptions.
Maintains insane amount of details, attention, accuracy, and length: focused outputs, "surprising" outcomes and descriptions (noticeably (in some models) less compared to Top-A 0.07 but still generally good).
V2 will be more bland and less descriptive compared to V1 in favor of maximum stability and fixes related to general incoherence with character names (mostly from trained data) and improper pronouns (complete removal of very-low-probability token picks that produce rare misspellings and related issues).
ASSISTANT MODE
TFS: 0.9551 Repetition Penalty: 1.02612Insane for maximum accuracy for ASSISTANT-related tasks (personal assistant); Will be less creative in favor of attention.
If any issues occur (too detailed with incoherence)
Decrease TFS to 0.8413 at cost of lower accuracy and level of detail, but noticeably better stability.Optional Adjustments (might degrade stability and accuracy)
Might provide very good results in specific models, but generally unstable.Repetition Penalty 1.12082 / 1.121
Will improve creativity with slight reduction of accuracy.Temperature 4.8 with Top-K 134
Will make outputs more lively, creative and "surprising", but might also increase chances of instabilities, such as overly-high description with sudden incoherence and overall degradation over higher amount of tokens.Temperature 4.8 with Top-K 278 (284 / 296)
Will make outputs more lively, creative, "surprising", descriptive and attentive, but tends to have higher chances of instability compared to Top-K134, with even faster degradation.TFS 0.9551 (Special)
Will improve attention to details and overall performance in many aspects.Special because tends to have more chances to work correctly across different models.
Lower Repetition Penalty to get even more insane attention and level of detail, but sacrifice a bit of creativity and "surprising" outcomes (especially with 1.02612).
Repetition Penalty 1.02612 is preferred as the lowest point; will output very attentive and detailed descriptions, with other things described earlier.
Use Top-K 134 for faster transitions and attention to more "surprising" moments (works (mostly) only with Top-A 0.07).
To preserve the versatility, I would like to describe complete and specific sampler values below Additional fine-tuning:, to aim perfection for any case.
Here I will describe additional values for special cases, as well as dependencies across different sampler parameters (Like Temperature+Top-K)
Additional fine-tuning:
Temperature+Top-K (only higher values):
Temperature is related to Top-K, and in order to achieve perfect part for this specific parameters, both of these needs to be adjusted.For example Temperature 2.4 needs to have at least Top-K 134, increased by 72.
Acceptable values of Top-K for Temperature 2.4/4.8: 134, 206.
Different values will cause inconsistency, instability and other issues.
Acceptable values of Temperature for Top-K: 2.4, 4.8.
(1.2) is not prioritized mostly because in most cases it will output unsatisfactory results (e.g. bland and boring).
Lower temperature will be output less "exciting" and creative results (like less emotions, variety and predictability by one output), and might trigger repetitions, which can be mostly fixed by raising Top-K.
Higher Top-K will expand the attention to smaller details, and preserve attention to multiple simultaneous events, and also can fix smaller text-related issues (like with quotation marks, asterisks, hyphens and etc.)
Top-K 278, as described earlier, might cause overly descriptive results, which will most likely lead to incoherent results.
Top-K 206 is the more attentive one, which fits more with assistant tasks, as it will take away some of creativity, but tends to be more repetitive and might lead to incoherence.
Top-K 134 is the middle-balanced one, with better creativity, good level of detail and fine transitions. Recommeded one for in-character actions and strong roleplay scenarios NOT suitable for Top-A 0.0001.
Further experimentation with Top-K might not be possible, mostly due to logical limit for all settings combined.
Repetition Penalty:
Base value: 1.12082, which will output more creative, emotional, varied, smart and "exciting" results. But tends to have issues with asterisks and quotation marks; similar to 1.02612, but with more creativity, less descriptions, faster pace, but prone to issues if input has logical inconsistencies or lots of typos.1.105 (not fine-tuned): specific value I found out during experimentation. Will output less "exciting" results, but fairly better compared to 1.05.
1.05 (not fine-tuned, not recommended): base value, which is widely used in various LLMs. Might output focused results with average creativity (better than 1.02612), but prone to be less stable compared to 1.105.
1.02612: very specific one; will output very descriptive, attentive and expanded results. Will try to pay attention to noticeably more things compared to other variants. Will preserve character details and much more things as events go by. Great as an assistant.. Great for very complex instructions, very complex character cards and complex scenes. Great attention to multiple characters.
1.15 (not fine-tuned): tends to be more creative with shorter descriptions; might be better with Top-A 0.0001 and might be incoherent (might perform well on specific models).
1.23 (not fine-tuned): tends to be even more creative with slight shorter descriptions; might be better with Top-A 0.0001 and tends to have less chances to be coherent, but might perform well in rare cases (with specific models).
Other values (might output unstable results):
Feel free to experiment with these variants, and show any good results (if stable enough to be used for at least ~6K tokens).
1.02665: similar to 1.02695, but slightly more altered creativity, with interesting developement of events, more realistic responses from other characters; no issues so far; closest to the 1.02612 one; haven't been tested thoroughly.
1.02695: creative and more stable, with interesting developement of events, good "surprising" moments, realistic responses from other characters; no issues with initial tests; haven't been tested thoroughly.
1.0276/1.0277/1.0278: 1.0276 might provide repetitive results; 1.0277 is similar to 1.0285/1.0286, but with better stability and steady creativity; 1.0278 is more descriptive, but might be unstable with character details and fixed character type (like Pokemon).
1.0283 (decent): will be more direct and provide more realistic, violent scenes (if necessary), especially in uncensored models; good creativity, pays well attention to basic details, good progression of events, realistic responses from characters based on events, but noticeably shorter output and might be very "chatty".
1.0285/1.0286: will provide very interesting and creative responses, but will mess up some character details (mostly the ones that already in LLM's database) Some of them will output inconsistency with specific character parts, like body type, skin type and etc; also missing out certain details and skipping some important parts, be sure to include that and select the best one.
Might output quite stable results if used with Top-A 0.0001.
Top-P:
Base value: 0.915, which will output very attentive, consistent and stable results. Recommeded for all cases.0.905: pays more attention to specific details, slightly less emotions, and very close to being repetitive.
0.95/0.97: very creative and unpredictable; might be used for better models, but generally less attentive (might perform well on higher-quality models (12B+)).
Top-A:
Base value: 0.07, which will output very consistent, generally stable results, with smooth transitions and relatively good attention to most details. Recommeded for all cases.0.0001: will output insane amount of details, attention, accuracy and other things described in V2.
Other values (experimental):
0.043725: more attention to anatomy, but unstable and tends to be unpredictable. Works better with Repetition Penalty 1.02612 or **V2** but degrades.0.2025: more creative, descriptive, "exciting" and emotional, but tends to skip some details. Less accurate and have lots of issues with high Temperature; not suitable to be used generally, only to get initial creative inputs.
Repetition Range:
Base value: 64, which is the least one that will output overall better, consistant and descriptive results. Recommeded for all cases.128: will output more descriptive results, but tends to be repetitive. Not recommended, but cab be used to adjust initial responses.
Seed:
Use fixed seed to improve consistency alot. Feel free to use these fixed values: 253991 **main one**372205 - second one
309090 - third one
680079
637001
608575
132458
Repetition Slope:
Base value: 1.12, which will help to max out most things like consistency, level of detail and etc. Recommeded for all cases. Higher values won't affect outputs at all, only might cause issues with slowdowns.All values below 1 are unstable and will cause very random issues.
TFS:
Base value: 0.8413, which will output very smart, attentive and smooth outputs. Recommeded for all cases.0.9551: will output extremely descriptive and attentive results; the best one as an assistant; might cause issues in some LLMs as described earlier.
(Additional) recommendations for assistant mode+other tips:
Disable "Inject ChatNames" to get the best results for assistant-related tasks.Usage of "Separate End Tags" might improve the cases with repetitions, make responses smarter and overall better for some models.