Back to Building

Can You Tune Your Way Out of Average?

Temperature and top-p adjust token sampling variance, not style. LLM parameters can't move you out of the AI mean. Here's what actually does.

AI WritingStyle ProfileWriting Science
Share:

A quantum physicist emailed us with a precise question. She had read our breakdown of how Writing Style Profiles work, and wanted to know: couldn't you achieve the same result by tuning temperature, top-p, and frequency penalty? She was asking whether the knobs already available in the API were doing the same thing we were doing, just less explicitly.

It is a good question. It deserves a real answer.


What the Parameters Actually Do

Temperature, top-p, and frequency penalty all operate at the same layer: token sampling. After the model has computed a probability distribution over its entire vocabulary, these parameters reshape that distribution before a token is drawn.

Temperature scales the logits. High temperature flattens the distribution. More tokens become plausible, outputs feel looser, more surprising. Low temperature sharpens it. The model doubles down on its most probable next token, outputs feel tighter, more predictable.

Top-p truncates the distribution. Only tokens in the top cumulative probability mass get sampled. It is a different way of narrowing the field.

Frequency penalty discounts tokens that have already appeared. It pushes the model away from repetition.

All three are variance controls. None of them move the center of the distribution. They adjust the spread around whatever the model already thinks is the most likely output. They do not change what "most likely" means.


The AI Baseline

We analyzed 320 samples across five major models (Claude, GPT, Gemini) scoring each across six style dimensions. The AI baseline that emerges:

DimensionAI Baseline (avg)
Sentence Complexity65
Vocabulary Range48
Expressiveness76
Formality58
Consistency53
Conciseness42

These numbers describe a specific kind of writing. Reasonably complex sentences. Narrower vocabulary than you might expect. High expressiveness. Medium formality. Mediocre conciseness.

Now compare two famous author profiles on the same axes:

DimensionKafkaHemingwayAI Baseline
Formality806658
Sentence Complexity774265
Expressiveness272576
Consistency605853
Conciseness325942

The AI baseline sits in a narrow middle band. Across five models, variation between models is roughly 12 points on any axis. Kafka and Hemingway differ from the AI baseline by 30 to 50 points on Expressiveness alone. And they differ from each other by 35 points on Conciseness.

Human writers span more than 60 points across these dimensions. AI models cluster within 12.


Why the Cluster Exists

This is not a technical failure. It is the intended outcome of how these models were trained.

RLHF (reinforcement learning from human feedback) systematically rewards certain writing qualities: clarity, helpfulness, completeness, engagement. Those rewards pull every model toward the same attractor basin. High expressiveness (76) is rewarded because engaged, expansive responses get positive feedback. Low conciseness (42) is rewarded because thorough responses tend to score better than terse ones.

These are not random fluctuations you can tune out of. They are the structural outcome of the training signal. The AI baseline is not a starting point. It is a gravitational center.

When you lower temperature, the model's outputs become more consistent. More reliably centered on the most probable next token. That means more reliably within the attractor basin, not further from it. Lower temperature makes the AI more average, not less.

When you raise frequency penalty, the model avoids repeating tokens it has already used. That sounds like it might expand vocabulary. But it pushes away from any specialized vocabulary the model has learned to associate with a particular domain. The vocabulary a physicist uses. The vocabulary a legal analyst uses. The vocabulary Hemingway uses for not using adjectives. Frequency penalty does not know those vocabularies are meaningful. It treats all repetition equally.


The Physics of It

Our physicist's intuition was correct in one direction: the parameters are doing something real. But they are adjusting variance, not mean.

Imagine a model's output as a sample from a multidimensional distribution. One axis for Formality, one for Conciseness, one for Expressiveness, and so on. The center of that distribution is fixed by training. The parameters adjust how tightly you sample around that center.

A Writing Style Profile does something structurally different. It shifts the center. It redefines what "most likely" means for each axis, independently, based on actual examples of how a specific human writes. The distribution moves. The mean changes.

You cannot replicate that by adjusting variance around the original mean. You would need infinite temperature to reach Hemingway's Conciseness score from an AI baseline. And at infinite temperature, you have random output, not Hemingway.


What This Means in Practice

If you are using an AI writing tool and feeling like the output is somehow always fine — clear, complete, a bit bloated, not quite how you would say it — you are experiencing the attractor basin. The tool is working as designed. It is producing text that is probably better than average in several ways. It is just not your text.

The parameters available to you (temperature, top-p, presence penalty, frequency penalty) can make outputs more creative or less, more verbose or less, more repetitive or less. They cannot make the AI write the way you write, or the way Kafka wrote, or the way any specific person with a specific history and set of constraints wrote.

That requires a different kind of intervention. One that operates on the training-equivalent level, not the sampling level. One that shows the model what your mean looks like across each dimension, and holds it there.

Tuning the knobs is useful. It is just not the same thing as teaching the model to sound like you.


This post was inspired by a question from a reader asking whether API parameters could achieve what a Writing Style Profile achieves. The short answer was no. This is the longer one.

Share:

Comments

Loading comments...

Leave a comment

Your email will not be displayed publicly.