Treating SEO as a coverage problem in vector space

June 3, 2026

Most automated blogs fail by repeating themselves. The solution is to treat the topic space as something to cover, not a well to keep redrawing from.

I have a soft spot for automation that runs autonomously for months on end. The kind you set up once and then leave alone. So at some point I wanted to see how far that stretched if I pointed it at a blog: don't write the posts, build the thing that writes them, and let it run.

That is a content farm. I am aware. But the thing that makes content farms bad isn't that a machine wrote the words, because a machine writing a passable post is a solved problem now. What makes them bad is that, over time, they repeat themselves.

This is not a guide to building one. It is a write-up of how mine works, and the one idea in it I would actually steal.

Why blog generators usually suck

If you let a model generate "a blog post about X" over and over, it drifts toward the center of whatever it knows. Ask for ten posts on personal finance and you get budgeting, then budgeting again, then an emergency fund, then budgeting with a slightly different title. The posts pile up on top of each other, and in SEO terms they cannibalize: a handful of your own pages all compete for the same search, and none of them wins cleanly.

The obvious fix is to work harder at the prompt. A longer list of topics, a few example angles, a human deciding what to write next. That works right up until you remember the entire point was to not be the human deciding what to write next.

Topics are a space, and you want to cover it

Stop thinking of your topics as a list and start thinking of them as points in a space. Every possible post sits somewhere, and posts that mean similar things sit close together. That isn't a metaphor I'm reaching for. It is just what an embedding is: run a piece of text through an embedding model and you get back a vector, a point in a few hundred dimensions, placed so that similar text lands nearby.

Once your topics are points in a space, good coverage has a shape you can actually see. You want the points spread out, not clumped. A site that covers its niche well is one whose posts are scattered across the whole space; a content farm eating itself is one whose posts are crammed into a single corner. So the generator's job stops being "write a good post" and becomes something more mechanical, where it simply drops new points into the parts of the space that are still empty.

Each dot is one published idea; each ring is the keep-out radius that rejects near-duplicates. Left: ideas crammed into one region, rings overlapping, all fighting for the same queries. Right: ideas spread to tile the space.

Forcing the spread

The mechanism is almost stupidly simple, which is exactly why I trust it. Every time the system comes up with an idea, it embeds it and checks how close that point is to every idea it has already used. Too close to something that exists, and the idea is thrown out and it tries again. Only ideas that land far enough into open space are allowed to live.

That's the whole trick: a minimum-distance rule on the embeddings. Picture each existing idea sitting inside a small keep-out zone. A new idea that lands inside one is a near-duplicate, so it dies. An idea that lands in open territory gets kept and grows a keep-out zone of its own. Over time the space tiles itself, and the system gets pushed, in a fairly literal sense, out into the topics it hasn't touched yet. No curated list, no prompt stuffed with "be sure to also cover." Just let geometry do the work.

Where the ideas come from

So, where do the candidate ideas come from in the first place? Not from the model, at least not for the first move. I specifically did not want the model choosing topics, because choosing topics is the exact thing that drifts toward the center.

Instead there's a fixed set of dials: subject, angle, format, target keyword, intended reader. A plain random number generator picks a combination. Say it rolls personal finance, contrarian take, listicle, "roth ira," nervous first-timer. Only then does a model take that combination and turn it into a real, specific idea. The randomness proposes, the model makes it coherent, and the distance check decides whether it gets to exist. Three steps, and the model only touches the middle one.

Three writers, one editor

An idea that survives still needs someone to write it. I didn't want a single house voice, partly because one voice across an entire site reads as exactly what it is, and partly because real publications have more than one writer. So there are three authors, each with a name, a public bio, a specific voice, and a domain that they're good at. These attributes ensure that they genuinely write differently instead of being one model in three different hats.

Routing an idea to the right author is the one decision I hand to a model on purpose. An editor step reads the idea and the three bios and assigns the piece to whoever owns that territory. It's a simple classification, the kind models are great at, and it keeps each author writing the kind of thing its bio promised.

Art and publishing

Unfortunately, blogs feel a bit lifeless without a hero image. So, I wanted my generator to also generate images. Every post gets a cover image using Recraft's API. I kept these abstract on purpose and leaned on Recraft's art styles instead of asking for a literal illustration of the topic. Abstract art reads as a deliberate choice; a literal AI illustration of "five budgeting tips" reads as exactly what it is. (This is a taste call, not a technical one, but it does more for whether a page feels cheap than any of the technical parts.)

After that it's plumbing. The finished post, its metadata, and the image get pushed to the CMS over its API, and the page is live. The whole thing, from a random roll of the dials to a published page, runs in n8n as a single flow, so "make a post" is one trigger.

The whole flow, from a random combination of dials to a published page. The one edge that loops back is the distance check, sending too-similar ideas to be re-rolled before anything gets written.

Where it falls short

The most obvious shortcoming is that a minimum-distance rule keeps ideas apart, but far apart in embedding space is not the same as worth writing. Nothing here checks whether a topic has any search demand, so the dials will happily commission thorough coverage of a corner nobody is looking in. The threshold for "too close" is a number I tuned by feel, and where you put it trades duplication against drift into nonsense.

Summary

Most of the argument about automated content is stuck on whether a machine should be allowed to write the words. That one is settled and a little boring. The more interesting question is whether a machine can cover a subject the way a real publication does, broadly and without repeating itself, and that turns out to be a geometry problem more than a writing one. Treat your topics as a space, refuse to put two points in the same spot, and the thing spreads out on its own.