AlphaEvolve-inspired feature engineering

Published 2025-05-20

AlphaEvolve, described as “a coding agent for scientific and algorithmic discovery” by DeepMind, is an evolutionary search process that uses LLMs to generate mutations. It broadly follows these three steps:

  1. Population generation
  2. Fitness evaluation
  3. Survivor selection

The population is generated by seeding for generation 0, such as through hand-writing baselines. For subsequent generations, an LLM is prompted to ‘mutate’ a provided target—either one of the baselines for generation 1 or an already-mutated target for subsequent generations—and return the mutation. A specific target—or ‘parent’—may be mutated multiple times to generate multiple children. The fitness of the child generation is then evaluated using some evaluation function, and the top k survivors become the parents of the next generation.

When I encountered this paper, I became interested in using this method for feature engineering, particularly because an LLM has a practically infinite search space, as opposed to AutoFE systems, which are bounded by predefined primitives. In addition, an LLM is theoretically able to reason about why some features might have worked better than others, although I have yet to implement this.

My goal with this process was to produce a single beneficial feature. The ‘target’ was therefore something like:

import numpy as np, pandas as pd
def build_feature(df):
return feat

Each LLM mutator received something like the above. The generation process looked something like this:

  1. Population generation
    1. Seed the population — I did this by returning an existing feature outright
    2. Subsequent generation — each parent mutated 4 children, which each received a prompt with context and instructions
  2. Fitness evaluation
    • I trained a small CatBoost model on the preprocessed training data plus a mutated feature; fitness was determined by the model’s outputted score
  3. Survivor selection
    • The top 7 mutators of the entire process — not strictly the top 7 children of the current generation — were selected to be parents of the following generation

Performing this process across 20 generations consistently yielded features that improved on what I came up with on my own. I found that one of the strengths of this approach is that the LLMs have some level of domain knowledge, so they’re able to generate intelligent and domain-appropriate features if given some context about the data.