← Zurück zum Blog

DAgger — agent fine-tuning through in-game correction

UGA Update

Juni 22, 2026 • hackwhiz ✓

DAgger (Dataset Aggregation) is a fine-tuning method integrated into the Universal Game AI platform. It addresses a common limitation of demonstration-based training: the model learns to reproduce correct behavior but receives no examples of recovery after a mistake.

In standard behavioral cloning, the user records demonstrations of successful play. After training, the model reproduces that experience until it enters a state not represented in the training set. In that situation the agent lacks sufficient information to continue correctly, which leads to error accumulation. This limitation is general to systems trained only on successful demonstrations, not specific to a particular model.

DAgger addresses this by collecting corrections at the moments when the model deviates from expected behavior.

How it works

The DAgger process is built on interaction between the user and an already trained agent during autonomous play:

The user launches the trained model, which acts on its own.
While agent behavior meets expectations, the user observes without intervention.
On error or risk of incorrect action, the user takes control, performs the required actions, and corrects agent behavior. The system automatically transfers control to the user.
After a period without user input, control returns to the model and the agent resumes autonomous play.

The cycle repeats throughout the session. The user does not re-record a full level run; corrections are applied only at problematic segments.

Training data formation

During a DAgger session, the system does not save the entire playthrough. Segments where the model behaves correctly are not added to the training set, since that information is already present in the initial demonstrations.

Only frames where the user intervened are recorded, together with a short preceding segment that captures the game state immediately before the correction. Each such fragment contains a “problematic state — correct action to exit it” pair, matching the examples missing from the initial dataset.

Input from the model and the user is separated explicitly. During autonomous play the model generates its own control commands. After control is taken over, the system records only user actions, without mixing in prior agent actions. This keeps collected corrections clean and supports correct subsequent training.

Advantages over additional demonstration recording

An alternative approach is to manually re-record demonstrations in areas where the model performs poorly. That method has two systemic drawbacks. First, the user identifies problem areas in advance without guarantee they match the agent’s actual failure points. Second, in manual recording the user typically does not enter the states the agent reaches on its own during autonomous play.

DAgger inverts this logic: problematic states are discovered by the model during autonomous play, and the user provides an example of correct exit from each such state. Training examples are formed at actual failure points, not in assumed scenarios.

Model fine-tuning

After a session, collected corrections can be used to continue training the current model without restarting from scratch. New data is added to the initial demonstrations with increased weight, reflecting its higher value for closing specific gaps in agent behavior.

The process is iterative: each “autonomous play — correction — fine-tuning” cycle improves passage through problematic segments. Reduced frequency of required user intervention is the primary indicator of method effectiveness.

Key capabilities

autonomous agent play with targeted user intervention on errors;
recording of correction segments only, without saving correct autonomous play;
separation of model and user actions in training data;
example collection at actual agent failure points;
continued training from the current checkpoint with increased weight on corrections.

Demonstration-based training shapes the model’s notion of correct behavior. DAgger complements this with examples of recovery after error — a scenario that is difficult to cover through manual recording of successful runs alone.