One of the main challenges encountered during the development of the first version of Wyverno Aelyne was inference speed. Although the model was capable of generating high-quality DSP effect settings and accurately interpreting complex text prompts, processing time remained a significant limitation for practical use. On average, generating a result took around 90 seconds, while in some cases the process could take as long as 250 seconds.

For this reason, after analyzing the experience gained from the first release, we decided to undertake a large-scale optimization of the entire system. The result of this effort is Wyverno Aelyne v1.5 — a new version of the model that delivers approximately five times faster inference while maintaining the quality of the generated results.

Motivation for a New Architecture

The first version of Aelyne was built on top of the Qwen2.5 3B Instruct large language model, which was further fine-tuned on a proprietary dataset consisting of a large collection of DSP effect chains and their corresponding textual descriptions. This approach allowed the model to effectively interpret user prompts, understand the behavior of audio processing effects, and generate high-quality parameter configurations.

However, using a large language model also comes with significant drawbacks. A model with three billion parameters requires substantial computational resources both during training and inference. For the specific task of generating DSP effect parameters, this approach proved to be less efficient than desired in terms of the balance between speed and quality.

After conducting a series of experiments, we decided to completely abandon the use of a general-purpose LLM architecture and instead develop a specialized model tailored specifically for this task.

The New Architecture

Wyverno Aelyne v1.5 is built on top of a TRM (Tiny Recursive Model) architecture. Unlike its predecessor, the new model was designed specifically for analyzing audio signals and generating DSP effect parameters.

Despite its significantly smaller size, the model retains all the key capabilities of the previous generation. It can accurately interpret prompts, understand the behavior of digital audio effects, and take existing effect chains into account. As a result, the system is capable not only of generating entirely new processing chains from scratch, but also of modifying previously generated settings according to updated user requirements.

One of the most notable improvements is the model size. While the first version relied on approximately 3 billion parameters, the new architecture contains only 11 million parameters. This represents a reduction of more than 270× while maintaining a high level of performance and output quality.

Expanded Training Dataset

Special attention was also given to data preparation during development. Real-world usage of the first version revealed a number of scenarios that required greater diversity within the training dataset.

As a result, a new dataset was assembled that significantly exceeds the original one in scale. While the first version was trained on approximately 13,000 examples, Aelyne v1.5 was trained on more than 400,000 examples. This substantial increase allows the model to generalize more effectively, perform more consistently across different types of audio content, and generate more accurate DSP parameter configurations.

Results

The implemented improvements resulted in a significant increase in performance. On average, generating a new effect chain now takes approximately 16 seconds compared to 92 seconds in the previous version. For workflows involving modifications to existing settings, processing time has been reduced from roughly 250 seconds to around 48 seconds.

Beyond the speed improvements, the new version also demonstrates higher-quality parameter selection and more stable behavior on complex prompts. The combination of a specialized architecture, a substantially larger dataset, and an optimized training pipeline has produced a system that is faster, smaller, and more effective than its predecessor.

Future Development

The release of Wyverno Aelyne v1.5 marks an important milestone in the evolution of the project, but it is far from the final destination. Development continues, and future work will focus on further improving generation quality, reducing inference time, and expanding the capabilities of the system.

We are proud of the progress achieved so far and believe that this new architecture provides a solid foundation for future generations of Aelyne.