Wyverno is proud to introduce our next-generation model — Wyverno SST v2, now available in Base and Max versions.

Watch the trailer

We’ve done extensive work to upgrade the architecture, optimize performance, and improve training quality to make audio processing even more accurate, faster, and more natural. Below is a closer look at the key improvements in this new generation.

 

Updated Model Architecture

In the previous generation, Wyverno SST v1 primarily focused on time-domain audio analysis. The model performed well with sound dynamics, but the spectral component was not processed as deeply as it could be.

With Wyverno SST v2, we completely redesigned the audio analysis pipeline and split the model into two specialized blocks:

Time-Domain Analysis

The first block, as before, is responsible for analyzing the temporal characteristics of the signal. This allows the model to accurately detect:

  • sound attack — how quickly the signal rises from silence to its peak;

  • decay and natural dynamics;

  • spatial characteristics of the sound;

  • rhythmic structure;

  • tempo and timing transitions.

This helps preserve the natural feel of the performance and retain even the finest sonic details.

Frequency-Domain Analysis

The second block focuses on deep spectral analysis. Thanks to this, the model is now significantly better at:

  • distinguishing timbre and subtle vocal nuances;

  • detecting pitch more accurately;

  • identifying noise and unwanted artifacts;

  • processing harmonics and frequency balance with greater precision.

The combination of these two approaches has significantly improved processing quality in tasks such as:

  • equalization;

  • noise reduction;

  • compression;

  • spatial effects;

  • overall style transfer.

As a result, processed audio now sounds cleaner, more detailed, and more natural.

 

Significantly Improved Training

The architecture update is only part of the progress. Another major step forward was a substantial improvement in the training process.

For Wyverno SST v2, we prepared a dataset that is 5 times larger than the one used for training v1.

It includes:

  • recordings with different voice types;

  • material from various acoustic environments;

  • a wide range of speech and vocal styles;

  • different recording conditions.

This scale allows the model to generalize better, perform more reliably in real-world scenarios, and reproduce the character of the reference sound more accurately.

 

Faster Processing

In addition to quality improvements, one of the main achievements of Wyverno SST v2 is a major increase in processing speed.

We ran tests on a 14-inch Apple MacBook Pro with an M4 Max chip (32-core GPU):

  • Wyverno SST v1 Max — approximately 60 seconds processing time;

  • Wyverno SST v2 Max — approximately 4 seconds.

This means the new version is up to 15 times faster than the previous generation.

For users, this means:

  • significantly less waiting time;

  • a faster creative workflow;

  • the ability to test more variations quickly;

  • a smoother experience even in complex processing scenarios.

Additional Improvements

In Wyverno SST v2 Max, we also integrated a new GainNet module.

Its purpose is to automatically match the loudness of the processed audio to the loudness level of the reference signal. This helps:

  • avoid unwanted volume jumps;

  • maintain comfortable listening levels;

  • make the final sound more balanced and professional.


 

Wyverno SST v2 is not just an update — it’s a major step forward in audio processing quality, speed, and accuracy.

Our goal was to build a model that not only understands sound better, but also helps users work faster, more comfortably, and achieve results that closely match their expectations.

Wyverno SST v2 Base and Max are now available — try the next generation of audio processing today.