Violet Twilight v0.2

White-haired anime girl in monk garb with staff and green scarf, violet moon backdrop

Violet Twilight v0.2 is a model that I created using a mix of custom datasets as well as datasets that were held in high regard with the Local LLM community at the time. It’s lineage comes from two LoRA fine-tunings of mistralai/Mistral-Nemo-Base-2407, and a SLERP merge using Mergekit. All fine-tuning was done with the tool Axolotl

The first, Crimson Dawn v0.2, was the result of training Roleplay data first without an instruct template. The goal of this approach was to shift the model away from the overbearing censorship that was more common in models of the time, as well as expose the base model to more complex and targeted role play scenarios. Finally a second LoRA fine-tuning was done over the resulting model, this time using a ChatML instruct template. The data for this training was more task and assistant oriented. However, custom data was created to create an “overlap” where the LLM was asked to Roleplay while completing the task. The thought behind this was to better blend the Roleplay data with the assistant data to create a model that would theoretically be capable of both, as well as increase overall coherence.

The second, Azure Dusk v0.2 was trained much the same way as it’s sibling Crimson Dawn; the key difference being datasets used in reverse order, as well as the ChatML instruct template being included in both training runs. Both trainings were still LoRA fine-tunes, just as before.

The birth of this model originally came from an experiment to see which training approach would yield the stronger model. When both parent models produced solid results, the idea was presented to merge the two models. This resulted in a model that seemed exceed both in terms of utility and enjoyment.

The final model achieved solid performance on standard benchmarks, particularly with instruction-following capabilities (IFEval: 45.32) and supports 9 languages. What I found most encouraging was the community response - several developers created independent quantizations and merges of the model, and users have noted its “excellent temporal coherency” and minimal repetition compared to other models they tested. One user mentioned testing it against over 10 other models and keeping only this one, which suggests the experimental approach paid off.