Hey! After starting the 20B MLA model on Testnet, we are looking for a second model to create on testnet, hopefully in the 70b range, probably not starting from scratch to get something highly usable, efficiently.
My proposal is to fine-tune Qwen’s 72B with Hermes 3. This model has a strong math inclination and could make a good basis for an RL’d model later on. The Hermes 3 dataset is about 390 M tokens and should be relatively inexpensive to train over (probably in the $3k-$5k range with traditional compute set ups). The result would be an instruct and tool use model with strong reasoning and creative
abilities, highly steerable, and inexpensive.
We could also translate the model to MLA before fine-tuning which has been shown to be strictly more expressive: [2502.07864] TransMLA: Multi-Head Latent Attention Is All You Need That will add a layer of complication, but the translation of qwen 72b to MLA could be left as an exercise for the community.
Looking for feedback and suggestions on this idea, but I think it will be a relatively straightforward run.