Psyche's Next Training Run

dillfred · March 17, 2025, 3:05pm

Hey! After starting the 20B MLA model on Testnet, we are looking for a second model to create on testnet, hopefully in the 70b range, probably not starting from scratch to get something highly usable, efficiently.

My proposal is to fine-tune Qwen’s 72B with Hermes 3. This model has a strong math inclination and could make a good basis for an RL’d model later on. The Hermes 3 dataset is about 390 M tokens and should be relatively inexpensive to train over (probably in the $3k-$5k range with traditional compute set ups). The result would be an instruct and tool use model with strong reasoning and creative
abilities, highly steerable, and inexpensive.

We could also translate the model to MLA before fine-tuning which has been shown to be strictly more expressive: [2502.07864] TransMLA: Multi-Head Latent Attention Is All You Need That will add a layer of complication, but the translation of qwen 72b to MLA could be left as an exercise for the community.

Looking for feedback and suggestions on this idea, but I think it will be a relatively straightforward run.

Topic		Replies	Views
Decentralize This: Psyche’s Play General training , psyche , llm-research	0	30	May 26, 2025
Announcing the Nous Portal Updates and Announcements inference	1	87	March 12, 2025
Welcome to the Psyche Network Forums! Get Started	1	94	March 10, 2025
Prompt phrasing on model performance, output quality Research fine-tuning , llm-research	0	33	May 1, 2025
Psyche in a Nutshell: Decentralized AI for Newbies General psyche	0	54	May 23, 2025
About the Psyche Development category Psyche Development	0	23	March 31, 2025
Mining Pool Contributions Psyche Development training	6	60	May 26, 2025
Do Language Models know what they know? Research llm-research	0	57	March 28, 2025

Psyche's Next Training Run

Related topics