MosaicML

发布时间:2023-05-08  栏目:LLM, 人工智能  评论:0 Comments

Introducing MPT: a new family of open-source commercially usable LLMs from MosaicML. Trained on 1T tokens of text+code, MPT models match and – in many ways – surpass LLaMa-7B. This release includes four models: MPT-Base, Instruct, Chat, and StoryWriter.

For full technical details on the models, datasets, and training regimes and links to all of the different artifacts we released today, check out our blog: https://lnkd.in/gCB22qR3

Why did we do this? These models are demonstrations of our tools for training, finetuning, and serving custom LLMs. Our friends at Replit used the exact same tools to train their SOTA code generation model last week. If you’re interested in building industrial strength custom models, please reach out: https://lnkd.in/e6XGjTPv

MPT-7B comes in four different flavors.
For full technical details on the models, datasets, and training regimes and links to all of the different artifacts we released today, check out our blog: https://lnkd.in/gCB22qR3

MPT-7B-Base is a decoder-style transformer with 6.7B parameters – designed up to be finetuned and customized for your use-case.

MPT-7B-Instruct is a commercially-usable instruction-following model finetuned on Dolly+HHRLHF.

MPT-7B-Chat is a chatbot finetuned on Alpaca & friends.

MPT-7B-StoryTeller-65k+ is finetuned on books w/context 65k; it writes awesome fiction.

To highlight StoryWriter: Its final training stage has a 65k token context, 32x LLaMa and 2x GPT-4. This crazy length works out of the box with our LLM Foundry on standard GPUs.

Technical details time! How did we do this? We started with our own custom variant of the transformer architecture, modified for speed and efficiency (no surprise from us). And then we trained on a ton of data on 440 A100s for 9.5 days.

This is the culmination of a two-year journey at MosaicML: we built great infrastructure (MosaicML platform), tools for training (Composer, StreamingDataset), and model code/checkpoints (LLM Foundry). What’s next? Stay tuned. These tools make it easy to churn out great models 😉

留下评论

You must be logged in to post a comment.

相册集

pix pix pix pix pix pix

关于自己

杨文龙,微软Principal Engineering Manager, 曾在各家公司担任影像技术资深总监、数据科学团队资深经理、ADAS算法总监、资深深度学习工程师等职位,热爱创新发明,专注于人工智能、深度学习、图像处理、机器学习、算法、自然语言处理及软件等领域,目前发明有国际专利19篇,中国专利28篇。

联系我

个人技术笔记

welonshen@gmail.com

2015 in Shanghai