Mistral’s new Codestral Mamba to aid longer code generation

Wednesday July 17, 2024. 01:24 PM , from InfoWorld

French AI startup Mistral has launched a new large language model (LLM) that can help generate longer tranches of code comparatively faster than other open-source models, such as CodeGemma-1.1 7B and CodeLlama 7B.

“Unlike transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length. It allows users to engage with the model extensively with quick responses, irrespective of the input length,” the startup said in a statement.

“This efficiency is especially relevant for code productivity use cases — this is why we trained this model with advanced code and reasoning capabilities, enabling it to perform on par with state-of-the-art transformer-based models,” it explained.

The company tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens — twice the number seen in OpenAI’s GPT4o — and found its 7B version performing better than open source models in several benchmarking tests, such as HumanEval, MBPP, Spider, and CruxE.

The larger 22B parameter version of the new model also performed significantly better than CodeLlama-34B with the exception of the CruxE benchmark.

While the 7B version is available under the Apache 2.0 license, the larger 22B version is available under a commercial license for self-deployment or community license for testing purposes.

Codestral Mamba, according to the company, can be deployed using the mistral-inference SDK, which relies on the reference implementations from Mamba’s GitHub repository.

The model can also be deployed through TensorRT-LLM or the raw weights can be downloaded from HuggingFace, the company said, adding that for purposes of easy testing, the new model is also available on la Plateforme.

The French startup has come out with another model, dubbed Mathstral, which it claims is part of its broader effort to support academic projects.

Mathstral, according to the startup, stands on the shoulders of Mistral 7B and specializes in STEM subjects.

“Mathstral is another example of the excellent performance/speed tradeoffs achieved when building models for specific purposes – a development philosophy we actively promote in la Plateforme, particularly with its new fine-tuning capabilities,” Mistral wrote in a blog post.

The weights of this model are hosted on HuggingFace and users can try Mathstral with mistral-inference and adapt it with mistral-finetune, it added.