Software & Apps

The most powerful open source AI model will bring more natural AI agents

The Allen Institute for AI aims to give developers, researchers, and startups the AI ​​tools they need to power bots that perform useful tasks on your devices.

Ai Technology, Artificial Intelligence. Man Using Technology Smart Robot Ai, Artificial Intelligence

The most capable open-source AI model yet with visual capabilities could lead to more developers, researchers, and startups developing AI agents that can perform useful tasks on your devices.

Released today by the Allen Institute for AI (Ai2), the Open Multimodal Language Model, or Molmo, can interpret images and converse via a chat interface . This means it can make sense of a computer screen, potentially helping an AI agent perform tasks like browsing the internet, file directories and composing documents.

“With this release, many more people can deploy a multimodal model. It should be an enabler for next-generation applications,” said Ali Farhadi, CEO of Ai2, a Seattle-based research organization, and a computer scientist at the University of Washington.

So-called AI agents are being widely touted as the next big thing in AI, with OpenAI, Google and others racing to develop them. Agents have become a buzzword of late, but the grand vision is for artificial intelligence to go far beyond chat, to reliably perform complex and sophisticated actions on computers when given a command. This capability has yet to be realized at any scale .

The need for an open model

Some powerful AI models already have visual capabilities, such as OpenAI’s GPT-4, Anthropic’s Claude, and Google DeepMind’s Gemini . These models can be used to power some experimental AI agents, but they are hidden from view and can only be accessed through a paid application programming interface (API).

Meta has released a family of AI models called Llama under a license that limits their commercial use, but has not yet provided developers with a multimodal version. Meta is expected to announce several new products at its Connect event today, including possibly additional Llama AI models .

Ofir Press, a postdoc at Princeton University working on AI agents, suggests that, “Having an open-source multimodal model means that any startup or researcher with an idea can try it.”

Press says the fact that Molmo is open source means developers will be able to more easily fine-tune their agents for specific tasks , such as working with spreadsheets, by bringing in additional training data. Models like GPT-4 can only be fine-tuned in a limited way through their APIs, whereas a fully open model can be modified widely: “When you have an open source model like this, you have a lot more options.”

Ai2 is launching several sizes of Molmo today, including a 70 billion parameter model and a 1 billion parameter model, small enough to run on a mobile device. The number of parameters in a model refers to the number of units it contains for storing and manipulating data and roughly corresponds to its capabilities.

As tiny as capable

Ai2 argues that despite its relatively small size, Molmo is as capable as other, considerably larger commercial models because it has been carefully trained using high-quality data. Moreover, unlike Meta’s Llama, the new model is entirely open source, with no restrictions on its use. Furthermore, Ai2 publishes the training data used to create the model, giving researchers more details about how it works.

The release of powerful models is not without risks. These can be more easily adapted for nefarious purposes – for example, we could one day see the emergence of malicious AI agents designed to automate the hacking of computer systems.

A key challenge is getting models to perform more reliably. This may require further advances in AI’s reasoning capabilities – something OpenAI has sought to address with its latest o1 model, which demonstrates step-by-step reasoning abilities. The next step could be to equip multimodal models with these capabilities.

For now, the launch of Molmo means that AI agents are closer than ever and could soon be useful even outside the giants that rule the artificial world.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button