Meta’s FAIR Team’s New Releases Include Image-to-Text and Text-to-Music Generation Models

AI, News

Insider Brief

Meta’s Fundamental AI Research (FAIR) team has announced the public release of five cutting-edge AI models.
Meta’s new AI models include image-to-text and text-to-music generation, a multi-token prediction model and a technique for detecting AI-generated speech.
The team added they are interested in advancing artificial intelligence through open research and global collaboration.

Meta’s Fundamental AI Research (FAIR) team has announced the public release of five cutting-edge AI models, according to a company statement.

Meta’s new AI models include image-to-text and text-to-music generation, a multi-token prediction model and a technique for detecting AI-generated speech. By sharing these models, Meta aims to inspire further iterations and foster responsible AI advancement.

The team added they are interested in advancing artificial intelligence through open research and global collaboration.

The models include:

Chameleon: Bridging Text and Images

One highlight of the release is the Chameleon model, a mixed-modal AI that processes and generates both text and images. Unlike typical large language models that are unimodal, Chameleon can handle any combination of text and images, offering endless possibilities. This model is available under a research-only license.

Multi-Token Prediction: Speeding Up AI Training

Meta has also introduced a multi-token prediction model aimed at improving the efficiency of training large language models (LLMs). Traditional LLMs predict one word at a time, requiring massive amounts of text. The new model predicts multiple future words simultaneously, enhancing speed and efficiency. The pretrained models for code completion are released under a non-commercial, research-only license.

JASCO: Enhanced Control in Music Generation

JASCO, a text-to-music model, offers users more control over AI-generated music by accepting various inputs such as chords and beats. This advancement allows the integration of both symbols and audio, improving the versatility and quality of generated music. JASCO’s performance is comparable to existing models but provides significantly better control.

AudioSeal: Detecting AI-Generated Speech

Meta’s AudioSeal, an audio watermarking technique, detects AI-generated speech segments within audio snippets. Its localized detection approach enhances speed by up to 485 times compared to previous methods, making it suitable for large-scale, real-time applications. AudioSeal is released under a commercial license to help prevent misuse of generative AI tools.

Diversity in Text-to-Image Models

Meta has also tackled geographic and cultural diversity in text-to-image models. To address potential disparities, the company developed automatic indicators and conducted a large-scale annotation study with over 65,000 annotations. This effort aims to improve the representation and inclusivity of AI-generated images.

The geographic disparities evaluation code and annotations are now publicly available, supporting the community in enhancing diversity across generative models.

Meta’s FAIR team emphasizes that collaboration with the global AI community is crucial for responsible innovation. The release of these models exemplifies Meta’s dedication to open science and ethical AI development. Through these contributions, Meta hopes to advance the state of AI in a responsible and inclusive manner.

For more information on Meta’s AI research, visit the company’s home page.