Meta Introduced Chameleon as ChatGPT Competitor

Kamis, 27 Juni 2024 - 09:33 WIB

Oleh :

Arianti Widya

Kecerdasan buatan (artificial intelligence/AI).

Sumber :

Dok. Istimewa

United States – Meta has introduced Chameleon, a new Artificial Intelligence (AI) model that's set to transform the generative AI field. Chameleon can handle different types of data all at once, making it a powerful tool for businesses.

Chameleon is designed to perform a range of tasks, including answering questions about visuals and generating image captions.

The models can perform a broad range of multimodal tasks, achieving state-of-the-art performance across image captioning tasks while equally handling text and visual data.

Chameleon can generate text-based responses and images using a single model. Comparatively, other AI systems tap other AI models for help with other tasks like ChatGPT uses DALL-E 3 to generate its images.

For example, the Chameleon models can create an image of an animal, like a bird, and answer user questions about a particular species.

Logo Meta.

Photo :

About Facebook

The Chameleon models outperform Llama 2. It’s competitive when compared to models like Mistral’s Mixtral 8x7B and Google’s Gemini Pro. It even keeps pace with larger-scale systems like OpenAI’s GPT-4V.

Its capabilities could power multimodal features in Meta AI, the recently released chatbot across Meta’s social media apps, including Facebook, Instagram and WhatsApp.

Meta currently uses Llama 3 to power Meta AI but could follow ChatGPT’s lead and use multiple underlying systems to perform different tasks, like to better answer user queries about photos on Instagram.

“Chameleon unlocks entirely new possibilities for multimodal interaction(s),” the researchers wrote.

Meta’s Chameleon follows the unveiling of another multimodal AI model, OpenAI’s GPT-4o, which is being used to power ChatGPT’s new visual capabilities.

The new Chameleon model uses a combination of architectural innovations and innovative training techniques.

Under the hood, the Chameleon models use an architecture that largely follows Llama 2. However, Meta’s researchers tweaked the underlying transformer architecture to ensure the model performed when handling mixed modalities.

Those changes include introducing techniques including query-key normalization and revised placement of layer norms.

They also utilized two tokenizers, which process input data, using one for text and one for visuals. The data is then used to form the entire input. The same process occurs in Chameleon’s outputs, enabling the model to focus on the data coming in and out.