News

As competition in the generative AI field shifts toward multimodal models, Meta has released a preview of what can be its answer to the models released by frontier labs. Chameleon, its new family ...
The second new model that Microsoft released today, Phi-4-multimodal, is an upgraded version of Phi-4-mini with 5.6 billion parameters. It can process not only text but also images, audio and video.
Image: Morsa Images/Getty Images. Microsoft has unveiled Kosmos-1, which it describes as a multimodal large language model (MLLM) that can not only respond to language prompts but also visual cues ...
Alibaba has created an AI model called Multi-Modality to Multi-Modality Multitask Mega-transformer (M6). The model contains 10 billion parameters and is pretrained on a dataset consisting of 1.9TB of ...
The Multimodal transformer with Unified maSKed modeling, or MUSK for short, is trained on over 50 million histopathology images and one billion text tokens from clinical reports to predict cancer ...
The model also integrates into Sora, OpenAI’s video-generation platform, further expanding multimodal capabilities. In an announcement on X, OpenAI confirmed that GPT-4o’s image generation is ...
Microsoft has introduced a new AI model that, it says, can process speech, vision, and text locally on-device using less compute capacity than previous models. Innovation in generative artificial ...
Complex model architectures, demanding runtime computations, and transformer-specific operations introduce unique challenges.
Today, we are launching Precious3GPT, the first multi omics multispecies multi tissue multimodal transformer model for aging research and drug discovery. It is trained on biomedical text data, and ...