News

Microsoft has just launched an experimental Copilot 3D tool that can transform a static image into a 3D model.
By leveraging a vision foundation model called Depth Anything V2, the method can accurately segment crops across diverse environments—field, lab, and aerial—reducing both time and cost in agricultural ...
To address these challenges, the first visual prompting-based multimodal large language model (MLLM) named EarthMarker is proposed in the RS domain. EarthMarker is capable of interpreting RS imagery ...
Image-goal navigation is a critical task in autonomous visual navigation, requiring the robot to navigate to a target localization specified by an image. Previous works using data-driven methods ...