34. Models that work across media
Multimodal models can read, describe, compare, and generate across text, image, audio, and video. You will build tasks that combine modalities, such as visual question answering, document reading, captioning, and image-grounded chat.