34. Build AI that handles multiple kinds of input
Combine text, images, audio, video, tables, and actions in one system. This chapter covers multimodal embeddings, vision-language models, document AI, visual question answering, and richer human-computer interaction.