Question 1

Which of the following is an example of multimodal AI usage?

Accepted Answer

Uploading a photo of a math problem and asking the AI to solve it. Uploading an image (visual input) combined with a text question is multimodal — the model must understand both image and text together to respond.

Question 2

DALL-E 3 and Midjourney are best described as what type of AI?

Accepted Answer

Text-to-image generation models. DALL-E 3 and Midjourney take text prompts and generate new images — they are generative models, not understanding/analysis models.

Question 3

You want to extract the total from a photo of a handwritten receipt. Which AI capability do you need?

Accepted Answer

Vision/image understanding (multimodal input). Reading a handwritten image requires vision/image understanding — you need a multimodal model like GPT-4o or Claude 3.5 Sonnet that accepts image input.

What is Multimodal AI?

Unimodal vs Multimodal

What Current Models Can Do

Practical Multimodal Use Cases

Image Generation vs Image Understanding

Deep Dive Articles

Related Concepts