Only Text Quote

I think multimodal kinds of models are pretty interesting - like can you combine text with imagery or audio or video in interesting ways?