End-To-End Generative Pretraining For Multimodal Video Captioning

Video Captioning 总结与展望 知乎

End-To-End Generative Pretraining For Multimodal Video Captioning. Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin.

Video Captioning 总结与展望 知乎
Video Captioning 总结与展望 知乎

Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin.

Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin. Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin.