End-to-end dense video captioning with parallel decoding T Wang, R Zhang, Z Lu, F Zheng, R Cheng, P Luo
ICCV 2021, 6847-6857, 2021
143 2021 Event-centric hierarchical representation for dense video captioning T Wang, H Zheng, M Yu, Q Tian, H Hu
IEEE Transactions on Circuits and Systems for Video Technology 31 (5), 1890-1900, 2020
63 2020 Caption anything: Interactive image description with diverse multimodal controls T Wang*, J Zhang*, J Fei*, Y Ge, H Zheng, Y Tang, Z Li, M Gao, S Zhao, ...
arXiv preprint arXiv:2305.02677, 2023
38 2023 VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix T Wang, W Jiang, Z Lu, F Zheng, R Cheng, C Yin, P Luo
ICML 2022, 2022
25 2022 Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models D Lu, Z Wang, T Wang, W Guan, H Gao, F Zheng
Proceedings of the IEEE/CVF International Conference on Computer Vision, 102-111, 2023
11 2023 Knowledge-aware prompt tuning for generalizable vision-language models B Kan, T Wang, W Lu, X Zhen, W Guan, F Zheng
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
10 2023 Dense-captioning events in videos: Sysu submission to activitynet challenge 2020 T Wang, H Zheng, M Yu
CVPR Workshops, 2020
10 2020 Image caption with endogenous–exogenous attention T Wang, H Hu, C He
Neural Processing Letters 50, 431-443, 2019
10 2019 -Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task InterpolationC Wu, T Wang, Y Ge, Z Lu, R Zhou, Y Shan, P Luo
International Conference on Machine Learning, 37713-37727, 2023
9 2023 Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline T Geng, T Wang, J Duan, R Cong, F Zheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
9 2023 Video understanding with large language models: A survey Y Tang, J Bi, S Xu, L Song, S Liang, T Wang, D Zhang, J An, J Lin, R Zhu, ...
arXiv preprint arXiv:2312.17432, 2023
7 2023 Accelerating Vision-Language Pretraining with Free Language Modeling T Wang, Y Ge, F Zheng, R Cheng, Y Shan, X Qie, P Luo
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
6 2023 Transferable decoding with visual entities for zero-shot image captioning J Fei, T Wang, J Zhang, Z He, C Wang, F Zheng
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
6 2023 Llmva-gebc: Large language model with video adapter for generic event boundary captioning Y Tang, J Zhang, X Wang, T Wang, F Zheng
arXiv preprint arXiv:2306.10354, 2023
4 2023 Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos T Wang*, J Zhang*, F Zheng, W Jiang, R Cheng, P Luo
arXiv preprint arXiv:2303.06378, 2023
4 2023 Multi-modal segment assemblage network for ad video editing with importance-coherence reward Y Tang, S Xu, T Wang, Q Lin, Q Lu, F Zheng
Proceedings of the Asian Conference on Computer Vision, 3519-3535, 2022
4 2022 Semantic-aware pretraining for dense video captioning T Wang, Z Liu, F Zheng, Z Lu, R Cheng, P Luo
arXiv preprint arXiv:2204.07449, 2022
3 2022 PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas C Li, X Peng, T Wang, Y Ge, M Liu, X Xu, Y Wang, Y Shan
arXiv preprint arXiv:2306.14644, 2023
1 2023 Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training Z Liu, T Wang, J Zhang, F Zheng, W Jiang, K Lu
IEEE Transactions on Multimedia, 2022
1 2022 UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization T Geng, T Wang, Y Zhang, J Duan, W Guan, F Zheng
arXiv preprint arXiv:2404.03179, 2024
2024