Beyond short-term snippet: Video relation detection with spatio-temporal global context C Liu, Y Jin, K Xu, G Gong, Y Mu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 79 | 2020 |
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization Y Jin, K Xu, L Chen, C Liao, J Tan, B Chen, C Lei, A Liu, C Song, X Lei, ... ICLR 2024, 2023 | 30 | 2023 |
Learning to effectively estimate the travel time for fastest route recommendation N Wu, J Wang, WX Zhao, Y Jin Proceedings of the 28th ACM International Conference on Information and …, 2019 | 24 | 2019 |
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding Y Jin, Y Li, Z Yuan, Y Mu Advances in Neural Information Processing Systems 35, 29192-29204, 2022 | 23 | 2022 |
Complex video action reasoning via learnable markov logic network Y Jin, L Zhu, Y Mu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 10 | 2022 |
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization Y Jin, Z Sun, K Xu, L Chen, H Jiang, Q Huang, C Song, Y Liu, D Zhang, ... ICML 2024, 2024 | 9 | 2024 |
Learning instance-level representation for large-scale multi-modal pretraining in e-commerce Y Jin, Y Li, Z Yuan, Y Mu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 8 | 2023 |
Zero-shot video event detection with high-order semantic concept discovery and matching Y Jin, W Jiang, Y Yang, Y Mu IEEE Transactions on Multimedia 24, 1896-1908, 2021 | 8 | 2021 |
Video action segmentation via contextually refined temporal keypoints B Jiang, Y Jin, Z Tan, Y Mu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 3 | 2023 |
Harder Tasks Need More Experts: Dynamic Routing in MoE Models Q Huang, Z An, N Zhuang, M Tao, C Zhang, Y Jin, K Xu, L Chen, S Huang, ... ACL 2024, 2024 | 1 | 2024 |
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance Z Sun, Z Yang, Y Jin, H Chi, K Xu, L Chen, H Jiang, D Zhang, Y Song, ... arXiv preprint arXiv:2405.14677, 2024 | | 2024 |
Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment Y Jin, Y Mu | | |