Hello! I’m Chen Ju (鞠陈).

I’m an algorithm researcher of Alibaba Search and Recommendation Department, working closely with Dr. Weilin Huang, Dr. Shuai Xiao, Dr. Xu Chen, and Dr. Zhonghua Zhai. The vision is to develop large-scale visual searching system (拍立淘) and general multi-modal technologies for various e-commerce applications, such as superlarge-scale pre-training (10-billion image-text product data), AIGC (GPT & VLM & Diffusion). Now it has become one of the largest visual/multi-modal application scenarios in China.

Recently, I study with some outstanding researchers from the CAD&CG State Key Laboratory, Zhejiang University: Prof. Chunhua Shen, Prof. Hao Chen, and Prof. Bohan Zhuang, aiming to explore the efficient/unified architecture and paradigm for next-generation VLM.

I’m also leading one small group that mainly works on Efficient Data Understanding & Generation for Multi-Modal Foundation Models(Data Flywheel/Governance, Advanced Paradigm/Framework, Creative AIGC). Actively recruiting research/engineering interns, please see Link, feel free to contact me!

Before that, I explore with some outstanding researchers from WeChat Technology (微信技术架构), Tencent: Dr. Fengyun Rao, Dr. Yizhou Zhou, Dr. Guangting Wang and Dr. Yukun Su, working to develop chinese pre-trainings of image-text-video-music, namely WeMM, WeCLIP, WeMU.

Earlier, I collaborate with some outstanding researchers from PanGu Large Model (盘古大模型), Huawei: Prof. Qi Tian, Dr. Lingxi Xie, Dr. Xiaopeng Zhang, Dr. Jianlong Chang, Dr. Jiemin Fang, and Dr. Peisen Zhao, to explore VLM for B-side industrial scenarios.

I obtained the PhD’s degree from MediaBrain Group, Shanghai Jiao Tong University, advised by Prof. Yanfeng Wang (人工智能学院院长) and Prof. Ya Zhang (国家万人), also collaborating with Prof. Weidi Xie(海外优青), Prof. Siheng Chen(海外优青), Prof. Yu Wang and Prof. Jiangchao Yao. Before, I obtained a Bachelor’s degree in Engineering from University of Electronic Science and Technology of China, where I studied under Prof. Yong Liu (国家杰青 & 长江学者), awarded with the honor of outstanding graduate.

Email: cju[dot]void[at]gmail[dot]com / ju_chen[at]alumni[dot]sjtu[dot]edu[dot]cn   Google Scholar: Citations 950+, H-index 12, I10-index 14

🔥 News

  • new paper Our new work, Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training is out!
  • new paper Our work, Greater Together: Unimodal Group Outperforms Vanilla Multimodal Bootstrap will be out!
  • Our new work, universal VLMs acceleration architecture, from one novel perspective of data de-redundancy is out!
  • Our work, rethinking the robustness for open-vocabulary visual understanding is out!
  • Our work, wear-any-way: manipulable virtual try-on via sparse correspondence alignment is out!
  • [2024.12] Two papers are accepted to ICASSP 2025, about label-efficient video understanding, and AIGC-assisted image understanding.
  • [2024.07] Two papers are accepted to ECCV 2024, about innovative acceleration of foundation models, and interactive virtual try-on.
  • [2024.06] One paper is accepted to Springer IJCV, about open-set semantic segmentation via multi-modal prototypes.
  • [2024.03] One paper is accepted to CVPR 2024, about audio-visual segmentation via unlabeled frame exploitation.
  • [2024.01] One paper is accepted to WWW 2024, about cross-domain CTR prediction via explicit feature augmentation.
  • [2023.09] One paper is accepted to NIPS 2023, about general semantic understanding for multi-modal large models.
  • [2023.07] One paper is accepted to ICCV 2023, about finer visual understanding from multiple diffusion models.
  • [2023.03] One paper is accepted to CVPR 2023, about effective collaboration of multiple foundation models.
  • [2022.07] One paper is accepted to ECCV 2022, about efficient adaptation for vision-language foundation models.
  • [2022.07] One paper is accepted to ACM Multimedia 2022, about cost-effective pre-training for video-audio foundation models.

💻 Researches

My primary research interests lie in

  • Vision-Language-Music Learning: Pre-training Alignment, Efficient Adaptation/Fine-Tuning, Training/Deployment Acceleration.

  • Data Governance/Flywheel & Mining: Clean/Compress/Distill/Synthesize Data, Cross-Modal Retrieval/Recommendation/Advert.

  • Creative AIGC:Generation/Fine Editing for Image/Video/Music, Conversation-Driven Understanding/Composition, RLHF Evalution.

  • Video Understanding: Retrieval/Caption/Summary for Video Clips, Alignment/Detection/Classification for Untrimmed Long Videos.

As a young researcher, your interest and kind citation will definitely mean a lot for me and my collaborators.

Also feel free to drop me an email for any suggestions or potential collaborations.

📝 Publications

📒 Topic: Efficiently Adapt Multi-modal Foundation Models to Unify/Generalize Downstream Tasks

  1. Prompting Visual-Language Models for Efficient Video Understanding | [Project] | [Code & Data] | [Report] | [Bibtex]
    Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang and Weidi Xie
    ECCV 2022

  2. Turbo: Informativity-Driven Acceleration Plugin for Vision-Language Large Models | [Project] | [Bibtex]
    Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao and Bo Zheng
    ECCV 2024

  3. Collaborating Vision-Language Pre-training with Weakly-Supervised Video Understanding | [Project & Code] | [Bibtex]
    Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya Zhang, Jianlong Chang, Qi Tian and Yanfeng Wang
    CVPR 2023

  4. FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
    Haicheng Wang, Zhemeng Yu, Gabriele Spadaro, Chen Ju, Victor Quétu, Enzo Tartaglione
    ArXiv preprint 2025

📒 Topic: Vision-Language-Audio Pre-trainings & Inference with Strong Generalization but Low Costs

  1. Transformation Invariance and Equivariance for Self-supervised Sound Localization | [Project & Demo] | [Code] | [Bibtex]
    Jinxiang Liu, Chen Ju, Weidi Xie and Ya Zhang
    ACM Multimedia 2022

  2. Audio-Visual Segmentation via Unlabeled Frames Exploitation | [Bibtex]
    Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Yanfeng Wang and Ya Zhang
    CVPR 2024

  3. Contrast and Unity for Partially-Supervised Temporal Sentence Grounding | [Project & Code] | [Bibtex]
    Haicheng Wang, Chen Ju, Weixiong Lin, Jinxiang Liu, Chaofan Ma, Ya Zhang, Yanfeng Wang
    ICASSP 2025

  4. SAM Guided Annotation-free Audio-Visual Cross-modal Segmentation | [Project & Code] | [Bibtex]
    Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie
    WACV 2024

  5. Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | [Project & Code] | [Bibtex]
    Haicheng Wang, Chen Ju, Weixiong Lin, Shuai Xiao, Mengting Chen, Yixuan Huang, Chang Liu, Mingshuai Yao, Jinsong Lan, Ying Chen, Qingwen Liu and Yanfeng Wang
    ArXiv preprint 2025

  6. Greater Together: Unimodal Group Outperforms Vanilla Multimodal Bootstrap
    Weixiong Lin, Chen Ju, Haicheng Wang, Shengchao Hu, Shuai Xiao, Mengting Chen, Yuheng Jiao, mingshuai Yao, Jinsong Lan, Ying Chen, Qingwen Liu
    ArXiv preprint 2025

📒 Topic: Understand World through Open-Vocabulary Learning, and also Rethinking Limitations

  1. Multi-modal GPT Prompts for Open-Vocabulary Video Understanding | [Project & Code] | [Bibtex]
    Chen Ju, Zeqian Li, Peisen Zhao, Ya Zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang and Weidi Xie
    Springer IJCV

  2. Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | [Bibtex]
    Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya Zhang and Yanfeng Wang
    NeurIPS 2023

  3. Multi-Modal Prototypes for Open-Set Semantic Segmentation | [Bibtex]
    Yuhuan Yang, Chaofan Ma, Chen Ju, Ya Zhang and Yanfeng Wang
    Springer IJCV

  4. DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | [Bibtex]
    Haozhe Cheng, Chen Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang and Yanfeng Wang
    ArXiv preprint 2024

📒 Topic: Innovative AIGC Creativity, Free Vision-Text-Audio Editing and Composition

  1. DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery | [Bibtex]
    Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya Zhang and Yanfeng Wang
    ICASSP 2025

  2. Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment | [Project] | [Bibtex]
    Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan and Shuai Xiao
    ECCV 2024

  3. Improving Human Image Animation via Semantic Representation Alignment
    Chang Liu, Mengting Chen, Yixuan Huang, Haoning Wu, Chen Ju, Shuai Xiao, Qingwen Liu, Jinsong Lan and Yanfeng Wang
    ArXiv preprint 2025

  4. AnyScene: Camera-controllable Video Background Generation
    Mingshuai Yao, Mengting Chen, Qinye Zhou, Yabo Zhang, Ming Liu, Xiaoming Li, Shaohui Liu, Chen Ju, Shuai Xiao, Qingwen Liu, Jinsong Lan, Wangmeng Zuo
    ArXiv preprint 2025

📒 Topic: Freeze Pre-trainings, Downstream Video Understanding with Limited Annotation & Supervision

  1. Divide and Conquer for Single-frame Temporal Action Localization | [Project & Demo] | [Bibtex]
    Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Yanfeng Wang and Qi Tian
    ICCV 2021

  2. Bottom-Up Temporal Action Localization with Mutual Regularization | [Demo] | [Code] | [Bibtex]
    Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Yanfeng Wang and Qi Tian
    ECCV 2020

  3. Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization | [Project & Demo] | [Bibtex]
    Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Xiaoyun Zhang and Qi Tian
    IEEE Transactions on Multimedia

  4. Audio-Aware Query-Enhanced Transformer for Audio-Visual Segmentation | [Project & Code] | [Bibtex]
    Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
    ArXiv preprint 2023

📒 Topic: MLLMs Guided Multi-Modal Information Retrieval & Sorting & Recall & Representation

  1. Enhancing Cross-domain Click-Through Rate Prediction via Explicit Feature Augmentation | [Bibtex]
    Xu Chen, Zida Cheng, Jiangchao Yao, Chen Ju, Weilin Huang, Xiaoyi Zeng and Shuai Xiao
    WWW 2024

  2. Category-Oriented Representation Learning for Image to Multi-Modal Retrieval | [Bibtex]
    Zida Cheng, Chen Ju, Xu Chen, Zhonghua Zhai, Shuai Xiao and Junchi Yan
    ArXiv preprint 2023

  3. Counterfactual Learning-Driven Representation Disentanglement for Search-Enhanced Recommendation | [Bibtex]
    Jiajun Cui, Xu Chen, Shuai Xiao, Chen Ju, Jinsong Lan, Qingwen Liu and Wei Zhang ArXiv preprint 2024

  4. Cell Variational Information Bottleneck Network
    Zhonghua Zhai, Chen Ju, Shuai Xiao, Jinsong Lan and Xiaoyi Zeng
    ArXiv preprint 2023

🗞️ Academics and Communications

  • PC Member & Conference Reviewer: ICML2025, ICLR2025, NeurIPS 2024, ECCV 2024/2022, CVPR 2025/2024/2023, AAAI 2024/2023, ICCV 2023, ACM MM 2024/2023, WACV 2024/2025
  • Journal Reviewers: IEEE T-PAMI, Springer IJCV, IEEE T-MM, IEEE TCSVT, NPL

  • I am fortunate to meet many interesting people & Team:
  1. University System.   UESTC: Yong Liu, Yadong Jiang.   PKU: Hong Liu, Jin Luo, Donglin Liu, Yong Peng.   THU: Shousheng Han, Zhengsong Wang, Zongren Dai.   SJTU: Haicheng Wang, Jinxiang Liu, Yue Hu, Chenxin Xu, Chaoqin Huang, Xiaoman Zhang, Xuehui Wang, Jiazhong Ceng, Chen Yang.   USTC: Jiaqing Gao, Yumin Xia, Qi Meng.   Oxford: Tengda Han, Charig Yang.   KU Leuven: Haien Tang, Chunzhuo Wang, Liting Yang.   NUS: Jialin Gao   OpenGVLab: Xue Yang.   Ruijin: Qinwei Xu

  2. Alibaba.   TAO Technology: Zida Cheng, Mengting Chen, Xuewen Hong, Yixuan Huang, Lianyu Du.   DAMO Academy: Chang Zhou, Xi Chen, Mosha Chen.   Alimama: Jiajie Wang, Hao Wu, Yuanzhe Gu.   T-head: Yu Fu, He Guo.   AntGroup: Tong Zhan, Qingpei Guo, Yifei Hu, Ming Yang, Jingdong Chen.

  3. Huawei.   Cloud BU: Yucheng Liu, Yaoming Wang, Shuangrui Ding, Haohang Xu.   Car BU: Maosen Li.   Consumer BG: Yongli Jia, Feilong Chen, Chenliang Hu.   ICT: Liang Zhao, Tongda Li.   2012: Yu Zhou, Guohao Gong.

  4. Baidu.   Big Search: Zhengyang Li, Suqi Chen.   Ernie Bot: Tian Wu, Jiachen Liu.   Phoenix Nest: Chenyang Li.

  5. Tencent.   WXG: Xiaoyi Jia, Honghui Lin, Yongsheng Luo, Tianyi Wang, Zhenghua Liu, Dr. Hongwei Xue, Dr. Dacheng Yin.   CDG: Tianyue Cao.   TEG: Hongfa Wang, Wei Liu.

  6. Software Company.   Meta: Kunhao Zheng.   DiDi: Zhe Xu.   ByteDance: Yichao Xiong, Zhikang Li, Kunyuan Du, Xuan Liao, Yuxuan Jiang, Shiqi Peng, Hangtian Zhao, Jian Li.   Bilibili: Luochen Lv.   ZTE: Xiao Hu.   KuaiShou: Liwei Chen, Kun Xu.   MeiTuan: Yujie Zhong, Yexun Zhang.

  7. Hardware Company.   INVIDIA: Jie Chang, Yangheng Zhao, Yingying Xue.   Intel: Yujie Pan.   Hikvision: Tengfei Hou, Wanshun Gao.   OPPO: Bo Wang, Chen Chen, Haonan Lu.   Honor: Yuanchao Du.

📄 Patents

📖 Educations

  • 2018 - 2024, PhD, Shanghai Jiao Tong University, Shanghai, China
  • 2018, Exchange Student, University of Amsterdam, Netherlands
  • 2018, Exchange Student, KU Leuven, Belgium
  • 2014 - 2018, Undergraduate, University of Electronic Science and Technology of China, Chengdu, China

🎖 Honors and Awards

  • [2024] Top Talent Program by Technology Companies (Alibaba-Star, Huawei-Topminds, Tencent-QingYun, BaiDu-AIDU, KuaiShou-Star, JD-DMT)
  • [2023] First Prize of Shanghai Technology Invention Award
  • [2022] CMIC Outstanding Scholarship at SJTU (Top 1%)
  • [2021] CMIC Outstanding Scholarship at SJTU (Top 1%)
  • [2020] CMIC Outstanding Scholarship at SJTU (Top 1%)
  • [2018] Outstanding Graduates of Sichuan Province (Top 1%)
  • [2018] Outstanding Graduates of UESTC (Top 1%)
  • [2017] First Prize in National Undergraduate Mathematical Modeling
  • [2017] Undergraduate National Scholarship at UESTC (Top 1%)
  • [2016] Undergraduate National Scholarship at UESTC (Top 1%)
  • [2015] Undergraduate National Scholarship at UESTC (Top 1%)