Hello! I’m Chen Ju (鞠陈).

I’m an algorithm researcher (阿里星) of Alibaba Future Life Laboratory (未来生活实验室), working closely with Dr. Bo Zheng, Dr. Weilin Huang, Dr. Shuai Xiao, Dr. Xu Chen, and Dr. Zhonghua Zhai. The vision is to develop large-scale multi-modal searching & question-answering system (拍立淘) and general multi-modal technologies for various e-commerce applications, such as superlarge-scale pre-training (10-billion image-text product data), AIGC (GPT & VLM & Diffusion). Now it has become one of the largest visual/multi-modal application scenarios in China.

Recently, I study with some outstanding researchers from the CAD&CG State Key Laboratory, Zhejiang University: Prof. Chunhua Shen, Prof. Hao Chen, and Prof. Bohan Zhuang, aiming to explore the efficient/unified architecture and paradigm for next-generation VLM.

Before that, I explore with some outstanding researchers from WeChat Technology (微信技术架构), Tencent: Dr. Fengyun Rao, Dr. Yizhou Zhou, Dr. Guangting Wang and Dr. Yukun Su, working to develop chinese pre-trainings of image-text-video-music, namely WeMM, WeCLIP, WeMU.

Earlier, I collaborate with some outstanding researchers from PanGu Large Model (盘古大模型), Huawei: Prof. Qi Tian, Dr. Lingxi Xie, Dr. Xiaopeng Zhang, Dr. Jianlong Chang, Dr. Jiemin Fang, and Dr. Peisen Zhao, to explore VLM for B-side industrial scenarios.

I obtained the PhD’s degree from MediaBrain Group, Shanghai Jiao Tong University, advised by Prof. Yanfeng Wang (人工智能学院院长) and Prof. Ya Zhang (国家万人), also collaborating with Prof. Weidi Xie(海外优青), Prof. Siheng Chen(海外优青), Prof. Yu Wang(海外优青) and Prof. Jiangchao Yao. Before, I obtained a Bachelor’s degree in Engineering from University of Electronic Science and Technology of China, where I studied under Prof. Yong Liu (国家杰青 & 长江学者), awarded with the honor of outstanding graduate.

Email: cju[dot]void[at]gmail[dot]com / ju_chen[at]alumni[dot]sjtu[dot]edu[dot]cn Google Scholar: Citations 1600+, H-index 18, I10-index 20

I’m also leading one small group that mainly works on Efficient Data Understanding & Generation for Multi-Modal Foundation Models (Agent Governance Flywheel, Advanced Paradigm/Framework, Creative AIGC). Actively recruiting research/engineering interns, please see 知乎 and 小红书, feel free to contact me!

🔥 News

Our new work, Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search is out!
I am awarded the honor of Outstanding Star in Alibaba Future Life Laboratory.
[2026.04] One paper is accepted to IEEE Transactions on Information Systems, about long sequence modeling for recommendation.
[2026.03] One paper is accepted to ICME 2026, about scene generation for human-centric videos.
[2026.01] Two papers are accepted to WWW 2026, about intelligent transaction agent system for e-commerce.
[2025.06] One paper is accepted to ICCV 2025, about efficient inference of MLLMs.
[2025.02] Two papers are accepted to CVPR 2025, about high-quality vision-language alignment, and efficient MLLMs.
[2024.12] Two papers are accepted to ICASSP 2025, about label-efficient video understanding, and AIGC-assisted image understanding.
[2024.07] Two papers are accepted to ECCV 2024, about innovative acceleration of foundation models, and interactive virtual try-on.
[2024.06] One paper is accepted to Springer IJCV, about open-set semantic segmentation via multi-modal prototypes.
[2024.03] One paper is accepted to CVPR 2024, about audio-visual segmentation via unlabeled frame exploitation.
[2024.01] One paper is accepted to WWW 2024, about cross-domain CTR prediction via explicit feature augmentation.
[2023.09] One paper is accepted to NIPS 2023, about general semantic understanding for multi-modal large models.
[2023.07] One paper is accepted to ICCV 2023, about finer visual understanding from multiple diffusion models.
[2023.03] One paper is accepted to CVPR 2023, about effective collaboration of multiple foundation models.
[2022.07] One paper is accepted to ECCV 2022, about efficient adaptation for vision-language foundation models.
[2022.07] One paper is accepted to ACM Multimedia 2022, about cost-effective pre-training for video-audio foundation models.

💻 Researches

My primary research interests lie in

Multi-Modal Language Large Models: Pre-Training, Continual-Training, Post-Training, SFT, Reason, RL, RAG, MoE, Interleaved Learning.
Vision-Language-Music Alignment: Pre-Training, Efficient Adaptation/Fine-Tuning, Training/Deployment Acceleration.
Creative AIGC：Generation/Fine Editing for Image/Video/Music, Conversation-Driven Understanding/Composition, RLHF Evalution.
Data Governance/Flywheel & Mining: Clean/Compress/Distill/Synthesize Data, Cross-Modal Retrieval/Recommendation/Advert.
Video Understanding: Retrieval/Caption/Summary for Video Clips, Alignment/Detection/Classification for Untrimmed Long Videos.

As a young researcher, your interest and kind citation will definitely mean a lot for me and my collaborators.

Also feel free to drop me an email for any suggestions or potential collaborations.

📝 Publications

Prompting Visual-Language Models for Efficient Video Understanding | [Project] | [Code & Data] | [Report] | [Bibtex]
Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang and Weidi Xie
ECCV 2022
Turbo: Informativity-Driven Acceleration Plugin for Vision-Language Large Models | [Project] | [Bibtex]
Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao and Bo Zheng
ECCV 2024
Collaborating Vision-Language Pre-training with Weakly-Supervised Video Understanding | [Project & Code] | [Bibtex]
Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya Zhang, Jianlong Chang, Qi Tian and Yanfeng Wang
CVPR 2023
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance | [Project & Code] | [Bibtex]
Haicheng Wang, Zhemeng Yu, Gabriele Spadaro, Chen Ju, Victor Quétu, Shuai Xiao, Enzo Tartaglione
ICCV 2025
Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization | [Bibtex]
Zhicheng Wang, Chen Ju^✉, Xu Chen, Shuai Xiao, Jinsong Lan, Xiaoyong Zhu, Zhiguo Cao
ArXiv preprint 2025

📒 Topic: Vision-Language-Audio Pre-trainings & Inference with Strong Generalization but Low Costs

Transformation Invariance and Equivariance for Self-supervised Sound Localization | [Project & Demo] | [Code] | [Bibtex]
Jinxiang Liu, Chen Ju, Weidi Xie and Ya Zhang
ACM Multimedia 2022
Audio-Visual Segmentation via Unlabeled Frames Exploitation | [Bibtex]
Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Yanfeng Wang and Ya Zhang
CVPR 2024
Contrast and Unity for Partially-Supervised Temporal Sentence Grounding | [Project & Code] | [Bibtex]
Haicheng Wang, Chen Ju^✉, Weixiong Lin, Jinxiang Liu, Chaofan Ma, Ya Zhang, Yanfeng Wang
ICASSP 2025
SAM Guided Annotation-free Audio-Visual Cross-modal Segmentation | [Project & Code] | [Bibtex]
Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie
WACV 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | [Project & Code] | [Bibtex]
Haicheng Wang, Chen Ju^✉, Weixiong Lin, Shuai Xiao, Mengting Chen, Yixuan Huang, Chang Liu, Mingshuai Yao, Jinsong Lan, Ying Chen, Qingwen Liu and Yanfeng Wang
CVPR 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance | [Bibtex]
Weixiong Lin, Chen Ju^✉, Haicheng Wang, Shengchao Hu, Shuai Xiao, Mengting Chen, Yuheng Jiao, mingshuai Yao, Jinsong Lan, Qingwen Liu, Ying Chen
ICIP 2026

📒 Topic: Understand World through Open-Vocabulary Learning, and also Rethinking Limitations

Multi-modal GPT Prompts for Open-Vocabulary Video Understanding | [Project & Code] | [Bibtex]
Chen Ju, Zeqian Li, Peisen Zhao, Ya Zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang and Weidi Xie
Springer IJCV
Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | [Bibtex]
Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya Zhang and Yanfeng Wang
NeurIPS 2023
Multi-Modal Prototypes for Open-Set Semantic Segmentation | [Bibtex]
Yuhuan Yang, Chaofan Ma, Chen Ju, Ya Zhang and Yanfeng Wang
Springer IJCV
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | [Bibtex]
Haozhe Cheng, Chen Ju^✉, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang and Yanfeng Wang
ICIP 2026

📒 Topic: Innovative AIGC Creativity, Free Vision-Text-Audio Editing and Composition

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery | [Bibtex]
Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya Zhang and Yanfeng Wang
ICASSP 2025
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment | [Project] | [Bibtex]
Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan and Shuai Xiao
ECCV 2024
Beyond Static Scenes: Camera-controllable Background Generation for Human Motion | [Bibtex]
Mingshuai Yao, Mengting Chen, Qinye Zhou, Yabo Zhang, Ming Liu, Xiaoming Li, Shaohui Liu, Chen Ju, Shuai Xiao, Qingwen Liu, Jinsong Lan, Wangmeng Zuo
ICME 2026
Improving Human Image Animation via Semantic Representation Alignment | [Bibtex]
Chang Liu, Mengting Chen, Yixuan Huang, Haoning Wu, Chen Ju, Shuai Xiao, Jinsong Lan, Yanfeng Wang
CVPRW 2026
Wave-Particle (Continuous–Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation | [Bibtex]
Yizhu Chen, Chen Ju^✉, Zhicheng Wang, Shuai Xiao, Xu Chen, Jinsong Lan, Xiaoyong Zhu
ArXiv preprint 2025

📒 Topic: Freeze Pre-trainings, Downstream Video Understanding with Limited Annotation & Supervision

Divide and Conquer for Single-frame Temporal Action Localization | [Project & Demo] | [Bibtex]
Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Yanfeng Wang and Qi Tian
ICCV 2021
Bottom-Up Temporal Action Localization with Mutual Regularization | [Demo] | [Code] | [Bibtex]
Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Yanfeng Wang and Qi Tian
ECCV 2020
Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization | [Project & Demo] | [Bibtex]
Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Xiaoyun Zhang and Qi Tian
IEEE Transactions on Multimedia
Audio-Aware Query-Enhanced Transformer for Audio-Visual Segmentation | [Project & Code] | [Bibtex]
Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
ArXiv preprint 2023

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search | [Bibtex]
Lei Chen, Chen Ju^✉, Xu Chen, Zhicheng Wang, Yuheng Jiao, Hongfeng Zhan, Zhaoyang Li, Shihao Xu, Zhixiang Zhao, Tong Jia, Jinsong Lan, Xiaoyong Zhu, Bo Zheng
Technical Report 2026
Enhancing Cross-domain Click-Through Rate Prediction via Explicit Feature Augmentation | [Bibtex]
Xu Chen, Zida Cheng, Jiangchao Yao, Chen Ju, Weilin Huang, Xiaoyi Zeng and Shuai Xiao
WWW 2024
Category-Oriented Representation Learning for Image to Multi-Modal Retrieval | [Bibtex]
Zida Cheng, Chen Ju, Xu Chen, Zhonghua Zhai, Shuai Xiao and Junchi Yan
ArXiv preprint 2023
Counterfactual Learning-Driven Representation Disentanglement for Search-Enhanced Recommendation | [Bibtex]
Jiajun Cui, Xu Chen, Shuai Xiao, Chen Ju, Jinsong Lan, Qingwen Liu and Wei Zhang
IEEE Transactions on Information Systems
Evaluating Multi-Turn Bargain Skills in LLM-Based Seller Agents | [Bibtex]
Issue Yishu Wang, Kakam Chong, Xiaofeng Wang, Xu Yan, Dexin Kong, Chen Ju, Ming Chen, Shuai Xiao, Shuguang Han, Junfeng Chen
WWW 2026
Multi-Branch Cooperation Networks for Enhanced Click-Through Rate Prediction in Large-Scale E-Commerce Search | [Bibtex]
Xu Chen, Zida Cheng, Shuai Xiao, Chen Ju, Xiaoming Liu, Jinsong Lan, Xiaoyong Zhu and Bo Zheng
WWW 2026

🗞️ Academics and Communications

PC Member & Conference Reviewer: ICML2026/2025, ICLR2026/2025, NeurIPS 2026/2025/2024, ECCV 2026/2024/2022, CVPR 2026/2025/2024/2023, AAAI 2026/2025/2024/2023, ICCV 2025/2023, ACM MM 2026/2025/2024/2023, WACV 2025/2024
Journal Reviewers: IEEE T-PAMI, Springer IJCV, IEEE T-MM, IEEE TCSVT, NPL
I am fortunate to meet many interesting people & Team:

University System. UESTC: Yong Liu, Yadong Jiang. PKU: Hong Liu, Jin Luo, Donglin Liu, Yong Peng. THU: Shousheng Han, Zhengsong Wang, Zongren Dai. SJTU: Haicheng Wang, Jinxiang Liu, Yue Hu, Chenxin Xu, Chaoqin Huang, Xiaoman Zhang, Xuehui Wang, Jiazhong Ceng, Chen Yang. USTC: Jiaqing Gao, Yumin Xia, Qi Meng. Oxford: Tengda Han, Charig Yang. KU Leuven: Haien Tang, Chunzhuo Wang, Liting Yang. NUS: Jialin Gao OpenGVLab: Xue Yang. Ruijin: Qinwei Xu
Alibaba. TAO Technology: Zida Cheng, Mengting Chen, Xuewen Hong, Yixuan Huang, Lianyu Du. DAMO Academy: Chang Zhou, Xi Chen, Mosha Chen. Alimama: Jiajie Wang, Hao Wu, Yuanzhe Gu. T-head: Yu Fu, He Guo. AntGroup: Tong Zhan, Qingpei Guo, Yifei Hu, Ming Yang, Jingdong Chen.
Huawei. Cloud BU: Yucheng Liu, Yaoming Wang, Shuangrui Ding, Haohang Xu. Car BU: Maosen Li. Consumer BG: Yongli Jia, Feilong Chen, Chenliang Hu. ICT: Liang Zhao, Tongda Li. 2012: Yu Zhou, Guohao Gong.
Baidu. Big Search: Zhengyang Li, Suqi Chen. Ernie Bot: Tian Wu, Jiachen Liu. Phoenix Nest: Chenyang Li.
Tencent. WXG: Xiaoyi Jia, Honghui Lin, Yongsheng Luo, Tianyi Wang, Zhenghua Liu, Dr. Hongwei Xue, Dr. Dacheng Yin. CDG: Tianyue Cao. TEG: Hongfa Wang, Wei Liu.
Software Company. Meta: Kunhao Zheng. DiDi: Zhe Xu. ByteDance: Yichao Xiong, Zhikang Li, Kunyuan Du, Xuan Liao, Yuxuan Jiang, Shiqi Peng, Hangtian Zhao, Jian Li. Bilibili: Luochen Lv. ZTE: Xiao Hu. KuaiShou: Liwei Chen, Kun Xu. MeiTuan: Yujie Zhong, Yexun Zhang.
Hardware Company. INVIDIA: Jie Chang, Yangheng Zhao, Yingying Xue. Intel: Yujie Pan. Hikvision: Tengfei Hou, Wanshun Gao. OPPO: Bo Wang, Chen Chen, Haonan Lu. Honor: Yuanchao Du.

📄 Patents

CN202010403823.4 《一种基于自适应采样的弱监督时序动作检测方法及系统》
Ya Zhang, Chen Ju, Yanfeng Wang.
CN202111190861.7 《一种单帧监督视频时序动作检测与分类方法及系统》
Ya Zhang, Chen Ju, Peisen Zhao, Siheng Chen, Xiaoyun Zhang, Yanfeng Wang.
CN202211056034.3 《弱监督视频时序动作检测与分类方法及系统》
Ya Zhang, Chen Ju, Kunhao Zheng, Jinxiang Liu, Weidi Xie, Yanfeng Wang.
CN202211581256.7 《局部监督长视频时序文本检索方法及系统》
Ya Zhang, Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Yanfeng Wang.
CN202310913202.4 《基于属性分解-聚合的开放词汇语义分割方法及系统》
Yanfeng Wang, Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya Zhang.
CN202410530557.X 《虚拟对象的生成方法、计算机终端、存储介质及产品》
Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao.
CN202411370683.X 《模型加速和数据处理方法、设备、存储介质及程序产品》
Chen Ju, Shuai Xiao, Haicheng Wang, Xu Chen, Mengting Chen, Jinsong Lan.
CN202510067425.2 《数据处理方法、计算设备以及电子设备》
Weixiong Lin, Chen Ju, Shuai Xiao, Haicheng Wang, Jinsong Lan.
CN202510222281.3 《图像处理方法、商品推荐方法和图像处理模型的训练方法》
Haicheng Wang, Chen Ju, Shuai Xiao, Weixiong Lin, Jinsong Lan, Xiaoyong Zhu.

📖 Educations

2018 - 2024, PhD, Shanghai Jiao Tong University, Shanghai, China
2018, Exchange Student, University of Amsterdam, Netherlands
2018, Exchange Student, KU Leuven, Belgium
2014 - 2018, Undergraduate, University of Electronic Science and Technology of China, Chengdu, China

🎖 Honors and Awards

[2025] Outstanding Star of Alibaba Future Laboratory
[2024] Top Talent Program by Technology Companies (Alibaba-Star, Huawei-Topminds, Tencent-QingYun, BaiDu-AIDU, KuaiShou-Star, JD-DMT)
[2023] First Prize of Shanghai Technology Invention Award
[2022] CMIC Outstanding Scholarship at SJTU (Top 1%)
[2021] CMIC Outstanding Scholarship at SJTU (Top 1%)
[2020] CMIC Outstanding Scholarship at SJTU (Top 1%)
[2018] Outstanding Graduates of Sichuan Province (Top 1%)
[2018] Outstanding Graduates of UESTC (Top 1%)
[2017] First Prize in National Undergraduate Mathematical Modeling
[2017] Undergraduate National Scholarship at UESTC (Top 1%)
[2016] Undergraduate National Scholarship at UESTC (Top 1%)
[2015] Undergraduate National Scholarship at UESTC (Top 1%)

Chen Ju