Hello! I’m Chen Ju (鞠陈).
I’m a final-year PhD candidate at MediaBrain Group, Shanghai Jiao Tong University, advised by Prof. Yanfeng Wang (上海AI LAB主任助理) and Prof. Ya Zhang (国家万人), also collaborating with Prof. Weidi Xie, Prof. Siheng Chen, Prof. Yu Wang and Prof. Jiangchao Yao. Before that, I obtained a Bachelor’s degree in Engineering from University of Electronic Science and Technology of China, where I studied under Prof. Yong Liu (国家杰青 & 长江学者), awarded with the honor of Outstanding Graduate.
Currently, I collaborate closely with some outstanding researchers from TAO Technology (拍立淘), Alibaba: Dr. Weilin Huang, Dr. Shuai Xiao, Dr. Xu Chen, and Dr. Zhonghua Zhai. The vision is to develop large-scale visual searching system for various e-commerce applications, such as superlarge-scale multi-modal learning (10-billion image-text product data), AIGC (GPT & Diffusion).
Before, I study with some outstanding researchers from WeChat Technology (微信技术架构), Tencent: Dr. Fengyun Rao, Dr. Yizhou Zhou, Dr. Guangting Wang and Dr. Yukun Su, working to develop chinese pre-trainings of image-text-video-music, namely WeMM, WeCLIP, WeMU.
Earlier, I cooperate with some outstanding researchers from PanGu Large Model (盘古大模型), Huawei: Prof. Qi Tian, Dr. Lingxi Xie, Dr. Xiaopeng Zhang, Dr. Jianlong Chang, Dr. Jiemin Fang, and Dr. Peisen Zhao, to explore MLLM for B-side industrial scenarios.
I’m now leading one small group that mainly works on Efficient Data Governance (Cleaner; Organizer; Compressor; Distiller; Synthesizer; Evolver). Actively recruiting research interns and engineering interns, please feel free to contact me!
Email: ju_chen[at]sjtu[dot]edu[dot]cn / cju[dot]void[at]gmail[dot]com Google Scholar: Citations 620+, H-index 10, I10-index 9
🔥 News
- [2024.04] Our new work, rethinking the robustness for open-vocabulary visual understanding is out!
- [2024.04] Our new work, wear-any-way: manipulable virtual try-on via sparse correspondence alignment is out!
- [2024.03] Our new work, vision-audio-text alignment, from large-scale self-supervised video streaming is out!
- [2024.01] Remarking for 500 Citations in Google Scholar.
- [2023.12] Our work, universal VLMs acceleration architecture, from one novel perspective of data de-redundancy is out!
- [2023.09] Our work, diversifying semantics’ attributes via LLMs for open-set visual system is out!
- [2023.07] Our work, aligning LLMs’ remarkable semantics for multi-modal understanding system is out!
- [2023.05] Our work, distilling fine-grained priors from stable diffusion for unsupervised object discovery is out!
- [2023.04] Our work, multi-modal GPT prompting for vision-language foundation models is out!
- [2023.03] Our work, collaborative distillation so that multiple foundation pre-trainings complement each other is out!
- [2023.02] Our work, partial supervision with quadruple contrasts for cost-effective vision-language pre-training is out!
- One paper is accepted to CVPR 2024, about audio-visual segmentation via unlabeled frame exploitation.
- One paper is accepted to WWW 2024, about cross-domain CTR prediction via explicit feature augmentation.
- One paper is accepted to NIPS 2023, about general semantic understanding for multi-modal large models.
- One paper is accepted to ICCV 2023, about finer visual understanding from multiple diffusion models.
- One paper is accepted to CVPR 2023, about effective collaboration of multiple foundation models.
- One paper is accepted to ECCV 2022, about efficient adaptation for vision-language foundation models.
- One paper is accepted to ACM Multimedia 2022, about cost-effective pre-training for video-audio foundation models.
💻 Researches
My primary research interests lie in
-
Vision-Language-Music Learning: Multi-Modal Pre-training, Efficient Adaptation, Accelerate Deployment for Downstream Tasks.
-
Data Governance & Mining: Clean & Compress & Synthesize Data, Cross-Modal Retrieval & Recommendation & Advert.
-
AIGC:Generation or Editing for Image & Video & Music, Conversation-Driven Multi-Modal Understanding and Composition.
-
Video Understanding: Retrieval & Caption & Summary for Video Clips, Detection & Classification for Untrimmed Long Videos.
As a young researcher, your interest and kind citation will definitely mean a lot for me and my collaborators. Also feel free to drop me an email for any suggestions or potential collaborations.
📝 Publications
📒 Topic: Efficiently Adapt Multi-modal Foundation Models to Unify/Generalize Downstream Tasks
-
Prompting Visual-Language Models for Efficient Video Understanding | [Project] | [Code & Data] | [Report] | [Bibtex]
Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang and Weidi Xie
ECCV 2022 -
Collaborating Vision-Language Pre-training with Weakly-Supervised Video Understanding | [Project & Code] | [Bibtex]
Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya Zhang, Jianlong Chang, Qi Tian and Yanfeng Wang
CVPR 2023 -
Turbo: Informativity-Driven Acceleration Plugin for Vision-Language Foundation Models | [Bibtex]
Chen Ju, Haicheng Wang, Zeqian Li, Xu Chen, Zhonghua Zhai, Weilin Huang and Shuai Xiao
arXiv preprint 2023
📒 Topic: Vision-Language-Audio Pre-trainings & Inference with Strong Generalization but Low Costs
-
Transformation Invariance and Equivariance for Self-supervised Sound Localization | [Project & Demo] | [Code] | [Bibtex]
Jinxiang Liu, Chen Ju, Weidi Xie and Ya Zhang
ACM Multimedia 2022 -
Audio-Visual Segmentation via Unlabeled Frames Exploitation | [Bibtex]
Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Yanfeng Wang and Ya Zhang
CVPR 2024 -
Contrast and Unity for Partially-Supervised Temporal Sentence Grounding | [Project & Code] | [Bibtex]
Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Ya Zhang
arXiv preprint 2023 -
SAM Guided Annotation-free Audio-Visual Cross-modal Segmentation | [Project & Code] | [Bibtex]
Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie
WACV 2024
📒 Topic: Understand World through Open-Vocabulary Learning, and also Rethinking Limitations
-
Multi-modal GPT Prompts for Open-Vocabulary Video Understanding | [Project & Code] | [Bibtex]
Chen Ju, Zeqian Li, Peisen Zhao, Ya Zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang and Weidi Xie
Springer IJCV -
Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | [Bibtex]
Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya Zhang and Yanfeng Wang
NIPS 2023 -
Multi-Modal Prototypes for Open-Set Semantic Segmentation | [Bibtex]
Yuhuan Yang, Chaofan Ma, Chen Ju, Ya Zhang and Yanfeng Wang
arXiv preprint 2023 -
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | [Bibtex]
Haozhe Cheng, Chen Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang and Yanfeng Wang
arXiv preprint 2024
📒 Topic: Innovative AIGC Creativity, Free Vision-Text-Audio Editing and Composition
-
DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery | [Bibtex]
Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya Zhang and Yanfeng Wang
ICCV 2023 -
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment | [Project] | [Bibtex]
Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan and Shuai Xiao
arXiv preprint 2024
📒 Topic: Freeze Pre-trainings, Downstream Video Understanding with Limited Annotation & Supervision
-
Divide and Conquer for Single-frame Temporal Action Localization | [Project & Demo] | [Bibtex]
Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Yanfeng Wang and Qi Tian
ICCV 2021 -
Bottom-Up Temporal Action Localization with Mutual Regularization | [Demo] | [Code] | [Bibtex]
Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Yanfeng Wang and Qi Tian
ECCV 2020 -
Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization | [Project & Demo] | [Bibtex]
Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Xiaoyun Zhang and Qi Tian
IEEE Transactions on Multimedia -
Audio-Aware Query-Enhanced Transformer for Audio-Visual Segmentation | [Project & Code] | [Bibtex]
Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
arXiv preprint 2023
📒 Topic: MLLMs Guided Multi-Modal Information Retrieval & Sorting & Recall & Representation
-
Enhancing Cross-domain Click-Through Rate Prediction via Explicit Feature Augmentation | [Bibtex]
Xu Chen, Zida Cheng, Jiangchao Yao, Chen Ju, Weilin Huang, Xiaoyi Zeng and Shuai Xiao
WWW 2024 -
Image to Multi-Modal Retrieval Learning for Industrial Scenarios | [Bibtex]
Zida Cheng, Chen Ju, Xu Chen, Zhonghua Zhai, Shuai Xiao and Junchi Yan
arXiv preprint 2023 -
Cell Variational Information Bottleneck Network
Zhonghua Zhai, Chen Ju, Shuai Xiao, Jinsong Lan and Xiaoyi Zeng
arXiv preprint 2023
🗞️ Academics and Communications
- PC Member & Conference Reviewer: ECCV 2024/2022, CVPR 2024/2023, AAAI 2024/2023, ICCV 2023, ACM MM 2024/2023, WACV 2024
-
Journal Reviewers: IEEE T-PAMI, Springer IJCV, IEEE T-MM, IEEE TCSVT, NPL
- I am fortunate to meet many interesting people & Team:
-
University System. UESTC: Yong Liu, Yadong Jiang. PKU: Hong Liu, Jin Luo, Donglin Liu, Yong Peng. THU: Shousheng Han, Zhengsong Wang, Zongren Dai. SJTU: Haicheng Wang, Jinxiang Liu, Yue Hu, Chenxin Xu, Chaoqin Huang, Xiaoman Zhang, Xuehui Wang, Jiazhong Ceng, Chen Yang. USTC: Jiaqing Gao, Yumin Xia, Qi Meng. Oxford: Tengda Han, Charig Yang. KU Leuven: Haien Tang, Chunzhuo Wang, Liting Yang. NUS: Jialin Gao OpenGVLab: Xue Yang. Ruijin: Qinwei Xu
-
Alibaba. TAO Technology: Zida Cheng, Mengting Chen, Xuewen Hong, Yixuan Huang, Lianyu Du. DAMO Academy: Chang Zhou, Xi Chen, Mosha Chen. Alimama: Jiajie Wang, Hao Wu, Yuanzhe Gu. T-head: Yu Fu, He Guo. AntGroup: Tong Zhan, Qingpei Guo, Yifei Hu, Ming Yang, Jingdong Chen.
-
Huawei. Cloud BU: Yucheng Liu, Yaoming Wang, Shuangrui Ding, Haohang Xu. Car BU: Maosen Li. Consumer BG: Yongli Jia, Feilong Chen, Chenliang Hu. ICT: Liang Zhao, Tongda Li. 2012: Yu Zhou, Guohao Gong.
-
Baidu. Big Search: Zhengyang Li, Suqi Chen. Ernie Bot: Tian Wu, Jiachen Liu. Phoenix Nest: Chenyang Li.
-
Tencent. WXG: Xiaoyi Jia, Honghui Lin, Yongsheng Luo, Tianyi Wang, Zhenghua Liu, Dr. Hongwei Xue, Dr. Dacheng Yin. CDG: Tianyue Cao. TEG: Hongfa Wang, Wei Liu.
-
Software Company. Meta: Kunhao Zheng. DiDi: Zhe Xu. ByteDance: Yichao Xiong, Zhikang Li, Kunyuan Du, Xuan Liao, Yuxuan Jiang, Shiqi Peng, Hangtian Zhao, Jian Li. Bilibili: Luochen Lv. ZTE: Xiao Hu. KuaiShou: Liwei Chen, Kun Xu. MeiTuan: Yujie Zhong, Yexun Zhang.
-
Hardware Company. INVIDIA: Jie Chang, Yangheng Zhao. Intel: Yujie Pan, Yingying Xue. Hikvision: Tengfei Hou, Wanshun Gao. OPPO: Bo Wang, Chen Chen, Haonan Lu. Honor: Yuanchao Du.
📄 Patents
- CN202010403823.4 《一种基于自适应采样的弱监督时序动作检测方法及系统》
Ya Zhang, Chen Ju, Yanfeng Wang. - CN202111190861.7 《一种单帧监督视频时序动作检测与分类方法及系统》
Ya Zhang, Chen Ju, Peisen Zhao, Siheng Chen, Xiaoyun Zhang, Yanfeng Wang. - CN202211056034.3 《弱监督视频时序动作检测与分类方法及系统》
Ya Zhang, Chen Ju, Kunhao Zheng, Jinxiang Liu, Weidi Xie, Yanfeng Wang. - CN202211581256.7 《局部监督长视频时序文本检索方法及系统》
Ya Zhang, Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Yanfeng Wang. - CN202310913202.4 《基于属性分解-聚合的开放词汇语义分割方法及系统》
Yanfeng Wang, Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya Zhang. - CN202410913202.4 《一种基于稀疏关系对齐的可自由控制的试衣方法》
Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao. - CN202410913202.4 《一种带噪多模态开放词汇视觉样本分类方法及系统》
Xiaoyun Zhang, Haozhe Cheng, Chen Ju, Qiang Hu, Yanfeng Wang.
📖 Educations
- 2018 - 2024, PhD candidate, Shanghai Jiao Tong University, Shanghai, China
- 2018, Exchange Student, University of Amsterdam, Netherlands
- 2018, Exchange Student, KU Leuven, Belgium
- 2014 - 2018, Undergraduate, University of Electronic Science and Technology of China, Chengdu, China
🎖 Honors and Awards
- [2024] Top Talent Program by Technology Companies
- [2023] First Prize of Shanghai Technology Invention Award
- [2022] CMIC Outstanding Scholarship at SJTU (Top 1%)
- [2021] CMIC Outstanding Scholarship at SJTU (Top 1%)
- [2020] CMIC Outstanding Scholarship at SJTU (Top 1%)
- [2018] Outstanding Graduates of Sichuan Province (Top 1%)
- [2018] Outstanding Graduates of UESTC (Top 1%)
- [2017] First Prize in National Undergraduate Mathematical Modeling
- [2017] Undergraduate National Scholarship at UESTC (Top 1%)
- [2016] Undergraduate National Scholarship at UESTC (Top 1%)
- [2015] Undergraduate National Scholarship at UESTC (Top 1%)