Xiqian Yu | Homepage

About

I am Xiqian Yu (余茜倩), a Research and Development Engineer at the Embodied AI Center, Shanghai AI Laboratory, working with Dr. Tai Wang. My research interests lie in embodied AI, vision-language-action models, and large-scale multimodal learning for embodied agents.

My recent research and work centers on embodied foundation models, spanning both navigation and manipulation. In terms of tasks and applications, I am particularly interested in streaming vision-language navigation and dual-system cooperation for generalizable agents. On the training and infrastructure side, my focus is on co-training with heterogeneous robotic and multimodal data, large-scale data processing pipelines, and distributed training systems. Moving forward, I am highly motivated by how data, training infrastructure, and model architecture can be jointly optimized to drive the scalability and generalization of embodied foundation models.

Selected Publications

Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation

arXiv 2026

Jiaqi Peng^*, Xiqian Yu^*, Delin Feng^*, Yuqiang Yang, Wenzhe Cai, Jing Xiong, Ganlin Yang, Jinliang Zheng, Jiafei Cao, Xueyuan Wei, Jiangmiao Pang, Yuan Shen^†, Tai Wang^†.

Project Page Paper Code

Preview image for Ground Slow, Move Fast

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

ICLR 2026

Meng Wei, Chenyang Wan, Jiaqi Peng, Xiqian Yu, Yuqiang Yang, Delin Feng, Wenzhe Cai, Chenming Zhu, Tai Wang^†, Jiangmiao Pang^‡, Xihui Liu^‡.

Project Page Paper Code

InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans

InternVLA-N1 Team

Project Page Tech Report Code

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

ICRA 2026

Meng Wei^*, Chenyang Wan^*, Xiqian Yu^*, Tai Wang^*^†, Yuqiang Yang, Xiaohan Mao, Chenming Zhu, Wenzhe Cai, Hanqing Wang, Yilun Chen, Xihui Liu^‡, Jiangmiao Pang^‡.

Project Page Paper Code

NaVid-4D: Unleashing Spatial Intelligence in Egocentric RGB-D Videos for Vision-and-Language Navigation

ICRA 2025

Haoran Liu^*, Weikang Wan^*, Xiqian Yu^*, Minghan Li^*, Jiazhao Zhang, Bo Zhao, Zhibo Chen, Zhongyuan Wang, Zhizheng Zhang^‡, He Wang^‡

Project Page Paper

GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

arXiv 2024

Xiqian Yu^*, Hanxin Zhu^*, Tianyu He, Zhibo Chen^‡

Project Page Paper

Education

University of Science and Technology of China

Sep. 2022 - Jul. 2025

Master, Electronics and Communication Engineering

Supervisor: Prof. Zhibo Chen

Shandong University

Sep. 2016 - Jul. 2020

Bachelor, Electronic Engineer and Information Science

Internship

Shanghai AI Laboratory

Jan. 2025 - Jun. 2025

Intern, Embodied AI Center

Vision Language Action

Galbot

Jan. 2024 - Jul. 2024

Intern, Algorithm Center

Vision Language Navigation

Skills

Programming & Frameworks

Languages & Core: Python, C/C++, CUDA
DL & Systems: PyTorch, Hugging Face (Accelerate, Transformers), DeepSpeed, Slurm Cluster
Distributed Training: Multi-node multi-GPU training, NCCL optimization

Multimodal & VLA Pre-training

Multimodal Pre-training: Vision-language pre-training and co-training, multimodal representation alignment
Data Infrastructure: Large-scale multimodal data processing, heterogeneous robot and multimodal dataset organization, dataset mixture design
Training Operations: Co-training over heterogeneous robot and multimodal datasets, high-throughput data loading, distributed batch scheduling, precision alignment

VLA Post-training & Alignment

Action Adaptation: Vision-language-action end-to-end fine-tuning, action expert integration
Planning & Evaluation: Long-horizon task planning, closed-loop evaluation
Model Development: VLA post-training, alignment, and training infrastructure for scalable embodied foundation models