About

I am Xiqian Yu (余茜倩), a Research and Development Engineer at the Embodied AI Center, Shanghai AI Laboratory, working with Dr. Tai Wang. My research interests lie in embodied AI, vision-language-action models, and large-scale multimodal learning for embodied agents.

My recent research and work centers on embodied foundation models, spanning both navigation and manipulation. In terms of tasks and applications, I am particularly interested in streaming vision-language navigation and dual-system cooperation for generalizable agents. On the training and infrastructure side, my focus is on co-training with heterogeneous robotic and multimodal data, large-scale data processing pipelines, and distributed training systems. Moving forward, I am highly motivated by how data, training infrastructure, and model architecture can be jointly optimized to drive the scalability and generalization of embodied foundation models.

Selected Publications

Preview image for Ground Slow, Move Fast

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

ICLR 2026

Meng Wei, Chenyang Wan, Jiaqi Peng, Xiqian Yu, Yuqiang Yang, Delin Feng, Wenzhe Cai, Chenming Zhu, Tai Wang, Jiangmiao Pang, Xihui Liu.

Preview image for InternVLA-N1

InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans

InternVLA-N1 Team

Preview image for StreamVLN

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

ICRA 2026

Meng Wei*, Chenyang Wan*, Xiqian Yu*, Tai Wang*, Yuqiang Yang, Xiaohan Mao, Chenming Zhu, Wenzhe Cai, Hanqing Wang, Yilun Chen, Xihui Liu, Jiangmiao Pang.

Preview image for NaVid-4D

NaVid-4D: Unleashing Spatial Intelligence in Egocentric RGB-D Videos for Vision-and-Language Navigation

ICRA 2025

Haoran Liu*, Weikang Wan*, Xiqian Yu*, Minghan Li*, Jiazhao Zhang, Bo Zhao, Zhibo Chen, Zhongyuan Wang, Zhizheng Zhang, He Wang

Preview image for GaussianSR

GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

arXiv 2024

Xiqian Yu*, Hanxin Zhu*, Tianyu He, Zhibo Chen

Education

University of Science and Technology of China

Sep. 2022 - Jul. 2025

Master, Electronics and Communication Engineering

Supervisor: Prof. Zhibo Chen

Shandong University

Sep. 2016 - Jul. 2020

Bachelor, Electronic Engineer and Information Science

Internship

Galbot

Jan. 2024 - Jul. 2024

Intern, Algorithm Center

Vision Language Navigation

Skills

Programming & Frameworks

  • Languages & Core: Python, C/C++, CUDA
  • DL & Systems: PyTorch, Hugging Face (Accelerate, Transformers), DeepSpeed, Slurm Cluster
  • Distributed Training: Multi-node multi-GPU training, NCCL optimization

Multimodal & VLA Pre-training

  • Multimodal Pre-training: Vision-language pre-training and co-training, multimodal representation alignment
  • Data Infrastructure: Large-scale multimodal data processing, heterogeneous robot and multimodal dataset organization, dataset mixture design
  • Training Operations: Co-training over heterogeneous robot and multimodal datasets, high-throughput data loading, distributed batch scheduling, precision alignment

VLA Post-training & Alignment

  • Action Adaptation: Vision-language-action end-to-end fine-tuning, action expert integration
  • Planning & Evaluation: Long-horizon task planning, closed-loop evaluation
  • Model Development: VLA post-training, alignment, and training infrastructure for scalable embodied foundation models