Xiang An

Xiang An (Chinese: 安翔) is a research scientist working on computer vision and multimodal large models.

安翔,研究科学家,专注于计算机视觉与多模态大模型。

He has 17 main-conference papers across ICCV (4), CVPR (3), AAAI (3), EMNLP (2), ECCV (2), ICLR (1), NeurIPS (1), and ACM MM (1).

目前共有 17 篇顶会主会论文:ICCV 4 篇CVPR 3 篇AAAI 3 篇EMNLP 2 篇ECCV 2 篇ICLR 1 篇NeurIPS 1 篇ACM MM 1 篇

His research spans three directions:

他的研究主要涵盖三个方向:

For a complete list of publications, see All Publications.

完整论文列表请参见所有发表论文

Publications §

发表论文 §

The following is a selection of notable publications. For a complete list, see All Publications.

以下为代表性论文精选。完整列表请参见所有发表论文

Awards & Competitions §

荣誉与竞赛 §

Open Source §

开源项目 §

  1. Open Source Library
    #2 contributor to the open-source 2D & 3D deep face analysis library. Author of Glint360K (the largest open-source face recognition training dataset) and Partial FC (enabling training 10 million identities on a single machine). Also organized the ICCV 2021 Workshop on masked face recognition challenge.
    开源2D/3D深度人脸分析库的第二贡献者。Glint360K(最大开源人脸识别训练数据集)和Partial FC(实现单机训练千万级身份)的作者。还组织了ICCV 2021口罩人脸识别挑战赛Workshop。
  2. Multimodal LLM Framework
    Team Leader of this fully open framework designed to democratize multimodal training. Released mid-training and instruct data for community use, and developed offline sampling pack for efficient training. Implemented RiceViT with native resolution support.
    该完全开放框架的团队负责人,旨在推动多模态训练的民主化。向社区发布了中期训练数据和指令数据,并开发了离线采样包以提高训练效率。实现了支持原生分辨率的RiceViT。
  3. Vision Encoder
    Project leader of this next-generation vision encoder that introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Achieves state-of-the-art performance on 16 image, video, and document understanding benchmarks while using substantially fewer visual tokens. Demonstrates 4.1% average improvement over Qwen3-ViT on video understanding tasks.
    下一代视觉编码器的项目负责人,提出编解码器对齐稀疏性作为多模态智能的基础原则。在 16 个图像、视频和文档理解基准上取得最先进性能,同时显著减少视觉 token 数量。在视频理解任务上比 Qwen3-ViT 平均提升 4.1%。
  4. Image Retrieval Framework
    Lead author and maintainer of Universal and Compact Representation Learning framework for universal image representations. Designed the novel cluster discrimination approach for representation learning. Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025 (Highlight)).
    通用紧凑表征学习框架的项目负责人和主要作者,用于通用图像表征。设计了新颖的聚类判别方法用于表征学习。开发了多标签和区域级扩展(分别发表于ECCV 2024和ICCV 2025 (Highlight))。
  5. Large Multimodal Model
    Vision module contributor to the next-generation large multimodal model. Enhanced the OCR capability of the vision module for better text recognition in images. Optimized the visual encoder for processing text-rich and document images.
    下一代大型多模态模型的视觉模块贡献者。增强了视觉模块的OCR能力以改善图像中的文字识别。优化了视觉编码器以处理富文本和文档图像。
  6. Educational Project
    Author and maintainer of this educational project for semantic segmentation on remote sensing and satellite imagery. Designed a simple single-file training approach for accessibility and integrated popular pretrained models. Created comprehensive tutorials and documentation for beginners.
    该教育项目的作者和维护者,用于遥感和卫星图像的语义分割。设计了简洁的单文件训练方法以提高可用性,并集成了流行的预训练模型。为初学者编写了全面的教程和文档。

Citation Map §

引用地图 §

City-level citing-author locations generated offline from Semantic Scholar + OpenAlex.

引用作者的城市级地理分布,通过 Semantic Scholar + OpenAlex 离线生成。


This page is styled after Wikipedia.

本页面样式参考自维基百科