Xiang An

Xiang An (Chinese: 安翔) is a research scientist working on computer vision and multimodal large models.

安翔，研究科学家，专注于计算机视觉与多模态大模型。

He has 17 main-conference papers across ICCV (4), CVPR (3), AAAI (3), EMNLP (2), ECCV (2), ICLR (1), NeurIPS (1), and ACM MM (1).

目前共有 17 篇顶会主会论文：ICCV 4 篇、CVPR 3 篇、AAAI 3 篇、EMNLP 2 篇、ECCV 2 篇，ICLR 1 篇、NeurIPS 1 篇、ACM MM 1 篇。

His research spans three directions:

他的研究主要涵盖三个方向：

Distributed ML — sparse algorithms for large-scale classification; one machine handles 100M-class comparisons. Partial FC.
Vision Encoders — next-generation ViT for modern MLLMs. OneVision-Encoder, RiceViT.
Multimodal LLMs — fully-open multimodal training frameworks. LLaVA-OneVision-1.5, LLaVA-OneVision-2.

分布式机器学习 — 稀疏的分布式大规模分类与对比学习算法，一台机器搞定 1 亿规模的比对。Partial FC。
视觉编码器 — 面向现代 MLLM 的下一代 ViT。OneVision-Encoder、RiceViT。
多模态大模型 — 完全开源的多模态训练框架。LLaVA-OneVision-1.5、LLaVA-OneVision-2。

For a complete list of publications, see All Publications.

完整论文列表请参见所有发表论文。

Publications §

发表论文 §

The following is a selection of notable publications. For a complete list, see All Publications.

以下为代表性论文精选。完整列表请参见所有发表论文。

Awards & Competitions §

荣誉与竞赛 §

ICCV 2025 Outstanding Reviewer
CVPR 2024 Outstanding Reviewer
Ranked 1st in NIST FRVT Competition, Visa Track 1:1
2024 中国年度力量人物提名
Ranked 1st in the graduate entrance examination (major)
First Place in Vehicle Re-Identification, PRCV 2019

ICCV 2025 杰出审稿人
CVPR 2024 杰出审稿人
NIST FRVT 竞赛 Visa Track 1:1 第一名
2024 中国年度力量人物提名
研究生入学考试（专业课）第一名
PRCV 2019 车辆重识别第一名

Open Source §

开源项目 §

InsightFace

Open Source Library

#2 contributor to the open-source 2D & 3D deep face analysis library. Author of Glint360K (the largest open-source face recognition training dataset) and Partial FC (enabling training 10 million identities on a single machine). Also organized the ICCV 2021 Workshop on masked face recognition challenge.

开源2D/3D深度人脸分析库的第二贡献者。Glint360K（最大开源人脸识别训练数据集）和Partial FC（实现单机训练千万级身份）的作者。还组织了ICCV 2021口罩人脸识别挑战赛Workshop。
LLaVA-OneVision-1.5

Multimodal LLM Framework

Team Leader of this fully open framework designed to democratize multimodal training. Released mid-training and instruct data for community use, and developed offline sampling pack for efficient training. Implemented RiceViT with native resolution support.

该完全开放框架的团队负责人，旨在推动多模态训练的民主化。向社区发布了中期训练数据和指令数据，并开发了离线采样包以提高训练效率。实现了支持原生分辨率的RiceViT。
OneVision-Encoder

Vision Encoder

Project leader of this next-generation vision encoder that introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Achieves state-of-the-art performance on 16 image, video, and document understanding benchmarks while using substantially fewer visual tokens. Demonstrates 4.1% average improvement over Qwen3-ViT on video understanding tasks.

下一代视觉编码器的项目负责人，提出编解码器对齐稀疏性作为多模态智能的基础原则。在 16 个图像、视频和文档理解基准上取得最先进性能，同时显著减少视觉 token 数量。在视频理解任务上比 Qwen3-ViT 平均提升 4.1%。
UNICOM

Image Retrieval Framework

Lead author and maintainer of Universal and Compact Representation Learning framework for universal image representations. Designed the novel cluster discrimination approach for representation learning. Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025 (Highlight)).

通用紧凑表征学习框架的项目负责人和主要作者，用于通用图像表征。设计了新颖的聚类判别方法用于表征学习。开发了多标签和区域级扩展（分别发表于ECCV 2024和ICCV 2025 (Highlight)）。
LLaVA-NeXT

Large Multimodal Model

Vision module contributor to the next-generation large multimodal model. Enhanced the OCR capability of the vision module for better text recognition in images. Optimized the visual encoder for processing text-rich and document images.

下一代大型多模态模型的视觉模块贡献者。增强了视觉模块的OCR能力以改善图像中的文字识别。优化了视觉编码器以处理富文本和文档图像。
Urban Seg

Educational Project

Author and maintainer of this educational project for semantic segmentation on remote sensing and satellite imagery. Designed a simple single-file training approach for accessibility and integrated popular pretrained models. Created comprehensive tutorials and documentation for beginners.

该教育项目的作者和维护者，用于遥感和卫星图像的语义分割。设计了简洁的单文件训练方法以提高可用性，并集成了流行的预训练模型。为初学者编写了全面的教程和文档。

Citation Map §

引用地图 §

City-level citing-author locations generated offline from Semantic Scholar + OpenAlex.

引用作者的城市级地理分布，通过 Semantic Scholar + OpenAlex 离线生成。

This page is styled after Wikipedia.

本页面样式参考自维基百科。