This project, led by Prof. Shen Chunhua from CCST, was supported by the Major R&D program in 2023. Foundation models are artificial intelligence models trained on large-scale image datasets to perform various visual tasks. The project will mainly focuses on vision foundation models in the first phase, including Scaling up model size and training data and develop an efficient software-hardware cooperative training platform、developing efficient models for zero-shot deployment and enabling more efficient learning and better generalization、developing real-time and embedded vision to running efficiently on edge devices and in real-time, and researching multi-modal learning to enabling new applications.