Face Recognition System Using CLIP and FAISS for Scalable and Real-Time Identification

Antonio Labinjan; Sandi Baressi Šegota; Ivan Lorencin; Nikola Tanković

doi:10.3390/mca31020036

Abstract

Face recognition is increasingly being adopted in industries such as education, security, and personalized services. This research introduces a face recognition system that leverages the embedding capabilities of the CLIP model. The model is trained on multimodal data, such as images and text and it generates high-dimensional features, which are then stored in a vector index for further queries. The system is designed to facilitate accurate real-time identification, with potential applications in areas such as attendance tracking and security screening. Specific use cases include event check-ins, implementation of advanced security systems, and more. The process involves encoding known faces into high-dimensional vectors, indexing them using a vector index FAISS, and comparing them to unknown images based on L2 (euclidean) distance. Experimental results demonstrate a high accuracy that exceeds 90% and prove efficient scalability and good performance efficiency even in datasets with a high volume of entries. Notably, the system exhibits superior computational efficiency compared to traditional deep convolutional neural networks (CNNs), significantly reducing CPU load and memory consumption while maintaining competitive inference speeds. In the first iteration of experiments, the system achieved over 90% accuracy on live video feeds where each identity had a single reference video for both training and validation; however, when tested on a more challenging dataset with many low-quality classes, accuracy dropped to approximately 73%, highlighting the impact of dataset quality and variability on performance.

Keywords:

attendance tracking; CLIP; embeddings; face recognition; FAISS; real-time identification; security screening; vector search

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.