基于OpenCV & face-recognition 的实时人脸识别与身份验证

gubai

2024-12-04

基于 OpenCV & face-recognition 的实时人脸识别与身份验证

项目地址：github

一、系统需求设计

1. 系统概述

本人工智能人脸识别系统旨在实现对人脸图像或视频流中的人脸进行精准识别、验证与分析，可以应用于安防监控、门禁系统、人员考勤、身份认证等多个领域，以提高安全性与管理效率。

2. 功能概述

1). 人脸录入与图像编码

2). 人脸识别与身份验证

3). web端实现上述功能

二、系统功能模块介绍

核心环境配置：

模块	版本
python	3.12.4
opencv-python	4.10.0.84
face-recognition	1.3.0
Pillow	10.0.0
numpy	2.1.2
Flask (web需要)	3.1.0

项目架构如下：

Face_recognition_local
├─ camera.py
├─ fps.py
├─ haarcascade_frontalface_default.xml
├─ trained_model.pkl
├─ training.py
├─ main.py
└─ person

Face_recognition_web
├─ camera.py
├─ fps.py
├─ haarcascade_frontalface_default.xml
├─ trained_model.pkl
├─ training.py
├─ person
├─ app.py
└─ templates
   └─ index.html

camera.py：优化后的opencv视频捕获

fps.py：用于展示当前视频画面fps

haarcascade_frontalface_default.xml：opencv提供的预训练集，用于捕获人脸

trained_model.pkl：录入的人脸图片编码后的文件

training.py：用于将jpg图片编码为.pkl

person：存放人脸图片

main.py：主程序

app.py：web后端

templates：web前端文件夹

index.html：web前端

三、系统实现

系统核心流程图

workline

训练模型——training.py

import os
import face_recognition
import pickle

person_encodings = []
person_names = []
for filename in os.listdir('person'):
    if filename.endswith('.jpg'):
        # 编码图像
        image = face_recognition.load_image_file(os.path.join('person', filename))
        encodings = face_recognition.face_encodings(image)
        if encodings:
            encoding = encodings[0]
            person_encodings.append(encoding)
            # 文件名处理
            person_name = ''.join([i for i in os.path.splitext(filename)[0] if not i.isdigit()])
            person_names.append(person_name)

# 输出
with open('trained_model.pkl', 'wb') as f:
    pickle.dump((person_encodings, person_names), f)

分段解析：

读取person文件夹中.jpg文件，其中文件名格式为 姓名+编号.jpg ，如 张三1.jpg

# 读取person文件夹中的图像和姓名
person_encodings = []
person_names = []
for filename in os.listdir('person'):
    if filename.endswith('.jpg'):

使用face-recognition编码图片

# 使用face-recognition加载图像并进行编码，并处理文件名中的编号
        image = face_recognition.load_image_file(os.path.join('person', filename))
        encodings = face_recognition.face_encodings(image)
        if encodings:
            encoding = encodings[0]
            person_encodings.append(encoding)
            # 去掉文件名中的编号，只保留姓名部分
            person_name = ''.join([i for i in os.path.splitext(filename)[0] if not i.isdigit()])
            person_names.append(person_name)

在person文件夹中放入目标人脸的jpg图片后运行脚本，得到模型trained_model.pkl

经测试，每人9个图片识别准确率为 65％-90％

主流程

启动摄像头——camera.py

来源：基于cv2.VideoCapture 和 OpenCV 得到更快的 FPS之文件篇

主要优化函数：cv2.VideoCapture

.read 方法是一个阻塞操作，通过将这些阻塞 I/O 操作移至单独的线程并维护解码帧队列，我们实际上可以将 FPS 处理速率提高 52% 以上

from threading import Thread, Lock
from datetime import datetime
import time
import cv2

time_cycle = 80

class CameraThread(Thread):
    def __init__(self, kill_event, src = 0, width = 320, height = 240):
        self.kill_event = kill_event
        
        self.stream = cv2.VideoCapture(src)
        self.stream.set(cv2.CAP_PROP_FRAME_WIDTH, width)
        self.stream.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

        (self.grabbed, self.frame) = self.stream.read()
        self.read_lock = Lock()

        Thread.__init__(self, args = kill_event)

    def update(self):
        (grabbed, frame) = self.stream.read()
        self.read_lock.acquire()
        self.grabbed, self.frame = grabbed, frame
        self.read_lock.release()

    def read(self):
        self.read_lock.acquire()
        frame = self.frame.copy()
        self.read_lock.release()
        return frame

    def run(self):
        while not self.kill_event.is_set():
            start_time = datetime.now()
            self.update()

            finish_time = datetime.now()
            dt = finish_time - start_time
            ms = (dt.days * 24 * 60 * 60 + dt.seconds) * 1000 + dt.microseconds / 1000.0
            if ms < time_cycle:
                time.sleep((time_cycle - ms) / 1000.0)

CameraThread 类继承自 Thread 类，可以在单独的线程中捕获摄像头图像

初始化方法 __init__：

def __init__(self, kill_event, src=0, width=320, height=240):
    self.kill_event = kill_event
    
    self.stream = cv2.VideoCapture(src)
    self.stream.set(cv2.CAP_PROP_FRAME_WIDTH, width)
    self.stream.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

    (self.grabbed, self.frame) = self.stream.read()
    self.read_lock = Lock()

    Thread.__init__(self, args=(kill_event,))

kill_event：用于停止线程的事件对象。
src：摄像头索引，默认值为 0。
width 和 height：视频帧的宽度和高度。
self.stream：创建一个视频捕获对象。
self.stream.set：设置视频帧的宽度和高度。
self.grabbed 和 self.frame：读取第一帧图像。
self.read_lock：创建一个锁对象，用于线程同步。

更新方法 update：

def update(self):
    (grabbed, frame) = self.stream.read()
    self.read_lock.acquire()
    self.grabbed, self.frame = grabbed, frame
    self.read_lock.release()

self.stream.read()：读取一帧图像。
self.read_lock.acquire() 和 self.read_lock.release()：在更新 self.grabbed和 self.frame 时加锁和解锁，以确保线程安全。

读取方法 read ：

def read(self):
    self.read_lock.acquire()
    frame = self.frame.copy()
    self.read_lock.release()
    return frame

self.read_lock.acquire()和 self.read_lock.release()：在读取 self.frame 时加锁和解锁，以确保线程安全。
self.frame.copy()：返回当前帧的副本。

运行方法 run ：

def run(self):
    while not self.kill_event.is_set():
        start_time = datetime.now()
        self.update()

        finish_time = datetime.now()
        dt = finish_time - start_time
        ms = (dt.days * 24 * 60 * 60 + dt.seconds) * 1000 + dt.microseconds / 1000.0
        if ms < time_cycle:
            time.sleep((time_cycle - ms) / 1000.0)

while not self.kill_event.is_set()：循环运行，直到 kill_event被设置。
start_time 和 finish_time：记录每次循环的开始和结束时间。
dt：计算每次循环的时间差。
ms：将时间差转换为毫秒。
time.sleep((time_cycle - ms) / 1000.0)：如果循环时间小于 time_cycle，则延时以控制帧率。

fps计算——fps.py

import time

class FPS:
    def __init__(self):
        self.prev_time = time.time()
        self.fps = 0

    def update(self):
        current_time = time.time()
        self.fps = 1 / (current_time - self.prev_time)
        self.prev_time = current_time
        return self.fps

获取当前时间戳 current_time。
计算当前帧率 self.fps

$$
fps = {1\over(current_time-self.prev_time)}
$$

更新 self.prev_time 为当前时间戳 current_time。
返回计算得到的帧率 self.fps。

人脸检测——main.py

import cv2
import tkinter as tk
from PIL import Image, ImageTk, ImageDraw
import numpy as np
from PIL import ImageFont
from threading import Event
from camera import CameraThread
import face_recognition
import pickle
from fps import FPS

choose_camera = 0 # 选择摄像头，0为内置摄像头，1为外置摄像头
min_matching_degree = 0.65 # 最小匹配度

# --------------------输出文本--------------------

def cv2AddChineseText(img, text, position, textColor=(0, 255, 0), textSize=30):
    if (isinstance(img, np.ndarray)): 
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    
    draw = ImageDraw.Draw(img)
    
    fontStyle = ImageFont.truetype("simsun.ttc", textSize, encoding="utf-8")
    
    draw.text(position, text, textColor, font=fontStyle)
    
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)


# --------------------加载模型，初始化摄像头，初始化窗口--------------------


# 加载opencv Haar Cascade分类器
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

# GUI窗口
root = tk.Tk()
root.geometry('640x480')
root.title('人脸识别')

# 创建标签用于显示图像
image_label = tk.Label(root)
image_label.pack()

# 创建 PhotoImage 对象
photo = None

# 加载模型
with open('trained_model.pkl', 'rb') as f:
    person_encodings, person_names = pickle.load(f)

# 事件对象用于停止线程
kill_event = Event()

# 启动摄像头
camera_thread = CameraThread(kill_event, src=choose_camera, width=640, height=480)
camera_thread.start()

# 初始化FPS计算
fps_calculator = FPS()

# --------------------打开摄像头，开始检测--------------------


# 处理捕获的图像
def update_frame():
    global photo
    frame = camera_thread.read()
    frame = cv2.flip(frame, 1)
    
    # 转换图像格式
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # 检测人脸
    face_locations = face_recognition.face_locations(rgb_frame)
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

    # 在图像中框出检测到的人脸
    for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
        # 检查人脸是否属于模型中的某个人
        matches = face_recognition.compare_faces(person_encodings, face_encoding) # 比较人脸编码
        face_distances = face_recognition.face_distance(person_encodings, face_encoding) # 计算距离
        best_match_index = np.argmin(face_distances) # 找到最小距离的索引
        name = "Unknown" # 默认为未知人脸
        matching_degree = 1 - face_distances[best_match_index] # 计算准确率
             
        if matches[best_match_index] and matching_degree > min_matching_degree:
            name = person_names[best_match_index]

        # 在图像中框出人脸并显示姓名
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 255), 2)
        frame = cv2AddChineseText(frame, name, (left + (right-left)//2 - 10, top - 30), (0, 255, 255), 30)

    # 计算并显示帧率
    fps = fps_calculator.update()
    cv2.putText(frame, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    # 将图像转换为PIL Image格式
    image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    photo = ImageTk.PhotoImage(image)

    # 更新标签
    image_label.configure(image=photo)
    image_label.image = photo

    # 处理GUI事件，避免程序挂起
    root.after(10, update_frame)

# 图像更新循环
update_frame()

# 关闭程序
def on_closing():
    # 停止摄像头线程
    kill_event.set()
    camera_thread.join()
    # 释放摄像头并关闭所有窗口
    camera_thread.stream.release()
    cv2.destroyAllWindows()
    # 关闭Tkinter窗口
    root.destroy()
root.protocol("WM_DELETE_WINDOW", on_closing)
root.mainloop()

web实现——app.py & index.html

app.py

app.py是main.py使用Flask框架后的后端服务

from flask import Flask, render_template, Response
import cv2
import face_recognition
import pickle
import numpy as np
from fps import FPS
from PIL import Image, ImageDraw, ImageFont

app = Flask(__name__)

choose_camera = 1  # 选择摄像头，0为内置摄像头，1为外置摄像头
min_matching_degree = 0.65  # 最小匹配度

# 加载训练好的模型
with open('trained_model.pkl', 'rb') as f:
    person_encodings, person_names = pickle.load(f)

# 初始化摄像头
cap = cv2.VideoCapture(choose_camera)

# 初始化FPS计算
fps_calculator = FPS()

def cv2AddChineseText(img, text, position, textColor=(0, 255, 0), textSize=30):
    if (isinstance(img, np.ndarray)): 
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    
    draw = ImageDraw.Draw(img)
    
    fontStyle = ImageFont.truetype("simsun.ttc", textSize, encoding="utf-8")
    
    draw.text(position, text, textColor, font=fontStyle)
    
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

def generate_frames():
    while True:
        success, frame = cap.read()
        if not success:
            break
        else:
            frame = cv2.flip(frame, 1)
            rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

            # 使用face_recognition库检测人脸
            face_locations = face_recognition.face_locations(rgb_frame)
            face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

            # 在图像中框出检测到的人脸
            for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
                matches = face_recognition.compare_faces(person_encodings, face_encoding)
                face_distances = face_recognition.face_distance(person_encodings, face_encoding)
                best_match_index = np.argmin(face_distances)
                name = "Unknown"
                matching_degree = 1 - face_distances[best_match_index]

                if matches[best_match_index] and matching_degree > min_matching_degree:
                    name = person_names[best_match_index]

                cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 255), 2)
                frame = cv2AddChineseText(frame, name, (left + (right-left)//2 - 10, top - 30), (0, 255, 255), 30)

            # 计算并显示帧率
            fps = fps_calculator.update()
            cv2.putText(frame, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

            ret, buffer = cv2.imencode('.jpg', frame)
            frame = buffer.tobytes()

            yield (b'--frame\r\n'
                   b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/video_feed')
def video_feed():
    return Response(generate_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')

if __name__ == '__main__':
    app.run(debug=True)

前端页面index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>人脸识别</title>
</head>
<body>
    <h1>人脸识别</h1>
    <img src="{{ url_for('video_feed') }}" width="640" height="480">
</body>
</html>

四、系统效果

预先使用本人9张照片进行训练

CPU	i9-14900HX
内存	32GB
显卡	RTX4060

test

平均帧率：2.6

平均准确率：70％

match

五、改进历程与不足

在刚开始制作时，并没有使用face-recognition训练模型进行人脸验证，采取的验证手段的为读取文件夹内所有图片，寻找与捕获人脸匹配度更高的图片。发现每次验证都要遍历文件夹内所有图片，严重影响系统性能（以至于卡死），于是对代码进行重构，使用face-recognition训练模型，将捕获的人脸与模型对比，大大缓解了性能问题。同时，由于cv2.VideoCapture的read方法阻塞，性能仍然不佳，搜集资料后采用一位博主的方法，将这些阻塞 I/O 操作移至单独的线程。

虽然系统已经能正常运行，但帧率仍然很低，还需要进一步优化。除此之外，目前系统使用的opencv官方提供的分类器 haarcascade_frontalface_default.xml 对于捕获正脸方面较优，但捕获其他方向和复杂表情方面效果很差。需要训练一个新的分类器来适应更复杂的环境。

六、后续优化

根据实际情况来看，并不需要每一帧都检测人脸，因此可以通过增加检测人脸间隔来提升流畅度。

大致思路：

detection_interval = 5 //间隔帧率
frame_count = 0 //计算帧率

def detect_faces(self):
    while not self.kill_event.is_set():
        if self.frame_count % self.detection_interval == 0:
            frame = self.camera_thread.read()
            ...
        self.frame_count += 1

此外，还可以将人脸检测放到独立的线程中，避免阻塞线程

结合以上两点，创建新的python文件 FaceDetector.py ，主程序通过在新的线程调用人脸检测来缓解卡顿

# FaceDetect.py
import threading
import cv2
import face_recognition

class FaceDetector:
    def __init__(self, camera_thread, detection_interval):
        self.camera_thread = camera_thread
        self.detection_interval = detection_interval
        self.frame_count = 0
        self.face_locations = []
        self.face_encodings = []
        self.kill_event = threading.Event()
        self.lock = threading.Lock()  # 添加锁
        self.detection_thread = threading.Thread(target=self.detect_faces)
        self.detection_thread.start()

    def detect_faces(self):
        while not self.kill_event.is_set():
            if self.frame_count % self.detection_interval == 0:
                frame = self.camera_thread.read()
                rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                face_locations = face_recognition.face_locations(rgb_frame)
                face_encodings = face_recognition.face_encodings(rgb_frame, self.face_locations)
                
                with self.lock:  # 使用锁保护共享数据
                    self.face_locations = face_locations
                    self.face_encodings = face_encodings
            self.frame_count += 1

    def stop(self):
        self.kill_event.set()
        self.detection_thread.join()

    def get_faces(self):
        with self.lock:  # 使用锁保护共享数据
            return self.face_locations, self.face_encodings

修改main.py

#main.py 

detection_interval = 20 # 人脸检测间隔
# 添加初始化人脸检测
face_detector = FaceDetector(camera_thread, detection_interval)

def update_frame():
    '''替换
    # 转换图像格式
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # 检测人脸
    face_locations = face_recognition.face_locations(rgb_frame)
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)
    '''
    # 获取人脸位置和编码
    face_locations, face_encodings = face_detector.get_faces()

这样修改后帧率大大提高，但是出现了框选人脸位置错误的问题，如下图：

人脸位置和框选位置刚好对称

原因是main.py中通过 frame = cv2.flip(frame, 1) 进行了镜像处理，FaceDetect.py 中调用 camera.py 后没有镜像处理，返回的人脸位置是没有经过处理的人脸位置，所以可以在 FaceDetect.py 中添加 frame = cv2.flip(frame, 1) ，或者删除 main.py 中的镜像处理。

#添加镜像处理后的FaceDetect.py
import threading
import cv2
import face_recognition

class FaceDetector:
    def __init__(self, camera_thread, detection_interval):
        self.camera_thread = camera_thread
        self.detection_interval = detection_interval
        self.frame_count = 0
        self.face_locations = []
        self.face_encodings = []
        self.kill_event = threading.Event()
        self.lock = threading.Lock()  
        self.detection_thread = threading.Thread(target=self.detect_faces)
        self.detection_thread.start()

    def detect_faces(self):
        while not self.kill_event.is_set():
            if self.frame_count % self.detection_interval == 0:
                frame = self.camera_thread.read()
                frame = cv2.flip(frame, 1) # 镜像处理
                rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                face_locations = face_recognition.face_locations(rgb_frame)
                face_encodings = face_recognition.face_encodings(rgb_frame, self.face_locations)
                
                with self.lock:  
                    self.face_locations = face_locations
                    self.face_encodings = face_encodings
            self.frame_count += 1

    def stop(self):
        self.kill_event.set()
        self.detection_thread.join()

    def get_faces(self):
        with self.lock:  
            return self.face_locations, self.face_encodings