Developed by TAS Lab, The Hong Kong Polytechnic University
A modern browser-based webcam application that integrates real-time AI-powered face detection, face recognition, and object detection capabilities — all running entirely in the browser with zero server-side processing.
🔗 Live Demo: https://weisongwen.github.io/iotProjectCam
📰 Project Post: https://weisongwen.github.io/talks/2026-02-18-iot-cam-project
The rapid proliferation of Internet of Things (IoT) devices and edge computing has created new opportunities for deploying artificial intelligence (AI) capabilities directly on end-user devices without reliance on cloud-based infrastructure. This project presents IoT Cam, a browser-based intelligent webcam application that integrates real-time face detection, face recognition, object detection, and facial attribute analysis — all executed entirely on-device through modern web technologies. Unlike conventional AI-powered surveillance systems that require dedicated GPU servers or cloud API calls, IoT Cam leverages TensorFlow.js and client-side inference to deliver a fully functional computer vision pipeline within a standard web browser, ensuring zero-latency processing and complete data privacy.
The system architecture comprises three core AI modules. First, a face detection and analysis module built on the @vladmandic/face-api.js library employs a TinyFaceDetector for real-time face localization, a 68-point facial landmark model for geometric feature extraction, and dedicated neural networks for expression classification (seven categories), age regression, and gender estimation. Second, an in-browser face recognition module enables users to register known individuals by capturing face descriptors (128-dimensional feature vectors) and performing Euclidean distance-based matching against a locally stored database, with configurable similarity thresholds to balance precision and recall. Third, a COCO-SSD object detection module based on MobileNet v2 provides real-time classification across 80 object categories with adjustable confidence thresholds and YOLO-style visual overlays.
IoT Cam is implemented using pure HTML, CSS, and JavaScript — requiring no frameworks, build tools, or server-side components. The application utilizes the browser MediaDevices API for camera access and the MediaRecorder API for video recording, supporting resolutions from 480p to 4K. A comprehensive detection output dashboard provides session-level analytics including cumulative detection counts, average inference times, object class distributions, and facial attribute histograms, with CSV export functionality for offline analysis. Additional features include real-time image adjustment filters, visual presets, snapshot capture with a lightbox gallery, and keyboard shortcuts for efficient operation.
Experimental evaluation demonstrates that the system achieves stable real-time performance at approximately 48 frames per second for video rendering with detection inference cycles of approximately 500 milliseconds on consumer-grade hardware. The privacy-by-design architecture ensures that no images, video frames, or biometric descriptors are transmitted to external servers, with all persistent data stored exclusively in the browser's localStorage. IoT Cam serves as both a practical prototype for privacy-preserving IoT surveillance and an educational tool for courses in computer vision, embedded AI, and IoT systems at The Hong Kong Polytechnic University.
Keywords: IoT, edge AI, face detection, face recognition, object detection, TensorFlow.js, browser-based inference, privacy-preserving AI, computer vision
The figure below illustrates the system architecture of IoT Cam. The application runs entirely client-side within a standard web browser, receiving real-time video streams via the MediaDevices API. Input frames are processed by three core AI modules: (1) a Face Detection & Analysis Module powered by @vladmandic/face-api.js for localization, landmark extraction, expression classification, age regression, and gender estimation; (2) a Face Recognition Module that extracts 128-dimensional descriptors and performs Euclidean distance matching against a locally stored database; and (3) an Object Detection Module using COCO-SSD (MobileNet v2) for 80-category real-time classification. All detection results are rendered as real-time overlays and fed into an analytics dashboard with cumulative statistics, histograms, and CSV export. The entire pipeline adheres to a privacy-by-design principle — zero data is transmitted to cloud or external servers.
-
Real-Time Face Detection & Recognition — Powered by @vladmandic/face-api.js (a maintained fork compatible with TensorFlow.js 3.x), the system detects faces in real time, identifies facial landmarks (68-point model), recognizes expressions (happy, sad, neutral, angry, surprised, etc.), and estimates age and gender.
-
Face Registration & Identification — Users can register known faces by name directly in the browser. The system then identifies registered individuals in real time, displaying match confidence and highlighting unregistered persons with visual warnings. All face descriptor data is stored locally in the browser — nothing is sent to any server. Supports export/import of registered faces as JSON files for backup and transfer.
-
Object Detection (COCO-SSD / YOLO-style) — Using TensorFlow.js and the COCO-SSD model (MobileNet v2), the application detects and classifies 80 object categories in real time, with adjustable confidence thresholds and color-coded bounding boxes with corner accents.
-
Detection Output Dashboard — A comprehensive analytics panel provides session statistics (total faces/objects detected, average inference time, frames analyzed), object class breakdowns, face detail breakdowns (gender distribution, average age, expression histogram), and a real-time scrolling detection log with export-to-CSV functionality.
-
Camera Controls & Image Processing — Supports multiple camera sources, resolution selection (480p to 4K), real-time image adjustments (brightness, contrast, saturation, hue), visual presets (Night Vision, Grayscale, Sepia, High Contrast, etc.), and mirror/flip transforms.
-
Snapshot & Video Recording — Take snapshots with a built-in lightbox gallery, or record video clips (WebM format) for later review.
-
Privacy by Design — All AI inference runs on-device using TensorFlow.js. No images, video frames, or face descriptors are ever transmitted to external servers. Face registration data persists only in the browser's localStorage.
The screenshot below shows the IoT Cam system in action, demonstrating simultaneous person detection (81% confidence), face recognition (75% match for a registered user "weisong"), age/gender estimation (♂ 20y), and expression analysis (Neutral 100%), all running at 48 FPS in the browser.
The application is built with pure web technologies — no frameworks or build tools required:
| Technology | Purpose |
|---|---|
| TensorFlow.js 3.x | On-device AI inference |
| COCO-SSD 2.2.3 | 80-class object detection (MobileNet v2) |
| @vladmandic/face-api 1.7.14 | Face detection, landmarks, expressions, age/gender, recognition |
| MediaDevices API | Camera access (getUserMedia) |
| MediaRecorder API | Video recording |
| HTML / CSS / JavaScript | Pure web — no frameworks or build tools |
- Open
index.htmlin any modern browser (Chrome, Edge, Firefox, Safari) - Click Start Camera and allow camera permissions when prompted
- Use the sidebar controls to adjust the image and enable AI detection
- Register faces by entering a name and clicking 📸 Register Face
Note: A webcam or camera device is required. HTTPS or localhost is needed for camera access in most browsers.
| Key | Action |
|---|---|
Space |
Start / Stop camera |
S |
Take snapshot |
R |
Start / Stop recording |
F |
Toggle fullscreen |
🔒 All data stays in your browser. Nothing is ever sent to any server.
| Data | Stored where | Sent externally? |
|---|---|---|
| Live video stream | Browser memory only | ❌ Never |
| Face descriptors | Browser localStorage |
❌ Never |
| Detection results | Browser canvas & DOM | ❌ Never |
| Exported JSON files | Your local disk | ❌ Never |
| AI model weights | Downloaded once from CDN | ✅ Model files only (no user data) |
This project demonstrates the feasibility of deploying sophisticated AI perception pipelines entirely within a web browser, eliminating the need for dedicated GPU servers or cloud-based inference. It serves as a prototype for privacy-preserving IoT surveillance systems and a teaching tool for courses related to computer vision, IoT, and embedded AI at PolyU.
The IoT Cam project was developed at the TAS LAB under the supervision of Dr. Weisong Wen, supporting the lab's mission of building trustworthy and accessible AI systems for autonomous applications.
© 2026 TAS Lab, The Hong Kong Polytechnic University.

