IoT Cam — Browser-Based AI-Powered Webcam Viewer

Developed by TAS Lab, The Hong Kong Polytechnic University

A modern browser-based webcam application that integrates real-time AI-powered face detection, face recognition, and object detection capabilities — all running entirely in the browser with zero server-side processing.

🔗 Live Demo: https://weisongwen.github.io/iotProjectCam
📰 Project Post: https://weisongwen.github.io/talks/2026-02-18-iot-cam-project

Abstract

The rapid proliferation of Internet of Things (IoT) devices and edge computing has created new opportunities for deploying artificial intelligence (AI) capabilities directly on end-user devices without reliance on cloud-based infrastructure. This project presents IoT Cam, a browser-based intelligent webcam application that integrates real-time face detection, face recognition, object detection, and facial attribute analysis — all executed entirely on-device through modern web technologies. Unlike conventional AI-powered surveillance systems that require dedicated GPU servers or cloud API calls, IoT Cam leverages TensorFlow.js and client-side inference to deliver a fully functional computer vision pipeline within a standard web browser, ensuring zero-latency processing and complete data privacy.

The system architecture comprises three core AI modules. First, a face detection and analysis module built on the @vladmandic/face-api.js library employs a TinyFaceDetector for real-time face localization, a 68-point facial landmark model for geometric feature extraction, and dedicated neural networks for expression classification (seven categories), age regression, and gender estimation. Second, an in-browser face recognition module enables users to register known individuals by capturing face descriptors (128-dimensional feature vectors) and performing Euclidean distance-based matching against a locally stored database, with configurable similarity thresholds to balance precision and recall. Third, a COCO-SSD object detection module based on MobileNet v2 provides real-time classification across 80 object categories with adjustable confidence thresholds and YOLO-style visual overlays.

IoT Cam is implemented using pure HTML, CSS, and JavaScript — requiring no frameworks, build tools, or server-side components. The application utilizes the browser MediaDevices API for camera access and the MediaRecorder API for video recording, supporting resolutions from 480p to 4K. A comprehensive detection output dashboard provides session-level analytics including cumulative detection counts, average inference times, object class distributions, and facial attribute histograms, with CSV export functionality for offline analysis. Additional features include real-time image adjustment filters, visual presets, snapshot capture with a lightbox gallery, and keyboard shortcuts for efficient operation.

Experimental evaluation demonstrates that the system achieves stable real-time performance at approximately 48 frames per second for video rendering with detection inference cycles of approximately 500 milliseconds on consumer-grade hardware. The privacy-by-design architecture ensures that no images, video frames, or biometric descriptors are transmitted to external servers, with all persistent data stored exclusively in the browser's localStorage. IoT Cam serves as both a practical prototype for privacy-preserving IoT surveillance and an educational tool for courses in computer vision, embedded AI, and IoT systems at The Hong Kong Polytechnic University.

Keywords: IoT, edge AI, face detection, face recognition, object detection, TensorFlow.js, browser-based inference, privacy-preserving AI, computer vision

System Overview

The figure below illustrates the system architecture of IoT Cam. The application runs entirely client-side within a standard web browser, receiving real-time video streams via the MediaDevices API. Input frames are processed by three core AI modules: (1) a Face Detection & Analysis Module powered by @vladmandic/face-api.js for localization, landmark extraction, expression classification, age regression, and gender estimation; (2) a Face Recognition Module that extracts 128-dimensional descriptors and performs Euclidean distance matching against a locally stored database; and (3) an Object Detection Module using COCO-SSD (MobileNet v2) for 80-category real-time classification. All detection results are rendered as real-time overlays and fed into an analytics dashboard with cumulative statistics, histograms, and CSV export. The entire pipeline adheres to a privacy-by-design principle — zero data is transmitted to cloud or external servers.

Key Features

Real-Time Face Detection & Recognition — Powered by @vladmandic/face-api.js (a maintained fork compatible with TensorFlow.js 3.x), the system detects faces in real time, identifies facial landmarks (68-point model), recognizes expressions (happy, sad, neutral, angry, surprised, etc.), and estimates age and gender.
Face Registration & Identification — Users can register known faces by name directly in the browser. The system then identifies registered individuals in real time, displaying match confidence and highlighting unregistered persons with visual warnings. All face descriptor data is stored locally in the browser — nothing is sent to any server. Supports export/import of registered faces as JSON files for backup and transfer.
Object Detection (COCO-SSD / YOLO-style) — Using TensorFlow.js and the COCO-SSD model (MobileNet v2), the application detects and classifies 80 object categories in real time, with adjustable confidence thresholds and color-coded bounding boxes with corner accents.
Detection Output Dashboard — A comprehensive analytics panel provides session statistics (total faces/objects detected, average inference time, frames analyzed), object class breakdowns, face detail breakdowns (gender distribution, average age, expression histogram), and a real-time scrolling detection log with export-to-CSV functionality.
Camera Controls & Image Processing — Supports multiple camera sources, resolution selection (480p to 4K), real-time image adjustments (brightness, contrast, saturation, hue), visual presets (Night Vision, Grayscale, Sepia, High Contrast, etc.), and mirror/flip transforms.
Snapshot & Video Recording — Take snapshots with a built-in lightbox gallery, or record video clips (WebM format) for later review.
Privacy by Design — All AI inference runs on-device using TensorFlow.js. No images, video frames, or face descriptors are ever transmitted to external servers. Face registration data persists only in the browser's localStorage.

Results

The screenshot below shows the IoT Cam system in action, demonstrating simultaneous person detection (81% confidence), face recognition (75% match for a registered user "weisong"), age/gender estimation (♂ 20y), and expression analysis (Neutral 100%), all running at 48 FPS in the browser.

Technical Stack

The application is built with pure web technologies — no frameworks or build tools required:

Technology	Purpose
TensorFlow.js 3.x	On-device AI inference
COCO-SSD 2.2.3	80-class object detection (MobileNet v2)
@vladmandic/face-api 1.7.14	Face detection, landmarks, expressions, age/gender, recognition
MediaDevices API	Camera access (`getUserMedia`)
MediaRecorder API	Video recording
HTML / CSS / JavaScript	Pure web — no frameworks or build tools

Getting Started

Open index.html in any modern browser (Chrome, Edge, Firefox, Safari)
Click Start Camera and allow camera permissions when prompted
Use the sidebar controls to adjust the image and enable AI detection
Register faces by entering a name and clicking 📸 Register Face

Note: A webcam or camera device is required. HTTPS or localhost is needed for camera access in most browsers.

Keyboard Shortcuts

Key	Action
`Space`	Start / Stop camera
`S`	Take snapshot
`R`	Start / Stop recording
`F`	Toggle fullscreen

Privacy

🔒 All data stays in your browser. Nothing is ever sent to any server.

Data	Stored where	Sent externally?
Live video stream	Browser memory only	❌ Never
Face descriptors	Browser `localStorage`	❌ Never
Detection results	Browser canvas & DOM	❌ Never
Exported JSON files	Your local disk	❌ Never
AI model weights	Downloaded once from CDN	✅ Model files only (no user data)

Significance

This project demonstrates the feasibility of deploying sophisticated AI perception pipelines entirely within a web browser, eliminating the need for dedicated GPU servers or cloud-based inference. It serves as a prototype for privacy-preserving IoT surveillance systems and a teaching tool for courses related to computer vision, IoT, and embedded AI at PolyU.

The IoT Cam project was developed at the TAS LAB under the supervision of Dr. Weisong Wen, supporting the lab's mission of building trustworthy and accessible AI systems for autonomous applications.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
app.js		app.js
index.html		index.html
readme.md		readme.md
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IoT Cam — Browser-Based AI-Powered Webcam Viewer

Abstract

System Overview

Key Features

Results

Technical Stack

Getting Started

Keyboard Shortcuts

Privacy

Significance

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IoT Cam — Browser-Based AI-Powered Webcam Viewer

Abstract

System Overview

Key Features

Results

Technical Stack

Getting Started

Keyboard Shortcuts

Privacy

Significance

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages