AirHandWriter: Real-Time Hand Tracking

Overview

AirHandWriter is a system for real-time hand tracking and pinch detection from a standard webcam feed. The system uses a neural network to detect the positions of the thumb and index fingertips and classify if they are pinching, enabling camera-only computer interaction.

The project encompasses a full machine learning pipeline, including data collection, a high-throughput CUDA-accelerated augmentation pipeline, model training, and a Flask-based web dashboard for managing experiments.

Key Features

How It Works

The system is broken down into a data pipeline, a training process, and an inference engine.

1. Data Collection & Preparation

2. Training Process

3. Inference Process

Technical Details

Model Architecture

The core of the system is a SimpleResNet, a lightweight variant of the popular ResNet architecture. It is designed to be efficient enough for real-time inference while being deep enough to learn complex spatial features from hand images. The final layers of the network are fully connected, outputting either 4 continuous values for keypoint regression or 6 values (4 for keypoints, 2 for pinch classification).

CUDA-Accelerated Augmentation

To prevent the data pipeline from becoming a bottleneck during training, a custom PyTorch extension was written in C++ and CUDA (augment.cu). This module implements transformations like rotation, translation, and resized cropping directly on the GPU. By performing these operations in parallel on the GPU, it avoids costly CPU-to-GPU memory transfers for each augmented batch, leading to a significant speedup in training.

Training & Experiment Dashboard

A web-based dashboard built with Flask and Chart.js serves as the control center for the project. It allows a user to:

Future Work