We present Deep-Check, a multi-modal continuous identity verification platform that combines keystroke dynamics biometrics, facial liveness detection, and document forensics to detect fraud in remote sessions and document submission workflows. The system operates entirely client-side for biometric signal extraction, transmitting only derived feature vectors for server-side ML inference. We describe the architecture of three analytical modules — behavioral biometrics, anti-deepfake liveness, and image forensics — and report performance characteristics under controlled evaluation conditions. The platform is designed to comply with GDPR, the EU AI Act, and best practices for privacy-by-design in biometric systems.
The proliferation of generative AI tools (large language models, image synthesis, voice cloning, and deepfake video generation) has fundamentally altered the threat landscape for remote identity verification. A remote candidate can now pass a technical interview using LLM-generated code, present a synthetic face via a virtual camera, and submit AI-generated supporting documents — all while appearing entirely legitimate to a human reviewer.
Existing point-in-time identity verification solutions (document scanning at login, facial recognition at session start) are insufficient because they verify identity once and assume continuity. Deep-Check addresses this by continuously verifying behavioural consistency throughout a session, not just at its inception.
This whitepaper describes the technical design of three integrated modules:
Keystroke dynamics are captured at the DOM event level via keydown andkeyup listeners on a Monaco editor instance. Two primary timing signals are extracted:
keyup of key N and keydown of key N+1. Human neuromotor minimum is approximately 15ms; values below this threshold indicate synthetic input.keydown and keyup for a single key. Typically 40–120ms in natural typing.In addition to single-key timing, bigram timing (digrams) is collected: the flight time for specific key-pair transitions (e.g., “t→h”, “i→o”). These transition times are highly stable within an individual and vary significantly across individuals, making them useful for identity matching beyond aggregate statistics.
A proprietary multi-dimensional feature vector is computed from a rolling window of keystroke events and submitted to the ML inference endpoint. The vector spans four families of biometric signals:
| Family | Signal Type | Description |
|---|---|---|
| Temporal dynamics | Flight & hold statistics | Mean, standard deviation, skewness, and kurtosis of inter-key intervals and key-hold durations. Captures the stochastic variability unique to human motor execution. |
| Entropic structure | Shannon entropy (multi-channel) | Information-theoretic measure of distributional regularity applied independently to flight and hold histograms. Synthetic input exhibits characteristically low entropy. |
| Rhythmic periodicity | Spectral analysis (FFT) | Dominant-frequency amplitude computed via Fast Fourier Transform over the keystroke time series. Automated tools produce detectable periodic patterns absent in human typing. |
| Temporal evolution | Velocity & fatigue signals | Linear trend and regression slope of typing speed over the session. Human typists exhibit measurable fatigue drift; programmatic input does not. |
| Micro-correction behaviour | Correction keystroke analysis | Statistical properties of correction keystrokes (timing, frequency, reaction latency) that reflect genuine cognitive load and error-correction cycles. |
| Bigram biometrics | Digraph pair consistency | Pair-wise inter-key interval variability across all observed character combinations. Each person exhibits a stable, unique bigram profile that is computationally expensive to replicate. |
| Burst injection detection | Sub-100ms key cluster rate | Rate of implausibly fast multi-key clusters per session volume. Paste injection, clipboard automation, and LLM-assisted input produce anomalous burst patterns. |
| Session throughput | Effective typing velocity | Derived words-per-minute with outlier sensitivity for both extremes of the human plausible range. |
Exact feature definitions, internal identifiers, and weighting coefficients are proprietary and withheld to prevent adversarial calibration. The full specification is available to authorised partners under NDA.
The classification layer uses a dual-model ensemble architecture combining a supervised gradient-boosted classifier with an unsupervised anomaly detection layer trained exclusively on genuine human sessions. The ensemble design requires an adversary to simultaneously fool two independent statistical models — one optimised for class separation, one for novelty detection — substantially raising the cost of evasion attacks compared to single-model systems.
Models are exported to ONNX format for server-side inference using onnxruntime-node, ensuring deterministic, version-controlled inference independent of client device capabilities. Model artifacts are stored outside the public HTTP path and are not directly accessible to clients.
Internal architecture details (tree count, feature weights, decision thresholds, and training data distributions) are withheld to prevent adversarial calibration.
When an enrollment profile exists for the session candidate, a Mahalanobis distance comparison is performed between the live session's feature distribution and the enrollment baseline. The identity match score is computed as:
match_score = 100 × exp(−λ × √Σ((xᵢ − μᵢ)² / σᵢ²))The decay constant λ is calibrated from enrollment validation data and withheld.
The Welford online algorithm is used to update the adaptive baseline during a session, allowing the system to account for fatigue and context-switching without being locked to initial typing conditions.
Liveness detection runs entirely client-side using face-api.js with the TinyFaceDetector (SSD MobileNetV1-derived, ~190KB) and the 68-point FaceLandmark68Net models, both loaded from /public/models/. No video data is transmitted to the server.
| Signal | Method | Deepfake/Spoof Indicator |
|---|---|---|
| Blink detection | Eye Aspect Ratio (EAR) via landmarks 36–41, 42–47 | Blink rate <2/min (photo) or >50/min (artifact). GAN deepfakes often blink at unnatural rates |
| Micro-saccades | Variance of horizontal gaze ratio across 60-frame history | Real eyes have micro-jerk movements; deepfake video is unnaturally smooth (score <10 = suspicious) |
| Lighting challenge | Screen flashes white (500ms); EAR measured during flash | Real pupils constrict and gaze changes; pre-recorded video shows no response |
| Blink edge trajectory | Eyelid closure speed symmetry analysis | AI renderers often show unnatural snap-close without the natural asymmetric trajectory |
| Oculo-manual desync | Cross-correlation between cursor movement and gaze direction | Virtual camera: cursor active but gaze frozen on fixed point |
| Micro-movements | Nose tip position variance over time | Photo: zero variance. Deepfake: artificially periodic. Human: stochastic |
Conservative thresholds, multi-frame consensus requirements, and 60-second cooldowns between alerts prevent alert flooding from normal user behaviour (reading pauses, natural gaze variation, corrective blinking). All camera-based alerts require a 7-second startup grace period to account for camera initialisation artefacts.
ELA exploits the lossy compression model of JPEG encoding. When a JPEG image is re-saved at a known quality level (Q=75), regions that have already been compressed at that quality show minimal change, while regions that were edited and re-saved at a different quality show larger discrepancies. This differential is amplified (×12) and rendered as a heatmap.
The ELA score is derived from the mean heatmap brightness (normalised to 255) and the fraction of 8×8 blocks with mean brightness above a 40-point threshold:
ela_score = 0.60 × (mean_diff / 255 × 100) + 0.40 × (suspicious_blocks / total_blocks × 100)AI-generated images that have never been JPEG-compressed score anomalously low on ELA (absence of compression artefacts is itself a signal). This is captured by the noise module.
EXIF metadata is extracted using exifr (client-side, no server upload). Anomaly signals include:
Software or CreatorTool fieldsDateTimeOriginal and DateTime exceeding 60 secondsMake and Model fields (camera metadata always present in genuine device captures)Images generated by diffusion models (Stable Diffusion, DALL-E, Midjourney) and GAN architectures exhibit characteristically uniform noise distributions. Real photographs contain heterogeneous noise from sensor shot noise, JPEG quantisation, and scene variation. Deep-Check computes:
noise_score = 0.65 × uniformity_score + 0.35 × laplacian_flagrisk_score = 0.50 × ela_score + 0.30 × exif_score + 0.20 × noise_scoreRisk levels: 0–29 = Clean · 30–59 = Suspicious · 60–100 = High Risk
Deep-Check is built privacy-first:
| Item | Target |
|---|---|
| Independent algorithm audit by external cybersecurity firm | Q3 2026 |
| Validation study on real-world diverse keystroke population | Q3 2026 |
| DPIA (Data Protection Impact Assessment) completion | Q2 2026 |
| EU AI Act technical documentation (Article 11) | Q3 2026 |
| ISO 27001 certification process initiation | Q1 2027 |
| ENS (Esquema Nacional de Seguridad) certification | Q2 2027 |
| Publication of peer-reviewed technical paper | Q4 2026 |
| Video deepfake detection via temporal consistency analysis | Q4 2026 |
| PDF document forensics (embedded image ELA, font analysis) | Q3 2026 |
For technical questions, partnership inquiries, or to request a Data Processing Agreement:
Deep-Check Technical Whitepaper v1.0 · February 2026 · Deep-Check Inc.
This document is provided for informational purposes. Performance metrics are derived from internal evaluation and are subject to revision following independent audit.