Technical Whitepaper

Deep-Check: Continuous Identity Verification and Document Forensics for the Synthetic Era

Deep-Check Research Team · February 2026

Abstract

We present Deep-Check, a multi-modal continuous identity verification platform that combines keystroke dynamics biometrics, facial liveness detection, and document forensics to detect fraud in remote sessions and document submission workflows. The system operates entirely client-side for biometric signal extraction, transmitting only derived feature vectors for server-side ML inference. We describe the architecture of three analytical modules — behavioral biometrics, anti-deepfake liveness, and image forensics — and report performance characteristics under controlled evaluation conditions. The platform is designed to comply with GDPR, the EU AI Act, and best practices for privacy-by-design in biometric systems.

1. Introduction

The proliferation of generative AI tools (large language models, image synthesis, voice cloning, and deepfake video generation) has fundamentally altered the threat landscape for remote identity verification. A remote candidate can now pass a technical interview using LLM-generated code, present a synthetic face via a virtual camera, and submit AI-generated supporting documents — all while appearing entirely legitimate to a human reviewer.

Existing point-in-time identity verification solutions (document scanning at login, facial recognition at session start) are insufficient because they verify identity once and assume continuity. Deep-Check addresses this by continuously verifying behavioural consistency throughout a session, not just at its inception.

This whitepaper describes the technical design of three integrated modules:

Module 1 — Keystroke Biometrics: Continuous typing pattern analysis for identity continuity and AI-assisted input detection
Module 2 — Facial Liveness: Real-time detection of deepfakes, virtual cameras, and pre-recorded video spoofs
Module 3 — Document Forensics: Analysis of submitted images for manipulation, AI generation, and metadata falsification

2. Module 1 — Keystroke Biometrics

2.1 Signal Capture

Keystroke dynamics are captured at the DOM event level via keydown andkeyup listeners on a Monaco editor instance. Two primary timing signals are extracted:

Flight time (inter-key interval): Time in milliseconds between keyup of key N and keydown of key N+1. Human neuromotor minimum is approximately 15ms; values below this threshold indicate synthetic input.
Hold time (key duration): Time between keydown and keyup for a single key. Typically 40–120ms in natural typing.

In addition to single-key timing, bigram timing (digrams) is collected: the flight time for specific key-pair transitions (e.g., “t→h”, “i→o”). These transition times are highly stable within an individual and vary significantly across individuals, making them useful for identity matching beyond aggregate statistics.

2.2 Feature Extraction (high-dimensional vector)

A proprietary multi-dimensional feature vector is computed from a rolling window of keystroke events and submitted to the ML inference endpoint. The vector spans four families of biometric signals:

Family	Signal Type	Description
Temporal dynamics	Flight & hold statistics	Mean, standard deviation, skewness, and kurtosis of inter-key intervals and key-hold durations. Captures the stochastic variability unique to human motor execution.
Entropic structure	Shannon entropy (multi-channel)	Information-theoretic measure of distributional regularity applied independently to flight and hold histograms. Synthetic input exhibits characteristically low entropy.
Rhythmic periodicity	Spectral analysis (FFT)	Dominant-frequency amplitude computed via Fast Fourier Transform over the keystroke time series. Automated tools produce detectable periodic patterns absent in human typing.
Temporal evolution	Velocity & fatigue signals	Linear trend and regression slope of typing speed over the session. Human typists exhibit measurable fatigue drift; programmatic input does not.
Micro-correction behaviour	Correction keystroke analysis	Statistical properties of correction keystrokes (timing, frequency, reaction latency) that reflect genuine cognitive load and error-correction cycles.
Bigram biometrics	Digraph pair consistency	Pair-wise inter-key interval variability across all observed character combinations. Each person exhibits a stable, unique bigram profile that is computationally expensive to replicate.
Burst injection detection	Sub-100ms key cluster rate	Rate of implausibly fast multi-key clusters per session volume. Paste injection, clipboard automation, and LLM-assisted input produce anomalous burst patterns.
Session throughput	Effective typing velocity	Derived words-per-minute with outlier sensitivity for both extremes of the human plausible range.

Exact feature definitions, internal identifiers, and weighting coefficients are proprietary and withheld to prevent adversarial calibration. The full specification is available to authorised partners under NDA.

2.3 ML Model

The classification layer uses a dual-model ensemble architecture combining a supervised gradient-boosted classifier with an unsupervised anomaly detection layer trained exclusively on genuine human sessions. The ensemble design requires an adversary to simultaneously fool two independent statistical models — one optimised for class separation, one for novelty detection — substantially raising the cost of evasion attacks compared to single-model systems.

Models are exported to ONNX format for server-side inference using onnxruntime-node, ensuring deterministic, version-controlled inference independent of client device capabilities. Model artifacts are stored outside the public HTTP path and are not directly accessible to clients.

Internal architecture details (tree count, feature weights, decision thresholds, and training data distributions) are withheld to prevent adversarial calibration.

2.4 Adaptive Baseline (Identity Matching)

When an enrollment profile exists for the session candidate, a Mahalanobis distance comparison is performed between the live session's feature distribution and the enrollment baseline. The identity match score is computed as:

match_score = 100 × exp(−λ × √Σ((xᵢ − μᵢ)² / σᵢ²))

The decay constant λ is calibrated from enrollment validation data and withheld.

The Welford online algorithm is used to update the adaptive baseline during a session, allowing the system to account for fatigue and context-switching without being locked to initial typing conditions.

3. Module 2 — Facial Liveness Detection

3.1 Architecture

Liveness detection runs entirely client-side using MediaPipe FaceLandmarker with 478 landmarks, iris tracking (10 points), 52 blendshapes, and facial transformation matrices for precise head-pose estimation. The model (~4MB) is loaded via CDN. No video data is transmitted to the server.

3.2 Detection Signals

Signal	Method	Deepfake/Spoof Indicator
Blink detection	Eye Aspect Ratio (EAR) via landmarks 36–41, 42–47	Blink rate <2/min (photo) or >50/min (artifact). GAN deepfakes often blink at unnatural rates
Micro-saccades	Variance of horizontal gaze ratio across 60-frame history	Real eyes have micro-jerk movements; deepfake video is unnaturally smooth (score <10 = suspicious)
Lighting challenge	Screen flashes white (500ms); EAR measured during flash	Real pupils constrict and gaze changes; pre-recorded video shows no response
Blink edge trajectory	Eyelid closure speed symmetry analysis	AI renderers often show unnatural snap-close without the natural asymmetric trajectory
Oculo-manual desync	Cross-correlation between cursor movement and gaze direction	Virtual camera: cursor active but gaze frozen on fixed point
Micro-movements	Nose tip position variance over time	Photo: zero variance. Deepfake: artificially periodic. Human: stochastic

3.3 False Positive Mitigation

Conservative thresholds, multi-frame consensus requirements, and 60-second cooldowns between alerts prevent alert flooding from normal user behaviour (reading pauses, natural gaze variation, corrective blinking). All camera-based alerts require a 7-second startup grace period to account for camera initialisation artefacts.

4. Module 3 — Document Forensics

4.1 Error Level Analysis (ELA)

ELA exploits the lossy compression model of JPEG encoding. When a JPEG image is re-saved at a known quality level (Q=75), regions that have already been compressed at that quality show minimal change, while regions that were edited and re-saved at a different quality show larger discrepancies. This differential is amplified (×12) and rendered as a heatmap.

The ELA score is derived from the mean heatmap brightness (normalised to 255) and the fraction of 8×8 blocks with mean brightness above a 40-point threshold:

ela_score = 0.60 × (mean_diff / 255 × 100) + 0.40 × (suspicious_blocks / total_blocks × 100)

AI-generated images that have never been JPEG-compressed score anomalously low on ELA (absence of compression artefacts is itself a signal). This is captured by the noise module.

4.2 EXIF Metadata Analysis

EXIF metadata is extracted using exifr (client-side, no server upload). Anomaly signals include:

Presence of known editing software strings (Photoshop, GIMP, Affinity, Canva, Stable Diffusion, Midjourney, DALL-E) in the Software or CreatorTool fields
Discrepancy between DateTimeOriginal and DateTime exceeding 60 seconds
Absence of Make and Model fields (camera metadata always present in genuine device captures)
Complete absence of EXIF data (common in screenshots and synthetic images)

4.3 PRNU / Noise Residual (AI Image Detection)

Images generated by diffusion models (Stable Diffusion, DALL-E, Midjourney) and GAN architectures exhibit characteristically uniform noise distributions. Real photographs contain heterogeneous noise from sensor shot noise, JPEG quantisation, and scene variation. Deep-Check computes:

Laplacian variance — Measures high-frequency content. Low values indicate unnaturally smooth images.
Residual correlation (PRNU proxy) — Inter-region Pearson correlation of the Gaussian denoising residual. Authentic images show low cross-region correlation; synthetic images exhibit a spatially uniform residual signature.
Haar wavelet consistency — Variance ratio across wavelet sub-bands. Inconsistent ratios indicate patchwork composition.

prnu_score = 0.50 × uniformity + 0.30 × residual_correlation + 0.20 × wavelet_inconsistency

4.4 DCT Statistics — Double-JPEG Detection

When a JPEG image is edited and re-saved, the regions that were modified undergo a second round of DCT quantisation. This leaves a characteristic periodicity in the DCT coefficient histograms — the so-called “double compression ghost”. Deep-Check computes the 8×8 block DCT for each luma block and analyses the coefficient variance across blocks, flagging anomalous periodicity patterns.

4.5 Edge Statistics

Real photographs exhibit statistically consistent edge orientations across similar regions of the scene. When a region from a different source image is composited in, its edge angle distribution diverges from the surrounding content. Deep-Check computes Sobel gradients per 64×64 region and measures KL-divergence between regional angle histograms.

4.6 Chroma Analysis

AI-generated images and composited regions exhibit characteristic chromatic anomalies: abnormally high RGB inter-channel correlation (colour channels are not independent), elevated saturation kurtosis, and reduced saturation entropy. Deep-Check extracts per-channel statistics and combines them into a chroma score.

4.7 Bayesian Combiner (Veritas Engine v3)

The six signal scores are combined using a Bayesian framework rather than a fixed weighted sum. Each signal s is modelled with Gaussian likelihood under the authentic (H₀) and manipulated (H₁) hypotheses. The log-likelihood ratio (LLR) per signal is accumulated, and the posterior probability of manipulation is computed from a conservative prior P(manipulated) = 0.05:

LLR = Σ log[ P(s_i | H₁) / P(s_i | H₀) ]

P(manipulated | signals) = sigmoid( LLR + log(0.05/0.95) )

A false-positive gate requires at least 2 signals to exceed their respective thresholds before the posterior can exceed 50%. This prevents single-signal noise from triggering a high-risk verdict. Risk levels: 0–29 = Clean · 30–59 = Suspicious · 60–100 = High Risk.

5. Privacy Architecture

Deep-Check is built privacy-first:

All signal extraction runs client-side in the browser. Raw video, audio, and keystroke sequences never leave the user's device.
Only derived numerical feature vectors (18 floats for keystroke, statistical aggregates for enrollment) are transmitted over HTTPS.
Document forensic analysis (ELA, EXIF, noise) is fully client-side. Only the final scores and metadata are persisted — not the original image.
Enrollment profiles store only mean, standard deviation, and bigram statistics — not reconstructable biometric signals.
All biometric profiles expire after 90 days and are hard-deleted from the database.

6. Known Limitations

Training data: The keystroke ML model is trained on synthetic data. Performance on real-world diverse user populations has not been formally evaluated. A validation study with a representative sample is planned.
Device variation: Keystroke timing is affected by keyboard type (mechanical, membrane, touchscreen). The adaptive baseline partially compensates for this but does not fully normalise cross-device variation.
ELA limitations: ELA is ineffective on PNG files (lossless compression) and on images that have been upscaled, screenshotted, or processed through a lossless pipeline before JPEG compression.
AI image detection: As generative models evolve, their noise characteristics change. The current noise analysis is based on known model families as of early 2026.
No 100% guarantee: No biometric or forensic system achieves perfect accuracy. Deep-Check outputs are probabilistic and should always be combined with human review.

7. Roadmap

Item	Target
Independent algorithm audit by external cybersecurity firm	Q3 2026
Validation study on real-world diverse keystroke population	Q3 2026
DPIA (Data Protection Impact Assessment) completion	Q2 2026
EU AI Act technical documentation (Article 11)	Q3 2026
ISO 27001 certification process initiation	Q1 2027
ENS (Esquema Nacional de Seguridad) certification	Q2 2027
Publication of peer-reviewed technical paper	Q4 2026
Video deepfake detection via temporal consistency analysis	Q4 2026
PDF document forensics (embedded image ELA, font analysis)	Q3 2026

8. Contact & Citations

For technical questions, partnership inquiries, or to request a Data Processing Agreement:

Deep-Check Technical Whitepaper v1.0 · February 2026 · Deep-Check Inc.
This document is provided for informational purposes. Performance metrics are derived from internal evaluation and are subject to revision following independent audit.