A comprehensive typing biometrics system that identifies users based on their typing patterns using four different methods: statistical analysis, n-gram analysis, machine learning, and neural networks.
-
Statistical Method (Original)
- Uses mean and standard deviation of hold, flight, and down-down times
- Best for same text identification
- Fast and simple
- 6-feature vector:
[mean_hold, std_hold, mean_flight, std_flight, mean_dd, std_dd]
-
N-Gram Analysis
- Analyzes timing patterns between character pairs (digraphs) and triplets (trigraphs)
- Works well with different texts
- More robust than statistical method for cross-text identification
- Dynamic feature extraction with fixed feature set for ML compatibility
-
Machine Learning
- Uses scikit-learn (KNN and SVM) on n-gram features
- Most accurate but requires more training data
- Automatically trains models when sufficient data is available
- Ensemble approach with confidence-based decision making
- Feature scaling and cross-validation for robust performance
-
Neural Network
- Uses deep neural network with comprehensive feature extraction
- Most advanced method with highest accuracy potential
- Combines statistical, n-gram, and additional features
- Requires more training data but provides best performance
- Early stopping and dropout layers for regularization
- Comprehensive Data Gathering: Collects all data types simultaneously during typing
- N-Gram Extraction: Automatically extracts digraph and trigraph timing patterns
- Keystroke Sequences: Records detailed keystroke sequences for analysis
- Multi-User Support: Supports multiple users with unique usernames
Typrinting/
โโโ data/ # User profile data
โ โโโ test_data.json
โ โโโ ... (multiple user profiles)
โโโ typing_game/ # Data collection app
โ โโโ app.py
โ โโโ static/
โ โ โโโ typing_game.js # Enhanced with n-gram collection
โ โโโ templates/
โ โโโ typing_game.html
โโโ identify_app/ # Identification app
โ โโโ app.py # Multi-method identification backend
โ โโโ models/ # Trained ML models
โ โ โโโ scaler.pkl
โ โ โโโ knn.pkl
โ โ โโโ svm.pkl
โ โ โโโ neural_network.h5
โ โ โโโ nn_scaler.pkl
โ โ โโโ nn_label_encoder.pkl
โ โโโ static/
โ โ โโโ identify.js # Enhanced with method selection
โ โ โโโ identify.css
โ โโโ templates/
โ โโโ identify.html # Method selection UI
โโโ TODO.md # Implementation roadmap
โโโ test_implementation.py # Test script
โโโ README.md
-
Clone the repository
git clone <repository-url> cd Typrinting
-
Install dependencies
# For typing game cd typing_game pip install -r requirements.txt # For identification app cd ../identify_app pip install -r requirements.txt
cd typing_game
python app.py- Open
http://localhost:8001 - Enter your username
- Type the prompts to build your profile
- All data types are collected automatically (statistical, n-gram, ML-ready)
cd identify_app
python app.py- Open
http://localhost:8001 - Select your preferred identification method:
- Statistical: Best for same text
- N-Gram: Good for different texts
- Machine Learning: Most accurate (requires sufficient data)
- Neural Network: Most advanced (requires sufficient data)
- Type the prompt and get identified
- Extracts 6 features:
[mean_hold, std_hold, mean_flight, std_flight, mean_dd, std_dd] - Compares test sample to user profiles using acceptance percentage
- Threshold-based decision making (default: 70% acceptance)
- Extracts digraphs (2-grams) and trigraphs (3-grams) from text
- Calculates timing features for each n-gram:
meanandstd - Compares overlapping n-grams between test and profiles
- More robust for different texts
- Fixed feature set for ML compatibility
- Uses n-gram features as input to ML models
- Trains KNN (k=3) and SVM (RBF kernel) classifiers
- Ensemble prediction with confidence scores
- Requires at least 10 samples across users
- Feature scaling with StandardScaler
- Cross-validation for robust evaluation
- Uses comprehensive feature extraction (statistical + n-gram + additional features)
- Deep neural network architecture: 128โ64โ32โnum_classes
- Dropout layers (0.3, 0.2, 0.1) for regularization
- Categorical classification with softmax output
- Early stopping to prevent overfitting
- Adam optimizer with learning rate 0.001
- Requires at least 10 samples across users
| Method | Same Text | Different Text | Accuracy | Speed | Data Requirements | Training Required |
|---|---|---|---|---|---|---|
| Statistical | โ Excellent | โ Poor | Medium | Fast | Low | No |
| N-Gram | โ Good | โ Good | High | Medium | Medium | No |
| ML | โ Excellent | โ Excellent | Very High | Slow | High | Yes |
| Neural Network | โ Excellent | โ Excellent | Highest | Slowest | Highest | Yes |
Run the test script to verify all methods work:
python test_implementation.pyThis will test:
- Statistical method with sample data
- N-gram feature extraction
- ML model training and prediction
- Neural network training and prediction
- All four identification methods
- Statistical: Default 0.7 (70% acceptance)
- N-Gram: Default 0.6 (60% similarity)
- ML: Automatic confidence-based
- Neural Network: Automatic confidence-based
- Digraphs: 2-character sequences (e.g., "th", "he")
- Trigraphs: 3-character sequences (e.g., "the", "qui")
- Alphabetic only: Filters out non-letter characters
- Fixed feature set: Ensures consistent dimensions for ML
- KNN: k=3 neighbors
- SVM: RBF kernel with probability estimates
- Feature scaling: StandardScaler
- Cross-validation: 5-fold CV
- Ensemble: Majority vote with confidence weighting
- Architecture: Dense layers with dropout
- Optimizer: Adam (lr=0.001)
- Loss: Categorical crossentropy
- Early stopping: Patience=10, restore_best_weights=True
- Batch size: 32
- Epochs: 100 (with early stopping)
- Real-time identification during typing
- Adaptive thresholds per user
- More ML algorithms (Random Forest, Ensemble Methods)
- Cross-platform compatibility
- API endpoints for integration
- Performance optimization for large datasets
- Advanced feature engineering
- Transfer learning capabilities
- Real-time model updates
Each user profile contains:
{
"username": "user",
"samples": [
{
"text": "prompt text",
"hold_times": [...],
"flight_times": [...],
"down_down_times": [...],
"ngram_data": {
"digraphs": {"th": [...], "he": [...]},
"trigraphs": {"the": [...], "qui": [...]}
},
"keystroke_sequence": [...]
}
]
}- KNN: k=3 neighbors
- SVM: RBF kernel with probability estimates
- Ensemble: Majority vote with confidence weighting
- Neural Network: Deep neural network with dropout layers
- Persistence: Models saved in
identify_app/models/
- Statistical features: Mean and standard deviation of timing data
- N-gram features: Timing patterns for character sequences
- Comprehensive features: Combined statistical and n-gram features
- Fixed feature set: Ensures consistent dimensions across samples
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests
- Submit a pull request
This project is open source and available under the MIT License.