Typrinting - Multi-Method Typing Identification System

A comprehensive typing biometrics system that identifies users based on their typing patterns using four different methods: statistical analysis, n-gram analysis, machine learning, and neural networks.

🚀 Features

Four Identification Methods

Statistical Method (Original)
- Uses mean and standard deviation of hold, flight, and down-down times
- Best for same text identification
- Fast and simple
- 6-feature vector: [mean_hold, std_hold, mean_flight, std_flight, mean_dd, std_dd]
N-Gram Analysis
- Analyzes timing patterns between character pairs (digraphs) and triplets (trigraphs)
- Works well with different texts
- More robust than statistical method for cross-text identification
- Dynamic feature extraction with fixed feature set for ML compatibility
Machine Learning
- Uses scikit-learn (KNN and SVM) on n-gram features
- Most accurate but requires more training data
- Automatically trains models when sufficient data is available
- Ensemble approach with confidence-based decision making
- Feature scaling and cross-validation for robust performance
Neural Network
- Uses deep neural network with comprehensive feature extraction
- Most advanced method with highest accuracy potential
- Combines statistical, n-gram, and additional features
- Requires more training data but provides best performance
- Early stopping and dropout layers for regularization

Data Collection

Comprehensive Data Gathering: Collects all data types simultaneously during typing
N-Gram Extraction: Automatically extracts digraph and trigraph timing patterns
Keystroke Sequences: Records detailed keystroke sequences for analysis
Multi-User Support: Supports multiple users with unique usernames

📁 Project Structure

Typrinting/
├── data/                    # User profile data
│   ├── test_data.json
│   └── ... (multiple user profiles)
├── typing_game/            # Data collection app
│   ├── app.py
│   ├── static/
│   │   └── typing_game.js  # Enhanced with n-gram collection
│   └── templates/
│       └── typing_game.html
├── identify_app/           # Identification app
│   ├── app.py             # Multi-method identification backend
│   ├── models/            # Trained ML models
│   │   ├── scaler.pkl
│   │   ├── knn.pkl
│   │   ├── svm.pkl
│   │   ├── neural_network.h5
│   │   ├── nn_scaler.pkl
│   │   └── nn_label_encoder.pkl
│   ├── static/
│   │   ├── identify.js    # Enhanced with method selection
│   │   └── identify.css
│   └── templates/
│       └── identify.html  # Method selection UI
├── TODO.md                # Implementation roadmap
├── test_implementation.py # Test script
└── README.md

🛠️ Installation

Clone the repository

git clone <repository-url>
cd Typrinting

Install dependencies

# For typing game
cd typing_game
pip install -r requirements.txt

# For identification app
cd ../identify_app
pip install -r requirements.txt

🎮 Usage

1. Data Collection (Typing Game)

cd typing_game
python app.py

Open http://localhost:8001
Enter your username
Type the prompts to build your profile
All data types are collected automatically (statistical, n-gram, ML-ready)

2. User Identification

cd identify_app
python app.py

Open http://localhost:8001
Select your preferred identification method:
- Statistical: Best for same text
- N-Gram: Good for different texts
- Machine Learning: Most accurate (requires sufficient data)
- Neural Network: Most advanced (requires sufficient data)
Type the prompt and get identified

🔬 How It Works

Statistical Method

Extracts 6 features: [mean_hold, std_hold, mean_flight, std_flight, mean_dd, std_dd]
Compares test sample to user profiles using acceptance percentage
Threshold-based decision making (default: 70% acceptance)

N-Gram Method

Extracts digraphs (2-grams) and trigraphs (3-grams) from text
Calculates timing features for each n-gram: mean and std
Compares overlapping n-grams between test and profiles
More robust for different texts
Fixed feature set for ML compatibility

Machine Learning Method

Uses n-gram features as input to ML models
Trains KNN (k=3) and SVM (RBF kernel) classifiers
Ensemble prediction with confidence scores
Requires at least 10 samples across users
Feature scaling with StandardScaler
Cross-validation for robust evaluation

Neural Network Method

Uses comprehensive feature extraction (statistical + n-gram + additional features)
Deep neural network architecture: 128→64→32→num_classes
Dropout layers (0.3, 0.2, 0.1) for regularization
Categorical classification with softmax output
Early stopping to prevent overfitting
Adam optimizer with learning rate 0.001
Requires at least 10 samples across users

📊 Method Comparison

Method	Same Text	Different Text	Accuracy	Speed	Data Requirements	Training Required
Statistical	✅ Excellent	❌ Poor	Medium	Fast	Low	No
N-Gram	✅ Good	✅ Good	High	Medium	Medium	No
ML	✅ Excellent	✅ Excellent	Very High	Slow	High	Yes
Neural Network	✅ Excellent	✅ Excellent	Highest	Slowest	Highest	Yes

🧪 Testing

Run the test script to verify all methods work:

python test_implementation.py

This will test:

Statistical method with sample data
N-gram feature extraction
ML model training and prediction
Neural network training and prediction
All four identification methods

🔧 Configuration

Thresholds

Statistical: Default 0.7 (70% acceptance)
N-Gram: Default 0.6 (60% similarity)
ML: Automatic confidence-based
Neural Network: Automatic confidence-based

N-Gram Settings

Digraphs: 2-character sequences (e.g., "th", "he")
Trigraphs: 3-character sequences (e.g., "the", "qui")
Alphabetic only: Filters out non-letter characters
Fixed feature set: Ensures consistent dimensions for ML

ML Settings

KNN: k=3 neighbors
SVM: RBF kernel with probability estimates
Feature scaling: StandardScaler
Cross-validation: 5-fold CV
Ensemble: Majority vote with confidence weighting

Neural Network Settings

Architecture: Dense layers with dropout
Optimizer: Adam (lr=0.001)
Loss: Categorical crossentropy
Early stopping: Patience=10, restore_best_weights=True
Batch size: 32
Epochs: 100 (with early stopping)

🚀 Future Enhancements

Real-time identification during typing
Adaptive thresholds per user
More ML algorithms (Random Forest, Ensemble Methods)
Cross-platform compatibility
API endpoints for integration
Performance optimization for large datasets
Advanced feature engineering
Transfer learning capabilities
Real-time model updates

📝 Technical Notes

Data Format

Each user profile contains:

{
  "username": "user",
  "samples": [
    {
      "text": "prompt text",
      "hold_times": [...],
      "flight_times": [...],
      "down_down_times": [...],
      "ngram_data": {
        "digraphs": {"th": [...], "he": [...]},
        "trigraphs": {"the": [...], "qui": [...]}
      },
      "keystroke_sequence": [...]
    }
  ]
}

ML Models

KNN: k=3 neighbors
SVM: RBF kernel with probability estimates
Ensemble: Majority vote with confidence weighting
Neural Network: Deep neural network with dropout layers
Persistence: Models saved in identify_app/models/

Feature Engineering

Statistical features: Mean and standard deviation of timing data
N-gram features: Timing patterns for character sequences
Comprehensive features: Combined statistical and n-gram features
Fixed feature set: Ensures consistent dimensions across samples

🤝 Contributing

Fork the repository
Create a feature branch
Implement your changes
Add tests
Submit a pull request

📄 License

This project is open source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Typrinting - Multi-Method Typing Identification System

🚀 Features

Four Identification Methods

Data Collection

📁 Project Structure

🛠️ Installation

🎮 Usage

1. Data Collection (Typing Game)

2. User Identification

🔬 How It Works

Statistical Method

N-Gram Method

Machine Learning Method

Neural Network Method

📊 Method Comparison

🧪 Testing

🔧 Configuration

Thresholds

N-Gram Settings

ML Settings

Neural Network Settings

🚀 Future Enhancements

📝 Technical Notes

Data Format

ML Models

Feature Engineering

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
identify_app		identify_app
typing_game		typing_game
README.md		README.md
TODO.md		TODO.md
test_implementation.py		test_implementation.py
test_ml_fix.py		test_ml_fix.py
test_neural_network.py		test_neural_network.py
test_ngram_fix.py		test_ngram_fix.py

icyca/Typrinting

Folders and files

Latest commit

History

Repository files navigation

Typrinting - Multi-Method Typing Identification System

🚀 Features

Four Identification Methods

Data Collection

📁 Project Structure

🛠️ Installation

🎮 Usage

1. Data Collection (Typing Game)

2. User Identification

🔬 How It Works

Statistical Method

N-Gram Method

Machine Learning Method

Neural Network Method

📊 Method Comparison

🧪 Testing

🔧 Configuration

Thresholds

N-Gram Settings

ML Settings

Neural Network Settings

🚀 Future Enhancements

📝 Technical Notes

Data Format

ML Models

Feature Engineering

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages