Skip to content

icyca/Typrinting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Typrinting - Multi-Method Typing Identification System

A comprehensive typing biometrics system that identifies users based on their typing patterns using four different methods: statistical analysis, n-gram analysis, machine learning, and neural networks.

๐Ÿš€ Features

Four Identification Methods

  1. Statistical Method (Original)

    • Uses mean and standard deviation of hold, flight, and down-down times
    • Best for same text identification
    • Fast and simple
    • 6-feature vector: [mean_hold, std_hold, mean_flight, std_flight, mean_dd, std_dd]
  2. N-Gram Analysis

    • Analyzes timing patterns between character pairs (digraphs) and triplets (trigraphs)
    • Works well with different texts
    • More robust than statistical method for cross-text identification
    • Dynamic feature extraction with fixed feature set for ML compatibility
  3. Machine Learning

    • Uses scikit-learn (KNN and SVM) on n-gram features
    • Most accurate but requires more training data
    • Automatically trains models when sufficient data is available
    • Ensemble approach with confidence-based decision making
    • Feature scaling and cross-validation for robust performance
  4. Neural Network

    • Uses deep neural network with comprehensive feature extraction
    • Most advanced method with highest accuracy potential
    • Combines statistical, n-gram, and additional features
    • Requires more training data but provides best performance
    • Early stopping and dropout layers for regularization

Data Collection

  • Comprehensive Data Gathering: Collects all data types simultaneously during typing
  • N-Gram Extraction: Automatically extracts digraph and trigraph timing patterns
  • Keystroke Sequences: Records detailed keystroke sequences for analysis
  • Multi-User Support: Supports multiple users with unique usernames

๐Ÿ“ Project Structure

Typrinting/
โ”œโ”€โ”€ data/                    # User profile data
โ”‚   โ”œโ”€โ”€ test_data.json
โ”‚   โ””โ”€โ”€ ... (multiple user profiles)
โ”œโ”€โ”€ typing_game/            # Data collection app
โ”‚   โ”œโ”€โ”€ app.py
โ”‚   โ”œโ”€โ”€ static/
โ”‚   โ”‚   โ””โ”€โ”€ typing_game.js  # Enhanced with n-gram collection
โ”‚   โ””โ”€โ”€ templates/
โ”‚       โ””โ”€โ”€ typing_game.html
โ”œโ”€โ”€ identify_app/           # Identification app
โ”‚   โ”œโ”€โ”€ app.py             # Multi-method identification backend
โ”‚   โ”œโ”€โ”€ models/            # Trained ML models
โ”‚   โ”‚   โ”œโ”€โ”€ scaler.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ knn.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ svm.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ neural_network.h5
โ”‚   โ”‚   โ”œโ”€โ”€ nn_scaler.pkl
โ”‚   โ”‚   โ””โ”€โ”€ nn_label_encoder.pkl
โ”‚   โ”œโ”€โ”€ static/
โ”‚   โ”‚   โ”œโ”€โ”€ identify.js    # Enhanced with method selection
โ”‚   โ”‚   โ””โ”€โ”€ identify.css
โ”‚   โ””โ”€โ”€ templates/
โ”‚       โ””โ”€โ”€ identify.html  # Method selection UI
โ”œโ”€โ”€ TODO.md                # Implementation roadmap
โ”œโ”€โ”€ test_implementation.py # Test script
โ””โ”€โ”€ README.md

๐Ÿ› ๏ธ Installation

  1. Clone the repository

    git clone <repository-url>
    cd Typrinting
  2. Install dependencies

    # For typing game
    cd typing_game
    pip install -r requirements.txt
    
    # For identification app
    cd ../identify_app
    pip install -r requirements.txt

๐ŸŽฎ Usage

1. Data Collection (Typing Game)

cd typing_game
python app.py
  • Open http://localhost:8001
  • Enter your username
  • Type the prompts to build your profile
  • All data types are collected automatically (statistical, n-gram, ML-ready)

2. User Identification

cd identify_app
python app.py
  • Open http://localhost:8001
  • Select your preferred identification method:
    • Statistical: Best for same text
    • N-Gram: Good for different texts
    • Machine Learning: Most accurate (requires sufficient data)
    • Neural Network: Most advanced (requires sufficient data)
  • Type the prompt and get identified

๐Ÿ”ฌ How It Works

Statistical Method

  • Extracts 6 features: [mean_hold, std_hold, mean_flight, std_flight, mean_dd, std_dd]
  • Compares test sample to user profiles using acceptance percentage
  • Threshold-based decision making (default: 70% acceptance)

N-Gram Method

  • Extracts digraphs (2-grams) and trigraphs (3-grams) from text
  • Calculates timing features for each n-gram: mean and std
  • Compares overlapping n-grams between test and profiles
  • More robust for different texts
  • Fixed feature set for ML compatibility

Machine Learning Method

  • Uses n-gram features as input to ML models
  • Trains KNN (k=3) and SVM (RBF kernel) classifiers
  • Ensemble prediction with confidence scores
  • Requires at least 10 samples across users
  • Feature scaling with StandardScaler
  • Cross-validation for robust evaluation

Neural Network Method

  • Uses comprehensive feature extraction (statistical + n-gram + additional features)
  • Deep neural network architecture: 128โ†’64โ†’32โ†’num_classes
  • Dropout layers (0.3, 0.2, 0.1) for regularization
  • Categorical classification with softmax output
  • Early stopping to prevent overfitting
  • Adam optimizer with learning rate 0.001
  • Requires at least 10 samples across users

๐Ÿ“Š Method Comparison

Method Same Text Different Text Accuracy Speed Data Requirements Training Required
Statistical โœ… Excellent โŒ Poor Medium Fast Low No
N-Gram โœ… Good โœ… Good High Medium Medium No
ML โœ… Excellent โœ… Excellent Very High Slow High Yes
Neural Network โœ… Excellent โœ… Excellent Highest Slowest Highest Yes

๐Ÿงช Testing

Run the test script to verify all methods work:

python test_implementation.py

This will test:

  • Statistical method with sample data
  • N-gram feature extraction
  • ML model training and prediction
  • Neural network training and prediction
  • All four identification methods

๐Ÿ”ง Configuration

Thresholds

  • Statistical: Default 0.7 (70% acceptance)
  • N-Gram: Default 0.6 (60% similarity)
  • ML: Automatic confidence-based
  • Neural Network: Automatic confidence-based

N-Gram Settings

  • Digraphs: 2-character sequences (e.g., "th", "he")
  • Trigraphs: 3-character sequences (e.g., "the", "qui")
  • Alphabetic only: Filters out non-letter characters
  • Fixed feature set: Ensures consistent dimensions for ML

ML Settings

  • KNN: k=3 neighbors
  • SVM: RBF kernel with probability estimates
  • Feature scaling: StandardScaler
  • Cross-validation: 5-fold CV
  • Ensemble: Majority vote with confidence weighting

Neural Network Settings

  • Architecture: Dense layers with dropout
  • Optimizer: Adam (lr=0.001)
  • Loss: Categorical crossentropy
  • Early stopping: Patience=10, restore_best_weights=True
  • Batch size: 32
  • Epochs: 100 (with early stopping)

๐Ÿš€ Future Enhancements

  • Real-time identification during typing
  • Adaptive thresholds per user
  • More ML algorithms (Random Forest, Ensemble Methods)
  • Cross-platform compatibility
  • API endpoints for integration
  • Performance optimization for large datasets
  • Advanced feature engineering
  • Transfer learning capabilities
  • Real-time model updates

๐Ÿ“ Technical Notes

Data Format

Each user profile contains:

{
  "username": "user",
  "samples": [
    {
      "text": "prompt text",
      "hold_times": [...],
      "flight_times": [...],
      "down_down_times": [...],
      "ngram_data": {
        "digraphs": {"th": [...], "he": [...]},
        "trigraphs": {"the": [...], "qui": [...]}
      },
      "keystroke_sequence": [...]
    }
  ]
}

ML Models

  • KNN: k=3 neighbors
  • SVM: RBF kernel with probability estimates
  • Ensemble: Majority vote with confidence weighting
  • Neural Network: Deep neural network with dropout layers
  • Persistence: Models saved in identify_app/models/

Feature Engineering

  • Statistical features: Mean and standard deviation of timing data
  • N-gram features: Timing patterns for character sequences
  • Comprehensive features: Combined statistical and n-gram features
  • Fixed feature set: Ensures consistent dimensions across samples

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Add tests
  5. Submit a pull request

๐Ÿ“„ License

This project is open source and available under the MIT License.

About

A typing fingerprinting engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published