A tool for analyzing Mozilla Firefox Play Store reviews using AI classification. This project fetches Firefox app reviews from Google BigQuery, translates non-English reviews, and classifies them into predefined categories using OpenAI's API.
- Review Fetching: Retrieves Firefox app reviews from Google BigQuery
- Multi-language Support: Translates non-English reviews to English
- AI Classification: Categorizes reviews into 22+ predefined categories (Webcompat, Performance, Privacy, etc.)
- Web Dashboard: Interactive HTML dashboard for visualizing results
- Weekly Analysis: Processes reviews in weekly batches with trend analysis
- Python 3.10+
- OpenAI API key
- Google Cloud credentials (for BigQuery access)
- Docker (optional)
-
Clone the repository:
git clone <repository-url> cd play-store-analysis
-
Install dependencies:
pip install -r requirements.txt
-
Set up authentication:
# Set OpenAI API key export OPENAI_API_KEY="your-openai-api-key" # Authenticate with Google Cloud gcloud auth login gcloud auth application-default login
Basic usage (processes next week in sequence):
python generate.py --summaryFile results/review-summary.jsonSpecify date range:
python generate.py \
--startDate 2024-01-01 \
--endDate 2024-01-07 \
--summaryFile results/review-summary.jsonProcess from CSV file:
python generate.py \
--input data/reviews.csv \
--startDate 2024-01-01 \
--endDate 2024-01-07 \
--summaryFile results/review-summary.jsonRun analysis with default settings:
docker run --rm \
-e OPENAI_API_KEY="your-api-key" \
-v ~/.config/gcloud:/root/.config/gcloud:ro \
-v $(pwd)/results:/app/results \
play-store-analysisRun with custom date range:
docker run --rm \
-e OPENAI_API_KEY="your-api-key" \
-v ~/.config/gcloud:/root/.config/gcloud:ro \
-v $(pwd)/results:/app/results \
play-store-analysis \
conda run -n play-store-analysis python generate.py \
--startDate 2024-01-01 \
--endDate 2024-01-07 \
--summaryFile results/review-summary.jsonRun with local data file:
docker run --rm \
-e OPENAI_API_KEY="your-api-key" \
-v $(pwd)/data:/app/data \
-v $(pwd)/results:/app/results \
play-store-analysis \
conda run -n play-store-analysis python generate.py \
--input data/reviews.csv \
--startDate 2024-01-01 \
--endDate 2024-01-07 \
--summaryFile results/review-summary.jsonThe main configuration is in config.py:
- LLM Settings: Model selection, API keys, rate limiting
- BigQuery: Project, dataset, and table configuration
- Firefox Apps: Package names to analyze
- Categories: 22 predefined review classification categories
- File Paths: Data and results directory locations
The system classifies reviews into these categories:
- Technical: Webcompat, Slow, Pageload, Memory, Battery, Crash
- Features: Tabs, Addons, Bookmarks, Sync, Translations
- Media: Audio, Video, Stuttering
- UI/UX: UI, Scrolling, Startup
- Other: Privacy, Policy, Networking, Benchmark, Satisfied
The tool generates several output files:
results/review-summary.json: Summary statistics for all processed weeksresults/results-YYYY-MM-DD-to-YYYY-MM-DD.json: Detailed results for each weekdata/YYYY-MM-DD-to-YYYY-MM-DD/reviews.pkl: Cached raw review datadata/YYYY-MM-DD-to-YYYY-MM-DD/classify.pkl: Cached classified review data
Open index.html in a web browser to view the interactive dashboard:
- Overview: Trend charts showing positive vs negative reviews and category trends
- Week Compare: Side-by-side comparison of two weeks with difference analysis
- Reviews: Searchable table of individual reviews with filtering options
The dashboard requires the results/ directory with generated JSON files.
play-store-analysis/
├── config.py # Configuration settings
├── generate.py # Main analysis script
├── entry.sh # Docker entry script
├── index.html # Web dashboard
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
├── lib/
│ ├── openai.py # OpenAI API client
│ ├── reviews.py # BigQuery data fetching
│ ├── utils.py # Utility functions
│ └── validation.py # Data validation
├── data/ # Cached review data
└── results/ # Analysis results
- Edit the
CATEGORIESdictionary inconfig.py - Add the category name and description
- The AI will automatically classify reviews into the new category
The tool includes built-in rate limiting for API calls:
- 0.5 second delay between requests (configurable)
- 3 retry attempts with exponential backoff
- Progress logging every 10 reviews
See LICENSE file for details.