This script extracts transcripts from the Podcasts app on macOS.
- Clone the repository
- Install dependencies:
npm install
Note: You need to download the desired podcast episode(s) before you can extract the transcript.
To process all TTML files in your Apple Podcasts cache:
node extractTranscript.js [--timestamps]This will:
- Find all TTML files in
~/Library/Group Containers/243LU875E5.groups.com.apple.podcasts/Library/Cache/Assets/TTML - Create a
./transcriptsdirectory - Save each transcript as
./transcripts/<short_episode_id>.txt
Add --timestamps to include timestamps for each paragraph in the format [HH:MM:SS].
For example:
[00:01:23] This is what the speaker said
[00:01:25] And then they said this
node extractTranscript.js <input_file> <output_file> [--timestamps]The input file comes from the transcript_<long_episode_id>.ttml file in the ~/Library/Group Containers/243LU875E5.groups.com.apple.podcasts/Library/Cache/Assets/TTML/PodcastContent<short_episode_id> directory.
I don't know how these IDs are generated by the Podcasts app.