Skip to content

mattdanielmurphy/apple-podcast-transcript-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apple Podcasts Transcript Extractor

This script extracts transcripts from the Podcasts app on macOS.

Installation

  1. Clone the repository
  2. Install dependencies: npm install

Usage

Note: You need to download the desired podcast episode(s) before you can extract the transcript.

Batch Mode

To process all TTML files in your Apple Podcasts cache:

node extractTranscript.js [--timestamps]

This will:

  1. Find all TTML files in ~/Library/Group Containers/243LU875E5.groups.com.apple.podcasts/Library/Cache/Assets/TTML
  2. Create a ./transcripts directory
  3. Save each transcript as ./transcripts/<short_episode_id>.txt

Timestamps Option

Add --timestamps to include timestamps for each paragraph in the format [HH:MM:SS].

For example:

[00:01:23] This is what the speaker said
[00:01:25] And then they said this

Single File Mode

node extractTranscript.js <input_file> <output_file> [--timestamps]

Where does the input file come from?

The input file comes from the transcript_<long_episode_id>.ttml file in the ~/Library/Group Containers/243LU875E5.groups.com.apple.podcasts/Library/Cache/Assets/TTML/PodcastContent<short_episode_id> directory.

How do I get the episode IDs?

I don't know how these IDs are generated by the Podcasts app.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published