HOWTO : DVD ripping with subtitles on OS X

A while ago I looked into ripping some of my DVDs to add them to my iTunes library. I expected this to be as straightforward a process as with a CD, but no. Ripping the video and audio is simple, I’d recommend handbrake or iVI pro for that. The problem is with subtitles. My english is good enough for me to follow most english movies or shows without subtitles, but they still help.

I naively thought that DVD subtitles were merely text files in the DVD… Nope, they are video files : subtitles in a DVD are stored as images which are overlaid over the main image. I’m sure there were plenty of good reasons for what now seems like an asinine technical choice, like for instance it would have been too expensive to require DVD players to have the necessary hardware for displaying text in Unicode and compositing it on the screen in real time, I don’t know. The end result is that ripping a DVD with its subtitles actually requires OCR.

I looked for software doing that on OS X but didn’t find any, until recently I found that you can do it using handbrake and subler. Since the process is very poorly documented (if at all) and isn’t straightforward, it’s worth a blog post.

  1. rip your DVD using handbrake. In the ‘subtitles’ pane, select the ‘VobSub’ tracks you want. Those are the subtitles in video format. handbrake_screenshot
  2. launch subler, create a new document (no, don’t open the m4v file with subler directly)
  3. drag’n drop the m4v you generated on that document. subler will show a pane where you can select what to do with each track of the m4v. You can disable whatever sound tracks you’re not interested in if you want, but that’s all you have to do. The default action for a VobSub track is ‘Tx3g’ which means ‘run OCR on it’. subler_screenshot_1
  4. enable all the tracks you want (in particular the ‘chapters’ and ‘subtitles’ ones, of course)subler_screenshot_2
  5. save the document
  6. You now have an m4v files with text subtitles that Quicktime or your Apple TV will happily display.

    Side notes :

    I discovered subler could do this because I had checked out the code and read the release notes - can’t think of a more poorly advertised feature :-). Typical OSS problem here :).

    subler uses Tesseract for OCR, and only packages the ‘english’ language data file. For more languages, download the corresponding file from the Tesseract language data files.

    as of version 0.25, subler seems to choke on some 5.1 sound tracks on save, simply disable those if you encounter the problem.