Working with YouTube and Extracting Audio

Working with YouTube and Extracting Audio

Image

Dave Taylor
Tue, 10/10/2017 – 06:48

In my last few articles, I’ve been exploring the capabilities of ImageMagick,
showing that just because you’re working on a command line
doesn’t mean you’re stuck processing only text. As I explained,
ImageMagick makes it easy to work
with images
, adding
watermarks
and analyzing
content far more accurately than with the standard Linux
file command,
and much, much more.

Continuing in a similar vein, I want to look at audio and video in this
article.
Well, maybe «listen» to audio and «look» at video, but
again, I’m still focusing on the command line, so in both instances,
player/viewer apps are required.

YouTube to MP3 Audio

As someone who watches a lot of lectures online, I’m also intrigued by
the online services that can extract just the audio portion of a YouTube or
Vimeo video and save it as an MP3. Listening to a lecture while driving is
far safer than trying not to watch a video on the move, for example.

Since there are so many live concert performances online, many people also
like to use a video-to-MP3 service to add those songs to their music libraries.

Note: be leery of copyright issues with any download and conversion of content.
Just because it’s on Vimeo, YouTube or other online service,
doesn’t mean you have permission to extract the audio or even download
it and save it on your computer.

Let’s start with the most basic functionality: downloading a video from
YouTube so you can watch it on your Linux system. There are a lot of
browser plugins and even websites devoted to this task, but who wants to
risk malware or be plagued by porn site ads? Yech.

Fortunately, there’s a terrific public domain program called youtube-dl
on GitHub that covers all your needs. At its most basic, it lets you
download video content from YouTube and a variety of other online video
repositories, but as you’ll learn, it can do quite a bit more.

You can grab a copy for your system
here.

Let’s start by downloading a copy of one of my own YouTube videos.
It’s a review of the splendid 1More quad-driver headphones, and its URL
is https://www.youtube.com/watch?v=BFL1E77hTHQ.

As an aside: I have a YouTube channel where I review consumer electronics
and gadgets. You should subscribe! Find all my videos at
http://youtube.com/askdavetaylor.

YouTube has a bunch of ways it can assemble a URL, however, including using
its URL-shortener youtu.be, but fortunately, youtube-dl can handle the
variations.

Downloading a copy of the video to the current working directory is now as
simple as:


youtube-dl 'https://www.youtube.com/watch?v=BFL1E77hTHQ'

The full output of the command is a bit, um, hairy, however:


$  youtube-dl 'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
WARNING: Requested formats are incompatible for merge and
will be merged into mkv.
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f137.mp4
[download] 100% of 118.74MiB in 02:49
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f251.webm
[download] 100% of 4.81MiB in 00:03
[ffmpeg] Merging formats into "1More Quad Driver In-Ear
Headphones Reviewed-BFL1E77hTHQ.mkv"
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f137.mp4 (pass -k to keep)
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f251.webm (pass -k to keep)
$

You can wade through the output messages, but it’s the message from
companion open-source program ffmpeg that’s most important:
merging formats into ... mkv.

In other words, the download format of the video is MKV by default. MKV is
part of the increasingly popular Matroska Multimedia Container format, and
it works with a lot of video players (including VideoLan, aka VLC, my favorite
cross-platform video player).

A quick ls reveals the result and that the default filename is taken from
the title of the video, something that might not be particularly desirable:


$ ls -lh *mkv
-rw-r--r--  1 taylor  staff   124M Jan 31 16:56 1More Quad
Driver In-Ear Headphones Reviewed-BFL1E77hTHQ.mkv

Do you prefer to specify the output name and have the output file in MP4 (MPEG4)
format instead? That’s doable:


$ youtube-dl -o 1more-review.mp4 -f mp4 \
    'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
[download] Destination: 1more-review.mp4
[download] 100% of 57.63MiB in 00:27

As a bonus, you get less ominous informational messages from the program
too, so it’s cleaner. And the output, sure enough, is in MP4 format:


$ ls -lh *mp4
-rw-r--r--@ 1 taylor  staff  58M Jan 31 16:57 1more-review.mp4

As a second bonus, it’s also more efficient in its video encoding, so
the MP4 version of the downloaded video is only 58M as opposed to the 124M
of the MKV-merged version.

So how do you watch it? Most likely, do a double-click and it’ll be up and
running, as shown in Figure 1.

Figure 1. Downloaded YouTube Video Playing in Ubuntu Player

That’s easy enough, but the original goal was to be able to extract just the audio
component of a YouTube video, so let’s look at that task.

Downloading Just the Audio Track

Since I’ve already started to delve into the command-line options for
the youtube-dl program, it’s not a leap to find out that there’s
yet another command-line option that lets you save just the audio portion
of a video:


$ youtube-dl -x --audio-format mp3 \
    'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.webm
[download] 100% of 4.81MiB in 00:07
[ffmpeg] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.mp3
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.webm (pass -k to keep)
$ ls -lh *mp3
-rw-r--r--  1 taylor  staff   4.0M Jan 31 18:22 1More Quad
Driver In-Ear Headphones Reviewed-BFL1E77hTHQ.mp3

That’s easy enough, and the output is delightfully small: 4MB total. The problem is,
there’s the same awkward naming issue, so the addition of -o
output-filename
definitely will be a win. But, really, youtube-dl makes
these tasks trivially easy, as long as you’re willing to figure out all
of its command-line options.

Writing a Wrapper Script

Instead of worrying about the obscure command-line flag notation, let’s
just write a script that does the heavy lifting for you. I’m going to
call it ytdl for «youtube download», and by default, it’ll accept
just a URL and output an MP4 format video file that has the same name as
the YouTube shortcut (for example, the above video would become BFL1E77hTHQ.mp4).

Add a second parameter, and that becomes the output filename. Specify the
-a flag, and it saves audio output only, in MP3 format instead.

Let’s start with a usage block if the user forgets to specify anything
or just needs a simple reminder:


if [ $# -eq 0 ] ; then
  echo "Usage: $(basename $0) {-a} YouTubeURL {outputfile}"
  echo "   where -a extracts the audio portion in MP3 format"
  exit 1
fi

That’s easy enough. The script is also going to use some predefined combinations
of flags to make it easier to write:


youtubedl="/usr/local/bin/youtube-dl"
audioflags="-x --audio-format mp3"
videoflags="-f mp4"
flags=$videoflags       # default set of command flags
audioonly=0             # default is audio + video

If the user specifies the -a flag,
audioonly will be set to true (that is, 1),
and the default flags will switch from video to audio:


if [ "$1" = "-a" ] ; then
  audioonly=1
  flags=$audioflags
  shift
fi

You’ll recall that the shift command moves all the parameters
«down» one to the left, so $2 becomes
$1 and so on. It’s an easy way to
process and discard parameters in a script, of course.

The biggest block of code creates a default output filename from the
YouTube URL:


if [ $# -eq 1 ] ; then
  # no output filename specified
  outfile=$(echo "$1" | cut -d= -f2)
  if [ $audioonly -eq 1 ] ; then
    outfile="$outfile.mp3"
  else
    outfile="$outfile.mp4"
  fi
else
  outfile="$2"
fi

This isn’t the most robust code, because it assumes that the URL
specified is in a format like the examples used herein,
youtube-yadda-yadaa?value=shortcode. It extracts the shortcode and simply
appends an appropriate filename suffix. There are better ways to do this,
but that’s okay, this’ll work for now. Just realize that your
output format might be a bit weird if you have a very different type of
YouTube URL or a URL from another site.

And, finally, the actual invocation of the
youtube-dl command:


$youtubedl $flags -o "$outfile" "$1"

That’s it! Now you can download a video as simply as:


$ ytdl 'https://www.youtube.com/watch?v=5yXDzg_QDGw' wiper.mp4

And an audio portion with:


$ ytdl -a 'https://www.youtube.com/watch?v=5yXDzg_QDGw'

Nice, eh?

I’ve way overrun my space for this column, but this is such a fun and
simple script atop a terrific, powerful program, that it’s worth it,
right? And now you know how to make YouTube work for you, rather than
vice versa!