Getting Frames from YouTube

YouTube has a wealth of data but often you don’t feel like downloading it all to disk. How can we stream video to python and then access the frame data?

Start with a conda environment with Python with Jupyter Lab. At the time of writing Python 3.7 was a good base environment that was supported by most packages:

conda create --name youtube_frames python=3.7
conda activate youtube_frames
conda install jupyterlab

Install OpenCV (try both C++ and Python):

conda install -c conda-forge opencv
pip install opencv-python

Install YouTube tools:

pip install youtube-dl
pip install pafy

Edit the backend_youtube_dl.py files in the conda environment directly (for me – ~/anaconda3/envs/youtube_frames/lib/python3.7/site-packages/pafy) to disable the “dislike info”, which appears no longer supported by YouTube (comment out as below).

...
        self._title = self._ydl_info['title']
        self._author = self._ydl_info['uploader']
        self._rating = self._ydl_info['average_rating']
        self._length = self._ydl_info['duration']
        self._viewcount = self._ydl_info['view_count']
        self._likes = self._ydl_info['like_count']
   ---> # self._dislikes = self._ydl_info['dislike_count']
        self._username = self._ydl_info['uploader_id']
        self._category = self._ydl_info['categories'][0] if       self._ydl_info['categories'] else ''
        self._bestthumb = self._ydl_info['thumbnails'][0]['url']
        self._bigthumb = g.urls['bigthumb'] % self.videoid
        self._bigthumbhd = g.urls['bigthumbhd'] % self.videoid
        self.expiry = time.time() + g.lifespan
...

Then open up a Jupyter Notebook and try out:

import cv2
import pafy


# Choose a YouTube URL (paste from viewing a video on the web)
url = "https://www.youtube.com/watch?v=BmrUJhY9teE"

# Use pafy to get the video stream url
video = pafy.new(url)
# Have a look at available streams
print("Streams : " + str(video.allstreams))
# But for now get best stream
best = video.getbest(preftype="mp4")

# Initialise OpenCV Video Capture Object with URL
capture = cv2.VideoCapture(best.url)

# Test out viewing very slowly frame by frame
while(capture.isOpened()):
    # Capture frame-by-frame
    ret, frame = capture.read()

    if ret == True:
        # Display the resulting frame
        cv2.imshow('Frame',frame)

        # Press Q on keyboard to  exit
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    # Break the loop
    else:
        break

You can then edit the above loop to perform processing instead of showing the frame.

Best Stream

Playing Around with Streams

Not a lot of people realise that a YouTube video isn’t “one” thing – it’s legion. There are multiple available video feeds of different types and resolutions, and separate audio feeds. The pafy package can help us access these.

As above to view the available streams run:

print("Streams : " + str(video.allstreams))

You can access stream details using the properties of the stream object:

for i, stream in enumerate(video.videostreams):
    print(f"Stream {i} - type: {stream.extension} - resolution: {stream.dimensions} - bitrate: {stream.bitrate}")

The mp4 videos couldn’t be decoded in my setup but the webm video worked fine. You can select different streams by just selecting items in the list of streams and using the URL for that stream:

smallest = video.videostreams[1]
capture = cv2.VideoCapture(smallest.url)
Tiny Stream

Playing with Colour Spaces

Ever wondered what a video stream looked like in YUV space?

Yes? Then try running the code below:

# Read until video is completed
while(capture.isOpened()):
    # Capture frame-by-frame
    ret, frame = capture.read()

    if ret == True:
        # Convert to YUV
        img_yuv = cv2.cvtColor(frame, cv2.COLOR_BGR2YUV)
        # Split into YUV planes
        y, u, v = cv2.split(img_yuv)
        # Show each plane in a separate window
        cv2.imshow('y', y)
        cv2.imshow('u', u)
        cv2.imshow('v', v)

        # Press Q on keyboard to  exit
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    # Break the loop
    else:
        break
Colour component spaces are cool.

Playing Audio Streams

This is a challenge for another day.

We have the audio stream URL from pafy. We know PyAudio can be used to access audio data and dump this to numpy. What we don’t know is how to open a web audio stream and access the data using PyAudio.

An alternative is maybe to use requests or urllib to directly retrieve frames of audio data and then convert the bytes to numpy arrays (as per here).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s