Making Proper Histograms with Numpy and Matplotlib

Often I find myself needing to visualise an array, such as bunch of pixel or audio channel values. A nice way to do this is via a histogram.

When building histograms you have two options: numpy’s histogram or matplotlib’s hist. As you may expect numpy is faster when you just need the data rather than the visualisation. Matplotlib is easier to apply to get a nice bar chart.

So I remember, here is a quick post with an example.

# First import numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt

I started with a data volume of size 256 x 256 x 8 x 300, corresponding to 300 frames of video at a resolution of 256 by 256 with 8 different image processing operations. The data values were 8-bit, i.e. 0 to 255. I wanted to visualise the distribution of pixel values within this data volume.

Using numpy, you can easily pass in the whole data volume and it will flatten the arrays within the function. Hence, to get a set of histogram values and integer bins between 0 and 255 you can run:

values, bins = np.histogram(data_vol, np.arange(0, 255))

You can then use matplotlib’s bar chart to plot this:

plt.bar(bins[:-1], values, width = 1)

Using matplotlib’s hist function, we need to flatten the data first:

results = plt.hist(data_vol.ravel(), bins=np.arange(0, 255))
plt.show()

The result of both approaches is the same. If we are being more professional, we can also use more of matplotlib’s functionality:

fig, ax = plt.subplots()
results = ax.hist(data_vol.ravel(), bins=np.arange(0, 255))
ax.set_title('Pixel Value Histogram')
ax.set_xlabel('Pixel Value')
ax.set_ylabel('Count')
plt.show()

Things get a little more tricky when we start changing our bin sizes. A good run through is found here. In this case, the slower Matplotlib function becomes easier:

fig, ax = plt.subplots()
results = ax.hist(
    data_vol.ravel(), 
    bins=np.linspace(0, 255, 16)
)
ax.set_title('Pixel Value Histogram (4-bit)')
ax.set_xlabel('Pixel Value')
ax.set_ylabel('Count')
plt.show()
Using 16-bins provides us with 4-bit quantisation. You can see here we could represent a large chunk of the data with just three values (if we subtract 128: <0 = -1, 0 = 0 and >0 = 1).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s