CBIO (CSCI) 4835/6835: Introduction to Computational Biology
This week, we're moving into image processing. In this lecture, we'll touch on some core concepts around computer vision, or the field dedicated to machine understanding of images. By the end of this lecture, you should be able to
Whenever you hear about or refer to an image analysis task, you've stepped firmly into territory occupied by computer vision, or the field of research associated with understanding images and designing algorithms to do the same.
You can probably name numerous examples of computer vision already, but just to highlight a couple:
This is all to underscore: computer vision is an extremely active area of research and application!
From the perspective of the computer, the simplest constituent of an image is a pixel.
A pixel is a picture element.
(There are many other image formats and representations, but they tend to be variations on this theme)
In either grayscale or color, the pixels are arranged in rectangular arrays, one for each color channel (1 for grayscale, 3 for RGB).
(What could these arrays possibly be in Python?)
Let's jump in and get our hands dirty! First, let's use a relevant image:
I've stored this image in the course GitHub repository under lectures/Lecture22
( https://github.com/eds-uga/cbio4835-sp17 ) if you're interested.
Here's how to load the images in Python:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Loads the image (just like a text file!)
img = mpimg.imread("Lecture22/image1.png")
print(type(img))
<class 'numpy.ndarray'>
Just a regular NumPy array!
Let's see if we can visualize it.
plt.imshow(img)
<matplotlib.image.AxesImage at 0x11b078b38>
This shows the whole image, all three channels.
As evidenced by the .shape
property of the NumPy array, there are three dimensions to this image:
Each slice of the third dimension is a color channel, of which there are 3: one for red, one for green, and one for blue (hence: RGB).
We can plot them separately!
# First, separate out the channels.
r = img[:, :, 0]
g = img[:, :, 1]
b = img[:, :, 2]
# Now, plot each channel separately.
f = plt.figure(figsize = (12, 6))
f.add_subplot(1, 3, 1)
plt.imshow(np.array(r), cmap = "gray")
f.add_subplot(1, 3, 2)
plt.imshow(np.array(g), cmap = "gray")
f.add_subplot(1, 3, 3)
plt.imshow(np.array(b), cmap = "gray")
<matplotlib.image.AxesImage at 0x11bd20828>
For comparison, here's the original:
plt.imshow(img)
<matplotlib.image.AxesImage at 0x11c587f98>
Image analysis of any kind is usually done on a single channel.
(Why?)
Since images are stored as NumPy arrays, all the usual NumPy functionality (besides slicing, as we saw earlier) is available to you.
print(np.max(img))
print(np.min(img))
1.0 0.0
print(np.mean(img))
print(np.median(img))
0.131524 0.0705882
print(np.median(r))
print(np.median(g))
print(np.median(b))
0.207843 0.00784314 0.0745098
Recall that our img
object was loaded from a PNG image; this is the only format type that Matplotlib natively supports (more on that later).
When you read an image into Python, it will automatically detect the format and read it into the closest approximate Python data format it can. However, you can always manually convert it once it's in Python.
For instance, we use a slightly different approach to instead read in our image as grayscale:
import scipy.ndimage as ndimg
img_gray = ndimg.imread("Lecture22/image1.png", flatten = True) # The "flatten" arg is critical
print(img_gray.shape)
(480, 640)
Note how there are only 2 dimensions now--just a height and width.
There is no need for a 3rd dimension because there's only 1 channel: luminescence, or grayscale intensity.
plt.imshow(img_gray, cmap = "gray")
<matplotlib.image.AxesImage at 0x11e39c860>
We can access individual pixels, just as you would individual elements of a matrix NumPy array (because that's all it is):
print(img_gray[100, 200])
35.139
print(img_gray[150, :])
[ 11.60599995 12.50300026 11.27900028 11.69200039 11.69200039 11.62100029 12.20800018 13.0340004 12.51799965 12.02299976 12.14799976 11.43599987 12.14799976 12.02299976 11.9630003 12.47900009 13.30500031 12.8920002 12.8210001 12.93500042 14.43000031 12.96700001 13.57600021 13.20600033 13.17800045 13.6619997 14.26000023 14.63000011 13.7329998 15.40200043 13.61900043 13.75 13.57600021 14.71800041 14.30500031 12.59300041 11.46800041 12.9090004 13.80599976 12.9090004 12.97999954 13.57800007 12.75199986 12.82299995 11.51299953 13.40999985 11.92599964 12.81200027 13.06799984 12.77999973 11.76900005 11.82900047 13.21000004 12.80799961 12.98200035 11.91100025 12.38399982 13.05300045 13.05300045 13.96100044 13.06400013 14.8579998 15.14599991 17.54899979 18.43499947 16.52700043 17.05400085 16.22800064 18.13599968 17.84799957 17.23900032 17.60899925 18.13599968 18.13599968 15.9289999 18.43499947 15.9289999 17.23900032 16.94000053 16.34199905 17.125 16.52700043 15.75500011 19.63100052 17.53800011 18.25 18.84799957 18.54899979 20.04400063 21.53899956 22.43600082 21.84900093 24.64299965 27.33399963 29.9109993 33.61299896 37.20100021 38.98400116 34.92300034 37.61399841 36.11899948 36.4070015 35.33599854 36.00500107 36.5320015 34.74900055 35.22200012 35.93399811 33.8409996 34.15100098 36.24399948 36.64599991 36.95600128 38.55400085 39.64699936 37.85300064 40.06000137 41.85400009 41.85400009 43.83300018 43.27799988 40.24499893 41.85400009 39.2879982 39.57600021 41.18500137 36.70000076 37.18399811 39.09199905 36.9280014 37.41199875 35.91699982 36.2159996 33.94900131 34.84600067 34.54700089 33.94900131 34.35100174 32.87799835 34.36199951 34.95999908 34.95999908 37.23799896 35.67200089 33.65000153 36.15599823 34.67200089 35.3730011 34.67200089 33.10599899 34.07400131 37.29199982 36.58000183 35.79700089 37.09600067 38.37400055 35.31299973 35.31299973 36.99300003 35.19900131 36.43799973 39.01499939 40.50999832 42.00500107 41.17900085 39.38499832 40.62400055 41.4070015 38.91199875 38.11800003 40.50999832 36.80799866 37.70500183 37.22100067 36.92200089 39.12900162 39.72700119 40.39599991 41.70600128 42.60300064 40.21099854 40.02600098 41.4070015 39.02600098 40.63499832 38.54199982 40.93399811 39.32500076 38.35699844 38.24300003 38.65599823 41.93399811 38.53099823 37.7480011 35.66600037 37.0359993 35.96500015 34.47000122 34.69800186 34.17100143 33.5019989 33.57300186 34.39899826 33.31700134 31.45199966 30.66900063 30.85400009 32.57699966 32.61999893 30.28199959 32.29299927 30.73800087 29.75900078 30.04100037 31.2689991 29.52899933 30.45800018 29.68600082 30.91399956 30.78300095 29.80400085 30.15699959 31.38500023 30.4279995 28.63400078 29.34600067 29.94400024 28.13899994 30.94400024 30.62800026 29.13299942 30.48600006 29.20800018 29.78899956 29.60400009 30.61499977 28.45100021 32.72299957 30.88599968 30.51600075 30.44499969 30.81500053 29.23399925 30.41900063 30.35899925 29.57600021 29.39100075 31.15699959 30.04700089 30.98699951 33.13800049 34.00699997 36.00099945 32.48400116 33.45199966 33.8370018 35.7879982 34.6629982 32.59799957 32.62599945 30.40399933 29.88999939 27.55400085 26.04400063 26.85499954 27.70899963 27.18199921 27.29599953 26.05699921 27.55200005 24.70400047 24.17700005 24.91699982 24.98800087 23.40699959 23.96199989 22.46699905 22.19599915 22.38100052 21.78300095 21.32699966 20.98500061 20.87100029 21.05599976 20.7140007 19.5890007 18.0510006 19.27499962 19.04700089 17.57999992 16.53700066 16.11300087 17.49399948 15.13000011 15.4289999 14.61400032 16.40800095 16.09799957 14.98400021 15.15799999 15.15799999 14.15799999 15.52799988 15.65299988 13.25 16.12599945 15.94099998 14.5170002 16.72400093 15.7130003 16.50699997 16.08300018 17.39299965 14.5880003 15.60999966 16.39299965 16.08300018 14.30000019 18.29000092 15.7840004 17.09399986 19.18700027 17.88800049 16.68099976 19.08399963 19.37199974 18.65999985 17.87700081 18.07299995 18.65999985 17.36100006 17.57799911 16.45299911 17.95899963 17.65999985 18.1439991 17.64900017 19.92700005 18.74200058 19.74200058 20.05200005 21.30800056 22.14500046 27.10300064 27.22800064 27. 30.58799934 32.31100082 35.47499847 35.18700027 37.56800079 39.3730011 39.85699844 39.97100067 42.54800034 43.44499969 41.35200119 45.04299927 45.5379982 45.64099884 47.14699936 47.51699829 45.95100021 40.75400162 38.36199951 31.48500061 31.59900093 25.50499916 24.72200012 21.73200035 17.91600037 15.63799953 14.14299965 13.24600029 12.94699955 13.24600029 11.63700008 12.05000019 13.05000019 13.54500008 14.44200039 14.14299965 16.83399963 16.83399963 19.4109993 21.02000046 22.2159996 30.58799934 37.45399857 45.35300064 58.69400024 67.66400146 59.87900162 54.09500122 42.54800034 32.37099838 31.18600082 31.18600082 32.85499954 33.99100113 32.08300018 34.77399826 35.67100143 35.25799942 33.39300156 35.77399826 35.48600006 35.06200027 35.07300186 31.89800072 25.31999969 22.03100014 22.74300003 22.74300003 22.02000046 20.16600037 19.2689991 18.84499931 17.88800049 19.44300079 17.57799911 16.26799965 15.01200008 14.88700008 13.81599998 12.91899967 14.21800041 13.02200031 12.73400021 13.54899979 10.56999969 10.08600044 11.99400043 10.61299992 10.68400002 11.4989996 9.82999992 11.43900013 9.94400024 9.75899982 10.17199993 9.38899994 10.81299973 9.98700047 9.50300026 9.01900005 9.43200016 11.51399994 9.73099995 8.82299995 9.31799984 9.38899994 7.66599989 8.74800014 10.41699982 9.60200024 8.69400024 9.00399971 10.08600044 9.48799992 9.90100002 10.30300045 10.18900013 10.01500034 9.10700035 10.23200035 9.86200047 8.79100037 8.90499973 10.37199974 11.19799995 10.71399975 10.34399986 10.68599987 12.52299976 12.72299957 12.86900043 12.73799992 13.49300003 14.31900024 15.48700047 15.18799973 15.30200005 14.63300037 15.38799953 15.95800018 16.52799988 17.23999977 16.6420002 16.48500061 16.44199944 17.25300026 15.25899982 15.69999981 13.70600033 14.33199978 14.21800041 14.28899956 14.61600018 13.70800018 11.95300007 12.62199974 12.69299984 10.81299973 12.97700024 12.92300034 12.29300022 13.56000042 12.54899979 13.14700031 11.83699989 12.61999989 12.54899979 11.42399979 12.3210001 12.50599957 12.98999977 12.3920002 13.09300041 12.09300041 13.47399998 11.79399967 12.16399956 12.4630003 11.73999977 12.05000019 10.85400009 11.15299988 12.64799976 11.7510004 13.61600018 12.94699955 13.72999954 12.83300018 12.83300018 14.32800007 11.93599987 14.02900028 13.91499996 12.5340004 12.5340004 14.61600018 13.01799965 14.92599964 12.12100029 9.84300041 11.22399998 8.83199978 10.92500019 11.3380003 12.42000008 12.71899986 14.02900028 13.61600018 12.83300018 12.71899986 12.83300018 15.11100006 13.50199986 13.01799965 14.02900028 15.70899963 13.61600018 14.51299953 14.99699974 15.52400017 14.81200027 15.11100006 16.30699921 16.6060009 16.6060009 17.0189991 16.00799942 15.40999985 16.6060009 17.20400047 17.80200005 20.79199982 18.39999962 18.51399994 20.37899971 22.57500076 22.88500023 22.38999939 23.36899948 25.75 24.26600075 28.25600052 28.2670002 26.04899979 25.26600075 26.16300011 24.06999969 26.34799957 26.35899925 25.75 25.75 25.875 26.04899979 25.76099968 28.25600052 27.07099915 26.47299957 28.55500031 28.2670002 31.24600029 31.54500008 30.95800018 30.46299934 26.76099968 26.46199989 28.55500031 25.45100021 26.94599915 25.46199989 25.86400032 23.65699959 22.87400055 21.38999939 20.49300003 18.87299919 19.29700089 19.47100067 17.97599983 18.27499962 18.09000015 19.2859993 18.38899994 19.17200089 18.5739994 19.77000046 21.26499939 20.66699982 19.17200089 22.15099907 23.76000023 26.34799957 29.73999977 30.8220005 29.44099998]
print(np.max(img_gray[:, 400]))
78.428
If you so desire, you can even modify the pixel values directly, again just as you would for a regular NumPy array.
Fair warning: doing this alters the image! You may want to copy the image structure first...
for i in range(img_gray.shape[0]):
for j in range(120, 130):
img_gray[i, j] = 255
plt.imshow(img_gray, cmap = "gray")
<matplotlib.image.AxesImage at 0x11e9140f0>
Another very useful way of obtaining information about an image is to view the histogram of pixel values.
You can do this regardless of whether it's a grayscale or RGB image, though in the latter case it's useful to plot the pixel values separated by channel.
First, let's re-import the image as grayscale and take a look at how the pixel values show up in a histogram:
img_gray = ndimg.imread("Lecture22/image1.png", flatten = True)
_ = plt.hist(img_gray.flatten(), bins = 25)
This tells us some very useful information--primarily, that most of the pixel values are centered around what seems like a pretty low number (20-30), so by and large the image is very dark (which we saw).
There do seem to be a few light spots on an island around 120-140, but that's it.
Let's take a look now at each channel individually.
fig = plt.figure(figsize = (16, 4))
plt.subplot(131)
_ = plt.hist(r.flatten(), bins = 25, range = (0, 1), color = 'r')
plt.subplot(132)
_ = plt.hist(g.flatten(), bins = 25, range = (0, 1), color = 'g')
plt.subplot(133)
_ = plt.hist(b.flatten(), bins = 25, range = (0, 1), color = 'b')
Recall what each channel represented:
There seems to be very little HSP27, while there is tons of actin and the quantity of DAPI falls somewhere in between.
...oh wait, did you see the scales for each one?
x = np.linspace(0, 1, 25)
plt.plot(x, np.histogram(r.flatten(), bins = 25)[0], color = 'r', label = 'Actin')
plt.plot(x, np.histogram(g.flatten(), bins = 25)[0], color = 'g', label = 'HSP27')
plt.plot(x, np.histogram(b.flatten(), bins = 25)[0], color = 'b', label = 'DAPI')
plt.legend()
<matplotlib.legend.Legend at 0x11f2b7c88>
So, yes:
While we're on the topic of histograms, there is a convenient way to try and "reshape" the pixel histograms so as to make the resulting image a bit sharper. This is called histogram equalization.
The idea is simple enough: re-map the pixel values in the image so that the corresponding histogram is perfectly flat.
Basically it tries to fill in the "valleys" and flatten the "peaks" of the pixel histograms we saw earlier--this has the effect of bringing out very dim signal and dampening oversaturated signal.
Let's see an example, using one of the image channels.
from PIL import Image, ImageOps
img_pil = Image.open("Lecture22/image1.png")
beq = ImageOps.equalize(img_pil.split()[2])
f = plt.figure(figsize = (12, 6))
f.add_subplot(1, 2, 1)
plt.imshow(b, cmap = 'gray')
f.add_subplot(1, 2, 2)
plt.imshow(np.array(beq), cmap = 'gray')
<matplotlib.image.AxesImage at 0x11f356438>
We can directly see why these two images look different (and, specifically, what histogram equalization did) by recomputing the channel histograms:
plt.plot(img_pil.split()[2].histogram(), 'b')
plt.plot(beq.histogram(), 'k')
[<matplotlib.lines.Line2D at 0x11f1b1a90>]
Autocontrast is another tool that modifies the pixel histograms to try and make the resulting images more viewable. In this case, the goal of autocontrast is to maximize (normalize) image contrast.
This function calculates a histogram of the input image, removes cutoff percent of the lightest and darkest pixels from the histogram, and remaps the image so that the darkest remaining pixel becomes black (0), and the lightest becomes white (255).
In essence, you choose some percentage cut-off (say: 50%, or 0.5), removes that fraction of pixels that are both darkest and lightest (assumes they're noise and throws them away), then remaps the remaining pixels.
Here's what it might look like:
bcon = ImageOps.autocontrast(img_pil.split()[2], 0.5)
f = plt.figure(figsize = (12, 6))
f.add_subplot(1, 2, 1)
plt.imshow(b, cmap = "gray")
f.add_subplot(1, 2, 2)
plt.imshow(np.array(bcon), cmap = "gray")
<matplotlib.image.AxesImage at 0x11f4f6b70>
In this case, we're trying to chop off pixel values at both ends of the histogram (lightest and darkest) and reshuffling the others around to make them more visible, hopefully improving contrast.
The effects on the underlying histograms look like:
plt.plot(img_pil.split()[2].histogram(), 'r')
plt.plot(bcon.histogram(), 'k')
[<matplotlib.lines.Line2D at 0x11f441978>]
It closely mimics the original histogram, but because some values at the tails were thrown away, all the other values were reshuffled--you end up with more pixels some of the middle values, which is (presumably) the signal you're interested in.
Thresholding is the process by which you define a pixel threshold--say, the value 100--and set every pixel below that value to 0, and every pixel above that value to 255.
In doing so, you binarize the image, as each pixel takes on only one of two possible values.
Remember boolean indexing?
(head on over to slide 40 of Lecture 8--January 31--if you're a little fuzzy on the details)
In short, you can create masks based on certain boolean conditions so you can modify certain parts of the array while holding the others constant.
Here's the example straight from the lecture:
x = np.random.standard_normal(size = (7, 4))
print(x)
[[ -6.49979242e-02 2.08316539e-01 -7.06011870e-01 -9.82043350e-01] [ -1.09983995e+00 -1.18645041e+00 -9.53307184e-01 6.89604810e-02] [ -2.64952299e+00 -5.97906503e-01 -4.66659443e-01 2.37222881e-01] [ -1.72373952e+00 1.03769970e+00 9.85069452e-01 -7.08066133e-01] [ 1.47547962e+00 1.34952992e+00 1.24486749e+00 1.81284437e+00] [ 6.33840445e-01 -9.44929143e-01 -1.70004374e-03 -2.09578017e+00] [ -6.91257130e-01 -6.03497503e-01 1.00038630e+00 8.05767208e-01]]
If we just want the positive numbers, we can define a mask using the condition you'd find in an if
statement:
mask = x < 0 # For every element of x, ask: is it < 0?
print(mask)
[[ True False True True] [ True True True False] [ True True True False] [ True False False True] [False False False False] [False True True True] [ True True False False]]
The mask is just a bunch of True
and False
values.
Now we can use the mask to modify the parts of the original array that correspond to True
in the mask:
x[mask] = 0.0
print(x)
[[ 0. 0.20831654 0. 0. ] [ 0. 0. 0. 0.06896048] [ 0. 0. 0. 0.23722288] [ 0. 1.0376997 0.98506945 0. ] [ 1.47547962 1.34952992 1.24486749 1.81284437] [ 0.63384045 0. 0. 0. ] [ 0. 0. 1.0003863 0.80576721]]
Back to images! Let's use a threshold on our blue channel:
b_thresh = np.array(bcon) > 120 # Every pixel greater than 120 is "True", otherwise it's "False"
f = plt.figure(figsize = (12, 6))
f.add_subplot(1, 2, 1)
plt.imshow(np.array(bcon), cmap = "gray")
f.add_subplot(1, 2, 2)
plt.imshow(b_thresh, cmap = "gray")
<matplotlib.image.AxesImage at 0x11c502048>
Any ideas how we might, say, count the number of cells?
There is an entire ecosystem of computer vision packages for Python.
Some are very general (a lot like scipy.ndimage
and PIL
) while some are very specific to certain classes of problems.
You could spend an entire career with just one or two of these packages, but very briefly I'll name a few of the most popular.
(We'll make use of some of them on Thursday)
scikit-image
¶If scipy.ndimage
or PIL
proves to be insufficient for your needs, this should be the first stop you take in looking for alternatives.
It has a wealth of general-purpose image processing routines built-in. It's actively developed and very easy to use, and integrates well with NumPy and SciPy.
It also comes with a bunch of basic tutorials and sample data to help you get your feet wet.
mahotas
¶This is another excellent general-purpose image processing library, though it has a slight preference for bio-imaging applications. After all, its author is a computational biologist!
Like scikit-image
, it's actively developed, easy to use, and integrates fully with the NumPy + SciPy scientific computing environment for Python.
This is probably your first stop if you're looking for some basic bioimaging tools; we'll make use of it Thursday, most likely.
OpenCV (for "Open Computer Vision") is the Grand Daddy of image processing packages.
You'll want to use this if computer vision is a significant part of your day-to-day career. It's not for the faint of heart, however: it's a C++ library with Python bindings, which means you have to install from source, and that can be painful depending on how (un)comfortable you are with compiling things from scratch.
(though if you use the Anaconda distribution of Python, and you connect it to the conda-forge channel, you can download pre-built OpenCV packages that WAY SIMPLIFY this process)
That said, OpenCV has everything:
The list goes on and on.
It's well-maintained, well-documented, and while it can be a little tricky to use, it has a huge community of developers and users ready to help.
Like scikit-image
, it also provides a ton of tutorials for typical use-cases, though OpenCV's definition of "typical" is a little different: they're actually pretty in-depth!
Thursday's lecture--
Stay tuned!