Transformations and interpolation

2D Transformations

A new vector is created by multiplying a vector by a matrix. By multiplying the coordinates of a matrix and the pixels of an image, each pixel is assigned a new coordinate, making the image transformed. Rotation, scaling, shifting or mirroring are two-dimensional transformations. Each one has its corresponding matrix.

Rotation (CW):

$ \begin{bmatrix} cos(\theta) & -sin(\theta) \\ sin(\theta) & \hfill cos(\theta) \end{bmatrix} ⋅ \begin{bmatrix} I_x \\ I_y \end{bmatrix} = \begin{bmatrix} I_x' \\ I_y' \end{bmatrix} $

Scaling:

$ \begin{bmatrix} S_x & 0 \\ 0 & S_y \end{bmatrix} ⋅ \begin{bmatrix} I_x \\ I_y \end{bmatrix} = \begin{bmatrix} I_x' \\ I_y' \end{bmatrix} $

Offset:

$ \begin{bmatrix} 1 & 0 & T_x \\ 0 & 1 & T_y \\ 0 & 0 & 1 \end{bmatrix} ⋅ \begin{bmatrix} I_x \\ I_y \\ 1 \end{bmatrix} = \begin{bmatrix} I_x' \\ I_y' \\ 1 \end{bmatrix} $

Mirroring (Y):

$ \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix} ⋅ \begin{bmatrix} I_x \\ I_y \end{bmatrix} = \begin{bmatrix} I_x' \\ I_y' \end{bmatrix} $

3D Transformations

Rotation around the X and Y axes can also be implemented, these are three-dimensional transformations. The result is still a two-dimensional image, with each pixel projected onto a plane. Another 3D transformation is the perspective transformation, where a new perspective is created by 4 input and 4 output points.

Perspective transformation:

A 3x3 matrix must be found that transforms the 4 selected points of the image to 4 selected new positions, thereby changing the perspective. By performing the multiplications, a system of equations can be formed, which can then be solved by any known method. The value of h33 is 1, because its value does not matter.

$ \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} ⋅ \begin{bmatrix} x_1 \\ y_1 \\ 1 \end{bmatrix} = \begin{bmatrix} u_1 \\ v_1 \\ w_1 \end{bmatrix} $

$ \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} ⋅ \begin{bmatrix} x_2 \\ y_2 \\ 1 \end{bmatrix} = \begin{bmatrix} u_2 \\ v_2 \\ w_2 \end{bmatrix} $

$ \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} ⋅ \begin{bmatrix} x_3 \\ y_3 \\ 1 \end{bmatrix} = \begin{bmatrix} u_3 \\ v_3 \\ w_3 \end{bmatrix} $

$ \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} ⋅ \begin{bmatrix} x_4 \\ y_4 \\ 1 \end{bmatrix} = \begin{bmatrix} u_4 \\ v_4 \\ w_4 \end{bmatrix} $

Interpolation

The new position of each pixel of the original image can be determined using the given matrix. The problem, however, is that the new image will be full of holes, since not all pixels will be filled on the new image after the transformation. The solution is to work backwards, meaning the corresponding pixel for each pixel in the new image must be found in the original image. The coordinates of pixels are whole numbers, while the calculated values are probably not. In such cases, the fastest method is to copy the value of the nearest neighbour pixel, this is the Nearest-Neighbour interpolation method. However, the edges of the transformed image using this method are of poor quality. A better quality can be obtained by using a weighted average of 4 neighbouring pixels (Bilinear) or a method using a calculation with splines fitted to 16 neighbouring pixels (Bicubic).

The GPU's role

The Graphics Processing Unit (GPU) is responsible for taking over tasks from the CPU that are associated with creating and displaying 2D and 3D graphics. It is a processor, usually with thousands of cores, that is able to quickly perform a huge amount of primitive calculations at once. They are optimized to do matrix and vector multiplication, making them useful for graphics operations. Machine learning, simulations and cryptography are other uses of GPUs, where matrix multiplication is neccessary.

Help

Boglárka image: Download

Notebook image: Download

Source code: Rotation

        
from PIL import Image
import matplotlib.pyplot as plt

# Open the image using PIL
image = Image.open("path-to-resources/boglarka.jpg")

# Resampling filter options are: 'NEAREST', 'BILINEAR', 'BICUBIC' and more
# Press 'Go to Definition' for more info about the function
rotated_image = image.rotate(angle = 20, resample = Image.Resampling.NEAREST)

# Display the image
plt.imshow(rotated_image)
plt.axis('off')
plt.show()
          
        

Source code: Perspective

        
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Open the image using PIL
image = Image.open("path-to-resources/notebook.JPG")

# Define input and output points (top-left, top-right, bottom-left, bottom-right)
input_points = np.float32([[452, 239], [1072, 304], [260, 583], [1052, 696]])
output_points = np.float32([[50, 50], [1438, 50], [50, 942], [1438, 942]])

def compute_perspective_matrix(inp_pts, out_pts):
    A = []
    b = []
    # Populate the two lists (arrays) with values
    for i in range(4):
        x, y = out_pts[i][0], out_pts[i][1]
        u, v = inp_pts[i][0], inp_pts[i][1]
        A.append([x, y, 1, 0, 0, 0, -u * x, -u * y])
        A.append([0, 0, 0, x, y, 1, -v * x, -v * y])
        b.append(u)
        b.append(v)
    # Solve the system of equations
    h = np.linalg.solve(A, b)
    # Add 1 as the last element and reshape into a 3x3 matrix
    # (Reshape is unneccessary in this exact application, because it is flattened in the next step)
    H = np.append(h, 1).reshape(3, 3)
    return H

perspective_matrix = compute_perspective_matrix(input_points, output_points)

# PIL expects a flattened 8-element matrix
# Flatten the matrix and extract the first 8 elements
matrix_for_pil = perspective_matrix.flatten()[:8]

# Apply the transform to the original image
transformed_image = image.transform(
    image.size, 
    Image.Transform.PERSPECTIVE, 
    matrix_for_pil, 
    resample = Image.Resampling.BICUBIC)

# Display images
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
ax[0].imshow(image)
ax[0].set_title("Original Image")
ax[0].axis("off")
ax[1].imshow(transformed_image)
ax[1].set_title("Transformed Image")
ax[1].axis("off")
plt.show()