Original Source Here

How to classify handwritten digits in python

Without the help of any machine learning libraries

If you have read the previous articles in this series (links at bottom of page), you should have built a DNN class completely from base principles. Let’s see if we can use this class to create a neural network that’s able to read handwritten digits.

This is a classic example of something that is easy for us humans to do, but it is incredibly difficult to write a program that accomplishes the same task. In the case of us humans classifying pictures, let’s think about what is happening. For a human looking at an image, the brain is receiving input in the form of photons and classifying the image based on this input.

Based on this we can say that there must be some relationship between the photons hitting our eye and the content of the picture. Obviously, right? Of course, what the picture looks like has something to do with what the picture is of. But, modeling this mathematically requires this proposition;

There exists some mathematical function that can tell us what is in the picture, based on the input (photons) we receive from the picture.

We will be finding this incredibly complicated function easily, using a neural network.

Getting and cleaning our data

In the spirit of advancing AI, the Modified National Institute of Standards and Technology (MNIST) has put together 10’s of thousands of pictures of handwritten digits, each with a label. This data can be downloaded here. Below is an example of one of these digits.

This picture is labeled as a 3. *Image by author*

Now, in the case of us humans, we receive input as photons entering our eyes. Similarly (sort of), to a computer this picture is just a matrix of pixel values. These pixel values range from 0 to 255 (since the image is black and white), as shown below:

*Image by author*

These pictures are 28X28, with 784-pixel values. Our goal is to map these 784-pixel values to the number that is written. So, let’s get started. Let’s first look at the data. You should have downloaded a zip folder from the Kaggle link I provided above. After you unzip the folder you will see a mnist_test and mnist_train CSV. For right now we will only be focused on the mnist_train CSV. Below I use pandas to open the file, then show the contents of the file.

import pandas as pd
import numpy as np
import matplotlib.pyplot as pltdata = pd.read_csv(‘mnist_train.csv’)

*Image by author*

As you can see, the first column of this data is the labels (the number that the image represents), and the rest are the pixel values for the image (totaling 785 columns). Each row is an individual image. Using pandas iloc function we can separate the inputs from the labels. Also, we ensure that the input is normalized between 0 and 1.

labels = np.array(data.iloc[:,0])
x_train = np.array(data.iloc[:,1:])/255 
# We divide by 255 so that all inputs are between 0 and 1

Our next step may seem odd, but it is essential. Right now each label is just one number or one dimensional. This means the neural network must output the correct number between 0 and 9. We can make the classification task easier for our neural network by increasing the dimensions of the output to 10 dimensions, one for each number. For example, we will change the label ‘0’ so that it is ‘[1,0,0,0,0,0,0,0,0,0], ‘1’ = ‘[0,1,0,0,0,0,0,0,0,0] and so on. This is called ‘one-hot-encoding’, and I do this for our labels below.

encoded_labels = []
for i in range(len(labels)):
  naked = [0,0,0,0,0,0,0,0,0,0]
  naked[labels[i]] = 1
  encoded_labels.append(naked)

Now we can use matplotlib to check out what these images actually look like.

# Take a look at what the images look like
random_index = np.random.randint(0,40000)
img = x_train[random_index].reshape(28,28)
plt.imshow(img, cmap = “gray”)

As you can see, our neural network really has its work cut out for it with some of these pictures…

*Image by author*

Creating our Neural Network and training

We will be using the DNN class we created in the article prior to this one (found here). Each image has 784 inputs and 10 outputs, I will choose one hidden layer with 1250 neurons. Therefore, our layers would be [784,1250,10]. Inside a for loop, we simply generate a random index, then run the train function on the data and label that corresponds to this index. I also print some useful information every 1000 steps, see below:

model = DNN([784,1250,10])from collections import deque
error = deque(maxlen = 1000) 
"""A deque is just a list that stays at 1000 units long (last item gets deleted when a new item gets added when at length = 1000)"""for n in range(30000):
  index = np.random.randint(0,59998)
  error.append(model.train(x_train[index], encoded_labels[index]))
  if n%1000 == 0:
    print(“\nStep: “,n)
    print(“Average Error: “, sum(error)/1000)
    plt.imshow(x_train[index].reshape(28,28), cmap = “gray”)
    plt.show()
    print(“Prediction: “, np.argmax(model.predict(x_train[index])))

Results and testing

After just 5 minutes on my computer, the NN is correctly identifying most of the digits. After 10 minutes the neural network seems as if it has fully trained, reaching an average error of around .2. It is important not to over-train the model, this causes the NN to just memorize images which would make it really bad at predicting things it has never seen.

Let’s use that mnist_test.csv file we downloaded to see how well our neural network performs on data it has never seen before. Below I call the predict function on each picture in the mnist_test file, then check if the prediction was correct. I then calculate the percent that this NN predicted correctly.

test_data = pd.read_csv(‘mnist_test.csv’)
test_labels = np.array(test_data.iloc[:,0])
x_test = np.array(test_data.iloc[:,1:])/255correct = 0for i in range(len(test_data)):
  prediction = np.argmax(model.predict(x_test[i]))
  if prediction == test_labels[i]:
    correct +=1percent_correct = correct/len(test_data) * 100
print(percent_correct,’%’)

Below is the final code for this project:

I end up getting a 93.5%. Feel free to try to train longer, increase the number of layers and/or neurons, change the activation function or loss function, and change the learning rate to see if this can be improved. Although 93.5% is very exciting, we can actually do much better. We can make the neural network able to identify unique features within the picture to help it identify the image. This is the idea behind convolutional neural networks, which I will explain in detail in a future article.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

via WordPress https://ramseyelbasheer.wordpress.com/2021/01/20/how-to-classify-handwritten-digits-in-python/