Neural networks and deep learning
Neural networks and Deep learning new technology
So far, I have not explained how I choose the values of the hyperparameters — the learning rate η, the regularization parameter λ, and so on. I just gave a good working value. In practice, when you use a neural network to attack a problem. It can be difficult to find good hyper parameters. Imagine. For example, that we were just told about the MNIST task. And we started working on it without knowing anything about the values of suitable hyperparameters.
Suppose we were lucky enough. And in the first experiments we chose many hyperparameters the way we did in this chapter: 30 hidden neurons, mini-package size 10. Training for 30 epochs and the use of cross-entropy. However. We chose the learning rate η = 10.0, and the regularization parameter λ = 1000.0. And that’s what I saw with this run:
Well, it’s easy to fix,” you could say, “just reduce such hyper parameters as learning speed and regularization.” Unfortunately. A priori you do not have information about the fact that these are the hyperparameters you need to adjust. Maybe the main problem is that our 30 hidden neurons will never work, regardless of how the other hyperparameters are selected? Maybe we need at least 100 hidden neurons? Or 300 Or a lot of hidden layers? Or a different approach to output coding? Maybe our network is learning.
But we need to train it more than ages Maybe the size of mini-packages is too small? Maybe we would have turned out better if we return to the quadratic cost function Maybe we should try a different approach to initializing the scales And so on and so forth. It is easy to get lost in the hyperparameter space. And it can really bring a lot of inconvenience if your network is very large, or it uses huge amounts of training data, and you can train it for hours, days or weeks without getting results. In such a situation, your confidence starts to pass. Maybe neural networks were the wrong approach to solve your problem? Maybe you quit and do beekeeping?
In this section, I will explain some heuristic approaches that can be used to configure hyper parameters in a neural network. The goal is to help you develop a workflow that allows you to fine-tune your hyperparameters well. Of course, I can’t cover the whole topic of optimization of hyper parameters. This is a huge area, and this is not a task that can be solved completely, or according to the correct strategies for solving which there is general agreement. There is always the opportunity to try some more trick to squeeze additional results from your neural network. But the heuristics in this section should give you a starting point.
When using a neural network to attack a new problem, the first difficulty is to obtain non-trivial results from the network, that is, exceeding the random probability. This can be surprisingly difficult, especially when faced with a new class of tasks. Let’s take a look at some strategies that can be used with this difficulty.
Suppose, for example, that you first attack an MNIST task. You start with great enthusiasm, but the complete failure of your first network discourages you a little, as described in the example above. Then you need to disassemble the problem in parts. It is necessary to get rid of all training and confirmatory images, except for images of zeros and ones.
Then try to teach the network to distinguish 0 from 1. This task is not only essentially simpler than distinguishing all ten digits, it also reduces the amount of training data by 80%, speeding up training by 5 times. This allows you to perform experiments much faster, and gives you the opportunity to quickly understand how to create a good network.Neural networks and Deep learning.
It is possible to accelerate experiments even more by reducing the network to a minimum size that it is more likely to be able to learn meaningfully. If you think that the network [784, 10] is likely to be able to classify MNIST numbers better than random sampling, then start experimenting with it. It will be much faster than teaching [784, 30, 10], and you can grow to it later.
One more acceleration of experiments can be obtained by increasing the tracking frequency. In the network2.py program, we track the quality of work at the end of each era. Processing 50,000 images per epoch. We have to wait quite a long time – about 10 seconds per epoch on my laptop when training the network [784, 30, 10] – before getting feedback on the quality of network training. Of course, ten seconds is not so long.
But if you want to try several dozen different hyper parameters. It becomes annoying. And if you want to try hundreds or thousands of options, it is just empty. Feedback can be received much faster by tracking the confirmation accuracy more often, for example, every 1000 training images. In addition, instead of using a complete set of 10,000 supporting images, we can get a score much faster by using only 100 confirmatory images.
The main thing is for the network to see enough images in order to really learn. And in order to get a reasonably good performance estimate. Of course. Our network2.py does not yet provide such tracking. But as crutches to achieve this effect for purposes of illustration. We will cut our training data to the first 1000 MNIST images. Let’s try to see what happens (for simplicity of the code. I did not use the idea of leaving only the images 0 and 1 – this can also be realized with a little more effort). Py does not yet provide such tracking.
But as crutches to achieve this effect for purposes of illustration. We will cut our training data to the first 1000 MNIST images. Let’s try to see what happens (for simplicity of the code. Neural networks and Deep learning I did not use the idea of leaving only the images 0
0But as crutches to achieve this effect for purposes of illustration, we will cut our training data to the first 1000 MNIST images. Let’s try to see what happens (for simplicity of the code, I did not use the idea of leaving only the images 0 and 1 – this can also be realized with a little more effort).