We will use it for every example in this book For example, if the target at index 2 is 1, then the one-hot target will have a 1 at row 2, column 1. If the target at index n is k, then the one-hot target will have a 1 at row n, column k. We don’t use this if we don’t have to since it takes up more space. For most of the examples in this book, we can make use of “sparse_categorical_crossentropy”, but “categorical_crossentropy” must be used in special cases as we shall see. Next, what is the “optimizer”? You can see we’ve chosen an optimizer called “adam”. This can get quite mathematical, so if you’re not into the math, just remember that “adam” is a typical default used by modern deep learning researchers today. We will use it for every example in this book. I’ve linked to other types of optimizers in the code, or you can just click here: https://keras.io/optimizers/ . For the more mathematically-inclined, I mentioned earlier that we use gradient descent to train the model parameters....