library(keras3)
Attaching package: 'keras3'
The following objects are masked from 'package:tensorflow':
set_random_seed, shape
# or library(torch)
We can use TensorFlow directly from R (see Appendix B for an introduction to TensorFlow), and we could use this knowledge to implement a neural network in TensorFlow directly in R. However, this can be quite cumbersome. For simple problems, it is usually faster to use a higher-level API that helps us implement the machine learning models in TensorFlow. The most common of these is Keras.
Keras is a powerful framework for building and training neural networks with just a few lines of code. As of the end of 2018, Keras and TensorFlow are fully interoperable, allowing us to take advantage of the best of both.
The goal of this lesson is to familiarize you with Keras. If you have TensorFlow installed, you can find Keras inside TensorFlow: tf.keras. However, the RStudio team has built an R package on top of tf.keras that is more convenient to use. To load the Keras package, type
library(keras3)
Attaching package: 'keras3'
The following objects are masked from 'package:tensorflow':
set_random_seed, shape
# or library(torch)
We build a small classifier to predict the three species of the iris data set. Load the necessary packages and data sets:
library(keras3)
library(tensorflow)
library(torch)
Attaching package: 'torch'
The following object is masked from 'package:keras3':
as_iterator
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
For neural networks, it is beneficial to scale the predictors (scaling = centering and standardization, see ?scale). We also split our data into predictors (X) and response (Y = the three species).
= scale(iris[,1:4])
X = iris[,5] Y
Additionally, Keras/TensorFlow cannot handle factors and we have to create contrasts (one-hot encoding). To do so, we have to specify the number of categories. This can be tricky for a beginner, because in other programming languages like Python and C++, arrays start at zero. Thus, when we would specify 3 as number of classes for our three species, we would have the classes 0,1,2,3. Keep this in mind.
= keras3::to_categorical(as.integer(Y) - 1L, 3)
Y head(Y) # 3 columns, one for each level of the response.
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 1 0 0
[3,] 1 0 0
[4,] 1 0 0
[5,] 1 0 0
[6,] 1 0 0
After having prepared the data, we start now with the typical workflow in keras.
1. Initialize a sequential model in Keras:
= keras_model_sequential(shape(4L)) model
Torch users can skip this step.
A sequential Keras model is a higher order type of model within Keras and consists of one input and one output model.
2. Add hidden layers to the model (we will learn more about hidden layers during the next days).
When specifying the hidden layers, we also have to specify the shape and a so called activation function. You can think of the activation function as decision for what is forwarded to the next neuron (but we will learn more about it later). If you want to know this topic in even more depth, consider watching the videos presented in section @ref(basicMath).
The shape of the input is the number of predictors (here 4) and the shape of the output is the number of classes (here 3).
|>
model layer_dense(units = 20L, activation = "relu") |>
layer_dense(units = 20L) |>
layer_dense(units = 20L) |>
layer_dense(units = 3L, activation = "softmax")
The Torch syntax is very similar, we will give a list of layers to the “nn_sequential” function. Here, we have to specify the softmax activation function as an extra layer:
=
model_torch nn_sequential(
nn_linear(4L, 20L),
nn_linear(20L, 20L),
nn_linear(20L, 20L),
nn_linear(20L, 3L),
nn_softmax(2)
)
3. Compile the model with a loss function (here: cross entropy) and an optimizer (here: Adamax).
We will learn about other options later, so for now, do not worry about the “learning_rate” (“lr” in Torch or earlier in TensorFlow) argument, cross entropy or the optimizer.
|>
model compile(loss = keras3::loss_categorical_crossentropy,
::optimizer_adamax(learning_rate = 0.001))
keras3summary(model)
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense) │ (None, 20) │ 100 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_1 (Dense) │ (None, 20) │ 420 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_2 (Dense) │ (None, 20) │ 420 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_3 (Dense) │ (None, 3) │ 63 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
Total params: 1,003 (3.92 KB)
Trainable params: 1,003 (3.92 KB)
Non-trainable params: 0 (0.00 B)
plot(model)
Specify optimizer and the parameters which will be trained (in our case the parameters of the network):
= optim_adam(params = model_torch$parameters, lr = 0.001) optimizer_torch
4. Fit model in 30 iterations (epochs)
library(tensorflow)
library(keras3)
=
model_history |>
model fit(x = X, y = apply(Y, 2, as.integer), epochs = 30L,
batch_size = 20L, shuffle = TRUE)
Epoch 1/30
8/8 - 0s - 50ms/step - loss: 1.0022
Epoch 2/30
8/8 - 0s - 2ms/step - loss: 0.8961
Epoch 3/30
8/8 - 0s - 2ms/step - loss: 0.8120
Epoch 4/30
8/8 - 0s - 2ms/step - loss: 0.7409
Epoch 5/30
8/8 - 0s - 2ms/step - loss: 0.6819
Epoch 6/30
8/8 - 0s - 2ms/step - loss: 0.6297
Epoch 7/30
8/8 - 0s - 2ms/step - loss: 0.5861
Epoch 8/30
8/8 - 0s - 2ms/step - loss: 0.5472
Epoch 9/30
8/8 - 0s - 2ms/step - loss: 0.5175
Epoch 10/30
8/8 - 0s - 2ms/step - loss: 0.4901
Epoch 11/30
8/8 - 0s - 2ms/step - loss: 0.4663
Epoch 12/30
8/8 - 0s - 2ms/step - loss: 0.4477
Epoch 13/30
8/8 - 0s - 2ms/step - loss: 0.4307
Epoch 14/30
8/8 - 0s - 2ms/step - loss: 0.4158
Epoch 15/30
8/8 - 0s - 2ms/step - loss: 0.4030
Epoch 16/30
8/8 - 0s - 2ms/step - loss: 0.3921
Epoch 17/30
8/8 - 0s - 2ms/step - loss: 0.3816
Epoch 18/30
8/8 - 0s - 2ms/step - loss: 0.3735
Epoch 19/30
8/8 - 0s - 2ms/step - loss: 0.3641
Epoch 20/30
8/8 - 0s - 2ms/step - loss: 0.3574
Epoch 21/30
8/8 - 0s - 2ms/step - loss: 0.3498
Epoch 22/30
8/8 - 0s - 2ms/step - loss: 0.3426
Epoch 23/30
8/8 - 0s - 2ms/step - loss: 0.3367
Epoch 24/30
8/8 - 0s - 2ms/step - loss: 0.3305
Epoch 25/30
8/8 - 0s - 2ms/step - loss: 0.3248
Epoch 26/30
8/8 - 0s - 2ms/step - loss: 0.3197
Epoch 27/30
8/8 - 0s - 1ms/step - loss: 0.3144
Epoch 28/30
8/8 - 0s - 1ms/step - loss: 0.3092
Epoch 29/30
8/8 - 0s - 1ms/step - loss: 0.3038
Epoch 30/30
8/8 - 0s - 1ms/step - loss: 0.2988
In Torch, we jump directly to the training loop, however, here we have to write our own training loop:
library(torch)
torch_manual_seed(321L)
set.seed(123)
# Calculate number of training steps.
= 30
epochs = 20
batch_size = round(nrow(X)/batch_size * epochs)
steps
= torch_tensor(X)
X_torch = torch_tensor(apply(Y, 1, which.max))
Y_torch
# Set model into training status.
$train()
model_torch
= NULL
log_losses
# Training loop.
for(i in 1:steps){
# Get batch.
= sample.int(nrow(X), batch_size)
indices
# Reset backpropagation.
$zero_grad()
optimizer_torch
# Predict and calculate loss.
= model_torch(X_torch[indices, ])
pred = nnf_cross_entropy(pred, Y_torch[indices])
loss
# Backpropagation and weight update.
$backward()
loss$step()
optimizer_torch
= as.numeric(loss)
log_losses[i] }
5. Plot training history:
plot(model_history)
plot(log_losses, xlab = "steps", ylab = "loss", las = 1)
6. Create predictions:
= predict(model, X) # Probabilities for each class. predictions
5/5 - 0s - 8ms/step
Get probabilities:
head(predictions) # Quasi-probabilities for each species.
[,1] [,2] [,3]
[1,] 0.9968982 0.002193661 0.0009081917
[2,] 0.9867232 0.012216523 0.0010602337
[3,] 0.9942806 0.005097551 0.0006219447
[4,] 0.9896508 0.009552005 0.0007972129
[5,] 0.9976753 0.001555835 0.0007687954
[6,] 0.9956868 0.001622168 0.0026909732
For each plant, we want to know for which species we got the highest probability:
= apply(predictions, 1, which.max)
preds print(preds)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 2 3 2 3 2 3 2 2 3 2 3 2 3 2 2 2 2 3 2 3 2
[75] 2 3 3 3 3 2 2 2 2 3 2 3 3 2 2 2 2 3 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 3 3 3 3
[112] 3 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[149] 3 3
$eval()
model_torch= model_torch(torch_tensor(X))
preds_torch = apply(preds_torch, 1, which.max)
preds_torch print(preds_torch)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2
[75] 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
[112] 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[149] 3 3
7. Calculate Accuracy (how often we have been correct):
mean(preds == as.integer(iris$Species))
[1] 0.8533333
mean(preds_torch == as.integer(iris$Species))
[1] 0.9733333
8. Plot predictions, to see if we have done a good job:
= par(mfrow = c(1, 2))
oldpar plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species,
main = "Observed")
plot(iris$Sepal.Length, iris$Petal.Length, col = preds,
main = "Predicted")
par(oldpar) # Reset par.
So you see, building a neural network is very easy with Keras or Torch and you can already do it on your own.
We now build a regression for the airquality data set with Keras/Torch. We want to predict the variable “Ozone” (continuous).
Tasks:
Before we start, load and prepare the data set:
library(tensorflow)
library(keras3)
= airquality
data
summary(data)
Ozone Solar.R Wind Temp
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
NA's :37 NA's :7
Month Day
Min. :5.000 Min. : 1.0
1st Qu.:6.000 1st Qu.: 8.0
Median :7.000 Median :16.0
Mean :6.993 Mean :15.8
3rd Qu.:8.000 3rd Qu.:23.0
Max. :9.000 Max. :31.0
= data[complete.cases(data),] # Remove NAs.
data summary(data)
Ozone Solar.R Wind Temp
Min. : 1.0 Min. : 7.0 Min. : 2.30 Min. :57.00
1st Qu.: 18.0 1st Qu.:113.5 1st Qu.: 7.40 1st Qu.:71.00
Median : 31.0 Median :207.0 Median : 9.70 Median :79.00
Mean : 42.1 Mean :184.8 Mean : 9.94 Mean :77.79
3rd Qu.: 62.0 3rd Qu.:255.5 3rd Qu.:11.50 3rd Qu.:84.50
Max. :168.0 Max. :334.0 Max. :20.70 Max. :97.00
Month Day
Min. :5.000 Min. : 1.00
1st Qu.:6.000 1st Qu.: 9.00
Median :7.000 Median :16.00
Mean :7.216 Mean :15.95
3rd Qu.:9.000 3rd Qu.:22.50
Max. :9.000 Max. :31.00
= scale(data[,2:6])
x = data[,1] y
library(tensorflow)
library(keras3)
= keras_model_sequential(shape(5L)) model
|>
model layer_dense(units = 20L, activation = "relu") |>
....layer_dense(units = 1L, activation = "linear")
|>
model layer_dense(units = 20L, activation = "relu", input_shape = list(5L)) |>
layer_dense(units = 20L) |>
layer_dense(units = 20L) |>
layer_dense(units = 1L, activation = "linear")
|>
model compile(loss = keras3::loss_mean_squared_error, optimizer_adamax(learning_rate = 0.05))
What is the “mean_squared_error” loss?
=
model_history |>
model fit(x = x, y = as.numeric(y), epochs = 100L,
batch_size = 20L, shuffle = TRUE)
Epoch 1/100
6/6 - 0s - 56ms/step - loss: 2304.9736
Epoch 2/100
6/6 - 0s - 2ms/step - loss: 844.4670
Epoch 3/100
6/6 - 0s - 2ms/step - loss: 403.8222
Epoch 4/100
6/6 - 0s - 2ms/step - loss: 338.9061
Epoch 5/100
6/6 - 0s - 2ms/step - loss: 377.5107
Epoch 6/100
6/6 - 0s - 2ms/step - loss: 312.3781
Epoch 7/100
6/6 - 0s - 2ms/step - loss: 332.7539
Epoch 8/100
6/6 - 0s - 2ms/step - loss: 300.3812
Epoch 9/100
6/6 - 0s - 2ms/step - loss: 304.9411
Epoch 10/100
6/6 - 0s - 2ms/step - loss: 294.1410
Epoch 11/100
6/6 - 0s - 2ms/step - loss: 297.0824
Epoch 12/100
6/6 - 0s - 2ms/step - loss: 303.1965
Epoch 13/100
6/6 - 0s - 2ms/step - loss: 287.5248
Epoch 14/100
6/6 - 0s - 2ms/step - loss: 287.0906
Epoch 15/100
6/6 - 0s - 2ms/step - loss: 299.0218
Epoch 16/100
6/6 - 0s - 3ms/step - loss: 288.2249
Epoch 17/100
6/6 - 0s - 2ms/step - loss: 285.8900
Epoch 18/100
6/6 - 0s - 2ms/step - loss: 293.2901
Epoch 19/100
6/6 - 0s - 2ms/step - loss: 315.4283
Epoch 20/100
6/6 - 0s - 2ms/step - loss: 277.5013
Epoch 21/100
6/6 - 0s - 2ms/step - loss: 289.2515
Epoch 22/100
6/6 - 0s - 2ms/step - loss: 280.9329
Epoch 23/100
6/6 - 0s - 2ms/step - loss: 283.0792
Epoch 24/100
6/6 - 0s - 2ms/step - loss: 317.6115
Epoch 25/100
6/6 - 0s - 2ms/step - loss: 279.5954
Epoch 26/100
6/6 - 0s - 2ms/step - loss: 267.7930
Epoch 27/100
6/6 - 0s - 2ms/step - loss: 289.1937
Epoch 28/100
6/6 - 0s - 2ms/step - loss: 281.3068
Epoch 29/100
6/6 - 0s - 2ms/step - loss: 276.8955
Epoch 30/100
6/6 - 0s - 2ms/step - loss: 262.5016
Epoch 31/100
6/6 - 0s - 2ms/step - loss: 265.7221
Epoch 32/100
6/6 - 0s - 2ms/step - loss: 273.6547
Epoch 33/100
6/6 - 0s - 2ms/step - loss: 244.4558
Epoch 34/100
6/6 - 0s - 2ms/step - loss: 263.5290
Epoch 35/100
6/6 - 0s - 2ms/step - loss: 256.0312
Epoch 36/100
6/6 - 0s - 2ms/step - loss: 248.2983
Epoch 37/100
6/6 - 0s - 2ms/step - loss: 253.4030
Epoch 38/100
6/6 - 0s - 2ms/step - loss: 236.8920
Epoch 39/100
6/6 - 0s - 2ms/step - loss: 251.6100
Epoch 40/100
6/6 - 0s - 2ms/step - loss: 260.6098
Epoch 41/100
6/6 - 0s - 2ms/step - loss: 243.3047
Epoch 42/100
6/6 - 0s - 2ms/step - loss: 250.5724
Epoch 43/100
6/6 - 0s - 2ms/step - loss: 249.6459
Epoch 44/100
6/6 - 0s - 2ms/step - loss: 222.8224
Epoch 45/100
6/6 - 0s - 2ms/step - loss: 242.3164
Epoch 46/100
6/6 - 0s - 2ms/step - loss: 223.1310
Epoch 47/100
6/6 - 0s - 2ms/step - loss: 223.3789
Epoch 48/100
6/6 - 0s - 2ms/step - loss: 234.4032
Epoch 49/100
6/6 - 0s - 2ms/step - loss: 227.7735
Epoch 50/100
6/6 - 0s - 2ms/step - loss: 236.4764
Epoch 51/100
6/6 - 0s - 2ms/step - loss: 225.5554
Epoch 52/100
6/6 - 0s - 2ms/step - loss: 221.5008
Epoch 53/100
6/6 - 0s - 2ms/step - loss: 227.1257
Epoch 54/100
6/6 - 0s - 2ms/step - loss: 218.0656
Epoch 55/100
6/6 - 0s - 2ms/step - loss: 226.0332
Epoch 56/100
6/6 - 0s - 2ms/step - loss: 213.2831
Epoch 57/100
6/6 - 0s - 2ms/step - loss: 209.4318
Epoch 58/100
6/6 - 0s - 2ms/step - loss: 201.4224
Epoch 59/100
6/6 - 0s - 2ms/step - loss: 205.1103
Epoch 60/100
6/6 - 0s - 2ms/step - loss: 203.7220
Epoch 61/100
6/6 - 0s - 2ms/step - loss: 222.1807
Epoch 62/100
6/6 - 0s - 2ms/step - loss: 214.8768
Epoch 63/100
6/6 - 0s - 2ms/step - loss: 202.3848
Epoch 64/100
6/6 - 0s - 2ms/step - loss: 209.8243
Epoch 65/100
6/6 - 0s - 2ms/step - loss: 193.5286
Epoch 66/100
6/6 - 0s - 2ms/step - loss: 193.3627
Epoch 67/100
6/6 - 0s - 2ms/step - loss: 189.8934
Epoch 68/100
6/6 - 0s - 2ms/step - loss: 186.7781
Epoch 69/100
6/6 - 0s - 2ms/step - loss: 207.4391
Epoch 70/100
6/6 - 0s - 2ms/step - loss: 191.4560
Epoch 71/100
6/6 - 0s - 2ms/step - loss: 184.1445
Epoch 72/100
6/6 - 0s - 2ms/step - loss: 193.3217
Epoch 73/100
6/6 - 0s - 2ms/step - loss: 202.0850
Epoch 74/100
6/6 - 0s - 2ms/step - loss: 184.7625
Epoch 75/100
6/6 - 0s - 2ms/step - loss: 184.3798
Epoch 76/100
6/6 - 0s - 2ms/step - loss: 186.3008
Epoch 77/100
6/6 - 0s - 2ms/step - loss: 181.1955
Epoch 78/100
6/6 - 0s - 2ms/step - loss: 193.9419
Epoch 79/100
6/6 - 0s - 2ms/step - loss: 224.2767
Epoch 80/100
6/6 - 0s - 2ms/step - loss: 207.8511
Epoch 81/100
6/6 - 0s - 2ms/step - loss: 186.0640
Epoch 82/100
6/6 - 0s - 2ms/step - loss: 168.0031
Epoch 83/100
6/6 - 0s - 2ms/step - loss: 202.3606
Epoch 84/100
6/6 - 0s - 2ms/step - loss: 184.4188
Epoch 85/100
6/6 - 0s - 2ms/step - loss: 185.6397
Epoch 86/100
6/6 - 0s - 2ms/step - loss: 161.2557
Epoch 87/100
6/6 - 0s - 2ms/step - loss: 185.0522
Epoch 88/100
6/6 - 0s - 3ms/step - loss: 157.3587
Epoch 89/100
6/6 - 0s - 5ms/step - loss: 181.0723
Epoch 90/100
6/6 - 0s - 2ms/step - loss: 166.3280
Epoch 91/100
6/6 - 0s - 2ms/step - loss: 173.8368
Epoch 92/100
6/6 - 0s - 2ms/step - loss: 163.1193
Epoch 93/100
6/6 - 0s - 2ms/step - loss: 152.3261
Epoch 94/100
6/6 - 0s - 2ms/step - loss: 145.8915
Epoch 95/100
6/6 - 0s - 2ms/step - loss: 156.2227
Epoch 96/100
6/6 - 0s - 2ms/step - loss: 152.8232
Epoch 97/100
6/6 - 0s - 3ms/step - loss: 150.5020
Epoch 98/100
6/6 - 0s - 3ms/step - loss: 157.4122
Epoch 99/100
6/6 - 0s - 2ms/step - loss: 139.4589
Epoch 100/100
6/6 - 0s - 2ms/step - loss: 139.4398
plot(model_history)
= predict(model, x) pred_keras
4/4 - 0s - 17ms/step
= lm(Ozone ~ ., data = data)
fit = predict(fit, data)
pred_lm = mean(sqrt((y - pred_lm)^2))
rmse_lm = mean(sqrt((y - pred_keras)^2))
rmse_keras print(rmse_lm)
[1] 14.78897
print(rmse_keras)
[1] 8.478705
Before we start, load and prepare the data set:
library(torch)
= airquality
data summary(data)
Ozone Solar.R Wind Temp
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
NA's :37 NA's :7
Month Day
Min. :5.000 Min. : 1.0
1st Qu.:6.000 1st Qu.: 8.0
Median :7.000 Median :16.0
Mean :6.993 Mean :15.8
3rd Qu.:8.000 3rd Qu.:23.0
Max. :9.000 Max. :31.0
plot(data)
= data[complete.cases(data),] # Remove NAs.
data summary(data)
Ozone Solar.R Wind Temp
Min. : 1.0 Min. : 7.0 Min. : 2.30 Min. :57.00
1st Qu.: 18.0 1st Qu.:113.5 1st Qu.: 7.40 1st Qu.:71.00
Median : 31.0 Median :207.0 Median : 9.70 Median :79.00
Mean : 42.1 Mean :184.8 Mean : 9.94 Mean :77.79
3rd Qu.: 62.0 3rd Qu.:255.5 3rd Qu.:11.50 3rd Qu.:84.50
Max. :168.0 Max. :334.0 Max. :20.70 Max. :97.00
Month Day
Min. :5.000 Min. : 1.00
1st Qu.:6.000 1st Qu.: 9.00
Median :7.000 Median :16.00
Mean :7.216 Mean :15.95
3rd Qu.:9.000 3rd Qu.:22.50
Max. :9.000 Max. :31.00
= scale(data[,2:6])
x = data[,1] y
=
model_torch nn_sequential(
nn_linear(5L, 20L),
...nn_linear(20L, 1L),
)
library(torch)
=
model_torch nn_sequential(
nn_linear(5L, 20L),
nn_relu(),
nn_linear(20L, 20L),
nn_relu(),
nn_linear(20L, 20L),
nn_relu(),
nn_linear(20L, 1L),
)
We have to pass the network’s parameters to the optimizer (how is this different to keras?)
= optim_adam(params = model_torch$parameters, lr = 0.05) optimizer_torch
In torch we write the trainings loop on our own. Complete the trainings loop:
# Calculate number of training steps.
= ...
epochs = 32
batch_size = ...
steps
= torch_tensor(x)
X_torch = torch_tensor(y, ...)
Y_torch
# Set model into training status.
$train()
model_torch
= NULL
log_losses
# Training loop.
for(i in 1:steps){
# Get batch indices.
= sample.int(nrow(x), batch_size)
indices = ...
X_batch = ...
Y_batch
# Reset backpropagation.
$zero_grad()
optimizer_torch
# Predict and calculate loss.
= model_torch(X_batch)
pred = ...
loss
# Backpropagation and weight update.
$backward()
loss$step()
optimizer_torch
= as.numeric(loss)
log_losses[i] }
# Calculate number of training steps.
= 100
epochs = 32
batch_size = round(nrow(x)/batch_size*epochs)
steps
= torch_tensor(x)
X_torch = torch_tensor(y, dtype = torch_float32())$view(list(-1, 1))
Y_torch
# Set model into training status.
$train()
model_torch
= NULL
log_losses
# Training loop.
for(i in 1:steps){
# Get batch indices.
= sample.int(nrow(x), batch_size)
indices = X_torch[indices,]
X_batch = Y_torch[indices,]
Y_batch
# Reset backpropagation.
$zero_grad()
optimizer_torch
# Predict and calculate loss.
= model_torch(X_batch)
pred = nnf_mse_loss(pred, Y_batch)
loss
# Backpropagation and weight update.
$backward()
loss$step()
optimizer_torch
= as.numeric(loss)
log_losses[i] }
Tips:
plot(y = log_losses, x = 1:steps, xlab = "Epoch", ylab = "MSE")
= model_torch(X_torch)
pred_torch = as.numeric(pred_torch) # cast torch to R object pred_torch
= lm(Ozone ~ ., data = data)
fit = predict(fit, data)
pred_lm = mean(sqrt((y - pred_lm)^2))
rmse_lm = mean(sqrt((y - pred_torch)^2))
rmse_torch print(rmse_lm)
[1] 14.78897
print(rmse_torch)
[1] 6.897064
Build a Keras DNN for the titanic dataset
The next task differs for Torch and Keras users. Keras users will learn more about the inner working of training while Torch users will learn how to simplify and generalize the training loop.
Go through the code and try to understand it.
Similar to Torch, here we will write the training loop ourselves in the following. The training loop consists of several steps:
library(tensorflow)
library(keras3)
= airquality
data = data[complete.cases(data),] # Remove NAs.
data = scale(data[,2:6])
x = data[,1]
y
= tf$keras$layers
layers = tf$keras$models$Sequential(
model c(
$InputLayer(input_shape = list(5L)),
layers$Dense(units = 20L, activation = tf$nn$relu),
layers$Dense(units = 20L, activation = tf$nn$relu),
layers$Dense(units = 20L, activation = tf$nn$relu),
layers$Dense(units = 1L, activation = NULL) # No activation == "linear".
layers
)
)
= 200L
epochs = tf$keras$optimizers$Adamax(0.01)
optimizer
# Stochastic gradient optimization is more efficient
# in each optimization step, we use a random subset of the data.
= function(batch_size = 32L){
get_batch = sample.int(nrow(x), size = batch_size)
indices return(list(bX = x[indices,], bY = y[indices]))
}get_batch() # Try out this function.
$bX
Solar.R Wind Temp Month Day
66 -0.10753214 -1.50086285 0.54640337 -0.1467431 -1.257115152
29 0.73720791 1.39425525 0.33653910 -1.5041165 1.499226218
8 -0.94130153 1.08506788 -1.97196786 -1.5041165 -0.912572481
122 0.57264816 -1.02302783 1.91052111 0.5319436 1.614073775
71 -0.10753214 -0.71384046 1.17599617 -0.1467431 -0.682877366
17 1.34059366 0.57912491 -1.23744292 -1.5041165 0.121055533
113 0.81400246 1.56290291 -0.08318944 0.5319436 0.580445762
117 0.58361881 -1.83815816 0.33653910 0.5319436 1.039835990
139 0.57264816 -0.85438017 0.02174269 1.2106304 0.006207976
92 0.75914921 -0.20789749 0.33653910 -0.1467431 1.728921332
79 1.09923936 -1.02302783 0.65133550 -0.1467431 0.235903090
142 0.58361881 0.10128988 -1.02757865 1.2106304 0.350750647
143 0.17770476 -0.54519280 0.44147123 1.2106304 0.465598204
136 0.58361881 -1.02302783 -0.08318944 1.2106304 -0.338334695
3 -0.39276904 0.74777257 -0.39798584 -1.5041165 -1.486810266
40 1.16506326 1.08506788 1.28092831 -0.8254298 -0.797724924
41 1.51612406 0.43858520 0.96613190 -0.8254298 -0.682877366
140 0.43002971 1.08506788 -1.13251078 1.2106304 0.121055533
31 1.03341546 -0.71384046 -0.18812157 -1.5041165 1.728921332
1 0.05702761 -0.71384046 -1.13251078 -1.5041165 -1.716505380
88 -1.12780258 0.57912491 0.86119977 -0.1467431 1.269531104
70 0.95662091 -1.19167548 1.49079258 -0.1467431 -0.797724924
82 -1.95060133 -0.85438017 -0.39798584 -0.1467431 0.580445762
20 -1.54468728 -0.06735777 -1.65717146 -1.5041165 0.465598204
149 0.08993956 -0.85438017 -0.81771438 1.2106304 1.154683547
12 0.78109051 -0.06735777 -0.92264652 -1.5041165 -0.453182252
116 0.29838191 -0.06735777 0.12667483 0.5319436 0.924988433
16 1.63680121 0.43858520 -1.44730719 -1.5041165 0.006207976
81 0.38614711 0.43858520 0.75626764 -0.1467431 0.465598204
68 1.00050351 -1.36032314 1.07106404 -0.1467431 -1.027420038
74 -0.10753214 1.39425525 0.33653910 -0.1467431 -0.338334695
137 -1.76410028 0.26993754 -0.71278225 1.2106304 -0.223487138
$bY
[1] 64 45 19 84 85 34 21 168 46 59 61 24 16 28 12 71 39 18 37
[20] 41 52 97 16 11 30 16 45 14 63 77 27 9
= floor(nrow(x)/32) * epochs # We need nrow(x)/32 steps for each epoch.
steps for(i in 1:steps){
# Get data.
= get_batch()
batch
# Transform it into tensors.
= tf$constant(batch$bX)
bX = tf$constant(matrix(batch$bY, ncol = 1L))
bY
# Automatic differentiation:
# Record computations with respect to our model variables.
with(tf$GradientTape() %as% tape,
{= model(bX) # We record the operation for our model weights.
pred = tf$reduce_mean(tf$keras$losses$mse(bY, pred))
loss
}
)
# Calculate the gradients for our model$weights at the loss / backpropagation.
= tape$gradient(loss, model$weights)
gradients
# Update our model weights with the learning rate specified above.
$apply_gradients(purrr::transpose(list(gradients, model$weights)))
optimizerif(! i%%30){
cat("Loss: ", loss$numpy(), "\n") # Print loss every 30 steps (not epochs!).
} }
Loss: 1884.367
Loss: 510.4832
Loss: 262.0391
Loss: 433.5691
Loss: 270.5837
Loss: 258.8594
Loss: 464.4019
Loss: 206.4962
Loss: 165.5195
Loss: 205.2222
Loss: 193.0815
Loss: 242.6305
Loss: 274.8206
Loss: 208.7632
Loss: 112.6337
Loss: 230.132
Loss: 186.9256
Loss: 176.1826
Loss: 375.6426
Loss: 286.4479
Keras and Torch use dataloaders to generate the data batches. Dataloaders are objects that return batches of data infinetly. Keras create the dataloader object automatically in the fit function, in Torch we have to write them ourselves:
library(torch)
= airquality
data = data[complete.cases(data),] # Remove NAs.
data = scale(data[,2:6])
x = matrix(data[,1], ncol = 1L)
y
= torch::dataset(
torch_dataset name = "airquality",
initialize = function(X,Y) {
$X = torch::torch_tensor(as.matrix(X), dtype = torch_float32())
self$Y = torch::torch_tensor(as.matrix(Y), dtype = torch_float32())
self
},.getitem = function(index) {
= self$X[index,]
x = self$Y[index,]
y list(x, y)
},.length = function() {
$Y$size()[[1]]
self
}
)= torch_dataset(x,y)
dataset = torch::dataloader(dataset, batch_size = 30L, shuffle = TRUE) dataloader
Our dataloader is again an object which has to be initiated. The initiated object returns a list of two elements, batch x and batch y. The initated object stops returning batches when the dataset was completly transversed (no worries, we don’t have to all of this ourselves).
Our training loop has changed:
= nn_sequential(
model_torch nn_linear(5L, 50L),
nn_relu(),
nn_linear(50L, 50L),
nn_relu(),
nn_linear(50L, 50L),
nn_relu(),
nn_linear(50L, 1L)
)= 50L
epochs = optim_adam(model_torch$parameters, 0.01)
opt = c()
train_losses for(epoch in 1:epochs){
= c()
train_loss ::loop(
corofor(batch in dataloader) {
$zero_grad()
opt= model_torch(batch[[1]])
pred = nnf_mse_loss(pred, batch[[2]])
loss $backward()
loss$step()
opt= c(train_loss, loss$item())
train_loss
}
)= c(train_losses, mean(train_loss))
train_losses if(!epoch%%10) cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(train_loss)))
}
Loss at epoch 10: 349.544273
Loss at epoch 20: 289.687613
Loss at epoch 30: 269.088741
Loss at epoch 40: 260.377857
Loss at epoch 50: 213.073002
plot(train_losses, type = "o", pch = 15,
col = "darkblue", lty = 1, xlab = "Epoch",
ylab = "Loss", las = 1)
Now change the code from above for the iris data set. Tip: In tf\(keras\)losses$… you can find various loss functions.
library(tensorflow)
library(keras3)
= scale(iris[,1:4])
x = iris[,5]
y = keras3::to_categorical(as.integer(Y)-1L, 3)
y
= tf$keras$layers
layers = tf$keras$models$Sequential(
model c(
$InputLayer(input_shape = list(4L)),
layers$Dense(units = 20L, activation = tf$nn$relu),
layers$Dense(units = 20L, activation = tf$nn$relu),
layers$Dense(units = 20L, activation = tf$nn$relu),
layers$Dense(units = 3L, activation = tf$nn$softmax)
layers
)
)
= 200L
epochs = tf$keras$optimizers$Adamax(0.01)
optimizer
# Stochastic gradient optimization is more efficient.
= function(batch_size = 32L){
get_batch = sample.int(nrow(x), size = batch_size)
indices return(list(bX = x[indices,], bY = y[indices,]))
}
= floor(nrow(x)/32) * epochs # We need nrow(x)/32 steps for each epoch.
steps
for(i in 1:steps){
= get_batch()
batch = tf$constant(batch$bX)
bX = tf$constant(batch$bY)
bY
# Automatic differentiation.
with(tf$GradientTape() %as% tape,
{= model(bX) # we record the operation for our model weights
pred = tf$reduce_mean(tf$keras$losses$categorical_crossentropy(bY, pred))
loss
}
)
# Calculate the gradients for the loss at our model$weights / backpropagation.
= tape$gradient(loss, model$weights)
gradients
# Update our model weights with the learning rate specified above.
$apply_gradients(purrr::transpose(list(gradients, model$weights)))
optimizer
if(! i%%30){
cat("Loss: ", loss$numpy(), "\n") # Print loss every 30 steps (not epochs!).
} }
Loss: 0.009369774
Loss: 0.0006021717
Loss: 0.0012347
Loss: 0.0003470368
Loss: 0.001295264
Loss: 0.0005416216
Loss: 0.0004750416
Loss: 0.0002165258
Loss: 0.0002295185
Loss: 0.0001976096
Loss: 0.0001388085
Loss: 0.0001116272
Loss: 0.0002655677
Loss: 0.0002172911
Loss: 8.227722e-05
Loss: 0.0002270631
Loss: 0.0002010441
Loss: 7.776875e-05
Loss: 9.629007e-05
Loss: 7.308766e-05
Loss: 7.005708e-05
Loss: 0.0001642501
Loss: 0.0001119636
Loss: 5.126416e-05
Loss: 0.0001921077
Loss: 5.825447e-05
library(torch)
= scale(iris[,1:4])
x = iris[,5]
y = as.integer(iris$Species)
y
= torch::dataset(
torch_dataset name = "iris",
initialize = function(X,Y) {
$X = torch::torch_tensor(as.matrix(X), dtype = torch_float32())
self$Y = torch::torch_tensor(Y, dtype = torch_long())
self
},.getitem = function(index) {
= self$X[index,]
x = self$Y[index]
y list(x, y)
},.length = function() {
$Y$size()[[1]]
self
}
)= torch_dataset(x,y)
dataset = torch::dataloader(dataset, batch_size = 30L, shuffle = TRUE)
dataloader
= nn_sequential(
model_torch nn_linear(4L, 50L),
nn_relu(),
nn_linear(50L, 50L),
nn_relu(),
nn_linear(50L, 50L),
nn_relu(),
nn_linear(50L, 3L)
)= 50L
epochs = optim_adam(model_torch$parameters, 0.01)
opt = c()
train_losses for(epoch in 1:epochs){
train_loss::loop(
corofor(batch in dataloader) {
$zero_grad()
opt= model_torch(batch[[1]])
pred = nnf_cross_entropy(pred, batch[[2]])
loss $backward()
loss$step()
opt= c(train_loss, loss$item())
train_loss
}
)= c(train_losses, mean(train_loss))
train_losses if(!epoch%%10) cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(train_loss)))
}
Loss at epoch 10: 15.999315
Loss at epoch 20: 8.324385
Loss at epoch 30: 5.632413
Loss at epoch 40: 4.256267
Loss at epoch 50: 3.425689
Remarks:
If are not yet familiar with the underlying concepts of neural networks and want to know more about that, it is suggested to read / view the following videos / sites. Consider the Links and videos with descriptions in parentheses as optional bonus.
This might be useful to understand the further concepts in more depth.
(https://en.wikipedia.org/wiki/Newton%27s_method#Description (Especially the animated graphic is interesting).)
Activation functions in detail (requires the above as prerequisite).
Videos about the topic:
Depending on activation functions, it might occur that the network won’t get updated, even with high learning rates (called vanishing gradient, especially for “sigmoid” functions). Furthermore, updates might overshoot (called exploding gradients) or activation functions will result in many zeros (especially for “relu”, dying relu).
In general, the first layers of a network tend to learn (much) more slowly than subsequent ones.