# Swift for TensorFlow: Training your first model

- by Tomás Ruiz López
- •
- December 08, 2020
- •
- swift• machine learning• tensorflow
- |
- 9 minutes to read.

*This is the fifth article in the Swift for TensorFlow series. If you’re just joining us, make sure to begin with Getting Started with Swift with TensorFlow.*

Recall the Machine Learning model we described in our previous post:

```
struct TitanicModel: Layer {
var hidden: Dense<Float>
var dropout: Dropout<Float>
var output: Dense<Float>
init() {
hidden = Dense<Float>(
inputSize: 6,
outputSize: 25,
activation: relu)
dropout = Dropout<Float>(probability: 0.25)
output = Dense<Float>(
inputSize: 25,
outputSize: 2,
activation: softmax)
}
@differentiable
func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> {
input.sequenced(through: hidden, dropout, output)
}
}
```

As we pointed out, this model gets initialized with random values for the weights at each layer, so it provides little help performing our task without training. In this post, we’ll learn how to write an algorithm that trains this model. But first, let’s introduce some concepts that will help us understand what’s going on in the training process.

## Epochs

The training process does not happen in a single computation. Rather, it is an iterative and incremental process; we need to keep making adjustments to the weights in each layer until we are satisfied with the accuracy of the predictions of our model.

Each iteration is called an **epoch**, and it consists of two stages: the **forward propagation**, where we evaluate the current model with all samples from our training dataset, and the **backpropagation**, where we update the weights in our model to try to optimize the results of our model.

TensorFlow provides us the `TrainingEpochs`

class, which models an infinite sequence of epochs in our training process. Each time we ask for a new epoch, it shuffles our samples and provides a number of batches of our desired size:

```
let epochs = TrainingEpochs(samples: samples, batchSize: batchSize)
```

## Loss function

We have mentioned above that we try to optimize the results of our model, but we need to provide a measure of what it means for our model to perform well or not. This is given by the **loss function**, and you don’t need to come up with this function yourself. There are some good functions provided in TensorFlow to compute this loss, and, in particular, in classification problems like the one we are dealing with. In this case, we can use the `softmaxCrossEntropy`

function, but you can take a look at the API Reference to find other functions that are helpful for your particular task.

## Optimizer

Once we have a measurement of how well our model is doing with respect to what is expected, we need a way of optimizing this function. There are multiple **optimizers** in TensorFlow, and the common thing among them is that they compute derivatives to know which direction they have to optimize weights, and then perform computations to update them based on several hyperparameters.

In this particular case, we chose the **Adam** optimizer, but you can explore others like **SGD** (Stochastic Gradient Descent), **RMSProp**, or **Adadelta**. Depending on your specific problem, some of them may perform better than others.

```
let optimizer = Adam(for: model)
```

## A single iteration

Now that we have introduced the three key parts of this training stage, we can see how they fit together to train our model:

```
let (loss, gradient) = valueWithGradient(at: model) { model -> Tensor<Float> in
let logits = model(batch.features)
return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}
optimizer.update(&model, along: gradient)
```

In the code above, `valueWithGradient`

is going to evaluate our current model with the features of the training batch we are processing, and we will obtain the `logits`

(a Tensor with the probabilities of each class, for each training example in the batch). Then, we use the `softmaxCrossEntropy`

to compare how the estimated logits compare to the expected labels in this batch.

This function will give us a measure of the `loss`

and the `gradient`

, which can be used to update our current model. The optimizer we created before will take the model (as an `inout`

parameter) and the gradients, and will update the weights.

## A training loop

Finally, we need to loop over the epochs we created for a number of times. How many epochs are necessary? As with many aspects in Machine Learning, it is subject to experimentation. If you run too few epochs, your model may not reach the optimum accuracy level; if you run too many epochs, your model may **overfit** the training model and not generalize well enough for other data.

```
func train(
model: inout TitanicModel,
samples: [TitanicBatch],
epochCount: Int = 1000,
batchSize: Int = 32
) {
let epochs = TrainingEpochs(samples: samples, batchSize: batchSize)
let optimizer = Adam(for: model)
for (epochIndex, epoch) in epochs.prefix(epochCount).enumerated() {
for batchSamples in epoch {
let batch = batchSamples.collated
let (loss, gradient) = valueWithGradient(at: model) { model -> Tensor<Float> in
let logits = model(batch.features)
return softmaxCrossEntropy(logits: logits, labels: batch.labels)
}
optimizer.update(&model, along: gradient)
}
}
}
```

It may be a good idea to collect intermediate values during the training process to print statistics about the accuracy of the model over the different epochs, or to plot the data to detect if the model starts overfitting the data.

## Evaluating the model

Once we are satisfied with the accuracy of our model with the training data, we need to test it against our dev/test set of examples. We can create a function to compute the accuracy of predictions versus expected labels:

```
func accuracy(
predictions: Tensor<Int32>,
truths: Tensor<Int32>
) -> Float {
Tensor<Float>(predictions .== truths).mean().scalarized()
}
```

Using this function, we can evaluate how the model performs with our test set:

```
func test(model: TitanicModel, with samples: [TitanicBatch]) {
let batch = samples.collated
let logits = model(batch.features)
let predictions = logits.argmax(squeezingAxis: 1)
let acc = accuracy(predictions: predictions, truths: batch.labels)
print("Dev batch accuracy: \(acc)")
}
```

In order to obtain a prediction from a model for a set of features, we just have to call it as if it were a function (in the end, it behaves as a function, doesn’t it?). It provides the logits; i.e., the probabilities for each class. We can use `argmax`

to select the index (the class) that has the highest probability, and then feed the predictions and the labels to the `accuracy`

function we defined above.

## Caveats

There is an important caveat when working with TensorFlow. You need to make sure you explicitly set the learning phase to training or inference, depending on what you are doing:

```
let trainSamples = readTitanic(file: "train-clean")
let devSamples = readTitanic(file: "dev-clean")
var model = TitanicModel()
Context.local.learningPhase = .training
train(model: &model,
samples: trainSamples)
Context.local.learningPhase = .inference
test(model: model,
with: devSamples)
```

Switching learning phases will enable or disable certain aspects of your model that are only for training, like dropout, for example.

## Conclusions

In this series of posts, we have covered how to select features from a dataset that can be used for training, reading such features from a CSV file into a Swift structure, describing a Machine Learning model for a classification problem, and finally training and validating it. This is a simple example, but it showcases how powerful TensorFlow can be, and how Swift makes writing the code really simple once you understand the concepts behind it.