47 Degrees joins forces with Xebia read more

Swift for TensorFlow: Describing your model

Swift for TensorFlow: Describing your model

This is the fourth article in the Swift for TensorFlow series. If you’re just joining us, make sure to begin with Getting Started with Swift with TensorFlow.

Now that we have selected the features from our Titanic dataset, and written a function to load a CSV file, it’s time to describe our first Machine Learning model. Let’s look into some basic concepts.

Deep Neural Networks

The problem we are trying to solve can be summarized as writing a function that, given 6 inputs corresponding to the features we have in our dataset (traveling class, age, sex, fare, embarkation port, and traveling alone), returns a value telling if the person described from such features survives or not. We could be inspecting the dataset and writing a complex decision logic consisting of multiple nested if statements to compute the final result. This would be an extremely hard task to do manually, and not trivial to reproduce if the dataset changes for some reason.

Alternatively, we can train a Neural Network to do this job for us. A Neural Network can be seen as a directed graph where nodes are categorized into layers. These nodes use the output of the previous layer as inputs, and provide new values to feed the following layer.

Each node has some internal values, known as weights, that represent the importance of certain inputs over others for that node; additionally, they have an activation function, which is a function that determines if the node triggers a value to the following layer or not. Metaphorically, you can view a node as a brain neuron that receives electrical impulses, checks if the combined intensities are above some threshold, and fire a new impulse to the following neurons connected to it. Weights are randomly initialized when the network is created, and adjusted through the learning process; we will cover this in the next post.

The term Deep comes from the fact that, in modern Machine Learning, with the amounts of data and computing power that we have nowadays, we can create Neural Networks with multiple layers. We typically have an input layer with as many nodes as features we have in our dataset (6 in our current example), an output layer with as many nodes as classes we have to classify (2 in our example: surviving and dying), and a variable number of hidden layers, each one with their own number of nodes. Each layer is fully connected to their previous and following layers.

The number of layers and nodes to be used in a specific problem requires a lot of experimentation and previous experience. It is recommended to start with something simple and iterate over it until results get better. Neural Networks have a lot of hyperparameters to fine-tune in order to obtain a model that has high accuracy.

Writing a model

As such, we are going to implement a very simple Neural Network. It will have an input layer, a single hidden layer, a dropout layer (don’t worry, we’ll explain this in a moment), and an output layer. This model can be written as:

struct TitanicModel: Layer {
  var hidden: Dense<Float>
  var dropout: Dropout<Float>
  var output: Dense<Float>
}

A Dense layer means it is a fully connected layer. A Dropout layer is a layer that drops some outputs from the previous layer with a given probability. Dropout is a regularization technique that is used to prevent trusting too much in a specific feature and reducing overfitting of the model. Notice also that our hidden, dropout, and output layers are declared as var instead of let; this is because they will be initialized with random values, and training will happen in-place, mutating those values.

In order to initialize this model, we can write an initializer:

init(_ hiddenUnits: Int) {
  hidden = Dense<Float>(
    inputSize: 6,
    outputSize: 25,
    activation: relu)

  dropout = Dropout<Float>(probability: 0.25)

  output = Dense<Float>(
    inputSize: 25,
    outputSize: 2,
    activation: softmax)
}
  • Our hidden layer has 6 inputs (the 6 features we have selected from the dataset) and 25 outputs. The activation function is the ReLU (Rectified Linear Unit) function, which is typically used in hidden layers.
  • Our dropout layer has a probability of dropping out a value of 0.25.
  • Our output layer has 25 inputs (must match the previous layer outputs) and 2 outputs (the 2 classes we want to estimate). The activation function is the softmax function, which essentially ensures the sum of all outputs add up to 1, giving us an estimation of the probability for each class.

Finally, we need to say how data flows through these layers:

@differentiable
func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> {
  input.sequenced(through: hidden, dropout, output)
}

That is, whenever we receive an input Tensor, we sequence it through the hidden, dropout, and output layers, and we get a new Tensor that will contain the probabilities for each estimated class. The @differentiable annotation indicates TensorFlow will need to compute derivatives to update the weights in these layers as part of the training process.

Conclusions

In this post, we have covered how we can write a simple Neural Network consisting of different layers, and how to connect them. If we try to use it as it is initialized, we will not get anything useful; weights are initialized randomly each time we run it. We need to train the network so that those weights get adjusted properly to fit the data, and finally use them to make predictions.

Ensure the success of your project

47 Degrees can work with you to help manage the risks of technology evolution, develop a team of top-tier engaged developers, improve productivity, lower maintenance cost, increase hardware utilization, and improve product quality; all while using the best technologies.