Introduction to TLU and Perceptron

Original article can be found here (source): Artificial Intelligence on Medium

Introduction to TLU and Perceptron

This is part of a series of implementations of relevant models in Machine Learning. The implementation pipeline starts with relevant theoretical exposition of the model followed by a detailed implementation and associated discussion.

Threshold Logic Unit

The Threshold Logic Unit (TLU) is a basic form of machine learning model consisting of a single input unit (and corresponding weights), and an activation function. Note that the TLU is the most basic form of AI-neuron/computational unit, knowledge of which will lay the foundation for advanced topics in machine learning and deep learning. The TLU is based on mimicking the functionality of biological neuron at high-level. A typical neuron receives a multitude of inputs from afferent neurons, each associated with weight. The weighted-inputs are modulated in the receiving neuron (the efferent) and the neuron responds accordingly — fires/produces a pulse (1) or no firing/no pulse (0). This is achieved in the TLU via an activation function which takes the activation a as an input to generate a prediction y`. A threshold θ is defined and the model produces an output if the threshold is exceeded, otherwise no output.

In the TLU, each input xᵢ, is associated with a weight wᵢ, in which the sum of the weighted inputs (products of the input-weight xᵢ × wᵢ) is computed to decide the activation a: a = ∑ᴺᵢ₌₁ xᵢ × wᵢ . The below figure depicts a simple TLU architecture.

A simple network with a set of weighted-inputs, processing unit and an output unit. The linear sum of the inputs x,x₂ and the bias node and their corresponding weights.

While the inputs remain unchanged, the weights are randomly initialised and are adjusted through a training technique. For the TLU, the training process relies on a pair of examples xᵢ,yᵢ, corresponding to an arbitrary datapoint xᵢ and its class yᵢ. This form of learning is referred to as a supervised learning because both the data instance and the target are used to direct the learning process. Other forms of learning, which I will not belabour here, are unsupervised (utilises the input to infer relevant clusters or categories) and reinforcement (motivates and rewards the model for a correct prediction, hence the model is aimed at maximising rewards). A model is said to learn if it can correctly classify a previously unseen datapoint. The final output or prediction is based on the sum of the weighted inputs: y` = 1 if a ≥ θ otherwise y` = 0.

Weights adjustment in the TLU

From the perspective of a learning model datapoints belong to groups that are demarcated by a one or more decision surface(s). Thus, the goal of a learning model is to achieve a certain task such as classification of objects after the training regime. During training, the model identifies a set of parameters or free parameters (e.g. weights) to be used in conjunction with the input to achieve the desired goal by identifying the decision surface. It is crucial in any ML-based model to identify the free parameters that enable the model to identify distinguishing features.

The TLU’s threshold is initialised as a scalar quantity that is used as a baseline or bias, i.e. a baseline to attain before the neuron fires. For uniformity, the threshold is treated as a weight with a constant input of -1 such that xᵢ×wᵢ> θis transformed and integrated into the mainstream of input-weight (xᵢ×wᵢ+(-1)×θ=0) for training. Consequently, a learning rule is defined and repeatedly applied until the right setting for the weights vector is obtained. Adjustment to the weights vector is a function of the output of the given instance. On that basis, the parameters are adjusted — either increased or decreased according to the learning rule. For each training epoch or regime, a marginal change is made to the weights. A well-trained model should be able to correctly classify new examples.

Some technical descriptions

With the augmented threshold, the action of the TLU is either positive or negative given by: w⋅x ≥ 0 → y = 1 or w⋅x < 0 → y = 0. Because the input vector x is not affected during the training process (remains unchanged), only the weight vector w is adjusted to align properly with the input vector. Using a learning rate α (0<α<1) to control the process, a new vector w` is formed which is closer to the input vector x. According to the decision rule, adjusting the weight can be based on addition or subtracting the weight vector; since both are likely, a learning rule that combines both is used instead. Thus, w` = w + αx or w` = w — αx results in w` = w + α(t-y')x, where t-y is used to decide the adjustment direction (increase or decrease). Alternatively, the relationship can be expressed in several ways:

  1. In terms of change in the weight vector: δw = w`- w but w` = w + α(t-y)x , and δw = α(t-y)x
  2. Or in terms of components of the weights vector: δwᵢ = α(tᵢ-yᵢ)xᵢ where i = 1 to n+1

TLU Implementation

Having established the theoretical base, the next step is to describe and implement the training phase of the model. Basically, the implementation is based on the following steps:

  1. Identify inputs and the corresponding representation
  2. Identify the free parameters in the problem
  3. Specify the learning rule
  4. Adjust the free parameters for optimisation
  5. Evaluate the model

Perceptron Learning Rule: The implementation here is based on the perceptron training rule, which is guaranteed to generate a valid weights vector that separates linearly separable data. More formally, the Perceptron Convergence Theorem states:

If two classes of vector X,Y are linearly separable, then application of the perceptron training algorithm will eventually result in a weight vector w₀ such that w₀ define a TLU whose decision hyperplane separate X and Y — Gurney (1997), pp. 43.

for each training vector pair (x,t)
evaluate the output y when x is input to the TLU
if y ≠ t then
form a new weight vector w’ according to the learning rule
do nothing
end if
end for
until y = t for all vectors

A basic python class to implement the TLU:

#import relevant package(s)
import numpy as np
class TLU(object): #initialise the parameter(s) for class operationalisation
def __init__(self, input_size):
self.weights = np.zeros(input_size+1)

def activate_tlu(self, x):
return 1 if x>= 0 else 0

def predict_tlu(self, row):
# this predicts individual row in a given dataset
xw = np.array(row).dot(self.weights)
a = self.activate_tlu(xw)
return a

def train_tlu(self, data, targets, epochs, lrate):
#training to identify the right setting for the weights
for epoch in range(epochs):
for row, t in zip(data, targets):
row = np.insert(row, 0,-1) #inserts the bias
pred = self.predict_tlu(row)
error = t - pred
# adjust the weights vector ...
if pred != t:
for r in range(len(self.weights)):
self.weights[r] = self.weights[r] + (lrate*error*row[r])
return self.weights

def __str__(self):
return('TLU Iteration!\n')

Instantiate the class object for training and prediction:

def tlu_pred(model,data,targets,epochs, lrate=0.2,toPrint=True):
adj_w = model.train_tlu(data, targets, epochs, lrate)
if toPrint:
return adj_w

Dataset: Irrespective of the problem to solve, the input needs to be transformed into numeric (usually real or binary values). Consider a basic input vectors: x₁=[0011] and x₂=[0101]. The free parameter to search is the weights vector, which is randomly initialised to kick-start the learning.

# Logical AND Data:
andData = np.array([[0,0],[0,1],[1,0],[1,1]])
andTargets = np.array([0,0,0,1])
# Logical OR Data:
orData = np.array([[0,0],[0,1],[1,0],[1,1]])
orTargets = np.array([0,1,1,1])

Class instantiation:

model = TLU(input_size= 2)# class instantiation ...
#AND Data:
tlu_prediction(model, andData, andTargets, epochs=11,lrate=0.3)
#OR Data:
tlu_prediction(model, orData, orTargets, epochs=11,lrate=0.3)

Sample outputs using AND data:

TLU Iteration!

Main Targets: [0 0 0 1]

Main Inputs: [[0 0]
[0 1]
[1 0]
[1 1]]

Adjusted weights: [0.9 0.6 0.3]

Using the OR data:

TLU Iteration!

Main Targets: [0 1 1 1]

Main Inputs: [[0 0]
[0 1]
[1 0]
[1 1]]

Adjusted weights: [0.6 0.6 0.6]


The description in this post assumed a right settings for the weights to correctly classify a given linearly separable data. With a complex network, adjusting the weights is a challenge. An iterative process that utilises the input and initial set of free parameter for training will be introduced. Future post will be based on delta rule, a basic training rule for multilayer network. The post is inspired by Gurney (1997).

Further Resources