Original article can be found here (source): Artificial Intelligence on Medium

# Introduction to TLU and Perceptron

This is part of a series of implementations of relevant models in Machine Learning. The implementation pipeline starts with relevant theoretical exposition of the model followed by a detailed implementation and associated discussion.

## Threshold Logic Unit

The Threshold Logic Unit (TLU) is a basic form of machine learning model consisting of a single *input unit *(and corresponding weights), and an *activation function*. Note that the TLU is the most basic form of AI-neuron/computational unit, knowledge of which will lay the foundation for advanced topics in machine learning and deep learning. The TLU is based on mimicking the functionality of biological neuron at high-level. A typical neuron receives a multitude of inputs from afferent neurons, each associated with weight. The weighted-inputs are modulated in the receiving neuron (the efferent) and the neuron responds accordingly — fires/produces a pulse (1) or no firing/no pulse (0). This is achieved in the TLU via an activation function which takes the activation **a**** **as an input to generate a prediction *y`**. *A threshold

is defined and the model produces an output if the threshold is exceeded, otherwise no output.**θ**

In the TLU, each *input*

, is associated with a *xᵢ**weight *

, in which the sum of the weighted inputs (products of the input-weight *wᵢ*

) is computed to decide the activation *xᵢ *× *wᵢ**a*: *a = ∑ᴺᵢ₌₁ xᵢ × wᵢ** . *The below figure depicts a simple TLU architecture.

While the inputs remain unchanged, the weights are randomly initialised and are adjusted through a training technique. For the TLU, the training process relies on a pair of examples

, corresponding to an arbitrary datapoint *xᵢ,yᵢ**xᵢ** *and its class *yᵢ**. *This form of learning is referred to as a *supervised learning *because both the data instance and the target are used to direct the learning process. Other forms of learning, which I will not belabour here, are *unsupervised *(utilises the input to infer relevant clusters or categories) and *reinforcement *(motivates and rewards the model for a correct prediction, hence the model is aimed at maximising rewards). A model is said to learn if it can correctly classify a previously unseen datapoint. The final output or prediction is based on the sum of the weighted inputs:

.**y` = 1 if a ≥ θ otherwise y` = 0**

## Weights adjustment in the TLU

From the perspective of a learning model datapoints belong to groups that are demarcated by a one or more decision surface(s). Thus, the goal of a learning model is to achieve a certain task such as classification of objects after the training regime. During training, the model identifies a set of parameters or free parameters (e.g. weights) to be used in conjunction with the input to achieve the desired goal by identifying the decision surface. It is crucial in any ML-based model to identify the free parameters that enable the model to identify distinguishing features.

The TLU’s threshold is initialised as a scalar quantity that is used as a baseline or *bias*, i.e. a baseline to attain before the neuron fires. For uniformity, the threshold is treated as a weight with a constant input of

such that *-1*

is transformed and integrated into the mainstream of input-weight *xᵢ×wᵢ> θ*

for training. Consequently, a learning rule is defined and repeatedly applied until the right setting for the weights vector is obtained. Adjustment to the weights vector is a function of the output of the given instance. On that basis, the parameters are adjusted — either increased or decreased according to the learning rule. For each training epoch or regime, a marginal change is made to the weights. A well-trained model should be able to correctly classify new examples.**( xᵢ×wᵢ+(-1)×θ=0)**

## Some technical descriptions

With the augmented threshold, the action of the TLU is either positive or negative given by:

. Because the input vector **w⋅x** ≥ 0 → y = 1 or **w⋅x** < 0 → y = 0

is not affected during the training process (remains unchanged), only the weight vector **x****w*** *is adjusted to align properly with the input vector. Using a learning rate

to control the process, a new vector **α (0<α<1)***w`** *is formed which is closer to the input vector

. According to the decision rule, adjusting the weight can be based on addition or subtracting the weight vector; since both are likely, a learning rule that combines both is used instead. Thus, *x*

or **w`** = **w** + α**x**

results in **w`** = **w** — α**x**

, where **w`** = **w** + α(t-y**'**)**x**`t-y`

is used to decide the adjustment direction (increase or decrease). Alternatively, the relationship can be expressed in several ways:

- In terms of change in the weight vector:
`δ`

**w**=**w`**-**w**but**w**` =**w**+**α(**t**-y)x**, and δ**w**=**α(**t**-**y**)x** - Or in terms of components of the weights vector:
`δwᵢ =`

where**α(**tᵢ**-**yᵢ**)**xᵢ`i = 1 to n+1`

## TLU Implementation

Having established the theoretical base, the next step is to describe and implement the training phase of the model. Basically, the implementation is based on the following steps:

- Identify inputs and the corresponding representation
- Identify the free parameters in the problem
- Specify the learning rule
- Adjust the free parameters for optimisation
- Evaluate the model

**Perceptron Learning Rule**: The implementation here is based on the *perceptron training rule*, which is guaranteed to generate a valid weights vector that separates linearly separable data. More formally, the *Perceptron Convergence Theorem* states:

If two classes of vector X,Y are linearly separable, then application of the perceptron training algorithm will eventually result in a weight vector

w₀such thatw₀define a TLU whose decision hyperplane separate X and Y — Gurney (1997), pp. 43.

`repeat`

for each training vector pair (**x**,**t**)

evaluate the output y when **x **is input to the TLU

if y ≠ t then

form a new weight vector **w’ **according to the learning rule

else

do nothing

end if

end for

until y = t for all vectors

A basic python **class**** **to implement the TLU:

#import relevant package(s)

import numpy as npclass TLU(object): #initialise the parameter(s) for class operationalisation

def __init__(self, input_size):

self.weights = np.zeros(input_size+1)

def activate_tlu(self, x):

return 1 if x>= 0 else 0

def predict_tlu(self, row):

# this predicts individual row in a given dataset

xw = np.array(row).dot(self.weights)

a = self.activate_tlu(xw)

return a

def train_tlu(self, data, targets, epochs, lrate):

#training to identify the right setting for the weights

for epoch in range(epochs):

for row, t in zip(data, targets):

row = np.insert(row, 0,-1) #inserts the bias

pred = self.predict_tlu(row)

error = t - pred

# adjust the weights vector ...

if pred != t:

for r in range(len(self.weights)):

self.weights[r] = self.weights[r] + (lrate*error*row[r])

else:

continue

return self.weights

def __str__(self):

return('TLU Iteration!\n')

Instantiate the `class object`

for training and prediction:

`def tlu_pred(model,data,targets,epochs, lrate=0.2,toPrint=True):`

adj_w = model.train_tlu(data, targets, epochs, lrate)

if toPrint:

print(model)

return adj_w

** Dataset**: Irrespective of the problem to solve, the input needs to be transformed into numeric (usually real or binary values). Consider a basic input vectors:

**x₁=[0011]**

and **x₂=[0101]**

. The free parameter to search is the *weights vector*, which is randomly initialised to kick-start the learning.

`# Logical AND Data:`

andData = np.array([[0,0],[0,1],[1,0],[1,1]])

andTargets = np.array([0,0,0,1])

# Logical OR Data:

orData = np.array([[0,0],[0,1],[1,0],[1,1]])

orTargets = np.array([0,1,1,1])

Class instantiation:

`model = TLU(input_size= 2)# class instantiation ...`

#AND Data:

tlu_prediction(model, andData, andTargets, epochs=11,lrate=0.3)

#OR Data:

tlu_prediction(model, orData, orTargets, epochs=11,lrate=0.3)

Sample outputs using `AND data`

:

`TLU Iteration!`

Main Targets: [0 0 0 1]

Main Inputs: [[0 0]

[0 1]

[1 0]

[1 1]]

Adjusted weights: [0.9 0.6 0.3]

Using the `OR data`

:

`TLU Iteration!`

Main Targets: [0 1 1 1]

Main Inputs: [[0 0]

[0 1]

[1 0]

[1 1]]

Adjusted weights: [0.6 0.6 0.6]

**Conclusion**

The description in this post assumed a right settings for the weights to correctly classify a given linearly separable data. With a complex network, adjusting the weights is a challenge. An iterative process that utilises the input and initial set of free parameter for training will be introduced. Future post will be based on delta rule, a basic training rule for multilayer network. The post is inspired by Gurney (1997).