Artificial Intelligence and Dedicated Architectures

As the use of neural networks progresses, the need for dedicated hardware becomes clearer and clearer, since each use case has different computational requirements, and the continuous increase in the complexity of AI systems requires a similar increase in computational power on part of the physical support that hosts them. We are therefore witnessing a greater attention to the hardware architecture that becomes the enabling core of innovation.

The new hardware, designed specifically for AI, is expected to accelerate the training and performance of neural networks while limiting issues with size, power consumption and cost.

Using dedicated hardware for AI is not a new concept – fortunately, unlike in the first “Winter of Artificial Intelligence”, today’s technology is able to effectively support neural networks. In fact, while general-purpose computers are technically capable of running the algorithms in question, the physical limitations of their architecture are starting to become tangible, quickly becoming bottlenecks.

Now, especially after the stalemate caused by Covid, the need arises for companies to embrace digital transformation and intelligent process automation, and the demand for IT services is therefore bound to increase – thus the introduction of better performing hardware would be an extremely powerful boost to speed up the recovery of the whole system.

Dedicated hardware and balancing effort and performance

There is currently a massive offering of AI solutions on the cloud, determined by the fact that the hardware resources required to work with neural networks are significant.

The cloud, by removing the computational burden from the individual, makes the “democratization” of AI possible, allowing it to be used by small and medium-sized companies (if not micro) and not only by Big Companies.

The other element that facilitates the penetration of AI applications is Edge Computing, which is able to provide immediate answers exactly where the data is collected, thanks to peripheral processing, which, thanks to AI hardware accelerators, is more and more performing and with lower and lower costs.

We are seeing the emergence of hardware being defined by software, and so the selection of these two elements (traditionally decoupled) is becoming more complex – algorithms will exist that can only run on specific physical media.

Understanding the needs of the developers, and ultimately of the end users, is a critical step in the design process of new hardware architectures. Developers need flexible solutions that allow them to focus on the development process of software and to minimize time and effort by reducing time-to-market – this demand is answered by AI accelerators, FPGAs first and foremost.

What is a Hardware Accelerator for Artificial Intelligence?

Hardware acceleration has many advantages, the main one being speed. Accelerators greatly reduce the time it takes to train and execute a neural network, implementing special AI-based tasks that cannot be performed on a conventional CPU. In conventional systems, processors work sequentially – instructions are executed one by one, while hardware accelerators improve the execution of a specific algorithm by allowing for greater parallelization in general; in fact, most of the operations involved in inference or training a neural network can be performed in parallel, helping to perform the task more efficiently in terms of time and energy consumption. Dedicated hardware consumes less energy than co-processors or general-purpose computing processors.

When we talk about AI hardware, we refer to certain types of accelerators (or also NPU, Neural Processing Unit) i.e., a class of microprocessors, or microchips, designed to enable faster processing of applications – especially in machine learning – in neural networks, computer vision and machine learning algorithms for robotics, the Internet of Things and other data-driven applications.

As the need for computational resources – both for training and inference of increasingly complex neural networks – grows exponentially, we are looking forward to a new generation of chips that have peculiar capabilities:

  • More computing power and efficiency: next-generation AI hardware solutions will have to be more powerful and more efficient in terms of cost and energy consumption;
  • Cloud and Edge computing: new silicon architectures will need to support deep learning, neural networks and computer vision algorithms, with specific models for Cloud and Edge applications;
  • Rapid Insights: be able to provide AI solutions – both software and hardware – to companies to analyze customer behavior and preferences much faster, in order to improve service through a more engaging user experience;
  • New Materials: new research is being done to move from traditional silicon to optical processing chips, with the objective of developing optical systems that are much faster than traditional CPUs or GPUs;
  • New Architectures: new types of architectures such as neuromorphic chips – that is, chips that mimic the synapses of brain cells.

Main types of Hardware Accelerator

AI algorithms therefore require us to improve performance and optimize time and costs. Therefore, the traditional central processing units (CPUs) have been joined by a wide range of hardware accelerators, each with specific characteristics.


The GPU (Graphics Processing Unit) is the specialized electronic circuitry designed to render 2D and 3D graphics in conjunction with a CPU.]

These processors can perform a limited variety of tasks but are extremely efficient at parallelization, which is critical for deep learning.

GPUs specialize in image manipulation. Since neural networks and image manipulation share the same mathematical basis, GPUs are frequently used for machine learning applications. For this reason, GPU manufacturers have begun to incorporate hardware specific to neural networks, such as tensor cores.

Unlike the CPU, the work logic of the GPU is parallel: this is possible because from the architectural point of view is composed of hundreds of cores: tiny processors that can handle thousands of instructions simultaneously.


As mentioned the CPU for handling graphics processing needs the GPU, however, designing and manufacturing two units to handle this data is inefficient. The solution to this problem is the APU ( Accelerated Processing Unit), which is designed to combine the two separate units (CPU and GPU) on a single chip.

Industry has come to the realization that reducing the footprint of processing units reduces costs, allows more space for other hardware, and is more efficient to boot. In addition, encasing the two components in the same chip increases data transfer rates and reduces power consumption.


An ASIC (Application-Specific Integrated Circuit) is a chip that is entirely created and designed for a specific application. Because it is highly optimized for a specific function, it typically operates at a higher level of efficiency than a CPU or APU. In short, ASICs are fast and consume less power than other chips. The criticality lies in the need for significant design investment and also the difficulty in making changes to them to quickly refine or update functionality.

This makes it best employed by those whose goal is high volume production and for relatively stable functionality.

In recent years, the demand for ASICs has grown to address the widespread use in smartphones and tablets for bandwidth requirements.


TPU (Tensor Processing Unit) is nothing but a particular ASIC optimized for neural networks. Google has created its own IC developed specifically for machine learning and customized for TensorFlow, its open-source framework for machine learning applications.

TPU is 15 to 30 times faster than contemporary GPUs and CPUs and is much more energy efficient, offering a 30 to 80 times improvement.
Just out of curiosity all versions of AlphaGO have been implemented with TPU.


Vision Processing Units (VPU) are specialized architectures for Machine Vision. The fundamental difference between GPU and VPU is that the former are designed primarily for rendering, while the latter implement neural networks optimized for recognition in images and videos. They are mainly used in drones, smart cars, augmented and virtual reality.

Recently the new generation of VPUs designed for AI applications in Edge has been launched on the market. These processors are dedicated to multimedia applications, computer vision and inference in Edge: in this area they give the best of themselves, resulting up to 10 times faster than the previous generation.


FPGAs (Field-Programmable Gate Arrays) are integrated circuits designed to be configured after production to implement arbitrary logic functions in hardware.

Reconfigurable devices such as FPGAs simplify the management of evolving hardware by meeting the demand for specific hardware for deep learning.

With additional tools and frameworks such as Open Computing Language (OpenCL), FPGAs allow traditional software developers to embed custom logic in hardware – effectively creating their own hardware-specific accelerators – hence the term ‘software-defined hardware’. Several cloud providers, such as Amazon and Microsoft, are already offering FPGA cloud services, referred to as FPGA-as-a-service.

FPGAs are used to accelerate the inference process for artificial neural networks (NN).

  • Logic functions are implemented by means of configurable logic cells;
  • These logic cells are distributed in a 2D grid structure and interconnected using configurable routing resources.

FPGAs are not new: in fact, they have existed since the 1990s, but some aspects have prevented their widespread use:

  • Difficulty of programming: a specific language is used (VHDL or Verilog) which is relatively uncommon.
  • Requirement for a complex testing and simulation phase for the hardware being developed.

The implementation of an accelerator using FPGAs makes it possible to reproduce typical ASIC architectures such as TPU or NNP in a much more flexible (although less efficient) way. If the NN model changes, the HW accelerator can be replaced without changing chip or platform.

The FPGA is different from other hardware accelerators in that it has no specific functionality when produced. As a hardware container, it has to be programmed. But once the operating logic is loaded, running algorithms at the hardware level can produce orders of magnitude in performance improvement. Unlike ASIC-based accelerators, which can take months to design and manufacture, FPGA-based accelerators can be developed in a matter of weeks.

Perhaps the greatest strength of the FPGA, however, is its ability to be reprogrammed. Its functionality can be further refined and upgraded in the field, especially in rapidly evolving areas such as machine learning. By using FPGA technology, Microsoft – for a certain type of calculation – has achieved up to 150-200x improvement in data throughput, up to 50x improvement in energy efficiency compared to a CPU, and a latency reduction of around 75 per cent.


They are processors designed to realistically mimic the interactions between synapses.

SNNs (Spiking Neural Networks) are artificial neural networks that mimic natural ones. In addition to the neuronal and synaptic state, SNNs also incorporate the concept of time in their operational model, i.e. the frequency of signal transmission (spike). The operating logic is not binary but analogic.

The aim is to create self-learning systems. The learning capability allows neuromorphic chips to perform multiple computing tasks at a much faster speed, resulting in incredible energy efficiency.


Paradoxically, the more ethereal, widespread and powerful the algorithms become, the more they need concreteness, ‘AI-grounding iron’.

link to the italian version