New architecture for the computers of the future
Most computers today have separate data storage and processing units, and data is shuttled back and forth every time a computation is performed. Now researchers in the Real-PIMSystem project are investigating a new computer architecture which promises to help improve both speed and energy efficiency, as Associate Professor Shahar Kvatinsky explains. The majority of
computers today are based on two units, a processing unit and the memory, which can be thought of as a kind of warehouse where data is stored. Currently, data is moved between these processing and memory units in order to perform computations. “Every time you want to perform a computation, you have to retrieve the relevant data, bring it to the processor, then do the processing. Then you need to put the result back in the warehouse,” explains Shahar Kvatinsky, an Associate Professor at the Technion - Israel Institute of Technology. This architecture, described by John von Neumann in 1945, is central to today’s computers, yet relentlessly moving data between these two units can limit speed and energy efficiency, particularly with dataintensive applications like image processing and artificial intelligence. “Moving the data takes much more time and energy than the data processing itself. This is called the von Neumann bottleneck,” says Professor Kvatinsky. As the Principal Investigator of the ERCfunded Real-PIM-System project, Professor Kvatinsky now aims to develop a new computer
www.euresearcher.com
Fig. 1: (a) The structure of a NOR logic gate based on memristors and (b) its integration inside a memory array.
architecture that promises to improve both performance and energy efficiency. While von Neumann is a towering figure in the history of mathematics, engineering and computer science, Professor Kvatinsky and his colleagues are now looking to move beyond the architecture that he described. “We want to develop an architecture where we can send a command that data should be processed inside the memory. We will not even take the data outside the memory,” he outlines. This is not an entirely new idea, yet earlier investigators looked more into processing near the memory; by processing data actually within the memory, researchers
aim to completely eliminate the need to move the data. “The place where the data is located is also where the data is processed,” explains Professor Kvatinsky.
Memristive memory processing unit The aim here is to develop a memristive memory processing unit (mMPU) capable of both storing and processing data, with researchers looking at both the design and fabrication of the architecture. A key step in this work involves giving the memory cells – memristors – computation capabilities; conventional memory cells cannot perform
59
Fig. 2: Improvements in speed and energy for different image processing tasks performed in the mMPU versus state-of-the-art hardware.
Real-PIM-System Memristive In-Memory Processing System
Project Objectives
Research objectives: - Create and design the memristive memory processing unit (mMPU) - a system that truly combines processing and storage of data using the same devices. - Demonstrating the superiority of the mMPU in terms of performance and energy efficiency over standard computers.
Project Funding these dual roles, so Professor Kvatinsky is looking towards new technologies. “With emerging memory technologies, such as resistive random access memory (RRAM) and phase change memory (PCM), we can store data and do computations using the same cells,” he says. Researchers are using a technique called memristor aided logic (MAGIC), developed by Professor Kvatinsky and his colleagues, to give these memristors the additional ability to perform computations. “We’ve developed several techniques so that we can perform computations inside the cell structure of a memory array,” he continues. A memory array is formed by rows and columns of cells which represent data. When a current flows across an array in one direction the resistance of the memristors drops, while if it flows in the other direction the resistance increases. “With the basic MAGIC NOR gate we have two input memristors. The initial resistance of the input memristors is the input – high resistance is 0, low resistance is 1,” explains Professor Kvatinsky. Data is represented within the architecture by these different levels of resistance; it has been shown that through the application of an external voltage, a memory array can be used for computation. “We need only two states to support memristor-aided logic, so the memory cells can be represented by either 0 or 1,” says Professor Kvatinsky. “Usually with memory arrays we select multiple cells, for the purpose of reading/writing. This operation is limited to a specific row.”
60
Parallelism This means that multiple operations can be performed in parallel on different rows and columns within the memory array, while Professor Kvatinsky and his colleagues have also identified how to isolate and select specific cells. These attributes could hold relevance to certain applications; for example internet searches. “If one million people try to use Google at the same time, how many are served in a second? This requires a lot of parallelism. One of the main advantages of the types of applications that we are looking at is that we can build very parallel machines,” stresses Professor Kvatinsky.
parallel, then you need the memory size to be large enough to support those operations,” points out Professor Kvatinsky. Power consumption is another major factor, while Professor Kvatinsky says it’s also important to think about the nature of the operation and whether it is sequential. “If an operation depends on the result of a previous operation, then you cannot run them simultaneously,” he explains. “Some applications are sequential – you need to perform something, and then you use the results on the next sequence.” A number of models have been developed within the project to evaluate the likely performance of the architecture in different
We’ve developed several techniques so that we can perform computations inside the cell structure of a memory array. Researchers are also looking at big data and various other applications where this architecture could have an impact. “We are trying to find where our system could help to accelerate artificial intelligence (AI) and other major applications,” continues Professor Kvatinsky. “The ideal scenario would be to help accelerate well-known algorithms.” There are however a number of limitations to consider in terms of the potential application of this system. One is the size of the memory required for a specific operation. “If you want to run 1,000 operations in
applications, which could provide important insights and help guide the future direction of research. This includes smaller models of elements within an architecture, up to larger models of the architecture as a whole. “The model evaluates the potential throughput, the potential performance of the architecture. We can then compare it to other systems and look to assess if it will make sense to run it in certain circumstances,” says Professor Kvatinsky. With this approach, the amount of time required to perform a computation can be quantified, so it’s possible to assess
EU Research
The Real-PIM-System project is funded by an ERC-STG - European Research Council Starting Grant for the amount of € 1 500 000.
Contact Details Fig. 3: (a) Standard computer architecture consists of a processor and a memory. (b) The proposed mMPU system that can compute and store data using the same memory array.
Project Coordinator, Shahar Kvatinsky, PhD, MBA Associate Professor, Electrical Engineering Technion - Israel Institute of Technology Haifa 3200003, Israel T: +972 77 887 4638 E: shahar@ee.technion.ac.il W: https://asic2.group W: https://kvatinsky.com
Shahar Kvatinsky, PhD, MBA
whether a specific application will benefit from the architecture. “For example, think about a demonstrator for image processing. Let’s say there are a million pixels in an image, and you want to make the image brighter,” outlines Professor Kvatinsky. This means modifying the brightness of each pixel separately with one million operations, all of which are independent from each other, as the result of one calculation is not required to successfully complete another. The situation is different with image recognition, however. “With image recognition, there is some correlation between the pixels. If you want to find the eyes in an image of a face for example, then you need multiple pixels,” points out Professor Kvatinsky. By using these models and rigorously assessing the performance of the architecture, Professor Kvatinsky hopes
www.euresearcher.com
to identify those areas and applications in which it could have a significant impact. “We aim to find out the most promising applications for this kind of architecture – to identify those areas where we will really improve performance and energy efficiency in comparison to conventional machines,” he explains. The technical work on the design of the architecture is also continuing, with the aim of developing and producing a full system come the conclusion of the project’s five-year funding term. The project is still at a relatively early stage, with researchers investigating questions around the design and fabrication of the system, yet Professor Kvatinsky says good progress has already been made. “We have developed a circuit in what we think of as the basic operations, and we’ve started to develop automatic flows,” he outlines.
Shahar Kvatinsky, PhD, MBA is an Associate Professor of Electrical Engineering at Technion, the Israel Institute of Technology, from where he also gained his PhD. Previously he worked as a circuit designer at Intel, before later taking up a post-doctoral position at Stanford University. His current research is focused on designing circuits and architectures with emerging memory technologies.
61