RISC-V AI Chips Will Be Everywhere - IEEE Spectrum

kuaciasing.blogspot.com

The adoption of RISC-V, a free and open-source computer instruction set architecture first introduced in 2010, is taking off like a rocket. And much of the fuel for this rocket is coming from demand for AI and machine learning. According to the research firm Semico, the number of chips that include at least some RISC-V technology will grow 73.6 percent per year to 2027, when there will be some 25 billion AI chips produced, accounting for US $291 billion in revenue.

The increase from what was still an upstart idea just a few years ago to today is impressive, but for AI it also represents something of a sea change, says Dave Ditzel, whose company Esperanto Technologies has created the first high-performance RISC-V AI processor intended to compete against powerful GPUs in AI-recommendation systems. According to Ditzel, during the early mania for machine learning and AI, people assumed general-purpose computer architectures—x86 and Arm—would never keep up with GPUs and more purpose-built accelerator architectures.

“We set out to prove all those people wrong,” he says. “RISC-V seemed like an ideal base to solve a lot of the kinds of computation people wanted to do for artificial intelligence.”

With the company’s first silicon—a 1,092-core AI processor—in the hands of a set of early partners and a major development deal with Intel [see sidebar], he might soon be proved right.

Ditzel’s entire career has been defined by the theory behind RISC-V. RISC, as you may know, stands for reduced instruction set computer. It was the idea that you could make a smaller, lower-power but better-performing processor by slimming down the core set of instructions it can execute. IEEE Fellow David Patterson coined the term in a seminal paper in 1980. Ditzel, his student, was the coauthor. Ditzel went on to work on RISC processors at Bell Labs and Sun Microsystems before cofounding Transmeta, which made a low-power processor meant to compete against Intel by translating x86 code for a RISC architecture.

With Esperanto, Ditzel saw RISC-V as a way to accelerate AI with relatively low power consumption. At a basic level, a more complex instruction set architecture means you need more transistors on the silicon to make up the processor, each one leaking a bit of current when off and consuming power when it switches states. “That was what was attractive about RISC-V,” he says. “It had a simple instruction set.”

The Core

The core of RISC-V is a set of just 47 instructions. The actual number of x86 instructions is oddly difficult to enumerate, but it’s likely near 1,000. Arm’s instruction set is thought to be much smaller, but still considerably larger than RISC-V’s. But simply using a slim set of instructions wouldn’t be enough to achieve the computing power Esperanto was aiming for, says Ditzel. “Most of the RISC-V cores out there aren’t that small or that energy efficient. So it’s not just a question of us taking a RISC-V core and slapping 1,000 of them on a chip. We had to completely redesign the CPU so that it would fit into those very tough constraints.”

Notably missing from the RISC-V instruction set at the time Ditzel and his colleagues started work were the “vector” instructions needed to efficiently do the math of machine learning, such as matrix multiplication. So Esperanto engineers came up with their own. As embodied in the architecture of the processor core, the ET-Minion, these included units that do 8-bit integer vectors and both 32- and 16-bit floating-point vectors. There are also units that do more complex “tensor” instructions, and systems related to the efficient movement of data and instructions related to the arrangement of ET-Minion cores on the chip.

The resulting system-on-chip, ET-SoC-1, is made up of 1,088 of the ET-Minion cores along with four cores called ET-Maxions, which help govern the Minions’ work. The chip’s 24 billion transistors take up 570 square millimeters. That puts it at around half the size of the popular AI accelerator Nvidia A100. Those two chips follow very different philosophies.

The ET-SoC-1 was designed to accelerate AI in power-constrained data centers at the heart of boards that fit into the peripheral component interconnect express (PCIe) slot of already installed servers. That meant the board had only 120 watts of power available, but it would have to provide at least 100 trillion operations per second to be worthwhile. Esperanto managed more than 800 trillion in that power envelope.

Most AI accelerators are built around a single chip that uses the bulk of the board’s power budget, Esperanto.ai principal architect Jayesh Iyer told technologists at the RISC-V Summit in December. “Esperanto’s approach is to use multiple low-power chips, which still fits within the power budget,” he said.

Each chip consumes 20 W when performing a recommender-system benchmark neural network—less than one-tenth what the A100 can draw—and there are six on the board. That combination of power and performance was achieved by reducing the chips’ operating voltages without the expected sacrifice in performance. (Generally, a higher operating voltage means you can run the chip’s clock faster and get more computing done.) At 0.75 volts, the nominal voltage for the ET-SoC-1’s manufacturing process, a single chip would blow way past the board’s power budget. But when dropping the voltage to about 0.4 V, you can run six chips from the board’s 120 W and achieve better than a fourfold boost in recommender-system performance over the single higher-voltage chip. At that voltage, each ET-Minion core is consuming only about 10 milliwatts.

“Low voltage operation is the key differentiator for Esperanto’s ET-minion [core] design,” said Iyer. It informed architectural and circuit-level decisions, he said. For instance, the core’s pipeline for the RISC-V integer instructions is made up of the fewest number of logic gates per clock cycle, allowing a higher clock rate at the reduced voltage. And when the core is performing long tensor computations, that pipeline is shut down to save energy.

Other AI Processors

Other recently developed AI processors have also turned to a combination of RISC-V and their own custom machine-learning acceleration. For example, Ceremorphic, which recently came out of stealth with its Hierarchical Learning Processor, uses both a RISC-V and an Arm core along with its own custom machine-learning and floating-point arithmetic units. And Intel’s upcoming Mobileye EyeQ Ultra will have 12 RISC-V cores with its neural-network accelerators in a chip meant to provide the intelligence for Level 4 autonomous driving.

Turning to RISC-V was both a business and technical move for embedded AI processor firm Kneron. The company has been selling chips and intellectual property using Arm CPU cores and its custom accelerator infrastructure. But last November Kneron released its first RISC-V-based tech in the KL530, aimed at supporting autonomous driving with a relatively new type of neural network called a vision transformer. According to Kneron CEO Albert Liu, the RISC-V architecture makes it easier to preprocess neural-network models so they run more efficiently. However, “it also made sense in light of the potential Arm acquisition by Nvidia last year to de-risk ourselves of any possible business decisions that could impact us,” he says. That deal fell apart in February but would have put the provider of Kneron’s previous CPU core architecture in the hands of a competitor.

Future RISC-V processors will be able to tackle machine-learning-related operations using an open-source set of instructions agreed upon by the community. RISC-V International, the body that governs the codification of the core instruction set architecture and new extensions, ratified a set of just over 100 vector instructions in December 2021.

With the new vector instructions, “somebody doing their own thing in AI doesn’t have to start from scratch,” says the organization’s CTO, Mark Himelstein. “They can use the instructions that other companies are using. They can use the tools that other companies are using. And then they can innovate in the implementation or power consumption, or performance, or whatever it is their added value is.”

Even with the vector extensions, promoting machine learning remains a top priority for the RISC-V community, says Himelstein. Most of the development of ML-related extensions to RISC-V is happening in the organization’s graphics special interest group, which merged with the machine-learning group “because they wanted the same things,” he says. But other groups, such as those interested in high-performance and data-center computing, are also focusing on ML-related extensions. It's Himelstein’s job to make sure the efforts converge where they can.

Despite RISC-V’s successes, Arm is the market leader in many of the markets where a lot of new AI functions are being added, and it is likely to still be so five years from now, with RISC-V capturing about 15 percent of total revenue in the market for CPU core designs, says Semico Research principal analyst Rich Wawrzyniak. “It’s not 50 percent, but it’s not 5 percent either. And if you think about how long RISC-V has been around, that’s pretty fast growth.”

From Your Site Articles