top of page

Simple and Linear Fast Adder

Although processors may seem fast and efficient, processing power is still insufficient. In a series of peer reviewed articles and conferences, I have proposed a “Simple and Linear Fast Adder” architecture for Arithmetic Logic Units. The Von Neumann bottleneck, which is responsible in large part for a processor's time delay and energy consumption, is bypassed with this design that seamlessly implements a Compute-In-Memory architecture without having to invest heavily in R+D for new memory and transistor types (SRAM/ReRAM/FeRam, etc.). This breakthrough enables faster, energy-efficient processors crucial for AI and ML, and other operation-intensive applications requiring high performance ASICs, GPUs and TPUs.


Other fast adders increase their complexity and area in proportion to the square of the number of bits. Our adder has constant circuit complexity, small gate depth, and it is linearly scalable. On top of better time and energy efficiency, reduced design, production and material costs, this Simple and Linear Fast Adder offers even greater advantage with respect to other architectures because a Compute-In-Memory architecture where addition of multiple inputs can be implemented. This circuit is also scalable to achieve fast In-Memory Matrix Multiplication.

This transformative IP represents a huge opportunity for AI hardware, cryptography, and edge computing innovations. The SLFA's in-memory computation capabilities are ideally suited for accelerating matrix operations in AI/ML processors, cryptography, and powering next-generation IoT devices. With its mathematically-proven efficiency and direct applications across trillion-dollar industries, this patent offers a decisive competitive advantage in the race toward post-Von Neumann computing architectures. The technology's CIM design makes it particularly valuable for neuromorphic systems and analog computing applications, while its foundational IP position creates opportunities for broad portfolio expansion.

The Significance of Matrix Multiplication in Modern Technology

 

Matrix multiplication is a cornerstone operation in mathematics, computer science, and engineering, enabling the modeling and computation of complex relationships between datasets. Its efficiency directly impacts the performance of numerous cutting-edge applications, including:

  1. Artificial Intelligence (AI) and Machine Learning (ML)
    Matrix multiplication powers the core operations of neural networks, such as applying weights to inputs during forward and backward propagation. It directly affects the training speed and scalability of AI models, which are foundational in applications like natural language processing, computer vision, and recommendation systems.

  2. Computer Graphics and Gaming
    Transformations like rotation, scaling, and translation in 3D graphics rely on matrix operations. Matrix multiplication enables real-time rendering for gaming, simulations, and virtual or augmented reality environments.

  3. Cryptography and Security
    Matrix multiplication is fundamental to many cryptographic algorithms used for secure key exchange, encryption, and decryption. Speeding up these operations improves the efficiency of securing sensitive data, especially in real-time applications.

  4. Scientific Computing and Simulations
    In fields like physics, chemistry, and weather modelling, matrix operations are crucial for solving large-scale simulations and numerical methods. Faster matrix multiplication enables higher accuracy and more complex models to be processed in less time.

  5. Data Analysis and Big Data
    Techniques like principal component analysis (PCA) and machine learning models leverage matrix multiplication to analyze correlations and patterns in massive datasets, driving insights in industries such as finance, healthcare, and marketing.

  6. Signal Processing
    Digital signal processing for audio, image, and video data relies on matrix multiplication for tasks like filtering, transformations, and compression. This operation is integral to technologies like MP3 encoding, video compression, and medical imaging.

  7. Optimization Problems
    From logistics to robotics, many optimization techniques involve solving equations that rely on matrix operations. Fast and efficient matrix multiplication accelerates decision-making and problem-solving in real-time systems.

The computational cost of matrix multiplication increases rapidly with matrix size. As demand for computational power continues to grow, especially in fields like AI, big data, and cryptography, advancements in matrix multiplication hardware will be essential to driving innovation and meeting future challenges. Innovations like the Fast Arithmetic Unit (FAU), which incorporates In-Memory matrix multiplication, are critical.

 

Compute-In-Memory Architecture: Unlocking New Possibilities

The traditional Von Neumann architecture separates memory and computation, requiring data to move back and forth between these components. This creates a significant bottleneck, especially for computationally intensive tasks like matrix multiplication. Compute-In-Memory (CIM) architecture eliminates this bottleneck by performing computations directly within memory, offering several transformative benefits:

  • Reduced Latency: By minimizing data transfer between memory and the processor, CIM significantly accelerates matrix operations.

  • Energy Efficiency: Performing In-Memory calculations reduces power consumption, making it ideal for applications requiring sustained performance, such as data centers and AI training.

  • Scalability: The architecture supports parallel processing of matrix operations, crucial for high-performance computing tasks.

  • Compact Design: CIM reduces the hardware footprint, enabling its integration into smaller devices, from mobile devices to edge computing nodes.

 

When combined with optimized hardware like the Fast Arithmetic Unit (FAU), the compute-in-memory architecture amplifies the impact of matrix multiplication by delivering unmatched computational speed and efficiency. This synergy is particularly vital in meeting the growing demands of AI, big data, and real-time systems.

​​​

  1. Proposal

  2. Patentability Report from International Searching Authority

  3. Articles

  4. Conferences

  5. Additional Links

How do Graphics Cards Work?  Exploring GPU Architecture
28:30

How do Graphics Cards Work? Exploring GPU Architecture

Interested in working with Micron to make cutting-edge memory chips? Work at Micron: https://bit.ly/micron-careers Learn more about Micron's Graphic Memory! Explore Here: https://bit.ly/micron-graphic-memory Curious about AI memory and HBM3E? Take a look: https://bit.ly/micron-hbm3e Graphics Cards can run some of the most incredible video games, but how many calculations do they perform every single second? Well, some of the most advanced graphics perform 36 Trillion calculations or more every single second. But how can a single device manage these tens of trillions of calculations? In this video, we explore the architecture inside the 3090 graphics card and the GA102 GPU chip architecture. Note: We chose to feature the 30 series of GPUs because, to create accurate 3D models, we had to tear down a 3090 GPU rather destructively. We typically select a slightly older model because we're able to find broken components on eBay. If you're wondering, the 4090 can perform 82.58 trillion calculations a second, and then we're sure the 5090 will be even more. Table of Contents: 00:00 - How many calculations do Graphics Cards Perform? 02:15 - The Difference between GPUs and CPUs? 04:56 - GPU GA102 Architecture 06:59 - GPU GA102 Manufacturing 08:48 - CUDA Core Design 11:09 - Graphics Cards Components 12:04 - Graphics Memory GDDR6X GDDR7 15:11 - All about Micron 16:51 - Single Instruction Multiple Data Architecture 17:49 - Why GPUs run Video Game Graphics, Object Transformations 20:53 - Thread Architecture 23:31 - Help Branch Education Out! 24:29 - Bitcoin Mining 26:50 - Tensor Cores 27:58 - Outro We're working on more ambitious subjects like computer architecture and graphics cards. Any contribution would greatly help make these videos. https://www.patreon.com/brancheducation Branch Education Website: https://www.branch.education Branch Education Facebook: https://www.facebook.com/BranchEducation/ Key Branches from this video are: How do Video Game Graphics Work? https://youtu.be/C8YtdC8mxTU Animation: Mike Radjabov, Sherdil Davronov, Adrei Dulay, Parvesh Khatri Research, Script and Editing: Teddy Tablante Twitter: @teddytablante Modeling: Mike Radjabov, Prakash Kakadiya Voice Over: Phil Lee Sound Design by Drilu: www.drilu.world Sound Design and mix: David Pinete Additional Sound Design: Raúl Núñez Supervising Sound Editor: Luis Huesca Erratum: 04:50 Ubuntu is a type of Linux 08:45 3080 has 10GB, not 12GB. Image Attribution: Williams, George. Jacquard Loom 6/6 https://www.flickr.com/photos/ghwpix/18056523/in/album-72157631870990316/ Eiermann, Georg. A close up of a weaving machine in a room. Jacquard Loom. https://unsplash.com/photos/a-close-up-of-a-weaving-machine-in-a-room--jvBPiva0vc Wikipedia contributors. "Embarrassingly Parallel", "Graphics Processing Unit", "Parallel Computing" , " SIMD", " Single Instruction, Multiple Threads" , "Thread block (CUDA Programming)". Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, Visited October 18th 2024 Textbooks and Papers A TON of information was found in Nvidia's white papers. You can find them here: https://research.nvidia.com/publications We recommend the GA102 architecture white paper and the fermi architecture white paper. #GPU #GraphicsCard #Micron
How Computers Calculate - the ALU: Crash Course Computer Science #5
11:10

How Computers Calculate - the ALU: Crash Course Computer Science #5

Today we're going to talk about a fundamental part of all modern computers. The thing that basically everything else uses - the Arithmetic and Logic Unit (or the ALU). The ALU may not have to most exciting name, but it is the mathematical brain of a computer and is responsible for all the calculations your computer does! And it's actually not that complicated. So today we're going to use the binary and logic gates we learned in previous episodes to build one from scratch, and then we'll use our newly minted ALU when we construct the heart of a computer, the CPU, in episode 7. *CORRECTION* We got our wires crossed with the Intel 4004, which we discuss later. The 74181 was introduced by Texas Instruments in 1970 but appeared in technical manuals around 1969. The design of the 74181, like most of the 74xx/74xxx series, was an open design which was manufactured by many other companies - Fairchild was one such manufacturer. They produced a chip, the Fairchild 9341, which was pin-for-pin compatible with the 74181. Fairchild was the first to prototype an ALU, building the Fairchild 4711 in 1968 - a one-off device not optimized for scale manufacturing. In 1969, Signetics came out with the 8260, which they marketed in a very limited sense (it was attached, AFAICT, to one particular computer, the Data General SUPERNOVA). TI follows afterwards (March 1970) with the 74181, coupled with the 9341 from Fairchild. The 74181 became the standard number for this part, and was available from many manufacturers (back in those days, chip makers cross-licensed designs all over the place in order to provide assurance that their part could be sourced from multiple manufacturers). Produced in collaboration with PBS Digital Studios: http://youtube.com/pbsdigitalstudios The Latest from PBS Digital Studios: https://www.youtube.com/playlist?list... We’ve got merch! https://store.dftba.com/collections/crashcourse Want to know more about Carrie Anne? https://about.me/carrieannephilbin Want to find Crash Course elsewhere on the internet? Facebook - http://www.facebook.com/YouTubeCrashC... Twitter - http://www.twitter.com/TheCrashCourse Tumblr - http://thecrashcourse.tumblr.com Support Crash Course on Patreon: http://patreon.com/crashcourse CC Kids: http://www.youtube.com/crashcoursekids Want to find Crash Course elsewhere on the internet? Facebook - http://www.facebook.com/YouTubeCrashCourse Twitter - http://www.twitter.com/TheCrashCourse Tumblr - http://thecrashcourse.tumblr.com Support Crash Course on Patreon: http://patreon.com/crashcourse CC Kids: http://www.youtube.com/crashcoursekids
Architecture All Access: Modern CPU Architecture Part 1 – Key Concepts | Intel Technology
18:58

Architecture All Access: Modern CPU Architecture Part 1 – Key Concepts | Intel Technology

What is a CPU, and how did they become what they are today? Boyd Phelps, CVP of Client Engineering at Intel, takes us through the history of CPU architecture, key architecture concepts like computing abstraction layers, Instruction Set Architecture (ISA), and more. Watch part two here: https://youtu.be/o_WXTRS2qTY Boyd Phelps has worked on some of the most well-known chip designs in Intel’s history, from Nehalem to Haswell to Tiger Lake and more. Architecture All Access is a master class technology series featuring Senior Intel Technical Leaders taking an educational approach to the historical impact and future innovations of key architectures that will continue to be at the center of ‘world-changing technology that enriches the lives of every person on earth.’ If you are interested in CPUs, FPGAs, Quantum Computing and beyond, subscribe and hit the bell to get new episode notifications. Chapters: 0:00 CPUs Are Everywhere 0:52 Meet Boyd Phelps, CVP of Client Engineering 1:58 Topics We're Covering 2:32 What Is A CPU? 5:39 CPU Architecture History 6:40 Bug Aside 7:30 Back to CPU History 11:13 Computing Abstraction Layers 14:58 Instruction Set Architecture (ISA) 18:28 What's in Part Two? Subscribe now to Intel Technology on YouTube: https://intel.ly/3P9BA7x About Intel Technology: Intel has always been at the forefront of developing exciting new technology for business and consumers including emerging technologies, data center servers, business transformation, memory and storage, security, and graphics. The Intel Technology YouTube channel is a place to learn tips and tricks, get the latest news, and watch product demos from both Intel and our many partners across multiple fields. Connect with Intel Technology: Visit Intel Technologies WEBSITE: https://intel.ly/IntelTechnologies Follow Intel Technology on TWITTER: https://twitter.com/IntelTech Architecture All Access: Modern CPU Architecture Part 1 – Key Concepts | Intel Technology https://www.youtube.com/inteltechnology
Architecture All Access: Modern CPU Architecture 2 - Microarchitecture Deep Dive | Intel Technology
25:34

Architecture All Access: Modern CPU Architecture 2 - Microarchitecture Deep Dive | Intel Technology

What is a CPU microarchitecture and what are the building blocks inside a CPU? Boyd Phelps, CVP of Client Engineering at Intel, takes us through key microarchitecture concepts like pipelines, speculation, branch prediction as well as the main building blocks in the front and back end of a CPU. Want to learn about the history of CPU architecture? Check out part one here: https://youtu.be/vgPFzblBh7w Boyd Phelps has worked on some of the most well-known chip designs in Intel’s history, from Nehalem to Haswell to Tiger Lake and more. Architecture All Access is a master class technology series featuring Senior Intel Technical Leaders taking an educational approach to the historical impact and future innovations of key architectures that will continue to be at the center of ‘world-changing technology that enriches the lives of every person on earth.’ If you are interested in CPUs, FPGAs, Quantum Computing and beyond, subscribe and hit the bell to get new episode notifications. Chapters: 0:00 Welcome to CPU Architecture Part 2 0:51 Meet Boyd Phelps, CVP of Client Engineering 1:14 What Are We Covering? 1:47 Key Building Blocks in a CPU 3:16 Pipeline Depth 5:15 Speculation 7:06 Branch Prediction 7:35 Speculative Execution 12:48 The Microprocessor Front End: Predict and Fetch 14:44 The Microprocessor Front End: Decode 17:19 Superscalar Execution 18:36 Out-Of-Order 19:57 CPU Back End 23:35 Micro-Architecture Summary 24:39 Where Are We Headed? Subscribe now to Intel Technology on YouTube: https://intel.ly/3P9BA7x About Intel Technology: Intel has always been at the forefront of developing exciting new technology for business and consumers including emerging technologies, data center servers, business transformation, memory and storage, security, and graphics. The Intel Technology YouTube channel is a place to learn tips and tricks, get the latest news, and watch product demos from both Intel and our many partners across multiple fields. Connect with Intel Technology: Visit Intel Technologies WEBSITE: https://intel.ly/IntelTechnologies Follow Intel Technology on TWITTER: https://twitter.com/IntelTech Architecture All Access: Modern CPU Architecture 2 - Microarchitecture Deep Dive | Intel Technology https://www.youtube.com/inteltechnology
How a CPU Works
20:42

How a CPU Works

Learn how the most important component in your device works, right here! Author's Website: http://www.buthowdoitknow.com/ See the Book: http://amzn.to/1mOYJvA (As of 2024-01-15, all videos on this channel are under the CC0 license (very similar to Public Domain). Feel free to download and repost without compensation, attribution, or notice.) https://creativecommons.org/public-domain/cc0/ See scripts for future videos here: https://github.com/In-One-Lesson/VideoScripts See the 6502 CPU Simulation: http://visual6502.org/JSSim/index.html Download the PowerPoint file used to make the video: https://docs.google.com/presentation/d/0BzwHNpicSnW0cGVmX0c3SVZzMFk/edit?usp=sharing&ouid=116531966426337918883&resourcekey=0-N0P5hrS6vx3En8ifQ-shGA&rtpof=true&sd=true The CPU design used in the video is copyrighted by John Scott, author of the book But How Do It Know?. There are a few small differences between the CPU in the video and the one used in the book. Those differences are listed below but they should not detract from your understanding of either. CONTROL UNIT - This component is called the Control Section in the book. It is called Control Unit here simply because that is a more common name for it that you might see used elsewhere. LOAD INSTRUCTION - In this video, what's called a LOAD instruction is actually called a DATA instruction in the book. The Scott CPU uses two different instructions to move data from RAM into the CPU. One loads the very next piece of data (called a DATA instruction in the book) and the other uses another register to tell it which address to pull that data from (called a LOAD instruction in the book). The instruction was renamed in the video for two reasons: 1) It might be confusing to hear that the first type of data we encounter in RAM is itself also called DATA. 2) Since the LOAD instruction from the book is a more complex concept, it was easier to use the DATA instruction in the video to introduce the concept of moving data from RAM to the CPU . IN and OUT INSTRUCTIONS - In the Scott CPU, there is more involved in moving data between the CPU and external devices than just an IN or an OUT instruction. That process was simplified in the video to make the introduction of the concept easier. ACCUMULATOR - The register that holds the output of the ALU is called the Accumulator in the book. That is the name typically used for this register, although it was simply called a register in the video. MEMORY ADDRESS REGISTER - The Memory Address Register is a part of RAM in the book, but it is a part of the CPU in the video. It was placed in the CPU in the video as this is generally where this register resides in real CPUs. JUMP INSTRUCTIONS - In the book there are two types of unconditional JUMP instructions. One jumps to the address stored at the next address in RAM (this is the one used in the video) and the other jumps to an address that has already been stored in a register. These are called JMP and JMPR instructions in the book respectively. MISSING COMPONENT - There is an additional component missing from the CPU in the video that is used to add 1 to the number stored in a register. This component is called "bus 1" in the book and it simply overrides the temporary register and sends the number 1 to the ALU as input B instead. REVERSED COMPONENTS - The Instruction Register and the Instruction Address Register are in opposite positions in the diagrams used in the book. They are reversed in the video because the internal wiring of the control unit will be introduced in a subsequent video and keeping these registers in their original positions made that design process more difficult. OP CODE WIRING - The wires used by the control unit to tell the ALU what type of operation to perform appear near the bottom of the ALU in the video, but near the top of the ALU in the book. They were reversed for a similar reason as the one listed above. The wiring of the ALU will be introduced in a subsequent video and keeping these wires at the top of the ALU made the design process more difficult.
How Amateurs created the world’s most popular Processor (History of ARM Part 1)
18:11

How Amateurs created the world’s most popular Processor (History of ARM Part 1)

Bonus videos and a Nebula discount: https://go.nebula.tv/lowspecgamer A new computer company based in the UK is looking for talent and stumbles upon the most popular microprocessor ever created. Events slightly adjusted or exaggerated for narrative (or dramatic) purpose. Sidequest bonus content: How the BBC Micro failed in America: https://nebula.tv/videos/lowspecgamer-how-the-bbc-micro-failed-in-america Interview with Steve Furber: https://nebula.tv/videos/lowspecgamer-full-interview-with-steve-furber Social media: https://twitter.com/lowspec_gamer https://www.instagram.com/thelowspecgamer Credits Research and Writing: LowSpecAlex Voice over: LowSpecAlex Editing: Zave Davey, LowSpecAlex Audio Editing: Susmit Gupta 3D animation: Windy, Divye Art by Maiku No Koe: https://twitter.com/maiku_no_koe Spanish Translation, Audio editing and revision: Henrique von Buren Camera work: Victor Candela, F4mi and LowSpecAlex Dubbed by: https://twitter.com/JesusHDoblaje Thumbnail design: Maiku no Koe Special VA guest: @DanStormVO @MedlifeCrisis https://www.youtube.com/channel/UC2PA-AKmVpU6NKCGtZq_rKQ Music by Epidemic Sound: http://epidemicsound.com/creator Stock Footage from Getty Sources: http://archive.computerhistory.org/resources/access/text/2015/04/102739951-05-01-acc.pdf https://archive.computerhistory.org/resources/access/text/2012/05/102746196-05-01-acc.pdf https://archive.computerhistory.org/resources/access/text/2012/06/102746190-05-01-acc.pdf
bottom of page