In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product uni...In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product unit and add-subtract unit. In these arithmetic units, operations are performed over complex data values. A modified fused floating-point two-term dot product and an enhanced model for the Radix-4 FFT butterfly unit are proposed. The modified fused two-term dot product is designed using Radix-16 booth multiplier. Radix-16 booth multiplier will reduce the switching activities compared to Radix-8 booth multiplier in existing system and also will reduce the area required. The proposed architecture is implemented efficiently for Radix-4 decimation in time(DIT) FFT butterfly with the two floating-point fused arithmetic units. The proposed enhanced architecture is synthesized, implemented, placed and routed on a FPGA device using Xilinx ISE tool. It is observed that the Radix-4 DIT fused floating-point FFT butterfly requires 50.17% less space and 12.16% reduced power compared to the existing methods and the proposed enhanced model requires 49.82% less space on the FPGA device compared to the proposed design. Also, reduced power consumption is addressed by utilizing the reusability technique, which results in 11.42% of power reduction of the enhanced model compared to the proposed design.展开更多
Arithmetic Logic Unit(ALU) as one of the main parts of any computing hardware plays an important role in digital computers. In quantum computers which can be realized by reversible logics and circuits, reversible ALUs...Arithmetic Logic Unit(ALU) as one of the main parts of any computing hardware plays an important role in digital computers. In quantum computers which can be realized by reversible logics and circuits, reversible ALUs should be designed. In this paper, we proposed three different designs for reversible 1-bit ALUs using our proposed 3×3 and 4×4 reversible gates called MEB3 and MEB4(Moallem Ehsanpour Bolhasani) gates, respectively. The first proposed reversible ALU consists of six logical operations. The second proposed ALU consists of eight operations, two arithmetic, and six logical operations. And finally, the third proposed ALU consists of sixteen operations, four arithmetic operations, and twelve logical operations. Our proposed ALUs can be used to construct efficient quantum computers in nanotechnology, because the proposed designs are better than the existing designs in terms of quantum cost, constant input, reversible gates used, hardware complexity, and functions generated.展开更多
A design of a high-speed multi-core processor with compact size is a trending approach in the Integrated Circuits(ICs)fabrication industries.Because whenever device size comes down into narrow,designers facing many po...A design of a high-speed multi-core processor with compact size is a trending approach in the Integrated Circuits(ICs)fabrication industries.Because whenever device size comes down into narrow,designers facing many power den-sity issues should be reduced by scaling threshold voltage and supply voltage.Initially,Complementary Metal Oxide Semiconductor(CMOS)technology sup-ports power saving up to 32 nm gate length,but further scaling causes short severe channel effects such as threshold voltage swing,mobility degradation,and more leakage power(less than 32)at gate length.Hence,it directly affects the arithmetic logic unit(ALU),which suffers a significant power density of the scaled multi-core architecture.Therefore,it losses reliability features to get overheating and increased temperature.This paper presents a novel power mini-mization technique for active 4-bit ALU operations using Fin Field Effect Tran-sistor(FinFET)at 22 nm technology.Based on this,a diode is directly connected to the load transistor,and it is active only at the saturation region as a function.Thereby,the access transistor can cutoff of the leakage current,and sleep transis-tors control theflow of leakage current corresponding to each instant ALU opera-tion.The combination of transistors(access and sleep)reduces the leakage current from micro to nano-ampere.Further,the power minimization is achieved by con-necting the number of transistors(6T and 10T)of the FinFET structure to ALU with 22 nm technology.For simulation concerns,a Tanner(T-Spice)with 22 nm technology implements the proposed design,which reduces threshold vol-tage swing,supply power,leakage current,gate length delay,etc.As a result,it is quite suitable for the ALU architecture of a high-speed multi-core processor.展开更多
Two optimization technologies, namely, bypass and carry-control optimization, were demonstrated for enhancing the performance of a bit-slice Arithmetic Logic Unit (ALU) in 2n-bit Rapid Single-Flux-Quantum (RSFQ) micro...Two optimization technologies, namely, bypass and carry-control optimization, were demonstrated for enhancing the performance of a bit-slice Arithmetic Logic Unit (ALU) in 2n-bit Rapid Single-Flux-Quantum (RSFQ) microprocessors. These technologies can not only shorten the calculation time but also solve data hazards. Among them, the proposed bypass technology is applicable to any 2n-bit ALU, whether it is bit-serial, bit-slice or bit-parallel. The high performance bit-slice ALU was implemented using the 6 kA/cm^(2) Nb/AlOx/Nb junction fabrication process from Superconducting Electronics Facility of Shanghai Institute of Microsystem and Information Technology. It consists of 1693 Josephson junctions with an area of 2.46 0.81 mm^(2). All ALU operations of the MIPS32 instruction set are implemented, including two extended instructions, i.e., addition with carry (ADDC) and subtraction with borrow (SUBB). All the ALU operations were successfully obtained in SFQ testing based on OCTOPUX and the measured DC bias current margin can reach 86% - 104%. The ALU achieves a 100 utilization rate, regardless of carry/borrow read-after-write correlations between instructions.展开更多
Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance ...Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing.However,as video/image processing and machine learning algorithms are widespread,the technique of approximate computing in these applications has become a hot topic.The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms,and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality.This paper presents a survey of approximate computing from arithmetic units design to high-level applications,in which we try to give researchers a comprehensive and insightful understanding of approximate computing.We believe that approximate computing will play an important role in the circuit and system design in the future,especially with the rapid development of artificial intelligence algorithms and their related applications.展开更多
该文提出了一种应用于移动顶点处理器的高性能低功耗定点特殊函数运算单元电路。该运算单元支持嵌入式图形标准OpenGL ES 1.X的定点数据格式,并支持小数点后16位精度的倒数、均方根、倒数均方根、对数和指数等初等函数运算。初等函数采...该文提出了一种应用于移动顶点处理器的高性能低功耗定点特殊函数运算单元电路。该运算单元支持嵌入式图形标准OpenGL ES 1.X的定点数据格式,并支持小数点后16位精度的倒数、均方根、倒数均方根、对数和指数等初等函数运算。初等函数采用分段二次多项式插值方法近似计算,系数处理中引入2运算电路,相对于传统的设计在相同的精度下使整体的二次多项式查找表大小减少了29%。优化二次多项式插值算法的计算误差和截断误差,使电路的查找表大小、平方器、乘法器和加法器的面积、速度达到最优。该电路采用0.18μm的CMOS工艺实现,面积为0.112 mm2,芯片时钟频率达到300 MHz,功耗仅为12.8 mW。测试结果表明该定点特殊函数运算单元非常适合移动图形顶点处理器的初等函数计算应用。展开更多
文摘In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product unit and add-subtract unit. In these arithmetic units, operations are performed over complex data values. A modified fused floating-point two-term dot product and an enhanced model for the Radix-4 FFT butterfly unit are proposed. The modified fused two-term dot product is designed using Radix-16 booth multiplier. Radix-16 booth multiplier will reduce the switching activities compared to Radix-8 booth multiplier in existing system and also will reduce the area required. The proposed architecture is implemented efficiently for Radix-4 decimation in time(DIT) FFT butterfly with the two floating-point fused arithmetic units. The proposed enhanced architecture is synthesized, implemented, placed and routed on a FPGA device using Xilinx ISE tool. It is observed that the Radix-4 DIT fused floating-point FFT butterfly requires 50.17% less space and 12.16% reduced power compared to the existing methods and the proposed enhanced model requires 49.82% less space on the FPGA device compared to the proposed design. Also, reduced power consumption is addressed by utilizing the reusability technique, which results in 11.42% of power reduction of the enhanced model compared to the proposed design.
文摘Arithmetic Logic Unit(ALU) as one of the main parts of any computing hardware plays an important role in digital computers. In quantum computers which can be realized by reversible logics and circuits, reversible ALUs should be designed. In this paper, we proposed three different designs for reversible 1-bit ALUs using our proposed 3×3 and 4×4 reversible gates called MEB3 and MEB4(Moallem Ehsanpour Bolhasani) gates, respectively. The first proposed reversible ALU consists of six logical operations. The second proposed ALU consists of eight operations, two arithmetic, and six logical operations. And finally, the third proposed ALU consists of sixteen operations, four arithmetic operations, and twelve logical operations. Our proposed ALUs can be used to construct efficient quantum computers in nanotechnology, because the proposed designs are better than the existing designs in terms of quantum cost, constant input, reversible gates used, hardware complexity, and functions generated.
文摘A design of a high-speed multi-core processor with compact size is a trending approach in the Integrated Circuits(ICs)fabrication industries.Because whenever device size comes down into narrow,designers facing many power den-sity issues should be reduced by scaling threshold voltage and supply voltage.Initially,Complementary Metal Oxide Semiconductor(CMOS)technology sup-ports power saving up to 32 nm gate length,but further scaling causes short severe channel effects such as threshold voltage swing,mobility degradation,and more leakage power(less than 32)at gate length.Hence,it directly affects the arithmetic logic unit(ALU),which suffers a significant power density of the scaled multi-core architecture.Therefore,it losses reliability features to get overheating and increased temperature.This paper presents a novel power mini-mization technique for active 4-bit ALU operations using Fin Field Effect Tran-sistor(FinFET)at 22 nm technology.Based on this,a diode is directly connected to the load transistor,and it is active only at the saturation region as a function.Thereby,the access transistor can cutoff of the leakage current,and sleep transis-tors control theflow of leakage current corresponding to each instant ALU opera-tion.The combination of transistors(access and sleep)reduces the leakage current from micro to nano-ampere.Further,the power minimization is achieved by con-necting the number of transistors(6T and 10T)of the FinFET structure to ALU with 22 nm technology.For simulation concerns,a Tanner(T-Spice)with 22 nm technology implements the proposed design,which reduces threshold vol-tage swing,supply power,leakage current,gate length delay,etc.As a result,it is quite suitable for the ALU architecture of a high-speed multi-core processor.
基金Strategic Priority Research Program of Chinese Academy of Sciences,under Grant XDA18000000.
文摘Two optimization technologies, namely, bypass and carry-control optimization, were demonstrated for enhancing the performance of a bit-slice Arithmetic Logic Unit (ALU) in 2n-bit Rapid Single-Flux-Quantum (RSFQ) microprocessors. These technologies can not only shorten the calculation time but also solve data hazards. Among them, the proposed bypass technology is applicable to any 2n-bit ALU, whether it is bit-serial, bit-slice or bit-parallel. The high performance bit-slice ALU was implemented using the 6 kA/cm^(2) Nb/AlOx/Nb junction fabrication process from Superconducting Electronics Facility of Shanghai Institute of Microsystem and Information Technology. It consists of 1693 Josephson junctions with an area of 2.46 0.81 mm^(2). All ALU operations of the MIPS32 instruction set are implemented, including two extended instructions, i.e., addition with carry (ADDC) and subtraction with borrow (SUBB). All the ALU operations were successfully obtained in SFQ testing based on OCTOPUX and the measured DC bias current margin can reach 86% - 104%. The ALU achieves a 100 utilization rate, regardless of carry/borrow read-after-write correlations between instructions.
基金supported by the Fundamental Research Funds for the Central Universities of China under Grant No.BLX202015Beijing Municipal Natural Science Foundation under Grant No.6222038the National Natural Science Foundation of China under Grant No.92164203.
文摘Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing.However,as video/image processing and machine learning algorithms are widespread,the technique of approximate computing in these applications has become a hot topic.The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms,and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality.This paper presents a survey of approximate computing from arithmetic units design to high-level applications,in which we try to give researchers a comprehensive and insightful understanding of approximate computing.We believe that approximate computing will play an important role in the circuit and system design in the future,especially with the rapid development of artificial intelligence algorithms and their related applications.