Swiss Federal Institute of Technology - Microelectronic Systems Laboratory Microelectronic Systems Laboratory Swiss Federal Institute of Technology EPFL CH – 1015 Lausanne Switzerland http://lsm.epfl.ch 3D STACKED MULTI-CORE PROCESSOR PLATFORM WITH IMPROVED TESTABILITY DATE 2012 P. Giovannini G. Beanato A. Cevrero P. Athanasopoulos Y. Leblebici tel: +41 21 693 6955 fax: +41 21 693 6959 [email protected] [email protected] [email protected] [email protected] [email protected] First homogeneous architecture for 3D integrated Multi-Core processor formed by identical KGD chips ABSTRACT 3D MODULAR MULTI-CORE ARCHITECTURE Increased core count. Reduced die area and form factor. Improved core to core communication. No additional design effort. Re-usability of the platform. Simple pre-bond testing. On chip TSV yield measurements. Processing Element : • LEON III 32 bit RISC processor form Gaisler; • 8 KB I-Cache, 32 KB RAM/ROM. Peripheral Subsystem with a 32 KB Shared data memory regulated using a semaphores system. Network-on-Chip (Switch and Network Interfaces NIs). One individual clock domain per layer (PLLs). Data synchronization at the interface (Dual Clock FIFOs). Layer identification signal generated post manufacturing. Platform adaptable for stand alone 2D-CMP, Homogeneous 3D-CMP and further heterogeneous stacking on demand. An innovative modular 3D stacked multi-processor architecture is presented. The platform is composed of identical stacked dies connected together by TSVs. Each die features four 32-bit processors and associated memory modules, interconnected by a 3D NoC, capable of routing packets in the vertical direction. Homogeneous integration minimizes design effort and manufacturing costs, ensuring at the same time high flexibility and re-configurability. Selecting the appropriate number of layers, the platform can target different market segments, being usable as stand alone chip or in 3D stacked fashion. Fully functional samples have been fabricated using a conventional UMC 90nm CMOS process and stacked using a Via-Last Cu-TSV process, developed in-house at EPFL- LSM. Initial results show a target operative frequency of 400 MHz, supporting a vertical data bandwidth of 3.2 Gbps. TSVs redundancy: 2 TSVs for data signals; 3 TSVs for Clock, Layer-ID, Reset. Overall TSVs number 120 Total Power TSVs 54 Total Signal TSVs 66 TSV sizes 40 x 50 μm TSV capacitance 1 pF TSV resistance 0.7 Ω TEST CHIP AND EXPERIMENTAL RESULTS Process technology UMC 90 nm CMOS Die size 4 mm 2 Core footprint 800 x1650 μm Max. operative frequency 400 MHz Vertical data bandwidth 3.2 Gbps 3D TESTABILITY Pre- and post- bonding testability ensured by the homogeneous architecture: • JTAG private modules for each Processing Element and Peripheral; • JTAG multiplexers interface for debug signals management. Parallel JTAG access to all per-layer cores before stacking directly through I/O pads. Serial JTAG testing (Boundary Scan Chain) to all cores of the 3D system no more directly accessible (bottom layers). COPPER TSVs AND REDUNDANCY LOGIC TSV macro: 4 pads for each signal; ESD protection; Logic for boot test and yield statistics collection. Copper TSVs matrix fabricated on testing chip (in-house process) Testing protocol definition Check cores ID-Code Check private ROM content R/W private and shared RAM Binary download & execution Establish scan chain of multiple cores Complete 3D system validation on FPGA emulation. Pre bond verification on single fabricated dies exploiting: • OpenOCD as software debugger; • PCBs and FPGA complex setup integrated on a Probe Station.