Estimation de Performances Multicritères pour les Systèmes sur Puce (SoC) Jean Luc Dekeyser
Motivations: Tendance des produits Diverse fonctionnalités, rapide, petit, pas cher… Time-to-prototype Time-to-market Flexibilité (Maintainability) Faible puissance/ Dissipation thermique (durée de vie des batteries) Coût de production Adaptation rapide avec les nouveaux standards Fiabilité, sécurité… Les SoC d’aujourd'hui doivent répondre à ces paradigmes!!!
ARM PrimeXsys Wireless platform: Standard SoC Kernel based on ARM926EJ-S Source: ©ARM
Triscend A7 CSoC ARM7TDMI + FPGA Source: ©Triscend
ASIP: reconfigurable microprocessor Tensilica Xtensa Source: ©Tensilica
Motivations: Plateformes cibles Scheduling/Arbitration proportional share WFQ static dynamic fixed priority EDF TDMA FCFS Communication Templates Computation Templates DSP mE Cipher SDRAM RISC FPGA LookUp Quelle architecture est adéquate pour notre application? Architecture # 1 Architecture # 2 LookUp RISC EDF mE mE mE TDMA static Priority mE mE mE WFQ Cipher DSP
Exploration de l’espace de solutions Application Architecture Mapping Analysis Cette méthodologie peut se faire à différents niveaux d’abstraction.
Analyse multicritères du système Temps d’exécution (fréquence) Consommation d’énergie (ou puissance) Surface en silicium (transistors) Coût Ces critères peuvent être estimés à différents niveaux d’abstraction Des outils académiques et industriels sont développés pour estimer chaque critère.
Analyse multicritères du système Adéquation Application/Architecture: optimisation multi-objective Trouver un ensemble de trade-offs: Temps, puissance, taille, coût…
Semiconductor Industry Roadmap
Evolution du nombre de transistor
Power density too high to keep junctions at low temp Densité de Puissance 4004 8008 8080 8085 8086 286 386 486 Pentium® proc P6 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Power Density (W/cm2) Rocket Nozzle Nuclear Reactor Hot Plate Power density too high to keep junctions at low temp Courtesy, Intel
Nombre de transistors/Productivité 1981 leading edge chip required 100 designer months 10,000 transistors / 100 transistors/month 2002 leading edge chip requires 30,000 designer months 150,000,000 / 5000 transistors/month Designer cost increase from $1M to $300M 10,000 1,000 100 10 1 0.1 0.01 0.001 Logic transistors per chip (in millions) 100,000 1000 Productivity (K) Trans./Staff-Mo. 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 IC capacity productivity Gap
Puissance vs. Nombre de transistor W 20 40 60 80 100 120 140 2003 2005 2007 2009 2012 2015 2018 10 30 50 taille des Tr en nm F en Ghz année W année Prévisions ITRS 2003 année
Niveaux d’abstraction pour la simulation Abstraction Level Objectives Functional Application TLM Transaction Level Modeling Communicants Process (CP) Syst description.= comm process, Data exchange between functions. Programmer View (PV) Defined architecture. Functional verification. Communication with channels Cycle Accurate and/or Bit accurate Cycle Accurate* Archi, pipeline, … Précis au bit (CABA)* Communication protocol. RTL : Register Transfer Level Implementation details: functional units, logic gates Accuracy Speed up
Techniques d’estimation de performances Emulation: plateforme réelle existante Totalement reconfigurable/ Partiellement reconfigurable Exemple: plateforme FPGA, ALTERA, XILINX… Mesures directes des performances: temps d’exécution, consommation, surface... Simulation: plateforme non existante Description du système Différents niveaux: RTL (Register Transfer Level), CABA (Cycle Accurate Bit Accurate), TLM (Transaction Level Modeling) et Functional Level. Différents langages de description: VHDL, SystemC, Verilog…
ou Reconfiguration (VHDL) Emulation Programme asm ou C ou Reconfiguration (VHDL) Analyse A Mesure de temps Calcul d’énergie
Niveaux d’implémentation d’un composant SYSTEM Accuracy Speed up MODULE + GATE CIRCUIT Vout Vin DEVICE n+ S D G
Simulation au niveau physique Reflect the actual circuit layout, include geometric information, cannot be simulated directly: behavior can be deduced by correlating the layout model with a behavioral description at a higher level or by extracting circuits from the layout. Length of wires and capacitances frequently extracted from the layout, back-annotated to descriptions at higher levels (more precision for delay and power estimations).
Simulation au niveau physique: exemple din powlo powhi dout © Mosis (http://www. mosis.org/Technical/ Designsupport/ polyflowC.html); Tool: Cadence
Simulation au niveau transistor using analog simulator (SPICE) Input: Models (transistor, gates, macro) Textual netlist (schematic, extracted layout, behavioral) Output: Circuit response (waveforms, patterns) Time domain Frequency domain Power analysis
Simulation au niveau transistor: exemple
Simulation au niveau porte logic Models contain gates as the basic components. Provide accurate information about signal transition probabilities and can therefore also be used for power estimations. Delay calculations can be more precise than for the RTL. Typically no information about the length of wires (still estimates). Term sometimes also employed to denote Boolean functions (No physical gates; only considering the behavior of the gates). Such models should be called “Boolean function models”.
Simulation au niveau porte logic: Exemple source: http://geda. seul.org/ screenshots/ screenshot-schem2.png
Simulation au niveau RTL At this level, we model all the components at the register- transfer level, including arithmetic/logic units (ALUs), registers, memories, muxes and decoders. Models at this level are always cycle-true. Automatic synthesis from such models is not a major challenge.
Simulation au niveau RTL: exemple Controller B PC Instruction register IR Memory Speicher alu_ control T sign_ extend <<2 4 * ALU Reg 1 2 3 § 31:26 25:21 20:16 25:0 15:0 15:11 i2 a2 a1 i3 a3 a2 a1 o2 o1 PCSource TargetWrite ALUOp ALUSelA ALUSelB RegWrite RegDest MemToReg IRWrite MemRead MemWrite PCWrite PCWriteC IorD § 31: 28 "00“