• Ei tuloksia

7. Conclusions

7.2 Future Development

There is room for improvement and further research in various areas of this work.

First, some other mechanisms can be added to get rid of the external overheads. One is the adoption of a FIFO-like control memory.

CREMA presents interesting research opportunities. Augmenting the flexibility of the template we can significantly extend the application range. For example, adding the support for arithmetic in Galois-field can allow the usage for encryption algo-rithms. The possibility to use a variable data width for different PEs should also be explored. All of these modifications would act on the template without increasing the implementation costs of a specific instantiation of CREMA.

The programming flow can also be improved through adding automatic features to help the application developer, thus reducing the development time. In particular, the programming flow could be extended to a larger range of architectures based on same principles (a processor tightly-coupled with an accelerator).

Finally, the mapping adaptiveness used in CREMA can be applied in a wider scope that extends it to fine-grain architectures. Considering for example a FPGA, one could think of customizing its architecture to the application requirements. This means that we would no longer have a homogeneous array of LUT-based logic blocks.

Each block could have different characteristics (size of the LUT, interconnections) that are fixed according to the application implemented on it. Such a customized FPGA cannot be used anymore for prototyping purposes or as a fast development platform because of its reduced flexibility. However, it can serve as a dedicated ac-celerator that can support a limited number of applications (e.g. FFT, FIR or corre-lations), and the customer decides on-the-field which one of the provided functional-ities he wants to use. Such a dedicated FPGA would have a market narrower than a standard FPGA chip and thus could be a bit more expensive although its development costs could still be less than an ASIC implementation. In addition, it could guaran-tee better performance and energy efficiency than a general-purpose FPGA. Future research is needed to evaluate the real potential of this approach.

no. 1, pp. 38–59, Quarter 2006.

[2] T. Ahonen and J. Nurmi, “Programmable switch for shared bus replacement,”

in Proceedings of the 2006 International Conference on Ph.D. Research in Microelectronics and Electronics (PRIME ’06). IEEE, 11-15 June 2006, pp. 241–244, ISBN: 1-4244-0156-9 (printed), 1-4244-0157-7 (PDF), DOI:

10.1109/RME.2006.1689941.

[3] Altera, “Nios II C-to-Hardware acceleration compiler,”

http://www.altera.com/products/ip/processors/nios2/tools/c2h/ni2-c2h.html, online 15.09.2009.

[4] ——,Stratix II Device Handbook.

[5] A. Amara, F. Amiel, and T. Ea, “FPGA vs. ASIC for low power applications,”

Microelectronics Journal, vol. 37, no. 6, pp. 669–677, August 2006.

[Online]. Available: http://www.sciencedirect.com/science/article/B6V44-4J5C844-1/2/a3f16850202f630ccbb9b29fa2c73f19

[6] C. Arbelo, A. Kanstein, S. L´opez, J. F. L´opez, M. Berekovic, R. Sarmiento, and J.-Y. Mignolet, “Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter,” in DATE ’07:

Proceedings of the conference on Design, automation and test in Europe. San Jose, CA, USA: EDA Consortium, 2007, pp. 177–182.

[7] V. Baumgarte, G. Ehlers, F. May, A. N¨uckel, M. Vorbach, and M. Weinhardt,

“PACT XPP-a Self-Reconfigurable Data Processing Architecture,”The Journal of Supercomputing, vol. 26, no. 2, pp. 167–184, September 2003.

[8] J. Becker, T. Pionteck, and M. Glesner, “Dream : a Dynamically Recon-figurable Architecture for future Mobile communication applications,” Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Com-puting, vol. 1896, pp. 312–321, January 01 2000.

[9] Y. Ben-Asher and N. Rotem, “Synthesis for variable pipelined function units,”

inInternational Symposium on System-on-Chip (SOC’08), 2008.

[10] B. Bougard, D. Novo, F. Naessens, L. Hollevoet, T. Schuster, M. Glassee, A. Dejonghe, and L. van der Perre, “A scalable baseband platform for energy-efficient reactive software-defined-radio,” inProc. 1st International Conference on Cognitive Radio Oriented Wireless Networks and Communications, 8–10 June 2006, pp. 1–5.

[11] C. Brunelli, “Design of hardware accelerators for embedded multimedia ap-plications,” Ph.D. dissertation, Tampere University of Technology (TUT), De-partment of Computer Systems (DCS), Tampere, Finland, December 2008, 94 pages, TUT Publication 778, ISSN: 1459-2045, ISBN: 978-952-15-2076-1.

[12] C. Brunelli, F. Campi, C. Mucci, D. Rossi, T. Ahonen, J. Kylli¨ainen, F. Garzia, and J. Nurmi, “Design space exploration of an open-source, IP-reusable, scal-able floating-point engine for embedded applications,”J. Syst. Archit., vol. 54, no. 12, pp. 1143–1154, 2008.

[13] C. Brunelli, F. Garzia, and J. Nurmi, “A coarse-grain reconfigurable architecture for multimedia applications featuring subword computation capabilities,” Real-Time Image Processing, vol. 3, no. 1-2, pp. 21–32, 2008.

[14] F. Campi, M. Toma, A. Lodi, A. Cappelli, R. Canegallo, and R. Guerrieri,

“A VLIW processor with reconfigurable instruction set for embedded appli-cations,” inSolid-State Circuits Conference, 2003. Digest of Technical Papers.

ISSCC. 2003 IEEE International, 2003, pp. 250–491 vol.1.

[15] D. Chen, J. Cong, Y. Fan, G. Han, W. Jiang, and Z. Zhang, “xPilot: A platform-based behavioral synthesis system,” inSRC TechCon’05, 2005.

Vehicular Technology, IEEE Transactions on, vol. 47, no. 4, pp. 1105–1118, November 1998.

[18] J. Davila, A. de Torres, J. M. Sanchez, M. Sanchez-Elez, N. Bagherzadeh, and F. Rivera, “Design and implementation of a rendering algorithm in a SIMD reconfigurable architecture (Morphosys).” inProceedings of the Conference on Design, Automation and Test in Europe (DATE ’06). European Design and Au-tomation Association, Leuven, Belgium, 6-10 March 2006, pp. 52–57, ISBN:

3-9810801-0-6.

[19] A. DeHon, “The density advantage of configurable computing,” Computer, vol. 33, no. 4, pp. 41–49, Apr 2000.

[20] A. Duller, D. Towner, G. Panesar, A. Gray, and W. Robbins, “picoarray technol-ogy: The tool’s story,” inDATE ’05: Proceedings of the conference on Design, Automation and Test in Europe. Washington, DC, USA: IEEE Computer So-ciety, 2005, pp. 106–111.

[21] F. Ferrandi, M. Santambrogio, and D. Sciuto, “A design methodology for dy-namic reconfiguration: the Caronte architecture,” inParallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, April 2005, pp. 4 pp.–.

[22] P. Frenger, “The ultimate RISC: A zero-instruction computer,”SIGPLAN Not., vol. 35, no. 2, pp. 17–24, 2000.

[23] D. Gajski, “NISC: The ultimate reconfigurable component,” Center for Embed-ded Computer Systems, Tech. Rep., 2003.

[24] P. Gardner, M. Hamid, P. Hall, J. Kelly, F. Ghanem, and E. Ebrahimi, “Re-configurable antennas for cognitive radio: Requirements and potential design approaches,” in Wideband, Multiband Antennas and Arrays for Defence or

Civil Applications, 2008 Institution of Engineering and Technology Seminar on, March 2008, pp. 89–94.

[25] B. Gorjiara and D. Gajski, “Automatic architecture refinement techniques for customizing processing elements,” inDAC ’08: Proceedings of the 45th annual Design Automation Conference. New York, NY, USA: ACM, 2008, pp. 379–

384.

[26] S. Gupta, R. Gupta, N. Dutt, and A. Nicolau,SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits, S. Gupta, Ed. Springer, 2004.

[27] L. Harju, “Programmable receiver architectures for multimode mobile termi-nals.” Ph.D. dissertation, Tampere University of Technology (TUT), Depart-ment of Information Technology, Institute of Digital and Computer Systems (IDCS), Tampere, Finland, August 2006, 160 pages, TUT Publication 604, ISSN: 1459-2045, ISBN: 952-15-1618-6.

[28] R. Hartenstein, M. Herz, T. Hoffmann, and U. Nageldinger, “Using the kress-array for reconfigurable computing,” inProc. SPIE, 1998, pp. 150–161.

[29] J. R. Hauser and J. Wawrzynek, “Garp: A MIPS processor with a reconfigurable coprocessor,” inIEEE Symposium on FPGAs for Custom Computing Machines, 1997, pp. 12–21.

[30] Y.-M. Joo and N. McKeown, “Doubling memory bandwidth for network buffers,” inINFOCOM ’98. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 2, Mar-2 Apr 1998, pp. 808–815 vol.2.

[31] A. H. Kamalizad, C. Pan, and N. Bagherzadeh, “Fast Parallel FFT on a Re-configurable Computation Platform,” in Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’03).

IEEE, 2003.

[32] D. Kim, R. Managuli, and Y. Kim, “Data cache and direct memory access in programming mediaprocessors,” Micro, IEEE, vol. 21, no. 4, pp. 33–42, Jul/Aug 2001.

[34] E. Kusse and J. Rabaey, “Low-energy embedded FPGA structures,” in

’98: Proceedings of the 1998 international symposium on Low power electron-ics and design. New York, NY, USA: ACM, 1998, pp. 155–160.

[35] J. Kylli¨ainen, T. Ahonen, and J. Nurmi, “General-purpose embedded processor cores - the COFFEE RISC example,” in Processor Design: System-on-Chip Computing for ASICs and FPGAs, J. Nurmi, Ed. Kluwer Academic Publishers / Springer Publishers, June 2007, ch. 5, pp. 83–100, ISBN-10: 1402055293, ISBN-13: 978-1-4020-5529-4.

[36] W. H. Lee and J. H. Kim, “H.264 implementation with embedded reconfig-urable architecture,” inThe 6th IEEE International Conference on Computer and Information Technology (CIT ’06), Sept. 2006, pp. 247–247.

[37] F. Lertora and M. Borgatti, “Handling different computational granularity by a reconfigurable IC featuring embedded FPGAs and a network-on-chip,” inThe 13th Annual IEEE Symposium on Field-Programmable Custom Computing Ma-chines (FCCM 2005), April 2005, pp. 45–54.

[38] D. Lim and M. Peattie, “Two flows for partial reconfiguration: Module based or small bit manipulations,” Xilinc Inc.,” Application Note, 17 May 2002, xAPP290 (v1.0).

[39] T. Marescaux, A. Bartic, D. Verkest, S. Vernalde, and R. Lauwereins, “Intercon-nection networks enable fine-grain dynamic multi-tasking on FPGAs,” in Field-Programmable Logic and Applications: Reconfigurable Computing Is Going Mainstream, ser. Lecture Notes in Computer Science, 2002, pp. 741–763.

[40] A. Marshall, T. Stansfield, I. Kostarnov, J. Vuillemin, and B. Hutchings, “A re-configurable arithmetic array for multimedia applications,” inFPGA ’99: Pro-ceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays. New York, NY, USA: ACM, 1999, pp. 135–143.

[41] B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, “ADRES: An architecture with tightly-coupled VLIW processor and coarse-grained recon-figurable matrix,”Field-Programmable Logic and Applications, vol. 2778, pp.

61–70, September 2003, iSBN978-3-540-40822-2.

[42] Microsoft Corporation,Microsoft DirectX 9 Programmable Graphics Pipeline.

Redmond, WA, USA: Microsoft Press, 2003.

[43] T. Miyamori and K. Olukotun, “REMARC: Reconfigurable multimedia array coprocessor,” inProceedings of ACM/SIGDA FPGA98. ACM, February 1998.

[44] MORPHEUS Consortium, “Multi-purpose dynamically reconfigurable plat-form for intensive heterogeneous processing,” http://www.morpheus-ist.org, online 15.09.2009.

[45] NISC, “NISC technology and toolset,” http://www.ics.uci.edu/online 15.09.2009.

[46] OpenGL Architecture Review Board, D. Shreiner, M. Woo, J. Neider, and T. Davis, OpenGL(R) Programming Guide : The Official Guide to Learning OpenGL(R), Version 2 (5th Edition). Addison-Wesley Professional, August 2005. [On-line]. Available: http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0321335732

[47] picoChip, “PicoArray Architecture,” http://picochip.com/prod and technolo-gy/picoarray architecture, July 2009, online 24.09.2009.

[48] Y. Qu, J.-P. Soininen, and J. Nurmi, “A genetic algorithm for scheduling tasks onto dynamically reconfigurable hardware,” inCircuits and Systems, 2007. IS-CAS 2007. IEEE International Symposium on, May 2007, pp. 161–164.

[49] ——, “Static scheduling techniques for dependent tasks on dynamically recon-figurable devices,”J. Syst. Archit., vol. 53, no. 11, pp. 861–876, 2007.

[50] ——, “Improving the efficiency of run time reconfigurable devices by config-uration locking,” inDATE ’08: Proceedings of the conference on Design, au-tomation and test in Europe. New York, NY, USA: ACM, 2008, pp. 264–267.

chitecture based on Linux,” inParallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, March 2007, pp. 1–8.

[53] G. K. Rauwerda, P. M. Heysters, and G. J. M. Smit, “Towards Software De-fined Radios using Coarse-grained Reconfigurable Hardware,” IEEE Trans.

Very Large Scale Integr. Syst., vol. 16, no. 1, pp. 3–13, 2008.

[54] T. Ristim¨aki, “Reconfigurable IP blocks: a MIMD approach,” Ph.D. disser-tation, Tampere University of Technology(TUT), Department of Information Technology, Institute of Digital and Computer Systems (IDCS), Tampere, Fin-land, 2005.

[55] A. Rivaton, J. Quevremont, Q. Zhang, P. Wolkotte, and G. Smit, “Implementing Non Power-of-Two FFTs on Coarse-Grain Reconfigurable Architectures,” in Proceedings of the International Symposium on System-on-Chip (SoC 2005).

IEEE, 14-17 November 2005, pp. 82–85.

[56] H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. C.

Filho, “MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications.”IEEE Trans. Computers, vol. 49, no. 5, pp.

465–481, 2000.

[57] G. J. M. Smit, A. B. J. Kokkeler, P. T. Wolkotte, P. K. F. Holzenspies, M. D.

van de Burgwal, and P. M. Heysters, “The chameleon architecture for streaming DSP applications,”EURASIP Journal on Embedded Systems, vol. 2007, p. 10, 2007.

[58] G. Smit, P. Heysters, M. Rosien, and B. Molenkamp, “Lessons Learned from Designing the MONTIUM - a Coarse-grained Reconfigurable Processing Tile,”

inProc. International Symposium on System-on-Chip, 2004, pp. 29–32.

[59] Software Defined Radio Forum. SDR Forum - promoting the success of next generation radio technologies. [Online]. Available: http://www.sdrforum.org/

[60] S. Stritter and N. Tredennick, “Microprogrammed implementation of a single chip microprocessor,” inMICRO 11: Proceedings of the 11th annual workshop on Microprogramming. Piscataway, NJ, USA: IEEE Press, 1978, pp. 8–16.

[61] Tampere University of Technology, “COFFEE RISC Core Project,”

http://coffee.tut.fi, online 15.09.2009.

[62] M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal, “The raw microprocessor: A computational fabric for software circuits and general-purpose programs,”IEEE Micro, vol. 22, no. 2, pp. 25–35, 2002.

[63] F. Thoma, M. Kuhnle, P. Bonnot, E. Panainte, K. Bertels, S. Goller, A. Schnei-der, S. Guyetant, E. Schuler, K. Muller-Glaser, and J. Becker, “Morpheus: Het-erogeneous reconfigurable computing,” inField Programmable Logic and Ap-plications, 2007. FPL 2007. International Conference on, Aug. 2007, pp. 409–

414.

[64] J. Thorvinger, “Dynamic partial reconfiguration of an FPGA for computational hardware support,” Master’s thesis, Lund Institute of Technology, 2004.

[65] University of Paderborn, “The raptor family,” www.raptor2000.de, 2007, online 15.09.2009.

[66] M. M. Vai and H. T. Nguyen,High Performance Embedded Computing Hand-book: A Systems Perspective. CRC, 2008, ch. Implementation Approaches of Front-End Processors, pp. 173–190.

[67] S. Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, and E. Panainte, “The MOLEN polymorphic processor,” Computers, IEEE Trans-actions on, vol. 53, no. 11, pp. 1363 – 1375, Nov 2004.

[68] T. von Sydow, B. Neumann, H. Blume, and T. G. Noll, “Quantitative analysis of embedded FPGA architectures for arithmetic,” inApplication-specific Systems, Architectures and Processors, 2006. ASAP ’06. International Conference on, Sept. 2006, pp. 125–131.

ogy, IEEE Transactions on, vol. 13, no. 7, pp. 560–576, July 2003.

[71] Xilinx, “Electronic system level design ecosystem,” http://www.xilinx.com/esl, online 15.09.2009.