CEVA, the leading licensor of signal processing IP for smarter, connected devices, has introduced a new DSP-based offering bringing deep learning and Artificial Intelligence (AI) capabilities to low-power embedded systems. A comprehensive, scalable, integrated hardware and software silicon IP platform that is centered around a new imaging and vision DSP – the CEVA-XM6 – allows developers to efficiently harness the power of neural networks and machine vision for smartphones, autonomous vehicles, surveillance, robots, drones and other camera-enabled smart devices.
A video introducing the CEVA-XM6 can be viewed on the CEVA YouTube Channel.
Compared to the previous generation CEVA-XM4 intelligent vision DSP, the new CEVA-XM6-based vision platform delivers up to 8x higher performance for neural network workloads and up to 3x performance improvement across all computer vision kernels. Key enhancements introduced in the new architecture include a new vector and scalar processing units and substantial enhancements to instruction set, memory bandwidth and direct memory access (DMA).
The new vision platform further extends CEVA’s performance advantage over leading GPU-based architectures when implementing neural networks. Compared to a leading GPU-based embedded system for computer vision and deep learning, CEVA’s latest imaging and vision platform delivers more than 25x performance-per-watt efficiency and 4x faster processing for convolutional neural networks (CNNs) such as AlexNet and GoogLeNet.
Ilan Yona, vice president and general manager of the Vision Business Unit at CEVA, commented, “As computer vision and deep learning technologies become mainstream, there is a need to bridge the gap between the deep neural networks that are being generated by power-consuming GPU engines and the ability to deploy these in power- and performance-constrained embedded applications. Our new vision platform excels in this regard, providing developers with the most comprehensive set of technologies to rapidly address these embedded use-cases.”
The vision platform integrates a wealth of software and hardware IPs that provide time-to-market and power advantages for deploying machine vision and deep learning in embedded systems. Alongside the CEVA-XM6 DSP itself, the platform includes function-specific accelerators for CNN and image de-warp (for all types of image transformations), CEVA’s highly-acclaimed CDNN2 neural network software framework, OpenCV, OpenCL and OpenVX APIs, CEVA-CV computer vision library and a set of optimised, widely used algorithms.
Commenting on the launch, Jeff Bier, founder, Embedded Vision Alliance, explained, “Designers of a wide range of end-products are eager to incorporate visual intelligence into their designs. Often, the vision and deep learning algorithms used by these developers demand very high processing performance in a low-cost, low-power, programmable form. I applaud CEVA for its long-term commitment to delivering processors and software tools focused on meeting these needs.”
Built on the strong foundations of the CEVA-XM4 and CEVA-MM3101 processors with over twenty-five design wins to date, the CEVA-XM6 introduces a range of architectural innovations and enhancements that deliver breakthrough performance for neural networks and advanced computer vision processing. These include:
· Innovative vector processor unit (VPU) architecture – ensuring above 95 per cent MAC utilisation, which is unmatched in the industry today.
· Enhanced Parallel Scatter-Gather Memory Load Mechanism – further improving the performance of vision algorithms, including SLAM and depth mapping.
· Sliding Window 2.0 – This patented mechanism takes advantage of pixel overlap in image processing and helps to achieve higher utilisation for a wider variety of neural networks and cope with the increasing complexity of these networks.
· Optional 32-way SIMD vector floating-point unit that includes the IEEE half precision standard (FP16) and major non-linear operations enhancements.
· Other improvements include an enhanced 3D data processing scheme for accelerated CNN performance, a 50 per cent improvement in control code performance versus the CEVA-XM4, a new scalar unit which further reduces code size, multi-core and system integration support.
In addition to the CEVA-XM6 DSP, other key components of the vision platform include:
· CDNN accelerator – the 16-bit CDNN accelerator delivers 512 MACs/cycle, ensuring best-in-class performance to handle today’s most complex neural networks. The CDNN accelerator also serves to free up the 256 MAC units in the CEVA-XM6 DSP, allowing additional computer vision tasks to run in parallel. This flexible approach makes the CEVA-XM6 working together with the CDNN accelerator an optimal architecture to support new imaging algorithms, network structures, and changing layer types experienced in the quickly evolving deep learning space.
· Image de-warp accelerator – For wide angle camera applications, such as 360 degree cameras, the image de-warp accelerator supports the ARM Frame Buffer Compression (AFBC) protocol, for best system interoperability.
· Accelerator-aware complementary software – runs on the CEVA-XM6 DSP for efficient accelerator utilisation allowing developers further differentiate their product designs.
· CDNN2 software framework – Optimised and works in conjunction with the CEVA-XM6 and the accelerators, developers can easily use this platform to generate and port their proprietary neural networks to the CEVA-XM6. This enables a significant acceleration in performance utilizing the most sophisticated, latest network topologies and layers, including support for Caffe and TensorFlow, Google’s software library for machine learning.
· ISO 26262 active safety compliance and safety package deliverables – supports the needs of next generation ADAS and automated driving solutions for automotive use cases.
CEVA’s CDNN2 software framework is optimized for both the CEVA-XM6 and the CDNN accelerator and fully supports 16-bit fixed-point precision, ensuring less than 1 per cent accuracy degradation when running a network that was trained in a 32-bit floating-point environment. This is critical when transitioning neural networks from R&D into cost- and power-efficient solutions that target high volume automotive and consumer applications.