Integrated Device Technology (IDT) has announced the development of a compute architecture that is intended to handle the immense data demands of online gaming, high-performance computing and analytics through high-density, low-latency clusters of connected mobile processors. In conjunction with Orange Silicon Valley, IDT has co-developed a massive highly scalable, low-latency cluster of low-power NVIDIA Tegra K1 mobile processors, using the company’s RapidIO interconnect technology to connect multiple nodes at up to 16 Gbps.
The architecture can scale to more than 2,000 nodes in a rack and enables ultra-high Gflop density and energy efficiency not achievable with PCI Express or Ethernet technologies.
The result is significantly increased computing horsepower built into very little board space. With up to 23 Tflops per 1U server, or greater than 800 Tflops of computing per rack, the cluster architecture enables approximately twice the computing density of the world’s top supercomputer, Tianhe-2 in China. It achieves this density by leveraging distributed switching and interconnect along with mobile-grade GPU technology, balancing I/O and compute per node in a best-in-class real estate footprint.
‘By integrating a large volume of low-power GPUs in a server rack at scale, this industry first creates a clear path to massive cloud-based clusters for analytics and gaming,” explained Jag Bolaria of the Linley Group. “This achievement means developing large clusters with low latency and massive scalability is finally possible. This architecture delivers, in an energy- and latency-efficient manner, remarkable computing horsepower in addressing the challenge of co-locating analytics in the approximately 2 million base stations deployed annually in wireless networks.”
The new architecture matches computing cores with 16 Gbps data rate to each node for better computing-to-throughput balance, one of the key limitations in the industry today. The compute to I/O ratio will continue to improve with 40 Gbps IDT RapidIO 10xN technology.
The architecture allows for 60 nodes on a 19-inch 1U board, with more than 2,000 nodes in a rack. Any node can communicate with another node with only 400 ns of fabric latency. Memory-to-memory latency is less than two microseconds. Each node consists of a Tsi721 PCIe to RapidIO NIC and a Tegra K1 Mobile Processor with 384 Gflops per 16 Gbps of data rate, or 24 floating point operations per bit of I/O. This will be valuable at the rack level in data centers and at the individual analytics server level for wireless access networks.
The cluster was achieved with NVIDIA’s Jetson TK1 development kit, which is powered by the revolutionary NVIDIA Tegra K1 mobile processor. Built on the same NVIDIA Kepler GPU architecture that powers the world’s fastest supercomputers, Tegra K1 delivers 192 fully programmable CUDA cores for advanced graphics and compute performance.
“Leading innovators in the ‘Big Data’ arena are increasingly discovering the benefits RapidIO interconnect can bring to their applications,” said Sean Fan, vice president and general manager of IDT’s Interface and Connectivity Division. “Our work with Orange Silicon Valley – connecting massive numbers of low-power NVIDIA mobile processors via RapidIO – demonstrates a breakthrough approach to addressing the trade-offs between total computing, power and balanced networking interconnect to feed the processors.”