Each SM also has a single RT core and eight tensor cores. These are dedicated silicon blocks designed to calculate real-time ray tracing and artificial intelligence workloads. This means that the complete TU102 chip contains 72 RT cores and 576 tensor cores.
- Professional Graphics
- NVIDIA RTX Graphics and AI for the Fourth Industrial Revolution
- RTX 8000 & 6000 Passive
- RTX Powered Workstations
- Vital stats
- Nvidia Turing release date
- The real Turing
- Inside Nvidia Turing GPU: Shading and memory improvements
- Mentioned in this article
- Predator X27
- Turing’s new shading technologies
- Mentioned in this article
- Zotac GeForce RTX 2080 AMP
- Nvidia Ampere vs Turing GPU Architecture
- Nvidia to Use Turing Again?…
- When Will We Know More?
- AMD Navi vs NVIDIA Turing GPU Architectures: SM vs CU
- CUDA Cores vs Stream Processors: Super-scalar & Vector
- Turing vs Navi: Graphics and Compute Pipeline
- Of Warps and Waves
- Benchmarking: The RTX 3080 Delivers a TKO
- Verdict: One Giant Leap for Card-Kind
Professional Graphics
Designed, built and tested by NVIDIA, NVIDIA® RTX ™ desktop products are the choice of millions of creative and technical users. Featuring the world’s most powerful GPUs, large memory capacity, 8K display outputs, advanced real-time photorealistic rendering, AI workflows, VR environments and more, NVIDIA RTX is designed to accelerate a range of professional workflows . Optimized, stable drivers, ISV certifications with over 100 professional applications and IT management tools are just some of the advantages of NVIDIA RTX.
® Quadro ® desktop products are the choice of millions of creative and technical users. Equipped with the world’s most powerful GPUs, large memory capacity, 8K display outputs, advanced real-time photorealistic rendering, artificial intelligence workflows, VR environments and more, NVIDIA Quadro is designed to accelerate a range of professional workflows.
Optimized, stable drivers, ISV certifications with over 100 professional applications and IT management tools are just some of the advantages of NVIDIA Quadro.
NVIDIA RTX Graphics and AI for the Fourth Industrial Revolution
Inventing, modeling, simulating, testing and improving products are Industry 4.0 imperatives. From identifying design flaws earlier in the process using VR, to modifying, simulating and optimizing designs faster with more CUDA cores for higher computing performance, to using AI-powered tools to discover alternatives, and even creating marketing and sales materials faster Cinema-quality direct from CAD files, NVIDIA RTX ™ controls the product design workflow during the fourth industrial revolution.
24-36 Month Lifecycle
Supply Chain Management
Long-life Program
Product extension to an additional 5 years
Designed and Certified for Servers
Game Development
(Direct X 12 Ultimate)
RTX 8000 & 6000 Passive
RTX power in the data center
The demand for visualization, rendering, data analytics, and simulation continues to grow as companies deal with larger, more complex workloads than ever before.
RTX Powered Workstations
RTX-enabled workstations feature the new NVIDIA Turing ™ GPU architecture that delivers real-time ray tracing, artificial intelligence, and advanced graphics capabilities.
This is not a review of GeForce RTX 2080 or GeForce RTX 2080 Ti. This is a study of the Turing architecture itself. Head over to our exhaustive GeForce RTX 2080 and 2080 Ti review for a complete, test-based evaluation of their speeds, channels, and promises.
Vital stats
Nvidia Turing release date
The first Turing cards were announced at SIGGRAPH in August 2018 under the name Quadro RTX, and GeForce RTX cards were unveiled at Gamescom shortly thereafter. But consumer cards came out first, with the RTX 2080 and RTX 2080 Ti going on sale in September.
Nvidia Turing specification
There are three separate GPUs in the first wave of Turing chips. The full top graphics card TU102 has 72 SM modules with 4608 CUDA cores. It also includes 576 AI-oriented tensor cores and 72 RT ray tracing cores. Below it are the TU104 and TU106 GPUs, with 3072 and 2304 CUDA cores respectively. You also get GDDR6 memory support.
Architecture of Nvidia Turing
Turing GPUs are designed with an emphasis on computation-based rendering, as they combine traditional rasterized rendering hardware with artificial intelligence and ray tracing-oriented silicon chip. The individual Turing SMs have also been redesigned to offer a 50% performance improvement over the Pascal SM design.
The price of the Nvidia Turing
If you want the absolute top of Turing hardware then the Quadro RTX 8000 is for you, at least $ 10,000. This makes 400,199 for the GeForce RTX 2080 Ti Founder’s Edition a bargain. The cheapest of the announced Turing GPUs, the RTX 2070, costs $ 499 for a reference card.
NVIDIA Turing performance
We now have a complete idea of how the Turing GPUs perform on traditional rendering in the RTX 2080 and RTX 2080 Ti benchmarks presented in our reviews of these two cards. But the potential of real-time ray tracing and artificial intelligence is yet to be fully understood.
Nvidia Turing release date
You’ll be able to get your hands on Nvidia’s Turing-based graphics cards for the first time when the RTX 2080 Ti and RTX 2080 are released on September 20, 2018, although the overall availability of the top card has been delayed until a week later. Professional Quadro RTX cards with full Turing TU102 and TU104 GPUs will be available in Q4 2018.
The real Turing
Nvidia’s Turing architecture is named after the famous British mathematician and computer scientist Alan Turing. Turing is known worldwide for his achievements in breaking the encrypted code of the German Enigma machine during World War II, developing the key machine that led to the computing as we know it today, and the Turing Test, a test designed to test whether a machine has intelligence.
The new GPU architecture was first announced at SIGGRAPH, where Quadro RTX cards had their first presentation, but it was at Gamescom later on August 20 Jen-Hsun Huang unveiled the first GeForce RTX graphics cards on stage at the pre-show event.
A third Turing graphics card, the RTX 2070, has also been announced, due to be released later than the two flagship GPUs, and the October 2018 launch window has already been confirmed. We expect it to be probably later this month, possibly October 20, if Nvidia sticks to the latest announcement traditions and release dates.
Interestingly, Nvidia has launched its Turing consumer GPU with its own Founder’s Edition overclocked cards, but it expects its GPU partners to release the reference-clocked versions on the same day. This is the first time in the latest memory that Nvidia and its partners have introduced a new generation of graphics cards to the market.
All of this makes Nvidia’s 150 199 (£ 1,099) request to make the founding GeForce RTX 2080 Ti look rather sane. Well almost. Eventually, RTX 2080 Ti versions with a reference clock at the manufacturer’s suggested retail price of $ 999 may arrive, but that won’t happen until supply increases and demand declines post-launch.
Inside Nvidia Turing GPU: Shading and memory improvements
Let’s explain the improvements to these long-known things before we dive into exotic new tensor and RT cores.
Nvidia says the GeForce RTX 2080 can be about 50 percent faster than the GTX 1080 in traditional games. Many comparisons take place with HDR-enabled games that lose performance on current GTX 10 cards. The GeForce RTX 2080 can be over twice as fast as the GTX 1080 in games that support Nvidia’s DLSS technology, claims Nvidia (more on DLSS, we’ll talk about later) and exceed 60 frames per second in several triple-A games at 4K resolution with HDR visualizations turned on.
Mentioned in this article
Predator X27
The true performance of Nvidia’s high-end RTX duo remains to be seen. (Update: Read our GeForce RTX 2080 and 2080 Ti review.) Nvidia made no mention of the GeForce RTX 2080 Ti performance in traditional games. We still have no idea how the GeForce RTX 2080 compares to the older GTX 1080 Ti in non-HDR games. Nvidia’s frame rate on the 4K / 60 HDR games listed above does not mention what graphics settings were tested with.
Since the RTX 2080 has at least the same performance as the GTX 1080 Ti, despite having nearly 20 percent fewer CUDA cores, it’s clear that these CUDA cores have been upgraded.
Note: Turing Simultaneous Multiprocessor GPUs have not only been updated but also reviewed. In addition to adding tensor and RT cores, Nvidia has also added a new integer pipeline (INT32) in addition to the floating point pipeline (FP32) traditionally used for shader processing.
When Nvidia looked at how real-world games behaved, it found that for every 100 floating-point instructions executed, an average of 36 and as many as 50 floating-point instructions were also processed, which disrupted performance. The new integer pipeline handles these extra instructions independently of and simultaneously with the FP32 pipeline. According to Jonah Allen, Nvidia’s vice president of GPU engineering, performing these two tasks simultaneously results in a huge speed increase.
Nvidia has also changed the way cache works in its simultaneous multiprocessors. Now, each of the smaller SM modules feeds the unified pool of L1 memory and shared memory, which in turn powers the L2 cache, which is twice as large as before. The shocks mean Turing has almost three times more available L1 memory than the Pascal GPUs in the GTX 10 series, with twice the bandwidth and lower latency.
Add all of that, and Nvidia says the Turing GPU performs traditional shading as much as 50 percent better than Pascal. This is a huge architectural improvement, although the actual benefits will vary from game to game as shown in the slide above.
But gaming is not limited by the performance of the shader itself. Memory bandwidth can directly affect the performance of your games. Turing refines Pascal’s superb memory compression technology, and with the introduction of Micron’s next-generation GDDR6 memory, it made its first appearance on the GPU. The GDDR6 rushes at 14Gbps despite being 20 percent more power efficient than the GDDR5X, and Nvidia’s Optimized Turing RAM provides 40 percent less crosstalk than its predecessor.
Turing’s new shading technologies
As with most major GPU architecture premiers, Nvidia has also introduced some new shading technologies that developers can use to improve performance, visual effects, or both.
Mesh shading helps to offload the CPU during visually very complex scenes with tens or hundreds of thousands of objects. It consists of two new shader stages. Job shaders cull objects to determine which elements of the scene need rendering. After making a decision, the Mesh Shader determines the level of detail at which the visible objects should be rendered. Those that are further away require a much lower level of detail, while closer objects need to appear as sharp as possible.
Nvidia showed mesh shading with an impressive, playable demo where you flew a spaceship through a huge field of 300,000 asteroids. The demo ran at around 50 frames per second despite this gigantic number of objects as mesh shading reduced the number of triangles drawn at any point to around 13,000, from a maximum of 3 trillion potential triangles drawn. Intriguing things.
Variable-speed shading is sort of a supercharged version of multi-resolution shading that Nvidia has been supporting for years. Human eyes see in full detail only the focus of what is in their vision; peripheral or moving objects are not as sharp. Variable speed shading uses this to shade primary objects at full resolution, but shading secondary objects at a slower speed, which can improve performance.
One potential application of this approach is motion adaptive shading, where non-critical parts of a moving scene are rendered with less detail. The image above shows how you can deal with it in Forza Horizon. Traditionally, every part of the screen would be rendered in full detail, but thanks to adaptive motion shading, only the blue sections of the scene are so refined.
Adaptive Content Shading follows the same principles, but dynamically identifies low-detail portions of the screen or large areas of similar colors, and shades those with fewer details, and more when you’re on the move. It looked damn good in action during the playable demo of Wolfenstein II, which allowed the feature to be toggled on and off. I haven’t noticed any change in image quality, but Nvidia’s Alben says activating CAS increases imaging speed by 20 or more fps in situations where the target is 60 fps on the main GPU. The developers cross their fingers, they support this kind of technology with more enthusiasm than multi-resolution shading, which knocked me off my feet in Shadow Warrior 2, but gained no grip beyond that.
Mentioned in this article
Zotac GeForce RTX 2080 AMP
Variable-speed shading can also help with virtual reality workloads by tailoring the level of detail to where you are looking. Another new VR technology, Multi-View Rendering, extends the simultaneous multi-projection technology introduced in the GTX 10 series to allow “developers to efficiently draw a scene from multiple angles, and even draw multiple instances of characters in different poses in one pass.”
Finally, Nvidia also introduced texture space shaders that shade the area around an object rather than a single scene to allow developers to reuse shading across multiple perspectives and frames.
A third Turing graphics card, the RTX 2070, has also been announced, due to be released later than the two flagship GPUs, and the October 2018 launch window has already been confirmed. We expect it to be probably later this month, possibly October 20, if Nvidia sticks to the latest announcement traditions and release dates.
Nvidia Ampere vs Turing GPU Architecture
A quick and quick comparison of Nvidia’s Ampere and Turing GPU architectures.
GPU Architecture-> | Ampere | Turing |
Producer | Nvidia | Nvidia |
Manufacturing process | 8 nm (Samsung) | 12nm (TSMC) |
CUDA version | 8 | 7.5 |
RT cores | 2nd generation | 1st generation |
Tensor Cores | 3rd Generation | 2nd generation |
Streaming Multiprocessors | 2x FP32 | 1x FP32 |
DLSS | DLSS 2.0 | DLSS 1.0 |
Memory support | HBM2, GDDR6X | GDDR6, GDDR5, HBM2 |
PCIe support | 4th generation PCIe | PCIe Gen 3 |
NVIDIA encoder (NVENC) | Gen 7 | Gen 7 |
NVIDIA Decoder (NVDEC) | Gen 5 | Gen 4 |
DirectX 12 Ultimate | Yes | Yes |
VR ready | Yes | Yes |
Multiple GPU support | NVLink 3.0 | NVLink 2.0 |
Energy efficiency | Better than Turing | Better than the Volta |
Video outputs | HDMI 2.1, DisplayPort 1.4a | HDMI 2.0b, DisplayPort 1.4a |
Graphic cards | RTX 30 series | RTX 20 series, GTX 16 series |
Apps | Games, workstations, artificial intelligence (AI) | Games, workstations, artificial intelligence (AI) |
Final words
Well, the Ampere GPU architecture offers a significant improvement when it comes to Ray Tracing and DLSS, but even when these features are not used, the performance gain in Ampere is greater than in Turing. Another significant addition to Ampere is PCIe Gen 4 support, which offers much higher bandwidth and could be quite useful in the future. If you have something to add or say, feel free to do so by leaving a comment below.
The power of RTX technology in the data center The demand for visualization, rendering, data analysis, and simulation continues to grow as companies deal with larger, more complex workloads than ever before.
Nvidia to Use Turing Again?…
Now, if you are familiar with the architectures of your graphics cards, you might know that Turing was used in the 20XX series and has since been superseded by the new 30XX “Ampere”. However, this may partly explain why Nvidia was able (eventually) to bring these new cryptocurrency mining GPUs to market so quickly. Namely, that they seemingly use “old parts” adapted for this purpose.
Honestly, this is an exceptionally clever move as it takes older “redundant” stocks and chipsets to a very eager market that cares more about mixing than framerate.
When Will We Know More?
The good news or the bad news, depending on how you view it, is that the 30HX and 40HX will not be released (or at least not expected) by the end of Q1 this year. As such, when it comes to general gaming GPU models, unfortunately it seems that the current mining craze will continue to play a major role in the overall market shortage, at least in the short to medium term. But let’s hope that when these models come out, it will help us humble consumers get those GPU enhancements we so much crave!
What do you think? – Let me know in the comments!
In the case of NVIDIA, tessellation, view transform, vertex fetch, and output stream are performed by different entities in the PolyMorph geometry engine.
AMD Navi vs NVIDIA Turing GPU Architectures: SM vs CU
One of the main differences between NVIDIA and AMD GPU architectures is in the cores / shaders and compute units (NVIDIA calls it SM or Streaming Multiprocessor). NVIDIA shaders (execution units) are called CUDA cores while AMD uses stream processors.
CUDA Cores vs Stream Processors: Super-scalar & Vector
AMD GPUs are vector processors while NVIDIA architecture is superscalar in nature. While theoretically the former uses the SIMD execution model and the latter is based on the SIMT, there are few practical differences. In AMD SIMD, there will always be room for 32 work items, no matter how many threads are executed per cycle. There may be 12, 15, 20, 25, or 30 threads spent by the application per cycle, but the model supports 32 at the native level. In total, the work is issued to CU in the form of waves, each with 32 items.
With NVIDIA SM, unless there is no more work, all 128 work queues (32 threads x 4) will always be saturated no matter which application is used. In this case, the threads are independent of each other and may yield or converge with threads from other SM modules as needed. This is one of the main advantages of using a superscalar architecture. The level of parallelism is maintained and the use is also better.
One NVIDIA Turing SM has FP32, INT32 cores and two tensor cores. There is also a load / store, special function unit, warp schedule, and shipping. Like Volt, the instruction takes two cycles (one INT and one FP) to execute. There are separate cores for the INT and FP calculations and they work in tandem. As such, NVIDIA’s Turing SM modules can execute both floating point and integer instructions per cycle. It is an implementation of NVIDIA’s asynchronous computing. While not exactly the same, the goal of both technologies is to improve GPU utilization.
AMD’s dual CU, on the other hand, consists of four SIMD cards, each containing 32 shaders or execution lines. There are no separate shaders for INT or FP, with the result that Navi stream processors can run both FP and INT per cycle. However, unlike the older GCN design, execution takes place every cycle, significantly increasing throughput.
The reason for this is that most games have short shipments that have not been able to fill the 4x 64 queue in the GCN-based Vega and Polaris graphics chips. Moreover, performing four clock cycles made the situation worse. In the case of Navi, executing one clock cycle significantly reduces this bottleneck, thereby increasing the IPC in some cases almost fourfold, putting design efficiency on a par with modern NVIDIA designs.
Turing vs Navi: Graphics and Compute Pipeline
In the AMD Navi architecture, the Graphics Command Processor takes care of the standard graphics pipeline (rendering, pixels, vertices, and hull shaders), and the ACE (Asynchronous Computing Engine) sends computing tasks in separate pipelines. They work in tandem with HWS (hardware schedules) and DMA (direct memory access) to enable simultaneous performance of computing and graphics workloads. Moreover, there is a geometry processor. Solves complex geometry loads, including tessellation.
In Turing, the warp schedule at the SM level and the Gigathread engine at the GPU level manage the computing and graphics loads. While concurrent computation is not the same as asynchronous computation, it works similarly, with support for concurrent floating point (mostly graphical) and total (mostly computing) loads.
Of Warps and Waves
With AMD Navi, load positions are issued in the form of a group of threads called waves. Each wave contains 32 threads (one for each shader on the SIMD card), computational or graphical, and is sent to the Dual Compute Units for execution. Since each CU has two SIMDs, it can handle two waves, while a Dual Compute Unit can process four.
In the case of NVIDIA, the thread schedule is managed by the Gigathread engine with the help of Harp Schedulers. Each collection of 32 threads is called a warp. Each core (INT or FP) in the SM supports the matrix. The warp threads did not originally act alone, but rather collectively. Hence, the core can switch between the different warps available if there is a stall.
Each thread has no program counter, but this is what happens when there is a warp. For Turing and Volta (and of course Ampere), however, NVIDIA says each thread is independent and convergence is handled similar to the Volta. Similar warp wefts are grouped into SIMT units and may coincide or coincide.
In fact, the TU106 is more like the TU102, only effectively cut in half. The TU102 and TU106 GPUs have 12 SM modules in each GPC, while the TU104 only has eight in their design. This means that the RTX 2080 Ti and RTX 2080 chipsets ship with six GPCs, but with fewer SMs in a smaller chip.
Benchmarking: The RTX 3080 Delivers a TKO
Okay, pretty grim chatter about ports and specs details. Let’s move on to the show!
What exactly did all these changes from RTX 2080, to RTX 2080 Super, to RTX 3080, mean when we test them all? Here’s what the eight major AAA games looked like.
Summarizing the above graph in one word? Wow. Crushing numbers (especially so many in sequence) tell the whole story better than I do. Simply put, the RTX 3080 shows a staggering performance boost, especially at 4K resolution. There is no situation here where the RTX 3080 will not drop out of the other end of the test, looking far superior to any card that Nvidia has released before.
Then we looked at less demanding multiplayer titles.
As you can see, the biggest profits are again in the 4K game. For a full comparison of the RTX 3080 with the dozen Turing cards, Nvidia “Pascal” and the AMD Radeon “Navi” and “Vega”, see our full GeForce RTX 3080 Founders Edition review.
Verdict: One Giant Leap for Card-Kind
Nvidia has improved the performance of its flagship GeForce RTX 3080 in every way possible, and in tasks that support DLSS ray tracing, there’s another extra hunk of performance gains layered on top like caramel sauce. (Again, many more details on this in our full RTX 3080 review.) The price is the same as the RTX 2080 Super, but there are many improvements and for gamers who have been waiting for a 4K flashlight capable card in any situation, the RTX 3080 is the only real an option here in the second half of 2020.
If you now have $ 700 burning a hole in your pocket and a 4K monitor or TV on your desk or in your entertainment center, the Nvidia GeForce RTX 3080 is your dream card. The last-generation Turing cards were excellent slow-burn hardware; technologies like ray tracing and DLSS were ticking instead of roaring forward. But Ampere’s atmosphere is fire to five alarms, pedal to the top, seventh gear on the road ahead, baby. The huge performance differences between the RTX 2080 Founders Edition and the RTX 3080 Founders Edition are proof of this and are the biggest thing you need to know about these two generations of cards. There is simply no comparison.
Like What You’re Reading?
Sign up for Lab Report to get the latest reviews and top product tips delivered straight to your inbox.
This newsletter may contain advertisements, offers, or affiliate links. By subscribing to the newsletter, you agree to our Terms of Use and Privacy Policy. You can unsubscribe from the newsletters at any time.