Kostas Anagnostou
Lead Rendering Engineer at Playground Games working on Fable. Always open for graphics questions or mentoring people who want to get in the industry. I tweet about graphics mostly. Views my own. Blog: interplayoflight.wordpress.com
- Post exploring the evolution of SIMT in GPUS: "SIMD Started It, SIMT Improved It" blog.siggraph.org/2026/01/simd...
- Reposted by Kostas AnagnostouI'm finally writing up how Nanite Tessellation works. The first few blogs posts are up. More will be coming. graphicrants.blogspot.com/2026/02/nani...
- Reposted by Kostas Anagnostou[Not loaded yet]
- IBL Optimization Study II: Faster Irradiance technik90.blogspot.com/2026/01/ibl-...
- Reposted by Kostas Anagnostou[Not loaded yet]
- Optimizing spatiotemporal variance-guided filtering for modern GPU architectures jcgt.org/published/00...
- Nice collection of graphics programming resources: cody-duncan.github.io/r-graphicspr...
- Reposted by Kostas AnagnostouSlides are now available for my GPC 2025 talk with @phammer.bsky.social on Variable Rate Compute Shaders in Doom The Dark Ages static.graphicsprogrammingconference.com/public/2025/...
- Reposted by Kostas Anagnostou[Not loaded yet]
- As expected, the size of the MLP does matter when running it on Tensor cores. A 3x32x32x1 MLP running on Cooperative Vectors is ~70x faster than my compute shader version for the same amount of inference. The Coop Vectors version is using fp16 but the speedup is impressive regardless.
- Reposted by Kostas AnagnostouMy "How to Vulkan in 2026" @vulkan.org #Vulkan guide is now publicly available at www.howtovulkan.com I still consider it a preview, though I'm mostly happy with it and only plan on changing minor things and incorporating some feedback.
- Interesting post, working through and improving the performance issues of an LLM-generated IBL implementation: technik90.blogspot.com/2025/12/ibl-...
- "Improving Direct Lighting Material Occlusion - Part 1", discussing micro-occlusion for direct lighting, Naughty Dog's and Activision's approaches and alternatives irradiance.ca/posts/micros...
- Wordle 1,657 5/6 🟨⬜⬜⬜🟩 ⬜🟩⬜⬜🟩 🟩🟩⬜⬜🟩 ⬜🟩🟨⬜🟩 🟩🟩🟩🟩🟩 A bit embarrassed that I couldn't find today's word sooner. 😊 Happy New Year everyone!
- Good read: "GPU Cache Hierarchy: Understanding L1, L2, and VRAM" charlesgrassi.dev/blog/gpu-cac...
- Reposted by Kostas Anagnostou[Not loaded yet]
- Reposted by Kostas Anagnostou[Not loaded yet]
- Reposted by Kostas Anagnostou#SIGGRAPH2025 Advances in Real-Time Rendering in Games course talk recording of "FAST AS HELL: IDTECH8 GLOBAL ILLUMINATION" by @idsoftwaretiago.bsky.social from id Software is now online: youtu.be/VTrdeqMMMK0?... Enjoy!
- Reposted by Kostas Anagnostou[Not loaded yet]
- Reposted by Kostas Anagnostou[Not loaded yet]
- Great collection of C/C++ compiler optimisations: xania.org/AoCO2025-arc...
- It's been an adventure but I finally managed to get Cooperative Vectors to use my tiny MLP (2 hidden layers, 3 nodes each) to infer sky vis at a specific pos. It isn't really faster than my compute shader version, the MLP is maybe too small to make good use of the tensor cores but cool regardless.
- Reposted by Kostas AnagnostouMy "No Graphics API" blog post is live! Please repost :) www.sebastianaaltonen.com/blog/no-grap... I spend 1.5 years doing this. Full rewrite last summer and another partial rewrite last month. As Hemingway said: "First draft of everything is always shit".
- "Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis", focusing on the tensor cores arxiv.org/pdf/2512.02189
- Great read: "Video Game Blurs (and how the best one works)" blog.frost.kiwi/dual-kawase/
- The z-buffer and depth testing (aka z-testing) have been the dominant way of hidden surface elimination for over 50 years, introduced but not implemented in W. Straßer's PhD thesis in 1974, and actually implemented in Ed Catmull's PhD thesis in the same year. 1/4
- The post on using spatial hashing with raytraced ambient occlusion attracted quite a bit of interest so I expanded it into a blog post to discuss how it works behind the scenes to both reduce the noise and its cost. interplayoflight.wordpress.com/2025/11/23/s...
- Reposted by Kostas Anagnostou[Not loaded yet]
- TIL that you can use an LLM to create Latex equations by pretty much describing them. My past, post graduate self who had to painstakenly create them by hand would be jealous.
- "Get Started with Neural Shading" course videos: youtube.com/playlist?lis...
- Reposted by Kostas Anagnostou#SIGGRAPH2025 Advances in Real-Time Rendering in Games course talk recording of "Stochastic Tile-Based Lighting in HypeHype" by Jarkko Lempiäinen from HypeHype is now online: www.youtube.com/watch?v=8O44...
- Reposted by Kostas Anagnostou#SIGGRAPH2025 Advances in Real-Time Rendering in Games course talk recording of "Strand-Based Hair And Fur Rendering In Indiana Jones and The Great Circle" by Sergei Kulikov from MachineGames is now online: youtu.be/jSE1XXBEK-w
- Did a quick and dirty implementation of a spatial hash structure to speedup RTAO, ray results are stored in cells indexed by pos/normal/cell size and after storing a few rays occlusion can be queried from the cell instead of raytracing it. 3x faster raytraced AO for that scene with no denoising.
- Reposted by Kostas AnagnostouNew blog post! Behind the scenes of some of the techniques involved in making our last PC demo 💫 gboisse.github.io/posts/this-i...
- Finally got around to adding support for (hardware) VRS to the toy engine. Forcing a 2x2 shading rate and comparing in GPU trace, a summary of what is happening for the gbuffer pass (2nd trace VRS on), the GPU is doing the same number of z-tests, while doing about 64% less pixel shader work. 1/3
- Happy Halloween!
- Some information about Ghost of Yōtei's rendering systems blog.playstation.com/2025/10/23/g...
- ReGIR - An advanced implementation for many-lights offline rendering tomclabault.github.io/blog/2025/re...
- Good read! "Neural Super Sampling for Mobile": huggingface.co/Arm/neural-s...
- Reposted by Kostas AnagnostouI finally found the time and energy to make a new blog and write a couple of posts. This time I wrote about PBR content and game development principles. Both posts are quite different so hopefully people find something interesting on either one of them. irradiance.ca/posts/
- Interesting read on the future of game graphics, from 12 years ago: mcvuk.com/business-new...
- Reading AMD GPU ISA rocm.blogs.amd.com/software-too...
- Quick experiment in Compiler Explorer to observe how VGPR allocation differs between wave32 and wave64 on RDNA: with wave32 it appears to be in batches of 8 while with wave64 it is in batches of 4 VGPRs. godbolt.org/z/W6ee8MhMx
- Really good, maths-free, introduction to the Fourier transform from this year's Siggraph, recommended watch: dl.acm.org/doi/10.1145/...
- Quick tip, add the "-fspv-target-env=vulkan1.1" command line argument to Compiler Explorer's RGA to get it to compile HLSL shaders with wave intrinsics: godbolt.org/z/PKoh5d51K
- "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels", includes a great intro to GPU architecture and PTX/SASS: www.aleksagordic.com/blog/matmul
- Reposted by Kostas AnagnostouNew blog post! In "Billions of triangles in minutes" we'll walk through hierarchical cluster level of detail generation of, well, billions of triangles in minutes. Reposts welcome! zeux.io/2025/09/30/b...
- Reposted by Kostas Anagnostou[Not loaded yet]
- Reposted by Kostas Anagnostou[Not loaded yet]
- Reposted by Kostas Anagnostou[Not loaded yet]
- Nsight Graphics' GPU Trace/Trace Analysis often provides more low level hardware information, in the form of tooltips and performance advice, than the documentation available online does, worth exploring a few captures to understand the GPU architecture better.
- Continuing on the topic of GPU utilisation and performance, as a practical example, I looked a bit deeper into the impact of vertex shader exports on the cost of a drawcall and wrote another blog post with some observations interplayoflight.wordpress.com/2025/09/21/t...
- Reposted by Kostas Anagnostou[Not loaded yet]
- Brief introduction to a number of OIT techniques: "Advances in Order Independent Transparency for Real-Time & Virtual Production Workflows" www.youtube.com/watch?v=wXSJ...
- New blog post discussing a few approaches to bottleneck reduction and GPU utilisation and performance increase interplayoflight.wordpress.com/2025/08/29/g...
- It is interesting that both RTGI presentations at Advances in R-T Rendering this year (both great reads!) introduce it as a solution to scalability and baking size/time issues, due to the size/dynamic nature of the world, more than a visual improvement advances.realtimerendering.com/s2025/index....
- Detecting reads from uninitialised heap memory in C++ programs at runtime www.forwardscattering.org/post/71
- Worth re-sharing this oldish but still great presentation as an example of how perf should be viewed holistically, maximising all GPU units' utilisation even if it means making a particular drawcall's execution slower to achieve this. s3.amazonaws.com/nd.images/re... www.youtube.com/watch?v=CvS6...
- Reposted by Kostas Anagnostou[Not loaded yet]
- Intuitive Guide to Convolution betterexplained.com/articles/int...
- Pointer Tagging in C++: The Art of Packing Bits Into a Pointer vectrx.substack.com/p/pointer-ta...
- Anno 1800: Frame Analysis blog.thomaspoulet.fr/posts/anno-1...