

At the datacenter scale Gaudi 3 was pretty good, at least when it came out.
e


At the datacenter scale Gaudi 3 was pretty good, at least when it came out.


Intel GPU support?
ZLUDA previously supported Intel GPUs, but not currently. It is possible to revive the Intel backend. The development team is focusing on high‑quality AMD GPU support and welcomes contributions.
Anyways, no actual AI company is going to buy $100M of AI cards just to run all of their software through an unfinished community made translation layer, no matter how good it becomes.
OneAPI is decent, but apparently usually fairly cumbersome to work with and people prefer to write software in cuda as it’s the industry standard (and the standard in academia)


Intel’s Gaudi 3 datacenter GPU from late 2024 advertises about 1800 tops in fp8, at 3.1 tops/w. Google’s mid 2025 TPU v7 advertises 4600 tops fp8, at 4.7 tops/w. Which is a difference, but not that dramatic of one. The reason it is so small is that GPUs are basically TPUs already; almost as much die space as is allocated to actual shader units is allocated to matrix accelerators. I have heard anecdotally.


It’s not even a pivot. They’ve been focusing on AI already. I’m sure they want it to seem like a pivot (and build up hype); the times before apparently just having the hardware and software wasn’t enough. nobody cared when the gaudi cards came out, nobody uses sycl or onednn, etc


although I like a lot of what Valve does (I have a lot of Steam games, valve games, have a steam deck oled, use steamvr, etc) they are a fairly flawed company. sweeney is so great at shooting himself in the foot though that any opinion he has people will by default believe the opposite of (and probably should)


i do want to point out how hard it is to even find out about the views of these people, if you just look up the names of the projects and aren’t specifically looking for this information there’s no way you’ll find anything about it
even looking up the name of David Heinemeier Hansson, the more vocally bad of these, i had to go to the 5th link to find anything even vaguely mentioning his views
Yes, it works out to a ton of power and money, but on the other hand, 2x the computation could be like a few percent better in results. so it’s often a thing of orders of magnitude, because that’s what is needed for a sufficiently noticeable difference in use.
basing things on theoretical tops is also not particularly equivalent to performance in actual use, it just gives a very general idea of a perfect workload.