Some kernel developers recently have been trying to work around the massive, horrifying, long-term security holes that have recently been discovered in Intel hardware. In the course of doing so, there were some interesting comments about coding practices.
Christoph Hellwig and Jesper Dangaard Brouer were working on mitigating some of the giant speed sacrifices needed to avoid Intel's gaping security holes. And, Christoph said that one such patch would increase the networking throughput from 7.5 million packets per second to 9.5 million—a 25% speedup.
To do this, the patch would check the kernel's "fast path" for any
dma_direct_ops and replace them with a simple direct
Linus Torvalds liked the code, but he noticed that Jesper and Christoph's code sometimes would perform certain tests before testing the fast path. But if the kernel actually were taking the fast path, those tests would not be needed. Linus said, "you made the fast case unnecessarily slow."
He suggested that switching the order of the tests would fix it right up. He added:
In fact, as a further micro-optimization, it might be a good idea to just specify that the dma_is_direct() ops is a special pointer (perhaps even just say that "NULL means it's direct"), because that then makes the fast-case test much simpler (avoids a whole nasty constant load, and testing for NULL in particular is often much better).
But that further micro-optimization absolutely *requires* that the ops pointer test comes first. So making that ordering change is not only "better code generation for the fast case to avoid extra cache accesses", it also allows future optimizations.
Regarding Linus' micro-optimization, Christoph explained:
I wanted to do the NULL case, and it would be much nicer. But the arm folks went to great lengths to make sure they don't have a default set of dma ops and require it to be explicitly set on every device to catch cases where people don't set things up properly, and I didn't want to piss them off....But maybe I should just go for it and see who screams, as the benefit is pretty obvious.
Linus also suggested that for Christoph's and Jesper's tests, the
dma_is_direct() function should be sure to use the
And this was interesting because
likely() is used to alert the
compiler that a block of code is more "likely" to be run than
another in order to optimize it. And, Christoph wasn't sure this
was true. He said, "Yes, for the common case, it is likely. But if
you run a setup where you say always have an iommu, it is not, in
fact, it is never called in that case, but we only know that at