Handling Complex Memory Situations

Jérôme Glisse felt that the time had come for the Linux kernel to address seriously the issue of having many different types of memory installed on a single running system. There was main system memory and device-specific memory, and associated hierarchies regarding which memory to use at which time and under which circumstances. This complicated new situation, Jérôme said, was actually now the norm, and it should be treated as such.

The physical connections between the various CPUs and devices and RAM chips—that is, the bus topology—also was relevant, because it could influence the various speeds of each of those components.

Jérôme wanted to be clear that his proposal went beyond existing efforts to handle heterogeneous RAM. He wanted to take account of the wide range of hardware and its topological relationships to eek out the absolute highest performance from a given system. He said:

One of the reasons for radical change is the advance of accelerator like GPU or FPGA means that CPU is no longer the only piece where computation happens. It is becoming more and more common for an application to use a mix and match of different accelerator to perform its computation. So we can no longer satisfy our self with a CPU centric and flat view of a system like NUMA and NUMA distance.

He posted some patches to accomplish several different things. First, he wanted to expose the bus topology and memory variety to userspace as a clear API, so that both the kernel and user applications could make the best possible use of the particular hardware configuration on a given system. A part of this, he said, would have to take account of the fact that not all memory on the system always would be equally available to all devices, CPUs or users.

To accomplish all this, his patches first identified four basic elements that could be used to construct an arbitrarily complex graph of CPU, memory and bus topology on a given system.

These included "targets", which were any sort of memory; "initiators", which were CPUs or any other device that might access memory; "links", which were any sort of bus-type connection between a target and an initiator; and "bridges", which could connect groups of initiators to remote targets.

Aspects like bandwidth and latency would be associated with their relevant links and bridges. And, the whole graph of the system would be exposed to userspace via files in the SysFS hierarchy.