Daniel Colascione submitted some code to support processes knowing when
others have terminated. Normally a process can tell when its own child
processes have ended, but not unrelated processes, or at least not
trivially. Daniel's patch created a new file in the /proc directory entry
for each process—a file called "exithand" that is readable by any other
process. If the target process is still running, attempts to
exithand file will simply block, forcing the querying process to wait.
When the target process ends, the
read() operation will complete, and the
querying process will thereby know that the target process has ended.
It may not be immediately obvious why such a thing would be useful. After all, non-child processes are by definition unrelated. Why would the kernel want to support them keeping tabs on each other? Daniel gave a concrete example, saying:
Android's lmkd kills processes in order to free memory in response to various memory pressure signals. It's desirable to wait until a killed process actually exits before moving on (if needed) to killing the next process. Since the processes that lmkd kills are not lmkd's children, lmkd currently lacks a way to wait for a process to actually die after being sent SIGKILL.
Daniel explained that on Android, the
lmkd process currently would simply
keep checking the proc directory for the existence of each process it tried
to kill. By implementing this new interface, instead of continually polling
the process, lmkd could simply wait until the
read() operation completed,
thus saving the CPU cycles needed for continuous polling.
And more generally, Daniel said in a later email:
I want to get polling loops out of the system. Polling loops are bad for wakeup attribution, bad for power, bad for priority inheritance, and bad for latency. There's no right answer to the question "How long should I wait before checking $CONDITION again?". If we can have an explicit waitqueue interface to something, we should. Besides, PID polling is vulnerable to PID reuse, whereas this mechanism (just like anything based on struct pid) is immune to it.
Joel Fernandes suggested, as an alternative, using ptrace() to get the process exit notifications, instead of creating a whole new file under /proc. Daniel explained: