TEXT   5


Guest on 20th July 2021 02:57:12 PM

  1. Small Task Packing in the big.LITTLE MP Reference Patch Set
  3. What is small task packing?
  4. ----
  5. Simply that the scheduler will fit as many small tasks on a single CPU
  6. as possible before using other CPUs. A small task is defined as one
  7. whose tracked load is less than 90% of a NICE_0 task. This is a change
  8. from the usual behavior since the scheduler will normally use an idle
  9. CPU for a waking task unless that task is considered cache hot.
  12. How is it implemented?
  13. ----
  14. Since all small tasks must wake up relatively frequently, the main
  15. requirement for packing small tasks is to select a partly-busy CPU when
  16. waking rather than looking for an idle CPU. We use the tracked load of
  17. the CPU runqueue to determine how heavily loaded each CPU is and the
  18. tracked load of the task to determine if it will fit on the CPU. We
  19. always start with the lowest-numbered CPU in a sched domain and stop
  20. looking when we find a CPU with enough space for the task.
  22. Some further tweaks are necessary to suppress load balancing when the
  23. CPU is not fully loaded, otherwise the scheduler attempts to spread
  24. tasks evenly across the domain.
  27. How does it interact with the HMP patches?
  28. ----
  29. Firstly, we only enable packing on the little domain. The intent is that
  30. the big domain is intended to spread tasks amongst the available CPUs
  31. one-task-per-CPU. The little domain however is attempting to use as
  32. little power as possible while servicing its tasks.
  34. Secondly, since we offload big tasks onto little CPUs in order to try
  35. to devote one CPU to each task, we have a threshold above which we do
  36. not try to pack a task and instead will select an idle CPU if possible.
  37. This maintains maximum forward progress for busy tasks temporarily
  38. demoted from big CPUs.
  41. Can the behaviour be tuned?
  42. ----
  43. Yes, the load level of a 'full' CPU can be easily modified in the source
  44. and is exposed through sysfs as /sys/kernel/hmp/packing_limit to be
  45. changed at runtime. The presence of the packing behaviour is controlled
  46. by CONFIG_SCHED_HMP_LITTLE_PACKING and can be disabled at run-time
  47. using /sys/kernel/hmp/packing_enable.
  48. The definition of a small task is hard coded as 90% of NICE_0_LOAD
  49. and cannot be modified at run time.
  52. Why do I need to tune it?
  53. ----
  54. The optimal configuration is likely to be different depending upon the
  55. design and manufacturing of your SoC.
  57. In the main, there are two system effects from enabling small task
  58. packing.
  60. 1. CPU operating point may increase
  61. 2. wakeup latency of tasks may be increased
  63. There are also likely to be secondary effects from loading one CPU
  64. rather than spreading tasks.
  66. Note that all of these system effects are dependent upon the workload
  67. under consideration.
  70. CPU Operating Point
  71. ----
  72. The primary impact of loading one CPU with a number of light tasks is to
  73. increase the compute requirement of that CPU since it is no longer idle
  74. as often. Increased compute requirement causes an increase in the
  75. frequency of the CPU through CPUfreq.
  77. Consider this example:
  78. We have a system with 3 CPUs which can operate at any frequency between
  79. 350MHz and 1GHz. The system has 6 tasks which would each produce 10%
  80. load at 1GHz. The scheduler has frequency-invariant load scaling
  81. enabled. Our DVFS governor aims for 80% utilization at the chosen
  82. frequency.
  84. Without task packing, these tasks will be spread out amongst all CPUs
  85. such that each has 2. This will produce roughly 20% system load, and
  86. the frequency of the package will remain at 350MHz.
  88. With task packing set to the default packing_limit, all of these tasks
  89. will sit on one CPU and require a package frequency of ~750MHz to reach
  90. 80% utilization. (0.75 = 0.6 * 0.8).
  92. When a package operates on a single frequency domain, all CPUs in that
  93. package share frequency and voltage.
  95. Depending upon the SoC implementation there can be a significant amount
  96. of energy lost to leakage from idle CPUs. The decision about how
  97. loaded a CPU must be to be considered 'full' is therefore controllable
  98. through sysfs (sys/kernel/hmp/packing_limit) and directly in the code.
  100. Continuing the example, lets set packing_limit to 450 which means we
  101. will pack tasks until the total load of all running tasks >= 450. In
  102. practise, this is very similar to a 55% idle 1Ghz CPU.
  104. Now we are only able to place 4 tasks on CPU0, and two will overflow
  105. onto CPU1. CPU0 will have a load of 40% and CPU1 will have a load of
  106. 20%. In order to still hit 80% utilization, CPU0 now only needs to
  107. operate at (0.4*0.8=0.32) 320MHz, which means that the lowest operating
  108. point will be selected, the same as in the non-packing case, except that
  109. now CPU2 is no longer needed and can be power-gated.
  111. In order to use less energy, the saving from power-gating CPU2 must be
  112. more than the energy spent running CPU0 for the extra cycles. This
  113. depends upon the SoC implementation.
  115. This is obviously a contrived example requiring all the tasks to
  116. be runnable at the same time, but it illustrates the point.
  119. Wakeup Latency
  120. ----
  121. This is an unavoidable consequence of trying to pack tasks together
  122. rather than giving them a CPU each. If you cannot find an acceptable
  123. level of wakeup latency, you should turn packing off.
  125. Cyclictest is a good test application for determining the added latency
  126. when configuring packing.
  129. Why is it turned off for the VersatileExpress V2P_CA15A7 CoreTile?
  130. ----
  131. Simply, this core tile only has power gating for the whole A7 package.
  132. When small task packing is enabled, all our low-energy use cases
  133. normally fit onto one A7 CPU. We therefore end up with 2 mostly-idle
  134. CPUs and one mostly-busy CPU. This decreases the amount of time
  135. available where the whole package is idle and can be turned off.

Raw Paste

Login or Register to edit or fork this paste. It's free.