Patch: cpu utilization monitor.

Thu Mar 18 08:41:24 EST 2004

Greg is right and so is Mike...

CKRM's province is a single Linux kernel image. So it controls whatever
the image gets from the hypervisor
and cannot, by itself, be used by users to achieve "true" control. For
that you need some workload management
middleware (like eWLM) whose control spans LPARS.

However, it can report whatever a kernel entity is willing to provide.
In CKRM's current design, we
have a number that represents the "100%" of a resource (its not 100 for
reasons that are not relevant here).
This number is currently irrelevant to the user (except as an upper
limit for what can be distributed amongst
the classes defined - all of them must, obviously, consume less than
100% of the available resource).
It would be easy enough to dynamically modify that number to represent
the real fraction being served to an OS image,
provided CKRM and the consumer of that number (middleware or sysadmin)
agree on the units.

Manish/Linas, if you're writing the entity to determine the real
fraction, there's no duplication of effort. If you're getting into
reporting it to higher level users (human or software), you might be -
we currently have two kernel-user paths for sending such
info up to the user (one for manual users of CKRM, one for middleware).
We'll be doing a code drop on lkml in the next day or
two so you'll be able to determine for yourself.

Up in user space, CKRM's tooling is rudimentary. With the new filesystem
API that we're using, its even more likely we'll be leaning towards
scripts initially.

Naturally, we'd be happy to discuss all this further. The CKRM project
has quite a few high-priority stuff on its plate that integration with
other projects (such as cpumemsets for NUMA, or yours for LPAR) isn't
important yet but if we keep in sync at a high level,  it may be
possible to avoid duplication/incompatible design choices.

Hope this helps,
Shailabh

ahuja at austin.ibm.com wrote:

>Thanks for the comments everyone.
>
>Like linas said earlier, the value getting reported by OS whether the cpu
>is 100% busy or 50% busy does not hold any relation to the actual physical
>CPU allocated to it anymore.
>
>I am attempting to normalize the value that the OS reports to the actual
>cpu use and give a more accurate picture to other tools/user space. Now
>there are couple of different requirements and I hope to get to all of
>them as this progresses.
>
>I will try and rectify the code from the comments I have received so far.
>I did give CKRM a cursory glance, not sure that I am duplicating effort
>here. But let me look further on that.
>
>Thanks,
>Manish
>
>
>On Wed, 17 Mar 2004, Mike Kravetz wrote:
>
>
>
>>On Wed, Mar 17, 2004 at 11:13:59AM -0800, Dave Hansen wrote:
>>
>>
>>>On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote:
>>>
>>>
>>>>This patch differs from other efforts in that it gets data directly from
>>>>the hypervisor.  Think multiple virtual cpus running on one physical cpu.
>>>>The traditional tools, whether CKRM or top or vmstat, are blind to the
>>>>fact that any given 'virtual cpu' might be getting only 10% of the physical
>>>>cycles in one hypervisor time-slice, and 90% in another.
>>>>
>>>>Very crudely, its sort-of like VM on the 390/zSeries.  Your kernel may
>>>>think its 100% busy, but in fact it might be getting only 1% of the actual
>>>>physical hardware cycles.  The goal here is to be able to report the
>>>>fraction of the total physical cycles, and do so on a HZ or even sub-HZ
>>>>level of granularity.
>>>>
>>>>
>>>But, the number is still just another performance counter, right?  Is
>>>the interface to fetch it the same as the other CPU performance
>>>counters?
>>>
>>>I think what Greg was getting at is that CKRM aims to be able to make
>>>resource decisions based on data it gets from all kinds of sources,
>>>including performance counters.  If you export this 'virtual cpu' slice
>>>in the same way that other CKRM-handled data are, then you can probably
>>>access it in whatever way you wanted, and you get the code reuse benefit
>>>of using the rest of the CKRM work.  Shailabh, am I on the right track
>>>here?  I'm kinda guessing at what the CKRM goals are here.
>>>
>>>What is the planned use of this counter?  Will it simply be exported to
>>>userspace, or will the kernel need it internally for something?
>>>
>>>
>>>
>>Actually, this type of data sounds like something that (forgive me
>>for mentioning this!!!) the IBM eWLM product would want to know.
>>I don't think CKRM, or the OS can do much with this type of data
>>except report it for further analysis.  More interesting is what
>>something that let's say 'controls the entire machine' can do with
>>this data.  For example, one OS isn't getting enough CPU cycles
>>and another OS has excess cycles.  Let's turn the knobs to balance
>>things out at the machine/hypervisor level.
>>
>>Perhaps this is what was meant by Linas's original reference to
>>'on demand'?
>>
>>--
>>Mike
>>
>>
>>
>>
>>
>
>
>
>

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/