G5 Xserve rackmeter broken?

Aaro Koskinen aaro.koskinen at iki.fi
Wed May 13 03:55:55 AEST 2015


Hi,

On Mon, May 11, 2015 at 08:13:35AM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2015-05-10 at 21:32 +0300, Aaro Koskinen wrote:
> > Hi,
> > 
> > With 4.1-rc2 the rackmeter driver for G5 Xserve is giving bogus
> > led patterns. So far I have seen at least the following:
> > 
> > a) With static load the leds seems to be sane and report CPU
> > usage properly, but after few minutes they go completely OFF,
> > even if the CPU load remains high.
> > 
> > b) On a completely idle system, leds alter between all OFF and all ON
> > roughly once a second.
> > 
> > Unfortunately I cannot say which was the last kernel where this worked
> > properly... These servers were away from normal use for a while due
> > to PSU issues.
> 
> And mine is dead due to ... PSU issue :-(

I had 4 dead servers, of which I have now managed get 2 again back
running by recapping the PSU.

> It could be that what we use to get the "idle time" isn't correct
> anymore...

It seems sometimes the idle ticks exceed total ticks and mess up
load calculation. This will explain the b) case behaviour at least.
I added the following debug patch:

@@ -234,6 +234,10 @@ static void rackmeter_do_timer(struct work_struct *work)
         */
        load = (9 * (total_ticks - idle_ticks)) / total_ticks;

+       if (load > 10)
+               pr_err("load: %d total: %u idle: %u\n", load,
+                       total_ticks, idle_ticks);
+
        offset = cpu << 3;
        cumm = 0;
        for (i = 0; i < 8; i++) {

Which shows:

[  795.832701] load: 515 total: 8333333 idle: 8333661
[  796.792767] load: 515 total: 8333333 idle: 8333551
[  796.832770] load: 515 total: 8333333 idle: 8333656
[  797.292799] load: 515 total: 8333334 idle: 8333532
[  798.082856] load: 515 total: 8333334 idle: 8333591
[  798.792903] load: 515 total: 8333333 idle: 8333424
[  798.832909] load: 515 total: 8333333 idle: 8333571
[  799.292937] load: 515 total: 8333334 idle: 8333459
[  799.832973] load: 515 total: 8333333 idle: 8333551
[  800.793038] load: 515 total: 8333333 idle: 8333414
[  800.833045] load: 515 total: 8333333 idle: 8333583
[  801.293071] load: 515 total: 8333334 idle: 8333455
[  801.833107] load: 515 total: 8333333 idle: 8333564

I'm running with HZ=100 so the values are still probably within
jiffy resolution, so perhaps the calculation should first do
idle = min(idle, total)?

A.


More information about the Linuxppc-dev mailing list