G5 Xserve rackmeter broken?
Aaro Koskinen
aaro.koskinen at iki.fi
Wed May 13 03:55:55 AEST 2015
Hi,
On Mon, May 11, 2015 at 08:13:35AM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2015-05-10 at 21:32 +0300, Aaro Koskinen wrote:
> > Hi,
> >
> > With 4.1-rc2 the rackmeter driver for G5 Xserve is giving bogus
> > led patterns. So far I have seen at least the following:
> >
> > a) With static load the leds seems to be sane and report CPU
> > usage properly, but after few minutes they go completely OFF,
> > even if the CPU load remains high.
> >
> > b) On a completely idle system, leds alter between all OFF and all ON
> > roughly once a second.
> >
> > Unfortunately I cannot say which was the last kernel where this worked
> > properly... These servers were away from normal use for a while due
> > to PSU issues.
>
> And mine is dead due to ... PSU issue :-(
I had 4 dead servers, of which I have now managed get 2 again back
running by recapping the PSU.
> It could be that what we use to get the "idle time" isn't correct
> anymore...
It seems sometimes the idle ticks exceed total ticks and mess up
load calculation. This will explain the b) case behaviour at least.
I added the following debug patch:
@@ -234,6 +234,10 @@ static void rackmeter_do_timer(struct work_struct *work)
*/
load = (9 * (total_ticks - idle_ticks)) / total_ticks;
+ if (load > 10)
+ pr_err("load: %d total: %u idle: %u\n", load,
+ total_ticks, idle_ticks);
+
offset = cpu << 3;
cumm = 0;
for (i = 0; i < 8; i++) {
Which shows:
[ 795.832701] load: 515 total: 8333333 idle: 8333661
[ 796.792767] load: 515 total: 8333333 idle: 8333551
[ 796.832770] load: 515 total: 8333333 idle: 8333656
[ 797.292799] load: 515 total: 8333334 idle: 8333532
[ 798.082856] load: 515 total: 8333334 idle: 8333591
[ 798.792903] load: 515 total: 8333333 idle: 8333424
[ 798.832909] load: 515 total: 8333333 idle: 8333571
[ 799.292937] load: 515 total: 8333334 idle: 8333459
[ 799.832973] load: 515 total: 8333333 idle: 8333551
[ 800.793038] load: 515 total: 8333333 idle: 8333414
[ 800.833045] load: 515 total: 8333333 idle: 8333583
[ 801.293071] load: 515 total: 8333334 idle: 8333455
[ 801.833107] load: 515 total: 8333333 idle: 8333564
I'm running with HZ=100 so the values are still probably within
jiffy resolution, so perhaps the calculation should first do
idle = min(idle, total)?
A.
More information about the Linuxppc-dev
mailing list