G5 fan problems return moving to 2.6.15 with dual processor 2.7GHz machine

Brian D. Carlstrom bdc at carlstrom.com
Sun Feb 5 17:10:48 EST 2006


I've been having problems with overheating on two of my three dual
processor 2.7GHzs running Fedora Core 4's 2.6.14 kernels since
November. Because of a pending deadline and the time of year, I simply
opened the window and let nature cool the machines. 

In early January, I saw the therm_pm72.c fix in 2.6.15
    http://ozlabs.org/pipermail/linuxppc64-dev/2006-January/007299.html
    [PATCH] powerpc: more g5 overtemp problem fix
I tried Fedora's updates-testing 2.6.15 kernel to get the fix.
That caused the fans to blow full blast like the old days, which was
better than leaving the window open, which had several issues with
storm winds closing the windows, unhappy facilities managers, and
"helpful" co-workers closing the windows for me.

Finally last week I had some time to work on this. My first step was to
backport the therm_pm72.c fix from 2.6.14 and that worked like a charm,
allowing humans with hearing to inhabit the office again. I'm running
CPU simulations 24/7 on these machines, and without this fix they were
powering off once a day or more without any fix, although I'd fixed them
to reboot instead with a /sbin/critical_overtemp script that called
"reboot -f".

However, even after reporting the problems with the 2.6.15
updates-testing kernel, Fedora Core released the 2.6.15 kernel update
anyway. I decided to try and debug what is going on since other people
are going to start seeing this issue.

Looking at the dmesg output change between 2.6.14 and 2.6.15, both
start with the following:

    PowerMac G5 Thermal control driver 1.2b2
    Detected fan controls:
      0: PWM fan, id 1, location: BACKSIDE,SYS CTRLR FAN
      1: RPM fan, id 2, location: DRIVE BAY
      2: PWM fan, id 2, location: SLOT,PCI FAN
      3: RPM fan, id 3, location: CPU A INTAKE
      4: RPM fan, id 4, location: CPU A EXHAUST
      5: RPM fan, id 5, location: CPU B INTAKE
      6: RPM fan, id 6, location: CPU B EXHAUST
      7: RPM fan, id 1, location: CPU A PUMP
      8: RPM fan, id 0, location: CPU B PUMP

However, 2.6.14 has the following addition line which I've come to
expect on the 2.5GHz and 2.7GHz machines, although not on the 2.0GHz
machines of course:
    Liquid cooling pumps detected, using new algorithm !

I decided to do a little more debugging before reporting this. I built
the driver with "#define DEBUG" and added some additional DBG tracing
messages (marked "XXX bdc" below). Here is the output with therm_pm72
built into the kernel, not as a module:

Feb  4 12:19:06 youngmc kernel: Detected fan controls:
Feb  4 12:19:06 youngmc kernel:   0: PWM fan, id 1, location: BACKSIDE,SYS CTRLR FAN
Feb  4 12:19:06 youngmc kernel:   1: RPM fan, id 2, location: DRIVE BAY
Feb  4 12:19:06 youngmc kernel:   2: PWM fan, id 2, location: SLOT,PCI FAN
Feb  4 12:19:06 youngmc kernel:   3: RPM fan, id 3, location: CPU A INTAKE
Feb  4 12:19:06 youngmc kernel:   4: RPM fan, id 4, location: CPU A EXHAUST
Feb  4 12:19:06 youngmc kernel:   5: RPM fan, id 5, location: CPU B INTAKE
Feb  4 12:19:06 youngmc kernel:   6: RPM fan, id 6, location: CPU B EXHAUST
Feb  4 12:19:06 youngmc kernel:   7: RPM fan, id 1, location: CPU A PUMP
Feb  4 12:19:06 youngmc kernel:   8: RPM fan, id 0, location: CPU B PUMP
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=monid
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=dvi
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=vga
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=crt2
Feb  4 12:19:06 youngmc kernel: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
Feb  4 12:19:06 youngmc kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Feb  4 12:19:06 youngmc kernel: ide0: Found Apple K2 ATA-6 controller, bus ID 3, irq 39
Feb  4 12:19:06 youngmc kernel: hda: PIONEER DVD-RW DVR-109, ATAPI CD/DVD-ROM drive
Feb  4 12:19:06 youngmc kernel: hda: Enabling Ultra DMA 4
Feb  4 12:19:06 youngmc kernel: ide0 at 0xd000080083656000-0xd000080083656007,0xd000080083656160 on irq 39
Feb  4 12:19:06 youngmc kernel: hda: ATAPI 32X DVD-ROM DVD-R CD-R/RW drive, 2000kB Cache, UDMA(66)
Feb  4 12:19:06 youngmc kernel: Uniform CD-ROM driver Revision: 3.20
Feb  4 12:19:06 youngmc kernel: ide-floppy driver 0.99.newide
Feb  4 12:19:06 youngmc kernel: usbcore: registered new driver libusual
Feb  4 12:19:06 youngmc kernel: usbcore: registered new driver hiddev
Feb  4 12:19:06 youngmc kernel: usbcore: registered new driver usbhid
Feb  4 12:19:06 youngmc kernel: drivers/usb/input/hid-core.c: v2.6:USB HID core driver
Feb  4 12:19:06 youngmc kernel: mice: PS/2 mouse device common for all mice
Feb  4 12:19:06 youngmc kernel: /u3 at 0,f8000000/i2c at f8001000: Missing interrupt or address !
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:19:06 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=mac-io 0
Feb  4 12:19:06 youngmc kernel: Found K2
Feb  4 12:19:06 youngmc kernel: Found KeyWest i2c on "mac-io", 1 channel, stepping: 4 bits

I'm guessing I should have seen a "found U3-0", but I see a suspicious message here:
    /u3 at 0,f8000000/i2c at f8001000: Missing interrupt or address !
that I do not see in the working 2.6.14 boot.

I was wondering if the change from module to builtin was causing the
problem (grasping at straws I guess) so I also tried building it as a
module. I get the similar results:

Feb  4 12:59:26 youngmc kernel: /u3 at 0,f8000000/i2c at f8001000: Missing interrupt or address !
Feb  4 12:59:26 youngmc kernel: Found KeyWest i2c on "mac-io", 1 channel, stepping: 4 bits
...
Feb  4 12:59:27 youngmc kernel: Detected fan controls:
Feb  4 12:59:27 youngmc kernel:   0: PWM fan, id 1, location: BACKSIDE,SYS CTRLR FAN
Feb  4 12:59:27 youngmc kernel:   1: RPM fan, id 2, location: DRIVE BAY
Feb  4 12:59:27 youngmc kernel:   2: PWM fan, id 2, location: SLOT,PCI FAN
Feb  4 12:59:27 youngmc kernel:   3: RPM fan, id 3, location: CPU A INTAKE
Feb  4 12:59:27 youngmc kernel:   4: RPM fan, id 4, location: CPU A EXHAUST
Feb  4 12:59:27 youngmc kernel:   5: RPM fan, id 5, location: CPU B INTAKE
Feb  4 12:59:27 youngmc kernel:   6: RPM fan, id 6, location: CPU B EXHAUST
Feb  4 12:59:27 youngmc kernel:   7: RPM fan, id 1, location: CPU A PUMP
Feb  4 12:59:28 youngmc kernel:   8: RPM fan, id 0, location: CPU B PUMP
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=monid
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=dvi
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=vga
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=crt2
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach
Feb  4 12:59:28 youngmc kernel: XXX bdc therm_pm72_attach adapter->name=mac-io 0
Feb  4 12:59:28 youngmc kernel: Found K2

Now this "/u3 at 0,f8000000/i2c at f8001000: Missing interrupt or address !"
warning I'm seeing in both cases looked familar, in fact I was on a
thread about it when the 2.7GHz machines first came out:

    http://patchwork.ozlabs.org/linuxppc64/patch?id=1982

The code that this patched applied to has moved to a new location
arch/powerpc/kernel/prom_init.c, but logically it still seems like it
should cover my case. The code says:

    if (u3_rev < 0x35 || u3_rev > 0x39)
        return;

and my u3_rev looks to be 0x35
    $ hexdump /proc/device-tree/u3 at 0,f8000000/device-rev
    0000000 0000 0035
    0000004

Unforunately it looks like I need to use prom_print to add debugging,
which I'm guessing only comes to the console which I'm not near right
now.

Before going further, is there something obvious that the Fedora
2.6.15 kernel is doing wrong, given that the 2.6.14 kernel works and
the 2.6.15 seems to have a regression? I'm willing to do some more
debugging or try a more up-to-date kernel to help resolve this issue.

One last note, my dual processor 2.0GHz and 2.5GHz machines are running
fine with 2.6.15...

-bri



More information about the Linuxppc64-dev mailing list