[BUG] rebuild_sched_domains considered dangerous

Jesse Larrew jlarrew at linux.vnet.ibm.com
Thu May 12 02:17:52 EST 2011


On 05/10/2011 09:09 AM, Peter Zijlstra wrote:
> On Mon, 2011-05-09 at 16:26 -0500, Jesse Larrew wrote:
>>
>> According the the Power firmware folks, updating the home node of a
>> virtual cpu happens rather infrequently. The VPHN code currently
>> checks for topology updates every 60 seconds, but we can poll less
>> frequently if it helps. I chose 60 second intervals simply because
>> that's how often they check the topology on s390. ;-)
> 
> This just makes me shudder, so you poll the state? Meaning that the vcpu
> can actually run 99% of the time on another node?
> 
> What's the point of this if the vcpu scheduler can move the vcpu around
> much faster?
> 

Based on my discussion with the firmware folks, it sounds like the hypervisor will never automatically move vcpus around on its own. The firmware is designed to set the cpu home node at partition boot, then wait for the customer to run a tool to rebalance the affinity. Moving vcpus around costs performance, so they want to let the customer decide when to shuffle the vcpus. 

>From the kernel's perspective, we can expect to see occasional batches of vcpus updating at once, after which the topology should remain fixed until the tool is run again.

>> As for updating the memory topology, there are cases where changing
>> the home node of a virtual cpu doesn't affect the memory topology. If
>> it does, there is a separate notification system for memory topology
>> updates that is independent from the cpu updates. I plan to start
>> working on a patch set to enable memory topology updates in the kernel
>> in the coming weeks, but I wanted to get the cpu patches out on the
>> list so we could start having these debates. :) 
> 
> Well, they weren't put out on a list (well maybe on the ppc list but
> that's the same as not posting them from my pov), they were merged (and
> thus declared done) that's not how you normally start a debate.
> 

That's a fair point. At the time, I didn't expect anyone outside of the PPC community to care much about a PPC-specific patch set, but I see now why it's important to keep everyone in the loop. Sorry about that. I'll be sure to send any future patches to LKML as well.

> I would really like to see both patch-sets together. Also, I'm not at
> all convinced its a sane thing to do. Pretty much all NUMA aware
> software I know of assumes that CPU<->NODE relations are static,
> breaking that in kernel renders all existing software broken.
> 

I suspect that's true. Then again, shouldn't it be the capabilities of the hardware that dictates what the software does, rather than the other way around?

-- 

Jesse Larrew
Software Engineer, Linux on Power Kernel Team
IBM Linux Technology Center
Phone: (512) 973-2052 (T/L: 363-2052)
jlarrew at linux.vnet.ibm.com


More information about the Linuxppc-dev mailing list