[3/5] pseries: Create device hotplug entry point

Wed Sep 24 00:43:13 EST 2014

On 09/22/2014 08:15 PM, Tyrel Datwyler wrote:
> On 09/17/2014 12:15 PM, Nathan Fontenot wrote:
>> On 09/17/2014 02:07 AM, Michael Ellerman wrote:
>>>
>>> On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote:
>>>> For pseries system the kernel will be notified of hotplug requests in
>>>> the form of rtas hotplug events. 
>>>
>>> Can you flesh that design out a bit for me, I don't entirely get how it's going
>>> to work.
>>>
>>> The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to
>>> userspace, which then writes them back in ?
>>>
>>>> This patch creates a common routine that can handle these requests in both
>>>> the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also
>>>                 ^
>>>> creates the initial memory hotplug request handling stub.
>>>>
>>>> For PowerVM this patch also creates a new /proc file that the drmgr
>>>> command will use to write rtas hotplug events to.
>>>
>>> Why is this different between phyp and KVM?
>>>
>>>> For future PowerKVM handling the rtas check-exception code can pass
>>>> any rtas hotplug events received to handle_dlpar_errorlog().
>>>
>>> Internally to the kernel you mean?
>>>
>>
>> Perhaps a better explanation of how things work today and where I see
>> them going is needed. I was trying to avoid a long explanation and I
>> don't think my shortened explanation worked. I'll include this in v2
>> of the patchset too.
>>
>> The current hotplug (or dlpar) of devices (the process is generally the
>> same for memory, cpu, and pci) on PowerVM systems is initiated
>> from the HMC, which communicates the request to the partitions through
>> the RSCT framework. The RSCT framework then invokes the drmgr command.
>> The drmgr command performs the hotplug operation by doing some pieces,
>> such as most of the rtas calls and device tree parsing, in userspace
>> and make requests to the kernel to online/offline the device, update the
>> device tree and add/remove the device.
>>
>> For PowerKVM the approach is to follow what is currently being done for
>> pci hotplug. A hotplug request is initiated from the host. QEMU then
>> sends an EPOW interrupt to the guest which causes the guest to make the
>> rtas,check-exception call. In QEMU, the rtas,check-exception call
>> returns a rtas hotplug event to the guest. I was using this same framework
>> to also enable memory (and next cpu) hotplug.
>>
>> You are correct that the current pci hotplug path for PowerKVM involves
>> the kernel receiving the rtas event, passing it to rtas_errd in userspace,
>> and having rtas_errd invoke drmgr. The drmgr command then handles the request
>> as described above for PowerVM systems.
>>
>> There is no need for this circuitous route, we should just handle the entire
>> hotplug of devices in the kernel. What I am hoping to do is to enable this
>> by moving the code to handle hotplug from drmgr into the kernel and 
>> provide a single path for handling hotplug for PowerVM and PowerKVM. To
>> make this work for PowerKVM we will update the kernel rtas code to
>> recognize rtas hotplug events returned from rtas,check-exception calls
>> and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out
>> to userspace.
> 
> Wouldn't we still want the event surfaced to userspace so that it can at
> least be logged?
> 

The only logging of hotplug/dlpar events we do is putting a notification
iv /var/log/messages. This is done today by the drmgr command.

I can add a pr_info message to log the hotplug/dlpar request and it's
success/failure.

Also, I believe one of the longer term goals is to not require the rtas_errd
daemon for PowerKVM.

-Nathan