[PATCH] powerpc/powernv: Add opal-prd channel
Benjamin Herrenschmidt
benh at kernel.crashing.org
Fri May 1 18:31:31 AEST 2015
On Fri, 2015-05-01 at 11:46 +0800, Jeremy Kerr wrote:
> Hi Ben,
>
> >> +static LIST_HEAD(opal_prd_msg_queue);
> >> +static DEFINE_SPINLOCK(opal_prd_msg_queue_lock);
> >> +static DECLARE_WAIT_QUEUE_HEAD(opal_prd_msg_wait);
> >> +static atomic_t usage;
> >
> > opal_prd_usage ... otherwise it's a mess in the symbols map
>
> OK, I'll change this.
>
> > Also why limit the number of opens ? we might want to have tools using
> > the opal prd for xscom :-) (in absence of debugfs). .. as long as not
> > two people read() it should be ok. Or a tool to dump the regions etc...
> >
> > I don't see any reason to block multiple open's.
>
> Simplicity, really. We can do a "get exclusive", but there's no
> (current) use-case for multiple openers on a PRD interface.
Sure but if we want to add one we have to change the kernel which is
nasty ... I always try to think a bit ahead when it comes to kernel
interfaces.
> Pulling this thread a little, you've hit on a key decision point of the
> prd design - I see there being two directions we could take with this:
>
> 1) This interface is specifically for PRD functions, or
>
> 2) This interface is a generic userspace interface to OPAL,
> and PRD is a subset of that.
>
> I've been aiming for (1) with the current code; and the nature of the
> generic read() & write() operations being PRD-specific enforces that.
>
> Allowing multiple openers will help with (2), but if we want to go in
> that direction, I think we'd be better off doing a couple of other
> changes too:
>
> * move the general functions (eg xscom, range mappings, OCC control)
> to a separate interface that isn't tied to PRD - say just /dev/opal
Well, there's debugfs but then we don't want to *rely* on that as API
> * using this prd code for only the prd-event handling, possibly
> renamed to /dev/opal-prd-events. This would still need some
> method of enforcing exclusive access.
>
> In this case, the actual PRD application would use both devices,
> dequeueing events (and updating the ipoll mask) from the latter, and
> using the former for helper functionality.
>
> Other tools (eg generic xscom access) would just use the generic
> interface, and not the PRD one, which wouldn't enforce exclusive access.
Or make it all /dev/opal with an ioctl to receive the PRD messages which
only one open fd can do. Keeps things simpler. Ie, rename /dev/prd
to /dev/opal and add _IOC_PRD :-)
> Regardless of the choice here, we could also remove the single-open
> exclusion, and shift that responsibility to userspace (eg, flock() on
> the PRD device node?). The main reason for the exclusion is to prevent
> multiple prd daemons running, which may get messy when updating the
> ipoll mask.
Well, the exclusion on _IOC_PRD that enables reception of PRD messages
works. Unless we want a way to "sniff" PRD messages but that gets harder
if the kernel has to maintain multiple queues so let's not go there.
> > Should we rely exclusively on userspace setting the right permissions or
> > should we check CAP_SYSADMIN here ?
>
> I'm okay with relying on userspace, is there any reason not to?
Not really I suppose. What does /dev/mem do ?
>
> >> + vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff,
> >> + size, vma->vm_page_prot)
> >> + | _PAGE_SPECIAL;
> >> +
> >> + rc = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, size,
> >> + vma->vm_page_prot);
> >
> > Do we still have the warnings of process exist about the map count or is
> > that fixed ?
>
> No, not fixed at present. I'll need to chat to you about that.
Ok, I need to figure out/remember what we need to do to avoid it.
> >> + case OPAL_PRD_SCOM_READ:
> >> + rc = copy_from_user(&scom, (void __user *)param, sizeof(scom));
> >> + if (rc)
> >> + return -EFAULT;
> >> +
> >> + rc = opal_xscom_read(scom.chip, scom.addr,
> >> + (__be64 *)&scom.data);
> >
> > Are we exporting these for modules ?
>
> No, but opal-prd isn't configurable as a module at the moment.
Why ?
> >
> >> + scom.data = be64_to_cpu(scom.data);
> >> + pr_debug("ioctl SCOM_READ: chip %llx addr %016llx "
> >> + "data %016llx rc %d\n",
> >> + scom.chip, scom.addr, scom.data, rc);
> >
> > pr_devel ?
>
> This removes the possibility of CONFIG_DYNAMIC_DEBUG, is that intentional?
Too noisy. Enabling DYNAMIC_DEBUG doesn't mean I want to have my log
flooded with the thousands of SCOMs that PRD is going to do.
> >
> >> + if (rc)
> >> + return -EIO;
> >
> > Should we consider returning more info about the SCOM error ? HBRT might
> > actually need that... Maybe opal_prd_scom needs a field for the OPAL rc
> > which is currently not very descriptive but that's fixable.
>
> Sounds good, I'll add that in. On error, we'll return -EIO and have the
> OPAL error code in the struct for further detail.
No, don't return -EIO, that would indicate that you didn't update the
structure. Return 0 and put the error code in the structure.
> >> + nr_ranges = of_property_count_strings(np, "reserved-names");
> >> + ranges_prop = of_get_property(np, "reserved-ranges", NULL);
> >> + if (!ranges_prop) {
> >> + of_node_put(np);
> >> + return -ENODEV;
> >> + }
> >
> > Didn't we say we had a problem with using those properties due to
> > coalescing ? Shouldn't we define specific ones for the HBRT regions ?
>
> There's not a problem at the moment, but one day we will need to expand
> the PRD's get_reserved_mem interface to allow per-chip ranges. This
> would use a different device-tree representation.
>
> However, I think it'd be better to remove this code entirely (ie, remove
> the range member of struct opal_prd_info), and require userspace to do
> the device-tree parsing.
But that means /dev/prd just grew a generic "mmap any piece of memory"
capability ... Oh well.
> >> +static int __init opal_prd_init(void)
> >> +{
> >> + int rc;
> >> +
> >> + /* parse the code region information from the device tree */
> >> + rc = parse_regions();
> >> + if (rc) {
> >> + pr_err("Couldn't parse region information from DT\n");
> >> + return rc;
> >> + }
> >
> > Should we create a virtual device under the OPAL node in FW so we have
> > something to attach to ? That way we get module autoload as well...
>
> Can do, if we want to support modules...
Or if we make this /dev/opal, just attach to the ibm,opal node itself
and make it a platform device like we do for i2c etc...
> >> + rc = opal_message_notifier_register(OPAL_MSG_PRD, &opal_prd_event_nb);
> >> + if (rc) {
> >> + pr_err("Couldn't register event notifier\n");
> >> + return rc;
> >> + }
> >> +
> >> + rc = misc_register(&opal_prd_dev);
> >> + if (rc) {
> >> + pr_err("failed to register miscdev\n");
> >> + return rc;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static void __exit opal_prd_exit(void)
> >> +{
> >> + misc_deregister(&opal_prd_dev);
> >> + opal_message_notifier_unregister(OPAL_MSG_PRD, &opal_prd_event_nb);
> >> +}
> >
> > Shouldn't you deregister the notifier first ?
>
> Yup, updated.
>
> Cheers,
>
>
> Jeremy
More information about the Linuxppc-dev
mailing list