[PATCH V2] cxl: Export library to support IBM XSL
christophe lombard
clombard at linux.vnet.ibm.com
Tue Jun 20 00:10:51 AEST 2017
Le 15/06/2017 à 14:36, Frederic Barrat a écrit :
> Salut Christophe,
>
> A few comments below, nothing major...
>
> Le 14/06/2017 à 15:29, Christophe Lombard a écrit :
>> This patch exports a in-kernel 'library' API which can be called by
>> other drivers to help interacting with an IBM XSL on a POWER9 system.
>>
>> The XSL (Translation Service Layer) is a stripped down version of the
>> PSL (Power Service Layer) used in some cards such as the Mellanox CX5.
>> Like the PSL, it implements the CAIA architecture, but has a number
>> of differences, mostly in it's implementation dependent registers.
>>
>> The XSL also uses a special DMA cxl mode, which uses a slightly
>> different init sequence for the CAPP and PHB.
>>
>> Changelog[v2]
>> - Rebase to latest upstream.
>> - Return -EFAULT in case of NULL pointer in cxllib_handle_fault().
>> - Reverse parameters when copro_handle_mm_fault() is called.
>>
>
>
> The change log shouldn't be part of the commit message, but below the
> next "---"
>
>
sure.
>> +++ b/drivers/misc/cxl/cxllib.c
>> @@ -0,0 +1,241 @@
>> +/*
>> + * Copyright 2016 IBM Corp.
>
>
> Year need to be updated
>
>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
>> +#include <linux/hugetlb.h>
>> +#include <linux/sched/mm.h>
>> +#include "cxl.h"
>> +#include <misc/cxllib.h>
>> +#include <asm/pnv-pci.h>
>
> Ordering of the #include is messy:
> #include <linux/hugetlb.h>
> #include <linux/sched/mm.h>
> #include <asm/pnv-pci.h>
> #include <misc/cxllib.h>
> #include "cxl.h"
>
>
>
>> +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags)
>> +{
>> + int rc;
>> +
>> + if (flags)
>> + return -EINVAL;
>> +
>> + rc = dma_set_mask(&dev->dev, DMA_BIT_MASK(64));
>> + return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(cxllib_set_device_dma);
>
>
> A comment in cxllib_set_device_dma() would help:
> /*
> * When switching the PHB to capi mode, the TVT#1 entry for
> * the Partitionable Endpoint is set in bypass mode, like
> * in PCI mode.
> * Configure the device dma to use TVT#1, which is done
> * by calling dma_set_mask() with a mask large enough.
> */
>
>
>
>> +
>> +int cxllib_get_PE_attributes(struct task_struct *task,
>> + unsigned long translation_mode, struct cxllib_pe_attributes
>> *attr)
>> +{
>> + struct mm_struct *mm = NULL;
>> +
>> + if (translation_mode != CXL_TRANSLATED_MODE &&
>> + translation_mode != CXL_REAL_MODE)
>> + return -EINVAL;
>> +
>> + attr->sr = cxl_calculate_sr(false /* master */,
>> + task == NULL /* kernel ctx */,
>> + translation_mode == CXL_REAL_MODE,
>> + true /* p9 */);
>> + attr->lpid = mfspr(SPRN_LPID);
>> + if (task) {
>> + mm = get_task_mm(task);
>> + if (mm == NULL)
>> + return -EINVAL;
>> + /*
>> + * Caller is keeping a reference on mm_users for as long
>> + * as XSL uses the memory context
>> + */
>> + attr->pid = mm->context.id;
>> + mmput(mm);
>> + } else {
>> + attr->pid = 0;
>> + }
>> + attr->tid = 0;
>
>
> We'll need to remember to patch that function as welll (attr->tid)
> when we add support for as_notify (even though Mellanox is not
> expected to use as_notify in capi mode, just pci I believe)
>
>
yep.
>
>> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size,
>> u64 flags)
>> +{
>> + int rc;
>> + u64 dar;
>> + struct vm_area_struct *vma = NULL;
>> + unsigned long page_size;
>> +
>> + if (mm == NULL)
>> + return -EFAULT;
>> +
>> + down_read(&mm->mmap_sem);
>> +
>> + for (dar = addr; dar < addr + size; dar += page_size) {
>> + if (!vma || dar < vma->vm_start || dar > vma->vm_end) {
>> + vma = find_vma(mm, addr);
>> + if (!vma) {
>> + pr_err("Can't find vma for addr %016llx\n", addr);
>> + rc = -EFAULT;
>> + goto out;
>> + }
>> + /* get the size of the pages allocated */
>> + page_size = vma_kernel_pagesize(vma);
>> + }
>> +
>> + rc = cxl_handle_page_fault(true, mm, flags, dar);
>
>
> Why do we pass "true" for kernel parameter?
> Actually do we even need a kernel input parameter for
> cxl_handle_page_fault() ? It seems that we can infer it based on mm.
> If NULL, then we are in kernel space.
>
>
you are right. Previously, the test was based on the field ctx->kernel.
This explains the kernel parameter.
>> diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
>> index c79e39b..9db63f3 100644
>> --- a/drivers/misc/cxl/fault.c
>> +++ b/drivers/misc/cxl/fault.c
>> @@ -132,18 +132,16 @@ static int cxl_handle_segment_miss(struct
>> cxl_context *ctx,
>> return IRQ_HANDLED;
>> }
>>
>> -static void cxl_handle_page_fault(struct cxl_context *ctx,
>> - struct mm_struct *mm, u64 dsisr, u64 dar)
>> +int cxl_handle_page_fault(bool kernel_context,
>> + struct mm_struct *mm, u64 dsisr, u64 dar)
>> {
>> unsigned flt = 0;
>> int result;
>> unsigned long access, flags, inv_flags = 0;
>>
>> - trace_cxl_pte_miss(ctx, dsisr, dar);
>> -
>> if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) {
>> pr_devel("copro_handle_mm_fault failed: %#x\n", result);
>> - return cxl_ack_ae(ctx);
>> + return result;
>> }
>>
>> if (!radix_enabled()) {
>> @@ -156,7 +154,7 @@ static void cxl_handle_page_fault(struct
>> cxl_context *ctx,
>> access |= _PAGE_WRITE;
>>
>> access |= _PAGE_PRIVILEGED;
>> - if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID))
>> + if (!kernel_context || (REGION_ID(dar) == USER_REGION_ID))
>> access &= ~_PAGE_PRIVILEGED;
>>
>> if (dsisr & DSISR_NOHPTE)
>> @@ -166,8 +164,7 @@ static void cxl_handle_page_fault(struct
>> cxl_context *ctx,
>> hash_page_mm(mm, dar, access, 0x300, inv_flags);
>> local_irq_restore(flags);
>> }
>> - pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe);
>> - cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
>> + return 0;
>> }
>>
>> /*
>> @@ -261,9 +258,15 @@ void cxl_handle_fault(struct work_struct
>> *fault_work)
>>
>> if (cxl_is_segment_miss(ctx, dsisr))
>> cxl_handle_segment_miss(ctx, mm, dar);
>> - else if (cxl_is_page_fault(ctx, dsisr))
>> - cxl_handle_page_fault(ctx, mm, dsisr, dar);
>> - else
>> + else if (cxl_is_page_fault(ctx, dsisr)) {
>> + trace_cxl_pte_miss(ctx, dsisr, dar);
>> + if (cxl_handle_page_fault(ctx->kernel, mm, dsisr, dar)) {
>> + cxl_ack_ae(ctx);
>> + } else {
>> + pr_devel("Page fault successfully handled for pe:
>> %i!\n", ctx->pe);
>> + cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
>> + }
>
>
> Could we have that code in a wrapper before calling
> cxl_handle_page_fault()? It would keep the code cleaner and in line
> with what we do for cxl_handle_segment_miss().
>
We can try to to that.
>
>
>> +++ b/include/misc/cxllib.h
>> @@ -0,0 +1,132 @@
>> +/*
>> + * Copyright 2016 IBM Corp.
>
>
> Year update.
>
>
>> +/*
>> + * Get the Process Element structure for the given thread
>> + *
>> + * Input:
>> + * pid: points the struct pid for the given thread (i.e. linux pid)
>> + * translation_mode: whether addresses should be translated
>> + */
>> +struct cxllib_pe_attributes {
>> + u64 sr;
>> + u32 lpid;
>> + u32 tid;
>> + u32 pid;
>> +};
>> +#define CXL_TRANSLATED_MODE 0
>> +#define CXL_REAL_MODE 1
>> +
>> +int cxllib_get_PE_attributes(struct task_struct *task,
>> + unsigned long translation_mode, struct cxllib_pe_attributes
>> *attr);
>
>
> Description in comment no longer matches reality.
> /*
> * Get the Process Element structure for the given thread
> *
> * Input:
> * task: task_struct for the context of the translation
> * translation_mode: whether addresses should be translated
> * Output:
> * attr: attributes to fill up the Process Element structure from CAIA
> */
>
>
> Thanks,
>
> Fred
More information about the Linuxppc-dev
mailing list