[Cbe-oss-dev] [patch 07/16] powerpc and x86_64: Initial port of PCIe endpoint device driver for Axon

Arnd Bergmann arnd.bergmann at de.ibm.com
Thu Mar 13 14:57:38 EST 2008


On Saturday 08 March 2008, Murali Iyer wrote:
> Subject: [patch 07/16] axon driver memory registration portion
> 
> Description:
> 	This patch provides the files used for DaCS memory
> 	registration.


> +void axon_initialize_dacs_sysfs(struct axon *axon_priv)
> +{

What does this function have to do with sysfs? It seems to just
set some generic variables per device that happen to also be exported
through sysfs.

> +	/* These values go into sysfs */
> +	axon_priv->num_memory_regions = AXON_MAX_MR_HANDLES;
> +	axon_priv->maximum_mr_size_reg_pages = AXON_MR_DESCRIPTOR_BLOCK_SIZE *
> +		 (AXON_MAX_NUM_REMOTE_DESC_BLOCKS - 2) * PAGE_SIZE;
> +	axon_priv->maximum_dma_size_reg_pages =
> +		(AXON_MR_DESCRIPTOR_LIST_SIZE - 4)*PAGE_SIZE;
> +	axon_priv->num_dma_descriptors = AXON_MR_DESCRIPTOR_LIST_SIZE - 4;
> +
> +	/* FIXME - needs to change once IOMMU issues are resolved */
> +#ifdef CONFIG_PPC64
> +	axon_priv->maximum_mr_request_size = 2048l*1024l*1024l;
> +#else
> +	axon_priv->maximum_mr_request_size = 512*1024*1024;
> +#endif
> +	return;
> +}

The request size limits seem rather strange to me. User space should not
assume that you can map anything of this size into the IOMMU. On most
systems, the IOMMU has a total size of just a few megabytes, shared by
all devices, and it's certainly not just architecture dependent.

E.g. when you have a System p host system, the limit will be around 128MB
for all of them. It may be good to start with a really small size here,
e.g. 16MB, and require the (root) user to increase it to a system specific
reasonable setting that can be enforced by the device driver, so we don't
crash the system because of exausted IOMMU space.

> +
> +static int count_sg_entries(struct scatterlist *sg, int nents)
> +{
> +	int count = 0;
> +	int i;
> +	__u64 hw_address;
> +	__u64 hw_length;
> +	__u64 end_address;
> +
> +	if (nents <= 0) {
> +		return count;
> +	}
> +
> +	count = 1;
> +	hw_address = sg_dma_address(sg);
> +	hw_length = sg_dma_len(sg);
> +	end_address = hw_address + hw_length;
> +	for (i = 1; i < nents; i++) {
> +		hw_address = sg_dma_address(sg + i);
> +		hw_length = sg_dma_len(sg + i);
> +		if (hw_address == end_address) {
> +			end_address += hw_length;
> +		} else {
> +			count++;
> +			end_address = hw_address + hw_length;
> +		}
> +	}
> +
> +	return count;
> +}

I'm not sure I understand what this function is used for, but
you'll have to audit this when going to a more recent kernel,
as you can no longer use 'sg + i' for indexing the sg list entry
with chained scatterlists.

> +/*
> + *
> + * axon_register_mr
> + *
> + */
> +int  axon_register_mr(struct axon *axon_priv, void __user *address, __u64 length, __u64 *mr_handle, int permissions)
> +{

This function is very long, which makes it hard to read. It would be
good if you could find logical parts of it that you can split into
separage static functions.

> +	int rc = 0;

Try not initializing the return code in the beginning of the function.
This will give gcc the chance to warn about cases where you don't
assign a specific error code at return time.


> +
> +	if (length == 0) {
> +		rc = -ETOOSMALL;
> +		pr_info("axon_register_mr: Request length 0 is invalid\n");
> +		return rc;
> +	}

User supplied values should normally not cause a kernel log message at
level higher than pr_debug -- it easily causes problems with syslogd flooding.

> +
> +// FIXME - figure out correct length check to add back in
> +#if 0
> +	if (length > 256*256*PAGE_SIZE) {
> +		rc = -EMSGSIZE;
> +		pr_info("axon_register_mr: Request length of %x (hex) is too large.\n", length);
> +		return rc;
> +	}
> +#endif

You should probably limit this to a much lower value to avoid
DoS attacks with an unpriviledged user draining all system memory
resources.

> +
> +	if (mutex_lock_interruptible(&(mr_control->mr_mutex)) != 0) {
> +		pr_info("axon_register_mr: Bad mutex return!\n");
> +		return -ERESTARTSYS;
> +	}

There is nothing wrong with mutex_lock_interruptible returning -ERESTARTSYS,
so this certainly should not cause a log message.

> +	/* compute num_list_entries, back up to page start */
> +	offset = ((__u64)address) % PAGE_SIZE;
> +	total_length = length+offset;
> +	start_address = (unsigned long)(address - offset);
> +	num_list_entries = (total_length + PAGE_SIZE - 1) / PAGE_SIZE;
> +	new_mr_entry->sg = kmalloc(num_list_entries * sizeof(struct scatterlist),
> +				   GFP_KERNEL);

sg_alloc_table()

> +	/* map pages */
> +	new_mr_entry->num_pages = get_user_pages(current, current->mm, start_address, num_list_entries,
> +						 1, 0, new_mr_entry->page_list, NULL);
> +	if (new_mr_entry->num_pages <= 0) {
> +		rc = -ENOMEM;
> +		pr_info("axon_register_mr: get_user_pages failed %x\n", new_mr_entry->num_pages);
> +		goto error4;
> +	}
> +
> +	remaining_length = total_length;
> +	for (i = 0; i < new_mr_entry->num_pages; i++) {
> +		new_mr_entry->sg[i].page = new_mr_entry->page_list[i];
> +		new_mr_entry->sg[i].offset = 0;
> +		if (remaining_length > PAGE_SIZE) {
> +			new_mr_entry->sg[i].length = PAGE_SIZE;
> +		} else {
> +			new_mr_entry->sg[i].length = remaining_length;
> +		}
> +		remaining_length -= PAGE_SIZE;
> +	}
> +
> +	nents = axon_arch_map_sg(axon_priv, new_mr_entry->sg,
> +			   new_mr_entry->num_pages, DMA_BIDIRECTIONAL);
> +	if (nents <= 0) {
> +		rc = -ENOMEM;
> +		pr_info("axon_register_mr: axon_arch_map_sg failed %x\n", nents);
> +		goto error5;
> +	}

It looks like you pass the user page directly into the iommu mapping, which
might be highly dangerous for anything except page cache or hugepages.
How is this code expected to deal with SPU local store pages while the
SPU is context switched?

> +	remote_entry = new_mr_entry->remote_entry;
> +	remote_entry->mr_handle = cpu_to_be64(new_mr_entry->mr_handle);
> +	remote_entry->flags = cpu_to_be64(new_mr_entry->flags);
> +	remote_entry->user_virt_address = cpu_to_be64(new_mr_entry->user_virt_address);
> +	remote_entry->region_base_address = cpu_to_be64(new_mr_entry->region_base_address);
> +	remote_entry->user_length = cpu_to_be64(new_mr_entry->user_length);
> +	hw_address = local_phys_to_plb5(axon_priv, sg_dma_address(new_mr_entry->sg));
> +	hw_length = sg_dma_len(new_mr_entry->sg);
> +	subindex = 0;
> +	block_index = 0;
> +	index_value = new_mr_entry->desc_blocks[block_index];
> +	desc_block = mr_control->rmt_desc_blocks + index_value;
> +	desc_block->desc[subindex].hw_address = cpu_to_be64(hw_address);
> +	desc_block->desc[subindex].hw_length = cpu_to_be64(hw_length);
> +	end_address = hw_address + hw_length;
> +	saved_length = hw_length;
> +	for (i = 1; i < nents; i++) {
> +		hw_address = local_phys_to_plb5(axon_priv, sg_dma_address(new_mr_entry->sg + i));

As mentioned a few times before, sg_dma_address() should return the
address with the right offset as it came from dma_map_sg, and not
require local_phys_to_plb5() here.

> +error8:
> +	kfree(new_mr_entry->local_descriptors);
> +
> +error7:
> +	/* No special handling needed */
> +

If you don't need to do anything here, just don't add a label for it.


> +#ifndef __LINUX_AXON_MR_H__
> +#define __LINUX_AXON_MR_H__
> +
> +#include <linux/triblade/axon_ioctl.h>

You shouldn't need to include the other header here,
just add a forward declaration for the structures in there,
like

struct axon;
struct axon_priv;

	Arnd <><



More information about the cbe-oss-dev mailing list