[PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE

Wed Aug 12 21:20:19 AEST 2015

On Wed, Aug 12, 2015 at 09:05:09PM +1000, Alexey Kardashevskiy wrote:
>On 08/12/2015 08:45 PM, Gavin Shan wrote:
>>On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/11/2015 10:03 AM, Gavin Shan wrote:
>>>>On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>>>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>>>The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>>>>
>>>>>The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>>>>>all about? Also, there was no m64_segmap, now there is, needs an explanation
>>>>>may be.
>>>>>
>>>>
>>>>Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
>>>>Now, they have fixed sizes - 512 bits.
>>>>
>>>>The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
>>>>why m64_segmap is added.
>>>
>>>
>>>But before this patch, you somehow managed to keep it working without a map
>>>for M64, by the same time you needed map for IO and M32. It seems you are
>>>making things consistent in this patch but it also feels like you do not have
>>>to do so as M64 did not need a map before and I cannot see why it needs one
>>>now.
>>>
>>
>>The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>where the M64 segments consumed by one particular PE will be released.
>
>
>Then add it where it is really started being used. It is really hard to
>review a patch which is actually spread between patches. Do not count that
>reviewers will just trust you.
>

Ok. I'll try. 

>
>>>>>
>>>>>>the consumed by one particular PE, which can be released once the PE
>>>>>>is destroyed during PCI unplugging time. Also, we're using fixed
>>>>>>quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>>>>>in one particular PHB.
>>>>>>
>>>>>
>>>>>Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>>>>>PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>>>>>is using? Not sure about this master/slave PEs though.
>>>>>
>>>>
>>>>I don't follow your suggestion. Can you rephrase and explain it a bit more?
>>>
>>>
>>>Please explains in what situations you need same map in both PHB and PE and
>>>how you are going to use them. For example, pe::m64_segmap and
>>>phb::m64_segmap.
>>>
>>>I believe you need to know what segment is used by what PE and that's it and
>>>having 2 bitmaps is overcomplicated hard to follow. Is there anything else
>>>what I am missing?
>>>
>>
>>The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap
>>as an example, it will be used when creating or destroying the PE who consumes
>>M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain.
>>It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks
>>the M64 segments consumed by the PE.
>
>
>You could have a single map in PHB, key would be a segment number and value
>would be PE number. No need to have a map in PE. At all. No need to
>initialize bitmaps, etc.
>

So it would be arrays for various segmant maps if I understood your suggestion
as below. Please confirm:

#define PNV_IODA_MAX_SEG_NUM	512

	int struct pnv_phb::io_segmap[PNV_IODA_MAX_SEG_NUM];
			    m32_segmap[PNV_IODA_MAX_SEG_NUM];
			    m64_segmap[PNV_IODA_MAX_SEG_NUM];
- Initially, they are initialize to IODA_INVALID_PE;
- When one segment is assigned to one PE, the corresponding entry
  of the array is set to PE number.
- When one segment is relased, the corresponding entry of the array
  is set to IODA_INVALID_PE;

>>>>>It would be easier to read patches if this one was right before
>>>>>[PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>>>>
>>>>
>>>>I'll try to reoder the patch, but not expect too much...
>>>>
>>>>>
>>>>>
>>>>>>Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>>>>>>---
>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>>>>>>  arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>>>>>>  2 files changed, 29 insertions(+), 18 deletions(-)
>>>>>>
>>>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>index e4ac703..78b49a1 100644
>>>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>@@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>>>>>  		}
>>>>>>
>>>>>>+		/* M64 segments consumed by slave PEs are tracked
>>>>>>+		 * by master PE
>>>>>>+		 */
>>>>>>+		set_bit(pe->pe_number, master_pe->m64_segmap);
>>>>>>+		set_bit(pe->pe_number, phb->ioda.m64_segmap);
>>>>>>+
>>>>>>  		/* P7IOC supports M64DT, which helps mapping M64 segment
>>>>>>  		 * to one particular PE#. However, PHB3 has fixed mapping
>>>>>>  		 * between M64 segment and PE#. In order to have same logic
>>>>>>@@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>>>
>>>>>>  			while (index < phb->ioda.total_pe &&
>>>>>>  			       region.start <= region.end) {
>>>>>>-				phb->ioda.io_segmap[index] = pe->pe_number;
>>>>>>+				set_bit(index, pe->io_segmap);
>>>>>>+				set_bit(index, phb->ioda.io_segmap);
>>>>>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>>>-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>>>>>+					pe->pe_number, OPAL_IO_WINDOW_TYPE,
>>>>>>+					0, index);
>>>>>
>>>>>Unrelated change.
>>>>>
>>>>
>>>>True, will drop. However, checkpatch.pl will complain wtih:
>>>>exceeding 80 characters.
>>>
>>>It will not as you are not changing these lines, it only complains on changes.
>>>
>>>
>>>
>>>>
>>>>>>  				if (rc != OPAL_SUCCESS) {
>>>>>>  					pr_err("%s: OPAL error %d when mapping IO "
>>>>>>  					       "segment #%d to PE#%d\n",
>>>>>>@@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>>>
>>>>>>  			while (index < phb->ioda.total_pe &&
>>>>>>  			       region.start <= region.end) {
>>>>>>-				phb->ioda.m32_segmap[index] = pe->pe_number;
>>>>>>+				set_bit(index, pe->m32_segmap);
>>>>>>+				set_bit(index, phb->ioda.m32_segmap);
>>>>>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>>>-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>>>>>+					pe->pe_number, OPAL_M32_WINDOW_TYPE,
>>>>>>+					0, index);
>>>>>
>>>>>Unrelated change.
>>>>>
>>>>
>>>>same as above.
>>>>
>>>>>>  				if (rc != OPAL_SUCCESS) {
>>>>>>  					pr_err("%s: OPAL error %d when mapping M32 "
>>>>>>  					       "segment#%d to PE#%d",
>>>>>>@@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>>>  {
>>>>>>  	struct pci_controller *hose;
>>>>>>  	struct pnv_phb *phb;
>>>>>>-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>>>>>+	unsigned long size, pemap_off;
>>>>>>  	const __be64 *prop64;
>>>>>>  	const __be32 *prop32;
>>>>>>  	int len;
>>>>>>@@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>>>
>>>>>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>>>>
>>>>>
>>>>>This comment came with if(IODA1) below, since you are removing the condition
>>>>>below, makes sense to remove the comment as well or move it where people will
>>>>>look for it (arch/powerpc/platforms/powernv/pci.h ?)
>>>>>
>>>>
>>>>Yes, will do.
>>>>
>>>>>
>>>>>>  	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>>>>>-	m32map_off = size;
>>>>>>-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
>>>>>>-	if (phb->type == PNV_PHB_IODA1) {
>>>>>>-		iomap_off = size;
>>>>>>-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
>>>>>>-	}
>>>>>>  	pemap_off = size;
>>>>>>  	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>>>>>>  	aux = memblock_virt_alloc(size, 0);
>>>>>
>>>>>
>>>>>After adding static arrays to PE and PHB, do you still need this "aux"?
>>>>>
>>>>
>>>>"aux" is still needed to tell the boundary of pe_alloc_bitmap and pe_array.
>>>>>
>>>>>>  	phb->ioda.pe_alloc = aux;
>>>>>>-	phb->ioda.m32_segmap = aux + m32map_off;
>>>>>>-	if (phb->type == PNV_PHB_IODA1)
>>>>>>-		phb->ioda.io_segmap = aux + iomap_off;
>>>>>>  	phb->ioda.pe_array = aux + pemap_off;
>>>>>>  	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>>>>>>
>>>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>>>index 62239b1..08a4e57 100644
>>>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>>>@@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>>>>>>  	/* PE number */
>>>>>>  	unsigned int		pe_number;
>>>>>>
>>>>>>+	/* IO/M32/M64 segments consumed by the PE. Each PE can
>>>>>>+	 * have one M64 segment at most, but M64 segments consumed
>>>>>>+	 * by slave PEs will be contributed to the master PE. One
>>>>>>+	 * PE can own multiple IO and M32 segments.
>>>>>
>>>>>
>>>>>A PE can have multiple IO and M32 segments but just one M64 segment? Is this
>>>>>correct for IODA1 or IODA2 or both? Is this a limitation of this
>>>>>implementation or it comes from P7IOC/PHB3 hardware?
>>>>>
>>>>
>>>>It's correct for IO and M32. However, on IODA1 or IODA2, one PE can have
>>>>multiple M64 segments as well.
>>>
>>>
>>>But the comment says "Each PE can have one M64 segment at most". Which
>>>statement is correct?
>>>
>>
>>The comment is correct regarding PHB's 15th M64 BAR: Each PE can have one
>>M64 segment at post. It's from hardware limitation. However, once one PE
>>consumes multiple M64 segments. all those M64 segments will be tracked in
>>"master" PE and it's determined by software implementation.
>>
>>>>>>+	 */
>>>>>>+	unsigned long		io_segmap[8];
>>>>>>+	unsigned long		m32_segmap[8];
>>>>>>+	unsigned long		m64_segmap[8];
>>>>>
>>>>>Magic constant "8", 64bit*8 = 512 PEs - where did this come from?
>>>>>
>>>>>Anyway,
>>>>>
>>>>>#define PNV_IODA_MAX_PE_NUM	512
>>>>>
>>>>>unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]
>>>>>
>>>>
>>>>I prefer "8", not macro for 3 reasons:
>>>>- The macro won't be used in the code.
>>>
>>>You will use it 6 times in the header, if you give it a good name, people
>>>won't have to guess if the meaning of all these "8"s is the same and you
>>>won't have to comment every use of it in this header file (now you have).
>>>
>>>Also, using BITS_PER_LONG tells the reader that this is a bitmask for sure.
>>>
>>>
>>>>- The total segment number of specific resource is variable
>>>>   on IODA1 and IODA2. I just choosed the max value with margin.
>>>>- PNV_IODA_MAX_PE_NUM, indicating max PE number, isn't 512 on
>>>>   IODA1 or IODA2.
>>>
>>>Give it a better name.
>>>
>>
>>Ok. It it has to be a macro, then it's as below:
>>
>>#define PNV_IODA_MAX_SEG_NUM	512
>
>
>Thanks mate :)
>
>
>>>
>>>>
>>>>>>+
>>>>>>  	/* "Weight" assigned to the PE for the sake of DMA resource
>>>>>>  	 * allocations
>>>>>>  	 */
>>>>>>@@ -145,15 +154,16 @@ struct pnv_phb {
>>>>>>  			unsigned int		io_segsize;
>>>>>>  			unsigned int		io_pci_base;
>>>>>>
>>>>>>+			/* IO, M32, M64 segment maps */
>>>>>>+			unsigned long		io_segmap[8];
>>>>>>+			unsigned long		m32_segmap[8];
>>>>>>+			unsigned long		m64_segmap[8];
>>>>>>+
>>>>>>  			/* PE allocation */
>>>>>>  			struct mutex		pe_alloc_mutex;
>>>>>>  			unsigned long		*pe_alloc;
>>>>>>  			struct pnv_ioda_pe	*pe_array;
>>>>>>
>>>>>>-			/* M32 & IO segment maps */
>>>>>>-			unsigned int		*m32_segmap;
>>>>>>-			unsigned int		*io_segmap;
>>>>>>-
>>>>>>  			/* IRQ chip */
>>>>>>  			int			irq_chip_init;
>>>>>>  			struct irq_chip		irq_chip;
>>>>>>

Thanks,
Gavin