[Skiboot] [RFC PATCH] virtual memory for OPAL boot

Mon Aug 27 17:22:05 AEST 2018

On Mon, 27 Aug 2018 16:16:06 +1000
Oliver <oohall at gmail.com> wrote:

> On Mon, Aug 27, 2018 at 12:15 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
> > I tried hacking on this a bit more. This turns on HPT virtual memory
> > quite early in boot. There is a global EA=RA map for "global" mappings
> > which are things that are always mapped and shared, like text and heap.
> > Then there are transient per-CPU mappings that use their own private
> > addresses for temporary mappings of things that are accessed carefully
> > (e.g., like the 0 page interrupt vectors).  
> 
> cool
> 
> > VM gets shut down right before the kernel is booted.
> >
> > This rearranges skiboot.lds.S a bit to put the most similar regions
> > together as possible, which makes it easier to map things with specific
> > protections. Everything but text is no-execute, rodata is read only, etc.
> >
> > Not too sure where I'm going with this. I think it's good to minimise
> > the amount of time spent in real mode in general to catch bugs. Maybe
> > this is unintrusive enough to be worthwhile. But this is only boot, I
> > would like to get to a point where OPAL services run mostly in virtual
> > mode too, but that would look much different and probably require VM
> > provided by the OS.
> >
> > Anyway this "works" (in mambo), it's fairly unintrusive, most code
> > changes are just juggling a link locations around.  
> 
> We'll need to have a think about how we're going to deal with I/O if
> we want to do this on real hardware, or even on mambo before
> xscom_init() is called. Currently we use the explicit cache inhibited
> load/store instructions for accessing MMIO regions in skiboot and
> those are only available in hypervisor real mode. So we'll probably
> need some kind of instruction patching mechanism if we want to boot in
> virtual mode and switch to real mode at runtime.

Yeah I expect that will be the hard part on real hardware.

> Alternatively we could leave them as-is and emulate them at boot time.
> It might be a bit slow, but MMIOs aren't exactly fast to begin with.

Well you don't need to do patching or interrupts, the happy medium I
think would just be a test and branch. Every thread knows whether or
not it's currently running with relocation on.

If we could enable virtual mode *really* early ahead of most of the
MMIOs, maybe we could always do them with cache inhibited mappings and
provide an exceptional case of explicit _rm accessors like the kernel
does.

I think that would actually be quite possible -- the vm code currently
needs the memory allocator up for the hash table, but we could allocate
that more simply much earlier. Or possibly we could do the
mem_region_init earlier.

> > @@ -971,6 +999,8 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
> >          */
> >         mem_region_init();
> >
> > +       vm_init();
> > +  
> 
> This is a bit too late to really be useful. Ideally we'd want to be in
> virtual mode before the HDAT parser runs or the FDT is expanded.

Yeah... well it did catch a couple of NULL pointer bugs already. Let's
say it's a bit too late to be really useful.

> > +void vm_map_stacks(void)
> > +{
> > +       unsigned long start = stack_end;
> > +       unsigned long end = start + (cpu_max_pir + 1)*STACK_SIZE;
> > +       unsigned long va;
> > +
> > +       if (start == end)
> > +               return;
> > +
> > +       for (va = start; va < end; va += PAGE_SIZE)
> > +               htab_install(va, va, 1, 0, 1);
> > +
> > +       stack_end = end;
> > +}  
> 
> I'd look at having each thread map it's own stack rather than doing it
> all at once. That way we can enter virtual mode before the DT has been
> expanded since we need the DT to find cpu_max_pir.

We need to map secondary stacks because the boot CPU sets them up
before calling in secondaries. We don't want to go to real mode for
that. It should be fine keeping this part here  around init_all_cpus()
time and moving the rest of the vm init earlier though.

> 
> > +static void vm_init_cpu(void)
> > +{
> > +       struct cpu_thread *c = this_cpu();
> > +       unsigned long esid = (0x0800000000000000ULL + (c->pir << 28)) >> 28;
> > +       unsigned long vsid = (unsigned long)c->pir << 30; /* per-cpu VA */
> > +
> > +       mtspr(SPR_LPCR, mfspr(SPR_LPCR) &
> > +               ~(PPC_BITMASK(0,3) | PPC_BIT(41) | PPC_BIT(43) | PPC_BIT(54)));
> > +       mtspr(SPR_LPID, 0);
> > +       mtspr(SPR_PID, 0);
> > +       mtspr(SPR_HRMOR, 0);  
> 
> If HRMOR is non-zero we'll fail an assert long before we get here.
> IIRC HRMOR is replicated across threads on the same core so you need
> to rendezvous all the threads on a core at an address with the high
> bit set (bypasses HRMOR) to safely update it. Hostboot and the FSP
> should always load us with HRMOR set to zero so it shouldn't matter.

Okay I'll get rid of it.

> > +void vm_init(void)
> > +{
> > +       unsigned long va;
> > +
> > +//     prtab = local_alloc(0, 64*1024, 64*1024);
> > +       prtab = memalign(64*1024, 64*1024);
> > +       assert(prtab);
> > +       memset(prtab, 0, 64*1024);
> > +
> > +       global_slb_add(SKIBOOT_BASE >> 28, SKIBOOT_BASE >> 28);
> > +
> > +       htab_nr_bytes = 1UL<<18;
> > +       htab_nr_ptegs = htab_nr_bytes / sizeof(struct hpteg);
> > +       htab_pteg_mask = htab_nr_ptegs - 1;
> > +//     htab = local_alloc(0, htab_nr_bytes, 1UL<<18);
> > +       htab = memalign(1UL<<18, htab_nr_bytes);  
> 
> I'd just statically allocate some space for it in the skiboot memory
> map. That would allow entering virtual mode earlier too.

Yeah.

Thanks,
Nick