[Skiboot] [RFC PATCH] virtual memory for OPAL boot

Oliver oohall at gmail.com
Thu Aug 30 18:30:32 AEST 2018


On Mon, Aug 27, 2018 at 5:22 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
> On Mon, 27 Aug 2018 16:16:06 +1000
> Oliver <oohall at gmail.com> wrote:
>
>> On Mon, Aug 27, 2018 at 12:15 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
>> > I tried hacking on this a bit more. This turns on HPT virtual memory
>> > quite early in boot. There is a global EA=RA map for "global" mappings
>> > which are things that are always mapped and shared, like text and heap.
>> > Then there are transient per-CPU mappings that use their own private
>> > addresses for temporary mappings of things that are accessed carefully
>> > (e.g., like the 0 page interrupt vectors).
>>
>> cool
>>
>> > VM gets shut down right before the kernel is booted.
>> >
>> > This rearranges skiboot.lds.S a bit to put the most similar regions
>> > together as possible, which makes it easier to map things with specific
>> > protections. Everything but text is no-execute, rodata is read only, etc.
>> >
>> > Not too sure where I'm going with this. I think it's good to minimise
>> > the amount of time spent in real mode in general to catch bugs. Maybe
>> > this is unintrusive enough to be worthwhile. But this is only boot, I
>> > would like to get to a point where OPAL services run mostly in virtual
>> > mode too, but that would look much different and probably require VM
>> > provided by the OS.
>> >
>> > Anyway this "works" (in mambo), it's fairly unintrusive, most code
>> > changes are just juggling a link locations around.
>>
>> We'll need to have a think about how we're going to deal with I/O if
>> we want to do this on real hardware, or even on mambo before
>> xscom_init() is called. Currently we use the explicit cache inhibited
>> load/store instructions for accessing MMIO regions in skiboot and
>> those are only available in hypervisor real mode. So we'll probably
>> need some kind of instruction patching mechanism if we want to boot in
>> virtual mode and switch to real mode at runtime.
>
> Yeah I expect that will be the hard part on real hardware.
>
>> Alternatively we could leave them as-is and emulate them at boot time.
>> It might be a bit slow, but MMIOs aren't exactly fast to begin with.
>
> Well you don't need to do patching or interrupts, the happy medium I
> think would just be a test and branch. Every thread knows whether or
> not it's currently running with relocation on.
>
> If we could enable virtual mode *really* early ahead of most of the
> MMIOs, maybe we could always do them with cache inhibited mappings and
> provide an exceptional case of explicit _rm accessors like the kernel
> does.
>
> I think that would actually be quite possible -- the vm code currently
> needs the memory allocator up for the hash table, but we could allocate
> that more simply much earlier. Or possibly we could do the
> mem_region_init earlier.
>
>
>> > @@ -971,6 +999,8 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>> >          */
>> >         mem_region_init();
>> >
>> > +       vm_init();
>> > +
>>
>> This is a bit too late to really be useful. Ideally we'd want to be in
>> virtual mode before the HDAT parser runs or the FDT is expanded.
>
> Yeah... well it did catch a couple of NULL pointer bugs already. Let's
> say it's a bit too late to be really useful.

Er yeah, poor choice of words. I mean it'd be way more useful to me
personally if we turned it on earlier since I get to fix all the HDAT
bugs ;)

>> > +void vm_map_stacks(void)
>> > +{
>> > +       unsigned long start = stack_end;
>> > +       unsigned long end = start + (cpu_max_pir + 1)*STACK_SIZE;
>> > +       unsigned long va;
>> > +
>> > +       if (start == end)
>> > +               return;
>> > +
>> > +       for (va = start; va < end; va += PAGE_SIZE)
>> > +               htab_install(va, va, 1, 0, 1);
>> > +
>> > +       stack_end = end;
>> > +}
>>
>> I'd look at having each thread map it's own stack rather than doing it
>> all at once. That way we can enter virtual mode before the DT has been
>> expanded since we need the DT to find cpu_max_pir.
>
> We need to map secondary stacks because the boot CPU sets them up
> before calling in secondaries. We don't want to go to real mode for
> that. It should be fine keeping this part here  around init_all_cpus()
> time and moving the rest of the vm init earlier though.

Oh right, I forgot the boot cpu had to fill out the cpu_thread
structures at top of the stack. You're right, leaving it here should
be fine.

>> > +static void vm_init_cpu(void)
>> > +{
>> > +       struct cpu_thread *c = this_cpu();
>> > +       unsigned long esid = (0x0800000000000000ULL + (c->pir << 28)) >> 28;
>> > +       unsigned long vsid = (unsigned long)c->pir << 30; /* per-cpu VA */
>> > +
>> > +       mtspr(SPR_LPCR, mfspr(SPR_LPCR) &
>> > +               ~(PPC_BITMASK(0,3) | PPC_BIT(41) | PPC_BIT(43) | PPC_BIT(54)));
>> > +       mtspr(SPR_LPID, 0);
>> > +       mtspr(SPR_PID, 0);
>> > +       mtspr(SPR_HRMOR, 0);
>>
>> If HRMOR is non-zero we'll fail an assert long before we get here.
>> IIRC HRMOR is replicated across threads on the same core so you need
>> to rendezvous all the threads on a core at an address with the high
>> bit set (bypasses HRMOR) to safely update it. Hostboot and the FSP
>> should always load us with HRMOR set to zero so it shouldn't matter.
>
> Okay I'll get rid of it.
>
>> > +void vm_init(void)
>> > +{
>> > +       unsigned long va;
>> > +
>> > +//     prtab = local_alloc(0, 64*1024, 64*1024);
>> > +       prtab = memalign(64*1024, 64*1024);
>> > +       assert(prtab);
>> > +       memset(prtab, 0, 64*1024);
>> > +
>> > +       global_slb_add(SKIBOOT_BASE >> 28, SKIBOOT_BASE >> 28);
>> > +
>> > +       htab_nr_bytes = 1UL<<18;
>> > +       htab_nr_ptegs = htab_nr_bytes / sizeof(struct hpteg);
>> > +       htab_pteg_mask = htab_nr_ptegs - 1;
>> > +//     htab = local_alloc(0, htab_nr_bytes, 1UL<<18);
>> > +       htab = memalign(1UL<<18, htab_nr_bytes);
>>
>> I'd just statically allocate some space for it in the skiboot memory
>> map. That would allow entering virtual mode earlier too.
>
> Yeah.
>
> Thanks,
> Nick


More information about the Skiboot mailing list