[PATCH 14/27] Add book3s_64 specific opcode emulation

Thu Nov 5 21:09:43 EST 2009

On 05.11.2009, at 01:53, Segher Boessenkool wrote:

>>>> +		case OP_31_XOP_EIOIO:
>>>> +			break;
>>>
>>> Have you always executed an eieio or sync when you get here, or
>>> do you just not allow direct access to I/O devices?  Other context
>>> synchronising insns are not enough, they do not broadcast on the
>>> bus.
>>
>> There is no device passthrough yet :-). It's theoretically  
>> possible, but nothing for it is implemented so far.
>
> You could just always do an eieio here, it's not expensive at all
> compared to the emulation trap itself.
>
> However -- eieio is a Book II insn, it will never trap anyway!

Don't all 31 ops trap? I'm pretty sure I added the emulation because I  
saw the trap.

>>>> +		case OP_31_XOP_DCBZ:
>>>> +		{
>>>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>>>> +			ulong ra = 0;
>>>> +			ulong addr;
>>>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>> +
>>>> +			if (get_ra(inst))
>>>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>>>> +
>>>> +			addr = (ra + rb) & ~31ULL;
>>>> +			if (!(vcpu->arch.msr & MSR_SF))
>>>> +				addr &= 0xffffffff;
>>>> +
>>>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>>>
>>> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
>>> there
>>> are HID bits to make it work on 32 bytes only, and an extra DCBZL  
>>> insn
>>> that always clears a full cache line (128 bytes).
>>
>> Yes. We only come here when we patched the dcbz opcodes to invalid  
>> instructions
>
> Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
> or such?

Good idea.

>> because cache line size of target == 32.
>> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
>>
>> Admittedly though, this could be a lot more clever.
>
>>>> +		/* guest HID5 set can change is_dcbz32 */
>>>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>>>> +		    (mfmsr() & MSR_HV))
>>>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>>>> +		break;
>>>
>>> Wait, does this mean you allow other HID writes when MSR[HV] isn't
>>> set?  All HIDs (and many other SPRs) cannot be read or written in
>>> supervisor mode.
>>
>> When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
>> dcbz HID flag. So all we need to do is tell our entry/exit code to  
>> set this bit.
>
> Which patch contains that entry/exit code?

That's patch 7 / 27.

+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) == 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:

>> If we're on 970 on a hypervisor or on a non-970 though we can't use  
>> the HID5 bit, so we need to binary patch the opcodes.
>>
>> So in order to emulate real 970 behavior, we need to be able to  
>> emulate that HID5 bit too! That's what this chunk of code does - it  
>> basically sets us in dcbz32 mode when allowed on 970 guests.
>
> But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
> will not trap but be silently ignored.  Sorry for not being more  
> clear.
> ...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.

Yeah, the guest is always running in PR=1, so all HV checks are for  
the host. Usually we run in HV=1 on the host, because IBM doesn't sell  
machines that have HV=0 accessible for mortals :-).

I'll address your comments in a follow-up patch once the stuff is  
merged.

Alex