[RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line

Naveen N Rao naveen at kernel.org
Tue Jul 2 05:44:13 AEST 2024


On Mon, Jul 01, 2024 at 08:39:03PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Function profile sequence on powerpc includes two instructions at the
> > beginning of each function:
> > 	mflr	r0
> > 	bl	ftrace_caller
> >
> > The call to ftrace_caller() gets nop'ed out during kernel boot and is
> > patched in when ftrace is enabled.
> >
> > Given the sequence, we cannot return from ftrace_caller with 'blr' as we
> > need to keep LR and r0 intact. This results in link stack imbalance when
> 
> (link stack is IBMese for "return address predictor", if that wasn't
> obvious)
> 
> > ftrace is enabled. To address that, we would like to use a three
> > instruction sequence:
> > 	mflr	r0
> > 	bl	ftrace_caller
> > 	mtlr	r0
> >
> > Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
> > reserve two instruction slots before the function. This results in a
> > total of five instruction slots to be reserved for ftrace use on each
> > function that is traced.
> >
> > Move the function profile sequence out-of-line to minimize its impact.
> > To do this, we reserve a single nop at function entry using
> > -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
> 
> What's the need to do this on vmlinux.o rather than vmlinux? We have
> all function syms?

We want to be able to build and include another .o file to be linked 
into vmlinux. That file contains symbols (pfe_stub_text, et al) used by 
vmlinux.o

> 
> > the total number of functions that can be traced. This is then used to
> > generate a .S file reserving the appropriate amount of space for use as
> > ftrace stubs, which is built and linked into vmlinux.
> 
> An example instruction listing for the "after" case would be nice too.

Sure.

> 
> Is this all ftrace stubs in the one place? And how do you deal with
> kernel size exceeding the limit, if so?

Yes, all at the end. Ftrace init fails on bootup if text size exceeds 
branch range. I should really be putting in a post-link script to detect 
and break the build in that case.

> 
> >
> > On bootup, the stub space is split into separate stubs per function and
> > populated with the proper instruction sequence. A pointer to the
> > associated stub is maintained in dyn_arch_ftrace.
> >
> > For modules, space for ftrace stubs is reserved from the generic module
> > stub space.
> >
> > This is restricted to and enabled by default only on 64-bit powerpc.
> 
> This is cool.
> 
> [...]
> 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> >  	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
> >  	def_bool $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
> >  
> > +config FTRACE_PFE_OUT_OF_LINE
> > +	def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> > +	depends on PPC64
> > +	select ARCH_WANTS_PRE_LINK_VMLINUX
> 
> This remains powerpc specific? Maybe add a PPC_ prefix to the config
> option?
> 
> Bikeshed - should PFE be expanded to be consistent with the ARCH_
> option?

I agree. PFE isn't immediately obvious. Now that I think about it, not 
sure it really matters that this uses -fpatchable-function-entry. I'll 
call this PPC_FTRACE_SEQUENCE_OUT_OF_LINE. Suggestions welcome :)

> 
> [...]
> 
> > diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
> > index 201f9d15430a..9da1da0f87b4 100644
> > --- a/arch/powerpc/include/asm/ftrace.h
> > +++ b/arch/powerpc/include/asm/ftrace.h
> > @@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
> >  struct module;
> >  struct dyn_ftrace;
> >  struct dyn_arch_ftrace {
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	unsigned long pfe_stub;
> > +#endif
> >  };
> 
> Ah, we put something else in here. This is the offset to the
> stub? Maybe call it pfe_stub_offset?

Ack.

> 
> [...]
> 
> > diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> > index 2cff37b5fd2c..9f3c10307331 100644
> > --- a/arch/powerpc/kernel/trace/ftrace.c
> > +++ b/arch/powerpc/kernel/trace/ftrace.c
> > @@ -37,7 +37,8 @@ unsigned long ftrace_call_adjust(unsigned long addr)
> >  	if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
> >  		return 0;
> >  
> > -	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
> > +	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY) &&
> > +	    !IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> >  		addr += MCOUNT_INSN_SIZE;
> >  
> >  	return addr;
> 
> I don't understand what this is doing acutally (before this patch
> even). We still emit one/two patchable intsructions at entry, so
> why do we only need to adjust by zero/one instruction here?

ftrace wants the address of the instruction that calls into 
ftrace_caller. We emit 2 nops with -fpatchable-function-entry, so we 
"adjust" the ftrace location to be the second nop.

> 
> 
> > @@ -82,7 +83,7 @@ static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_
> >  {
> >  	int ret = ftrace_validate_inst(ip, old);
> >  
> > -	if (!ret)
> > +	if (!ret && !ppc_inst_equal(old, new))
> >  		ret = patch_instruction((u32 *)ip, new);
> >  
> >  	return ret;
> 
> Is this leftover debugging stuff or should it be in a different patch?

It is intentional to simplify further patching and checks. I will move 
this to a separate patch.

> 
> > @@ -132,11 +133,23 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a
> >  }
> >  #endif
> >  
> > +static unsigned long ftrace_get_pfe_stub(struct dyn_ftrace *rec)
> > +{
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	return rec->arch.pfe_stub;
> > +#else
> > +	BUILD_BUG();
> > +#endif
> > +}
> > +
> >  static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_inst_t *call_inst)
> >  {
> >  	unsigned long ip = rec->ip;
> >  	unsigned long stub;
> >  
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> 
> Maybe put the ip = rec->ip; into an else case here.

Ok.

> 
> [...]
> 
> > @@ -155,6 +168,79 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_
> >  	return 0;
> >  }
> >  
> > +static int ftrace_init_pfe_stub(struct module *mod, struct dyn_ftrace *rec)
> > +{
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	static int pfe_stub_text_index, pfe_stub_inittext_index;
> > +	int ret = 0, pfe_stub_count, *pfe_stub_index;
> > +	ppc_inst_t inst;
> > +	struct ftrace_pfe_stub *pfe_stub, pfe_stub_template = {
> > +		.insn = {
> > +			PPC_RAW_MFLR(_R0),
> > +			PPC_RAW_NOP(),		/* bl ftrace_caller */
> > +			PPC_RAW_MTLR(_R0),
> > +			PPC_RAW_NOP()		/* b rec->ip + 4 */
> > +		}
> > +	};
> > +
> > +	WARN_ON(rec->arch.pfe_stub);
> > +
> > +	if (is_kernel_inittext(rec->ip)) {
> > +		pfe_stub = ftrace_pfe_stub_inittext;
> > +		pfe_stub_index = &pfe_stub_inittext_index;
> > +		pfe_stub_count = ftrace_pfe_stub_inittext_count;
> > +	} else if (is_kernel_text(rec->ip)) {
> > +		pfe_stub = ftrace_pfe_stub_text;
> > +		pfe_stub_index = &pfe_stub_text_index;
> > +		pfe_stub_count = ftrace_pfe_stub_text_count;
> > +#ifdef CONFIG_MODULES
> > +	} else if (mod) {
> > +		pfe_stub = mod->arch.pfe_stubs;
> > +		pfe_stub_index = &mod->arch.pfe_stub_index;
> > +		pfe_stub_count = mod->arch.pfe_stub_count;
> > +#endif
> > +	} else {
> > +		return -EINVAL;
> > +	}
> > +
> > +	pfe_stub += (*pfe_stub_index)++;
> > +
> > +	if (WARN_ON(*pfe_stub_index > pfe_stub_count))
> > +		return -EINVAL;
> > +
> > +	if (!is_offset_in_branch_range((long)rec->ip - (long)&pfe_stub->insn[0]) ||
> > +	    !is_offset_in_branch_range((long)(rec->ip + MCOUNT_INSN_SIZE) - (long)&pfe_stub->insn[3])) {
> > +		pr_err("%s: ftrace pfe stub out of range (%p -> %p).\n",
> > +					__func__, (void *)rec->ip, (void *)&pfe_stub->insn[0]);
> > +		return -EINVAL;
> > +	}
> > +
> > +	rec->arch.pfe_stub = (unsigned long)&pfe_stub->insn[0];
> > +
> > +	/* bl ftrace_caller */
> > +	if (mod)
> > +		/* We can't lookup the module since it is not fully formed yet */
> 
> What do you mean here, what would a lookup look like if we could do it?

I originally had ftrace_get_call_inst() here, which does a 
__module_text_address() lookup. It didn't work since the module is not 
yet fully loaded when this is called. I can expand the comment.

> 
> > +		inst = ftrace_create_branch_inst(ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE,
> > +						 mod->arch.tramp, 1);
> > +	else
> > +		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &inst);
> > +	pfe_stub_template.insn[1] = ppc_inst_val(inst);
> > +
> > +	/* b rec->ip + 4 */
> > +	if (!ret && create_branch(&inst, &pfe_stub->insn[3], rec->ip + MCOUNT_INSN_SIZE, 0))
> > +		return -EINVAL;
> > +	pfe_stub_template.insn[3] = ppc_inst_val(inst);
> > +
> > +	if (!ret)
> > +		ret = patch_instructions((u32 *)pfe_stub, (u32 *)&pfe_stub_template,
> > +					 sizeof(pfe_stub_template), false);
> > +
> > +	return ret;
> > +#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
> > +	BUILD_BUG();
> > +#endif
> > +}
> > +
> >  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr)
> >  {
> > @@ -167,18 +253,29 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned
> >  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >  {
> >  	ppc_inst_t old, new;
> > -	int ret;
> > +	unsigned long ip = rec->ip;
> > +	int ret = 0;
> >  
> >  	/* This can only ever be called during module load */
> > -	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
> > +	if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(ip)))
> >  		return -EINVAL;
> >  
> >  	old = ppc_inst(PPC_RAW_NOP());
> > -	ret = ftrace_get_call_inst(rec, addr, &new);
> > -	if (ret)
> > -		return ret;
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) {
> > +		ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second instruction in stub */
> > +		ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &old);
> > +	}
> >  
> > -	return ftrace_modify_code(rec->ip, old, new);
> > +	ret |= ftrace_get_call_inst(rec, addr, &new);
> > +
> > +	if (!ret)
> > +		ret = ftrace_modify_code(ip, old, new);
> > +
> > +	if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		ret = ftrace_modify_code(rec->ip, ppc_inst(PPC_RAW_NOP()),
> > +			 ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) - (long)rec->ip)));
> > +
> > +	return ret;
> >  }
> >  
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
> > @@ -211,6 +308,13 @@ void ftrace_replace_code(int enable)
> >  		new_addr = ftrace_get_addr_new(rec);
> >  		update = ftrace_update_record(rec, enable);
> >  
> > +		if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) && update != FTRACE_UPDATE_IGNORE) {
> > +			ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE;
> > +			ret = ftrace_get_call_inst(rec, (unsigned long)ftrace_caller, &nop_inst);
> > +			if (ret)
> > +				goto out;
> > +		}
> > +
> >  		switch (update) {
> >  		case FTRACE_UPDATE_IGNORE:
> >  		default:
> > @@ -235,6 +339,24 @@ void ftrace_replace_code(int enable)
> >  
> >  		if (!ret)
> >  			ret = ftrace_modify_code(ip, old, new);
> > +
> > +		if (!ret && IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE) &&
> > +		    (update == FTRACE_UPDATE_MAKE_NOP || update == FTRACE_UPDATE_MAKE_CALL)) {
> > +			/* Update the actual ftrace location */
> > +			call_inst = ppc_inst(PPC_RAW_BRANCH((long)ftrace_get_pfe_stub(rec) -
> > +							    (long)rec->ip));
> > +			nop_inst = ppc_inst(PPC_RAW_NOP());
> > +			ip = rec->ip;
> > +
> > +			if (update == FTRACE_UPDATE_MAKE_NOP)
> > +				ret = ftrace_modify_code(ip, call_inst, nop_inst);
> > +			else
> > +				ret = ftrace_modify_code(ip, nop_inst, call_inst);
> > +
> > +			if (ret)
> > +				goto out;
> > +		}
> > +
> >  		if (ret)
> >  			goto out;
> >  	}
> > @@ -254,7 +376,8 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >  	/* Verify instructions surrounding the ftrace location */
> >  	if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) {
> >  		/* Expect nops */
> > -		ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
> > +		if (!IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +			ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_NOP()));
> >  		if (!ret)
> >  			ret = ftrace_validate_inst(ip, ppc_inst(PPC_RAW_NOP()));
> >  	} else if (IS_ENABLED(CONFIG_PPC32)) {
> > @@ -278,6 +401,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >  	if (ret)
> >  		return ret;
> >  
> > +	/* Set up out-of-line stub */
> > +	if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +		return ftrace_init_pfe_stub(mod, rec);
> > +
> >  	/* Nop-out the ftrace location */
> >  	new = ppc_inst(PPC_RAW_NOP());
> >  	addr = MCOUNT_ADDR;
> > diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S
> > index 244a1c7bb1e8..b1cbef24f18f 100644
> > --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> > +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> > @@ -78,10 +78,6 @@
> >  
> >  	/* Get the _mcount() call site out of LR */
> >  	mflr	r7
> > -	/* Save it as pt_regs->nip */
> > -	PPC_STL	r7, _NIP(r1)
> > -	/* Also save it in B's stackframe header for proper unwind */
> > -	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
> >  	/* Save the read LR in pt_regs->link */
> >  	PPC_STL	r0, _LINK(r1)
> >  
> > @@ -96,16 +92,6 @@
> >  	lwz	r5,function_trace_op at l(r3)
> >  #endif
> >  
> > -#ifdef CONFIG_LIVEPATCH_64
> > -	mr	r14, r7		/* remember old NIP */
> > -#endif
> > -
> > -	/* Calculate ip from nip-4 into r3 for call below */
> > -	subi    r3, r7, MCOUNT_INSN_SIZE
> > -
> > -	/* Put the original return address in r4 as parent_ip */
> > -	mr	r4, r0
> > -
> >  	/* Save special regs */
> >  	PPC_STL	r8, _MSR(r1)
> >  	.if \allregs == 1
> > @@ -114,17 +100,64 @@
> >  	PPC_STL	r11, _CCR(r1)
> >  	.endif
> >  
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	/* Save our real return address locally for return */
> > +	PPC_STL	r7, STACK_INT_FRAME_MARKER(r1)
> 
> Hmm, should you be using STACK_INT_FRAME_MARKER in a
> non-INT_FRAME? I actually wanted to turn the int marker
> into a 4 byte word and move it into a reserved space in
> the frame too. Could it go in pt_regs somewhere?

I can use a nvr.

> 
> > +	/*
> > +	 * We want the ftrace location in the function, but our lr (in r7)
> > +	 * points at the 'mtlr r0' instruction in the out of line stub.  To
> > +	 * recover the ftrace location, we read the branch instruction in the
> > +	 * stub, and adjust our lr by the branch offset.
> > +	 */
> > +	lwz	r8, MCOUNT_INSN_SIZE(r7)
> > +	slwi	r8, r8, 6
> > +	srawi	r8, r8, 6
> > +	add	r3, r7, r8
> 
> Clever. Maybe a comment in ftrace_init_pfe_stub() that says to
> keep that last instruction in synch with this?

Sure.

> 
> > +	/*
> > +	 * Override our nip to point past the branch in the original function.
> > +	 * This allows reliable stack trace and the ftrace stack tracer to work as-is.
> > +	 */
> > +	add	r7, r3, MCOUNT_INSN_SIZE
> > +#else
> > +	/* Calculate ip from nip-4 into r3 for call below */
> > +	subi    r3, r7, MCOUNT_INSN_SIZE
> > +#endif
> > +
> > +	/* Save NIP as pt_regs->nip */
> > +	PPC_STL	r7, _NIP(r1)
> > +	/* Also save it in B's stackframe header for proper unwind */
> > +	PPC_STL	r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
> > +#ifdef CONFIG_LIVEPATCH_64
> > +	mr	r14, r7		/* remember old NIP */
> > +#endif
> > +
> > +	/* Put the original return address in r4 as parent_ip */
> > +	mr	r4, r0
> > +
> >  	/* Load &pt_regs in r6 for call below */
> >  	addi    r6, r1, STACK_INT_FRAME_REGS
> >  .endm
> >  
> >  .macro	ftrace_regs_exit allregs
> > +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> >  	/* Load ctr with the possibly modified NIP */
> >  	PPC_LL	r3, _NIP(r1)
> >  	mtctr	r3
> >  
> >  #ifdef CONFIG_LIVEPATCH_64
> >  	cmpd	r14, r3		/* has NIP been altered? */
> > +#endif
> > +#else /* !CONFIG_FTRACE_PFE_OUT_OF_LINE */
> > +#ifdef CONFIG_LIVEPATCH_64
> > +	/* Load ctr with the possibly modified NIP */
> 
> Comment doesn't apply to this leg of the ifdef AFAIKS.
> We load the original NIP into LR, and set CR0 for
> livepatch branch.

Indeed. Will update.

> 
> > +	PPC_LL	r3, _NIP(r1)
> > +
> > +	cmpd	r14, r3		/* has NIP been altered? */
> > +	bne-	1f
> > +#endif
> > +
> > +	PPC_LL	r3, STACK_INT_FRAME_MARKER(r1)
> > +1:	mtlr	r3
> >  #endif
> >  
> >  	/* Restore gprs */
> > @@ -139,7 +172,9 @@
> >  
> >  	/* Restore possibly modified LR */
> >  	PPC_LL	r0, _LINK(r1)
> > +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> >  	mtlr	r0
> > +#endif
> >  
> >  #ifdef CONFIG_PPC64
> >  	/* Restore callee's TOC */
> > @@ -153,7 +188,12 @@
> >          /* Based on the cmpd above, if the NIP was altered handle livepatch */
> >  	bne-	livepatch_handler
> >  #endif
> > -	bctr			/* jump after _mcount site */
> > +	/* jump after _mcount site */
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	blr
> > +#else
> > +	bctr
> > +#endif
> 
> Here is the crux of it all, we return here with a blr that matches
> the return address of the bl which it was called with, so the CPU
> can predict it.
> 
> I think it would be worth a comment here to explain why you go to
> so much effort to add the case that uses LR here. Because the out
> of line stub itself could pretty well have the same calling convention
> as the original mcount.

Sure. FWIW, null_syscall showed an improvement of ~22% with ftrace 
enabled with this patch going down from ~520 cycles to ~400 cycles.

> 
> Actually that's a thought too. Could you split this patch in two?
> First just the patch to add the out of line call but use the same
> calling convention as mprofile-kernel. Second which changes it to
> use the balanced call/return. Would that be a lot of extra work?

I'll check. My primary motive was to ensure there would only ever be two 
options:
- the existing -mprofile-kernel sequence, primarily for ppc32
- the new ool sequence with a third instruction to balance the link 
  stack.

Though I agree splitting this makes the code easier to follow.

> 
> >  .endm
> >  
> >  _GLOBAL(ftrace_regs_caller)
> > @@ -177,6 +217,11 @@ _GLOBAL(ftrace_stub)
> >  
> >  #ifdef CONFIG_PPC64
> >  ftrace_no_trace:
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	REST_GPR(3, r1)
> > +	addi	r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
> > +	blr
> > +#else
> >  	mflr	r3
> >  	mtctr	r3
> >  	REST_GPR(3, r1)
> > @@ -184,6 +229,7 @@ ftrace_no_trace:
> >  	mtlr	r0
> >  	bctr
> >  #endif
> > +#endif
> >  
> >  #ifdef CONFIG_LIVEPATCH_64
> >  	/*
> > @@ -196,9 +242,9 @@ ftrace_no_trace:
> >  	 *
> >  	 * On entry:
> >  	 *  - we have no stack frame and can not allocate one
> > -	 *  - LR points back to the original caller (in A)
> > -	 *  - CTR holds the new NIP in C
> > -	 *  - r0, r11 & r12 are free
> > +	 *  - LR/r0 points back to the original caller (in A)
> > +	 *  - CTR/LR holds the new NIP in C
> > +	 *  - r11 & r12 are free
> 
> Could you explain the added case here, e.g.,
> 
> On entry, depending on CONFIG_FTRACE_PFE_OUT_OF_LINE (=n/=y)

Sure.

> 
> >  	 */
> >  livepatch_handler:
> >  	ld	r12, PACA_THREAD_INFO(r13)
> > @@ -208,18 +254,23 @@ livepatch_handler:
> >  	addi	r11, r11, 24
> >  	std	r11, TI_livepatch_sp(r12)
> >  
> > -	/* Save toc & real LR on livepatch stack */
> > -	std	r2,  -24(r11)
> > -	mflr	r12
> > -	std	r12, -16(r11)
> > -
> >  	/* Store stack end marker */
> >  	lis     r12, STACK_END_MAGIC at h
> >  	ori     r12, r12, STACK_END_MAGIC at l
> >  	std	r12, -8(r11)
> >  
> > -	/* Put ctr in r12 for global entry and branch there */
> > +	/* Save toc & real LR on livepatch stack */
> > +	std	r2,  -24(r11)
> > +#ifndef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +	mflr	r12
> > +	std	r12, -16(r11)
> >  	mfctr	r12
> > +#else
> > +	std	r0, -16(r11)
> > +	mflr	r12
> > +	/* Put ctr in r12 for global entry and branch there */
> > +	mtctr	r12
> > +#endif
> >  	bctrl
> >  
> >  	/*
> > diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
> > index f420df7888a7..0aef9959f2cd 100644
> > --- a/arch/powerpc/kernel/vmlinux.lds.S
> > +++ b/arch/powerpc/kernel/vmlinux.lds.S
> > @@ -267,14 +267,13 @@ SECTIONS
> >  	.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
> >  		_sinittext = .;
> >  		INIT_TEXT
> > -
> > +		*(.tramp.ftrace.init);
> >  		/*
> >  		 *.init.text might be RO so we must ensure this section ends on
> >  		 * a page boundary.
> >  		 */
> >  		. = ALIGN(PAGE_SIZE);
> >  		_einittext = .;
> > -		*(.tramp.ftrace.init);
> >  	} :text
> >  
> >  	/* .exit.text is discarded at runtime, not link time,
> 
> Why this change?

I should have explained in the commit log. Will add.
Without this change, core_kernel_text() test was failing in 
ftrace_init_pfe_stub() I think.

> 
> > diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile
> > new file mode 100644
> > index 000000000000..9e2ba9a85baa
> > --- /dev/null
> > +++ b/arch/powerpc/tools/Makefile
> > @@ -0,0 +1,10 @@
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +quiet_cmd_gen_ftrace_pfe_stubs = FTRACE  $@
> > +      cmd_gen_ftrace_pfe_stubs = $< $(objtree)/vmlinux.o $@
> > +
> > +targets += .arch.vmlinux.o
> > +.arch.vmlinux.o: $(srctree)/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh $(objtree)/vmlinux.o FORCE
> > +	$(call if_changed,gen_ftrace_pfe_stubs)
> > +
> > +clean-files += $(objtree)/.arch.vmlinux.S $(objtree)/.arch.vmlinux.o
> > diff --git a/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> > new file mode 100755
> > index 000000000000..ec95e99aff14
> > --- /dev/null
> > +++ b/arch/powerpc/tools/gen-ftrace-pfe-stubs.sh
> > @@ -0,0 +1,48 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +# Error out on error
> > +set -e
> > +
> > +is_enabled() {
> > +	grep -q "^$1=y" include/config/auto.conf
> > +}
> > +
> > +vmlinux_o=${1}
> > +arch_vmlinux_o=${2}
> > +arch_vmlinux_S=$(dirname ${arch_vmlinux_o})/$(basename ${arch_vmlinux_o} .o).S
> > +
> > +RELOCATION=R_PPC64_ADDR64
> > +if is_enabled CONFIG_PPC32; then
> > +	RELOCATION=R_PPC_ADDR32
> > +fi
> 
> Started PPC32 support?

Yes, except perhaps the module code. The intent was to enable as much of 
it as I could so that Christophe could try and see if this would be 
useful on 32-bit. The config option is intentionally neutral.

> 
> > +
> > +num_pfe_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
> > +		     grep -v ".init.text" | grep "${RELOCATION}" | wc -l)
> > +num_pfe_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} |
> > +			 grep ".init.text" | grep "${RELOCATION}" | wc -l)
> > +
> > +cat > ${arch_vmlinux_S} <<EOF
> > +#include <asm/asm-offsets.h>
> > +#include <linux/linkage.h>
> > +
> > +.pushsection .tramp.ftrace.text,"aw"
> > +SYM_DATA(ftrace_pfe_stub_text_count, .long ${num_pfe_stubs_text})
> > +
> > +SYM_CODE_START(ftrace_pfe_stub_text)
> > +	.space ${num_pfe_stubs_text} * FTRACE_PFE_STUB_SIZE
> > +SYM_CODE_END(ftrace_pfe_stub_text)
> > +.popsection
> > +
> > +.pushsection .tramp.ftrace.init,"aw"
> > +SYM_DATA(ftrace_pfe_stub_inittext_count, .long ${num_pfe_stubs_inittext})
> > +
> > +SYM_CODE_START(ftrace_pfe_stub_inittext)
> > +	.space ${num_pfe_stubs_inittext} * FTRACE_PFE_STUB_SIZE
> > +SYM_CODE_END(ftrace_pfe_stub_inittext)
> > +.popsection
> > +EOF
> > +
> > +${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \
> > +      ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \
> > +      -c -o ${arch_vmlinux_o} ${arch_vmlinux_S}
> 
> Looking pretty good. I don't know the livepatch stuff well though.

Thanks for the detailed review!

- Naveen


More information about the Linuxppc-dev mailing list