PROBLEM: monotonic clock going backwards on ppc64
Michael Ellerman
mpe at ellerman.id.au
Sat Mar 2 00:24:57 AEDT 2019
Hi Jakub,
[Cc += Timekeeping maintainers]
"Jakub Drnec" <jaydee at email.cz> writes:
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
Thanks for the report.
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
You're right, the wall-to-monotonic offset (wtom_clock_sec) is a signed
32-bit value, so that seems like it's going to have problems.
If I do `date -s 2037-1-1` I see:
[ 26.024061] update_vsyscall: tk->wall_to_monotonic.tv_sec -2114341175
[ 26.042633] update_vsyscall: vdso_data->wtom_clock_sec -2114341175
Which looks sane.
But then 2040-1-1 shows:
[ 32.617020] update_vsyscall: tk->wall_to_monotonic.tv_sec -2208949168
[ 32.632642] update_vsyscall: vdso_data->wtom_clock_sec 2086018128
ie. the larger negative offset has overflowed and become positive.
But then when we go back to 2037 we get a negative offset again and
monotonic time appears to go backward and things are unhappy.
I don't know this code well, but the patch below *appears* to work. I'll
have a closer look on Monday.
cheers
diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
index 1afe90ade595..139133ec21d5 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,7 +82,7 @@ struct vdso_data {
__u32 icache_block_size; /* L1 i-cache block size */
__u32 dcache_log_block_size; /* L1 d-cache log block size */
__u32 icache_log_block_size; /* L1 i-cache log block size */
- __s32 wtom_clock_sec; /* Wall to monotonic clock */
+ __s64 wtom_clock_sec; /* Wall to monotonic clock */
__s32 wtom_clock_nsec;
struct timespec stamp_xtime; /* xtime as at tb_orig_stamp */
__u32 stamp_sec_fraction; /* fractional seconds of stamp_xtime */
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S
index a4ed9edfd5f0..1f324c28705b 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -92,7 +92,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
* At this point, r4,r5 contain our sec/nsec values.
*/
- lwa r6,WTOM_CLOCK_SEC(r3)
+ ld r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
/* We now have our result in r6,r9. We create a fake dependency
@@ -125,7 +125,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
bne cr6,75f
/* CLOCK_MONOTONIC_COARSE */
- lwa r6,WTOM_CLOCK_SEC(r3)
+ ld r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
/* check if counter has updated */
> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include <stdio.h>
> #include <time.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> long get_time() {
> struct timespec tp;
> if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
> perror("clock_gettime failed");
> exit(1);
> }
> long result = tp.tv_sec + tp.tv_nsec / 1000000000;
> return result;
> }
>
> int main() {
> printf("monitoring monotonic clock...\n");
> long last = get_time();
> while(1) {
> long now = get_time();
> if (now < last) {
> printf("clock went backwards by %ld seconds!\n",
> last - now);
> }
> last = now;
> sleep(1);
> }
> return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround:
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.
>
> Regards,
> Jakub Drnec
More information about the Linuxppc-dev
mailing list