Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework winapi's QueryPerformance* functions to match XDK #663

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
39 changes: 36 additions & 3 deletions lib/winapi/profiling.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
// SPDX-FileCopyrightText: 2019 Stefan Schmidt

#include <profileapi.h>
#ifdef USE_RDTSC_FOR_FREQ
#include <synchapi.h>
#endif
#include <assert.h>
#include <stddef.h>
#include <xboxkrnl/xboxkrnl.h>
Expand All @@ -11,14 +14,44 @@ BOOL QueryPerformanceCounter (LARGE_INTEGER *lpPerformanceCount)
{
assert(lpPerformanceCount != NULL);

lpPerformanceCount->QuadPart = KeQueryPerformanceCounter();
lpPerformanceCount->QuadPart = __rdtsc();
return TRUE;
}

BOOL QueryPerformanceFrequency (LARGE_INTEGER *lpFrequency)
{
assert(lpFrequency != NULL);

lpFrequency->QuadPart = KeQueryPerformanceFrequency();
#ifdef USE_RDTSC_FOR_FREQ
#define AVG_SET 10
ULARGE_INTEGER f_rdtsc, avg;
ULONG f_ticks = 0;

avg.QuadPart = 0;

for (int i = 0; i < AVG_SET; i++) {
ULARGE_INTEGER s_rdtsc;
ULONG s_ticks;

s_rdtsc.QuadPart = __rdtsc();
s_ticks = KeTickCount;

s_rdtsc.QuadPart -= f_rdtsc.QuadPart;
JayFoxRox marked this conversation as resolved.
Show resolved Hide resolved
s_rdtsc.QuadPart /= s_ticks - f_ticks;
JayFoxRox marked this conversation as resolved.
Show resolved Hide resolved

f_rdtsc.QuadPart = __rdtsc();
f_ticks = KeTickCount;

// Skip the first result as invalid
if (i)
avg.QuadPart += s_rdtsc.QuadPart;

// If we call rdtsc too fast we'll end up with div by 0
Sleep(10);
Copy link
Member

@JayFoxRox JayFoxRox Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is.. insane. Hard no from me.

(This function is often used in hot-code path and multiple times per frame; slowing it down like this is crazy. I can't imagine the actual XDK did this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct it doesn't, that's added by me. It only ever replies 733mhz. I'd like to hear ideas of how to get this dynamically. Perhaps a run once and global var?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with jfr, code with this kind of performance issue isn't suitable for this type of functionality.
The loop doesn't make sense to me either - doing it in multiple small steps just accumulates rounding and measurement errors, exacerbated by using a low-precision timing source such as the tick counter.

We should calibrate the performance counter frequency value only once, so this function won't have to do anything more than just return the content of a global static variable.
The variable can be initialized via a function marked as a constructor to avoid having to do any manual initialization.
The timing source for the calibration should be KeQueryPerformanceCounter, which should give us better precision given the ACPI timer's tick rate of 3.579545 MHz. It may only support a 24 bit counter, so the code needs to handle a potential overflow during calibration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should prime, and thanks for the hint at constructor attribute. See newest commit for changes.

}
lpFrequency->QuadPart = (avg.QuadPart / (AVG_SET - 1)) * 1000;
#else
lpFrequency->QuadPart = 733333333;
#endif
return TRUE;
}
}