The Problem with GPU Benchmarks | Reality vs. Numbers, Animation Error Methodology White Paper

The Downside with GPU Benchmarks | Actuality vs. Numbers, Animation Error Methodology White Paper

YouTube video

That is that limitation: Frames are displayed at a fair tempo on this instance, however one thing continues to be unsuitable with it. The factor that’s unsuitable is why we’ve got the brand new measurement methodology that we’re debuting as we speak: Animation error.

Animation error is the distinction between the pacing of animation and show. Placing “animation” within the title is likely to be complicated: Intel known as it Simulation Time Error at one level. 

And this can be a pattern chart illustrating what we’re speaking about, the place you’ll be able to see the animation error timing typically in alignment with frametime spikes, typically out of alignment with it, displaying that it’s a unique factor. 

Right here’s one other chart displaying the p.c animation error throughout a take a look at move. 

Or this one, the place we present the CPUStartTime versus animation time delta.

These are all new kinds of benchmark charts that haven’t been proven earlier than. Tom Petersen first pitched the concept of animation error a pair years in the past, one thing he arguably started on about 14 years in the past.

The issue with GPU, CPU, and new recreation benchmarks has all the time been that it’s powerful to precisely seize the precise participant expertise. Framerate was a very good begin, and frame-to-frame interval testing (or “frametime testing”) was an excellent growth on that — however neither completely captures the actual expertise.

That is what we’re unveiling and detailing as we speak. Think about this a whitepaper, like a analysis piece that’s supposed to place data out to the group for individuals to start out attempting to experiment with. None of that is good but, however we predict we’ve got a very good basis for viewers and different reviewers to construct upon and advance our understanding of recreation habits. Should you’re a reviewer and that is helpful, please level again to this story because it was over a month of labor for us to wrap our heads round.

Let’s get into it.

This methodological deep-dive lays the muse for brand spanking new testing. It’s exploratory.

We rolled-out 1% low and 0.1% lows in our testing again in 2014, ultimately popularizing their presentation on bar charts alongside common framerate. These are the metrics we use as we speak to level us towards an issue with frametime pacing. The trade has relied on frametimes, 1%, and 0.1% averages for over a decade now with few new metrics in between. 

The historical past to that’s all necessary, and it’s necessary that those that did the groundwork earlier than us are recognized: Tom Petersen, PC Per’s Ryan Shrout, and Tech Report’s Scott Wasson all superior this metric, with Petersen doing heavy lifting on offering software program instruments and early insights to frametime evaluation. His engineering work has continued with the open supply software PresentMon, which will get us to animation error as we speak.

Animation Error must be considered in a extra conventional sense: Like a flipbook with completely animated drawings, however with out the execution of flipping via the pages at a continuing tempo. That is only one particular state of affairs for animation error; you could possibly lengthen the metaphor, like flipping via the pages completely however messing up the drawing (or you could possibly do a mix of each).

We’re speaking about animation within the dictionary sense, “a film, scene, or sequence that simulates motion from a sequence of nonetheless frames,” so animation error applies to total frames. Consider it like frames of drawings on an animation lamp, in a flipbook, or on a reel of movie. We’re NOT speaking about errors in animation of particular person fashions, objects, NPCs, or sprites inside the frames, and we’re additionally not speaking about video games that simply have unhealthy animation from the artists. 

You can technically have animation error even watching a clean wall in-game with none motion in any respect, although it is likely to be unimaginable to note.

This all comes down to 2 issues: Smoothness and acceleration. After we have been speaking to Tom Petersen about this idea, he made some good factors about this. Displaying frames sooner permits the mind to interpolate and generate an phantasm of easy movement, which is the phantasm of TV and flicks. However the mind additionally is aware of the right way to establish acceleration, one thing Tom equated again to “monkey instances like operating away from lions and shit.”

That is an instance of animation error that we created in 3D area. It’s very easy and correct as a fee, however it’s sluggish. Now and again, you’ll see an error in it regardless of the smoothness of the frames, and that’s stutter. When the mind sees even delicate acceleration or deceleration, we decide it up fast. That’s what makes it really feel so unhealthy once we see stutter in gaming. Stutter is what you’re seeing right here: One thing accelerates or decelerates rapidly and the physique is overreacting as a result of, to cite Tom, “we don’t wish to be eaten by a lion, or some shit like that.” He actually has a present with phrases.

However it might be nice if we might measure smoothness and acceleration individually, as a result of they’re totally different issues: Framerate and frame-to-frame interval consider smoothness, however acceleration is one thing we haven’t carried out a very good job at quantifying on this trade. That’s characterised by animation error, which we’re introducing with its first full charts as we speak.

This chart is from our Dragon’s Dogma 2 testing at launch, the place we have been quietly starting to farm information for this eventual piece. You’ll be able to see the place the animation error blips typically align with the frametime spikes, however not all the time.

AnimationError may also be optimistic or destructive. When a body is displayed “too late” (relative to its right placement) that is a destructive error, and when it is proven “too quickly” that is a optimistic error. Neither is sweet. At a macro stage, it would not matter which is which: additional away from 0 is all the time worse. 

The concept is that if frames are created at a sure tempo, it’s best to see these frames displayed on the similar tempo. There are variations between animation error and frametime pacing, although. 

If there is a mismatch, that is the place the animation half is available in: motion depicted within the frames will seem jerky and unsuitable, even when frametimes are completely constant. 

For a real-world demonstration, the only, most dependable means we discovered to instantly induce animation error was with SLI. 

Yeah, we all know.

We are able to take one card and get a traditional consequence, then add a second and get a consequence with greater animation error. 

That limits us to older {hardware}, and it additionally limits us to GPUs that we personal in pairs. We chosen two 1080 Tis (learn our revisit) since they’re our latest playing cards that also use common previous SLI, and (as of now) they’re nonetheless supported in the newest NVIDIA driver bundle. We additionally needed to choose a recreation that supported SLI.

We’re entering into benchmarks. We’ll break these down beginning on the most abstracted metric, which is that this chart. 

That is framerate represented as bars, abstracting away from time. Subsequent, we’ll take a look at the frametimes that create this framerate common, and final, we’ll take a look at the brand new animation error metric.

These are averaged outcomes for 30 second logs of Far Cry 5’s baked-in benchmark, one utilizing a single 1080 Ti and one utilizing two 1080 Tis in SLI. The 1% and 0.1% lows point out that there weren’t large frametime spikes, and it’s these metrics that inform us once we ought to examine a frametime plot nearer for main issues. Common FPS smooths over issues, 1% and 0.1% are nonetheless averages and may nonetheless easy over them, however are extra seemingly to attract our consideration towards an issue as a result of they’re averaging the worst 1% and worst .1% of information.

As a reminder, we aren’t utilizing percentiles, which is a unique means of approaching this. We defined that in our video the place we did JayztwoCents’ lab overhaul.

The common with SLI is greater, as anticipated; nevertheless, past that, there’s no dramatic change between how these numbers manifest. The only GTX 1080 Ti seems to have nearer frametime pacing to its common, which is what we’ve been preaching for years as a very good consequence, however the twin 1080 Tis nonetheless look good total.

Knowledge Presentation: Frametimes

The frametime plot helps us see deeper into these bars. 

This nonetheless isn’t animation error, although, and it’s nonetheless not new.

This plot of frametimes is for 2 particular person passes. As we all know based mostly on the final chart, the SLI run’s hole between the typical and its 1% and 0.1% metrics is wider. Right here, that materializes within the type of spiky habits (notably within the 500-1000 body vary) for the SLI configuration. The expertise is way much less constant, with extra sporadic frametime excursions from baseline. Most customers start to note these round 8ms, in line with an interview we carried out with Scott Wasson years in the past, however provided that they steadily happen. 

The only-card run has extra constant frametimes, regardless of its decrease common. It’s not so unhealthy that it’s a ruinous expertise, because the SLI configuration is finally nonetheless inside the vary of 2-4 ms of the baseline, however the relative distance from baseline is bigger. It’s not like we’re seeing 100ms spikes the place you’d stare at one body for 1/10 of a second, as we’ve seen in different assessments on trendy single playing cards.

Knowledge Presentation: Animation Error

Right here’s the brand new stuff.

The left axis exhibits animation error in ms, with deviations away from zero relying on whether or not frames confirmed up too quickly or too late. 0 is ideal and 0 does happen. The X-axis represents frames, the identical because the frametime plot we simply confirmed. The animation errors is also taken as absolute values for a lower-is-better illustration, however this can be a plot of the uncooked information as logged by PresentMon. 

There are numerous methods this information could possibly be plotted. A scatterplot lets us decide the information factors individually, however nonetheless retains the factors so as so we will see how habits adjustments over the course of the take a look at. Traces between the factors would not imply a lot right here.

By plotting the animation error, we will eliminate frametimes as a variable and simply evaluate that relative spikeyness as a participant would actually really feel it interacting with the sport itself. That is nearer to the actual expertise, in the identical means end-to-end latency could be however for various causes. The additional the deviation from zero in both route, the more serious. The playing cards in SLI steadily had 2-3ms of animation error per body, whereas the one card sometimes had properly underneath 1ms animation error. The SLI configuration is considerably worse for animation error in a relative sense, regardless of even the frametime information trying not that dramatic.

The only card is clearly much better when it comes to animation error, though we’d like extra information to guage whether or not the SLI result’s “unhealthy.”

Unusual Brigade Animation Error

That is one other take a look at. This time, we’re utilizing Unusual Brigade. Our purpose as we speak is exploratory, so we’re selecting video games based mostly on usefulness to discover the idea, not on their reputation. That’ll come later.

The principle benefit of a plot like that is that we do not have to insert our personal calculations or conclusions: we will merely present the information. The only 1080 Ti sticks even nearer to zero animation error per body than it did in Far Cry, whereas the SLI 1080 Tis proceed to generate 2-3ms of optimistic or destructive error on practically each body, solely not often approaching zero. 

SLI outperforms the single-card so closely on this title that the crimson line is considerably shorter, which is a draw back of utilizing frames as an X-axis. Though each its common framerate and its 1% and 0.1% lows are total good right here, the animation error is way superior on the one GPU. There’s a risk that it feels higher to a participant, however not for the explanations everybody on this group has stated for years: It’s not resulting from frametimes, on a technicality, however animation error, which is a metric that has been underneath the floor this complete time.

Animation Error Bar Chart

That is an “Error Per Body” chart we tried, which places the information again into bars for denser comparability of extra playing cards. Possibly this might work higher as a visualization: for these two particular person passes in Far Cry from earlier, the entire animation error (taken as absolute values) divided by the entire variety of frames was 0.13ms per body for the one 1080 Ti and a couple of.31ms per body for the SLI 1080 Tis. These are mainly single-number summaries for the plots we simply confirmed. However now we’re doing the unhealthy factor once more: We’re abstracting away from the bottom metric over time (proven as frames) and changing it right into a bar, as a result of that’s simpler to learn. This isn’t excellent for lots of causes.

To start with, this might obscure particular person huge error spikes (if that is one thing we wish to observe), and secondly, it additionally provides framerate again in as a variable. The SLI setup generates extra frames, which lowers the tip consequence for this calculation, arguably making the SLI playing cards look unfairly good; then once more, possibly it’s not “unfairly” as a result of extra frames going by faster might assist disguise animation error.

That is getting difficult, however you’ll be able to see why we’ve had to consider this for weeks.

Far Cry 5 Animation Error P.c

So we don’t just like the prior chart for these causes, and the scatter plot doesn’t accommodate greater than two or three GPUs earlier than it’s illegible. Possibly this may assist.

This various was instructed by Tom final yr. It divides the entire animation error absolute values by the entire frametimes (the size of the take a look at run) to get a ratio or proportion. 

That is equal to evaluating these common error-per-frame numbers from the final chart to common frametimes. 

Nonetheless, as a way to make this chart we’re displaying correct and never deceptive, we had to make use of two instances the entire frametimes for the calculation as a result of in any other case, the implication is that 100% is the utmost worth, and it wouldn’t be if we hadn’t corrected for that, and that’s as a result of within the absolute worst case state of affairs and assuming that latency would not accumulate over the course of the take a look at run (which is a unique topic solely), the utmost whole animation error could be twice whole frametime. 

THIS diagram makes that abundantly clear by displaying that alternating between infinitely small frametimes rounded to zero and 10ms frametimes, and inversing the show instances, we will get 10ms or -10ms of animation error per body, the place including up all frametimes offers us 30ms but including up absolute values of the animation error offers us 60ms.

OK, this graphic actually isn’t serving to make it much less complicated. The purpose is, we already corrected for this in our p.c chart and another reviewers planning to make use of such a chart might want to do the identical.

We used the identical two Far Cry passes for this instance: for the one 1080 Ti, the result’s 0.7%, and for the SLI setup the result’s 17.3%. This probably cancels out or reverses the high-framerate benefit from the final chart: extra frames equals extra (whole) error.

Actual-World Makes use of

Let’s speak potential real-world use circumstances. Possibly this may give different reviewers some concepts.

Animation error is theoretically decoupled from framerate and frametime consistency, though in actuality poor efficiency correlates throughout all of these classes. 

On this mockup from Intel, the bar marked “CPU Stutter” marks a frametime spike, whereas the mismatch between the sizes of the bars within the prime and backside rows is the animation error (F1 versus F1 = error for F2, F2 versus F2 = error for F3), so we would see related spikes in each metrics.

Animation error can be separate from latency: it is about how the frames are spaced, not whether or not they’re all displaying up ten seconds late.

Body era has attention-grabbing implications for animation error, however sadly there is not any animation time for faux frames, so there is not any reference level for calculating error. That makes this complete factor much more difficult to quantify.

We might wish to verify on NVIDIA MFG specifically, since that comes together with flip metering that shuffles body timings round on the finish of the pipeline, which has the potential to really induce animation error (as we talked about with the SLI instance).

Animation error might shine a lightweight on shortcomings of flip metered body era which can be at the moment masked with present testing strategies, however we received’t know till the software program works on it, which can at the very least partially depend on NVIDIA’s willingness to play ball.

Dragon’s Dogma 2 Animation Error

As a result of we count on AnimationError to correlate with frametime spikes more often than not, its commonest usages for us can be just like 0.1% lows and frametime plots, however with a extra direct illustration of how the sport feels. 

For instance, this can be a chart that Tom helped us generate throughout the early phases of troubleshooting Dragon’s Dogma 2’s efficiency again when it launched — that’s how lengthy we’ve been eager about this information. 

The spikes in each metrics line up precisely, however animation error provides depth by telling us why the stuttering was so noticeable and ugly. The optimistic and destructive animation error dots that correspond to the frametime spike towards 60ms round body 500 exhibits a possible for 45ms of animation error. That’s probably much more noticeable than even the already noticeable frametime hitch of 58ms.

Borderlands 2 Animation Error

Right here’s Borderlands 2. Sure, we all know there are newer ones. However this can be a higher demonstration.

Since animation error successfully cancels out framerate variations, we will use it to check two utterly totally different items of {hardware} and get extra nuance out of it.

These two outcomes are from our piece concerning the demise of 32-bit PhysX. These two specific outcomes have been pretty shut collectively each when it comes to common FPS and lows: the GTX 980 with GPU PhysX averaged 101 FPS throughout a number of passes and the RTX 5080 (learn our evaluation) with CPU PhysX averaged 95 FPS. These are actual outcomes and we defined them in our piece about NVIDIA killing 32-bit PhysX help this yr.

Despite that, there is a clear distinction in habits, and animation error is far greater with the GTX 980. Utilizing the proportion math we talked about earlier, that is an error-to-frametime ratio of 11.9% for the 980 and 1.8% for the 5080, which permits us to establish an issue with out having to generate frametime plots for each single consequence.

Percentile Limitations

Seize a GN15 Giant Anti-Static Modmat to rejoice our fifteenth Anniversary and for a high-quality PC constructing work floor. The Modmat options helpful PC constructing diagrams and is anti-static conductive. Purchases instantly fund our work! (or think about a direct donation or a Patreon contribution!)

We in all probability would not use a take a look at scene that resulted in a graph just like the GDC 2015 one proven above by NVIDIA, however because it factors out, there are eventualities like this the place frametime spikes could possibly be artificially masked and never present up in 1% and 0.1% low calculations. 

NVIDIA is utilizing percentiles right here as a substitute, which we don’t use, however the concept is analogous. 

Assuming these frametime spikes have been accompanied by animation error, calculating animation error would do a significantly better job of summarizing the issue (on this occasion). That stated, our method to lows already helps to regulate for a few of this, however it nonetheless requires educated testers to know when to look into the 0.1% and 1% low outcomes. 

Microstutter & Multi-GPU

Animation error has been related to microstutter up to now, however it’s not fairly the identical factor. If something, animation error is a solution to measure microstutter, however not microstutter itself, relying on the definition. Microstuttering was steadily introduced up within the context of multi-GPU rendering, so we’ll return in time and begin there by referencing supplies from GDC 2015.

This simplified timeline establishes our basis: every block is a body, and body N will get displayed whereas the pc works on body N + 1 behind the scenes. CPU work is not represented right here, however we’ll ignore that for now. The body instances are completely constant, which is good. 

This subsequent slide exhibits how Alternate Body Rendering (AFR) multi-GPU operates, or at the very least operated again when anybody truly supported it. Every of the GPUs takes turns rendering frames, and the output is mixed and displayed so as. The blue containers are nonetheless uniform sizes, indicating constant show instances, and so they’re smaller than the earlier diagram, indicating a better framerate.

One of many main difficulties with AFR is attempting to synchronize GPUs. Right here, GPU0 and GPU1 are taking the identical period of time to finish particular person frames, however they’re poorly synced in order that some frames get little show time and are successfully wasted. 

These are often known as “runt frames,” the place fractions of frames are proven in a means that elevates the typical FPS, however creates an terrible expertise with unhealthy tearing. The common framerate is greater as a result of extra frames have been technically proven, however the extra body is ineffective and the precise expertise can look decrease in framerate, which is one definition of microstutter.

Once more, we won’t see the CPU stage of the pipeline right here, however we’ll assume it is being utterly constant. 

On this instance, the rhythm of animation matches the rhythm of show, so there isn’t a animation error. The framerate is stuttery, however all of the shifting stuff within the frames exhibits up in the proper place on the proper time. After every brief frametime, objects transfer slightly; after every lengthy frametime, objects transfer loads.

You’ll be able to keep away from that microstutter by forcing the pacing into alignment. Ideally this occurs by manipulating delays early within the pipeline (on the CPU), wherein case you come back to a clear consequence like this one.

Nonetheless, should you have been to meter these frames out on the END of the pipeline, that may instantly contribute to animation error (or “animation stutter,” as NVIDIA put it again then). We have created and rendered an edited diagram to point out what that may appear like. Should you simply take the “brief” frames and maintain onto them longer earlier than flipping, then there is a mismatch between the pacing of the frames as they’re displayed versus the animations depicted in these frames. That results in perceived stuttering and rubberbanding.

Again to our earlier animation, that’s seen when evaluating the crimson and inexperienced indicators under the scene, the place the imperfect ball stalls after which will get dragged ahead in uneven intervals.

We aren’t exploring which technique NVIDIA or AMD used to take care of microstutter; that is a topic for one more time, and that point was 12 years in the past. At present, we’re simply displaying an actual world instance of a scenario the place animation error was a threat.

Animated Examples

Andrew on the crew made some 3D mockups of simulated animation error. We began by rendering out a scene at 120 frames per second. This might have been any arbitrary quantity, however choosing a excessive framerate allowed us to downsample and mess around with the spare frames.

This is what that appears like. The highest row of squares represents our 120 FPS supply video, and we’ll faux that our simulated recreation has an impossibly low latency of zero, which means that this row represents in-game actuality. The underside row of squares represents frames that we pulled out of the unique sequence to create our 60 FPS video. As a result of we pulled precisely each different body, we nonetheless have easy playback with zero frametime spikes and nil animation error. As a result of we show the frames in sync with their authentic placement, we’ve got zero latency as properly.

Since we’re pulling frames each 16.67ms from the supply video, our AnimationTimes are all the time 16.67ms. And since we’re additionally displaying these frames each 16.67ms, our DisplayedTimes are all the time 16.67ms. Due to this fact, AnimationError is all the time zero.

The inexperienced circle represents “actuality” as decided by our supply video, whereas the crimson circle represents what we’re truly seeing. Once more, these match completely on this management instance.

By taking a few of the spare frames from the supply video, we will create animation error, however we should be particular. The diagonal strains imply that we’re taking the unique frames and displaying them later than “actuality,” which introduces latency. Latency itself is just not animation error. That’s a unique downside. 

As we play via the massive lump of diagonal strains at first of this clip, the crimson circle falls behind the inexperienced circle, however the animation of the video stays easy. Animation error is when the crimson circle jerks round, skipping to meet up with the inexperienced circle at an uneven tempo. Should you watch the video clip when this occurs, you’ll be able to see the interruptions. That is what we imply once we say animation error is a measurement of jitteryness.

This is an excessive instance of our level about latency: we’re displaying each body 41.67ms “late,” so the crimson circle lags behind the inexperienced, however the ensuing video is similar to the management. AnimationTimes and DisplayedTimes are nonetheless completely matched 16.67ms intervals each time, so there’s zero animation error.

We are able to do that a number of instances throughout the clip to accentuate that feeling of rubber-banding. We’re creating lots of variation in our simulated AnimationTimes right here: if we take two back-to-back frames from the 120 FPS supply video, that is an 8.33ms AnimationTime. If we pull two frames that have been spaced 4 aside within the authentic video, that is a 33.3ms AnimationTime. In the meantime, our DisplayedTimes stay fixed, as a result of we’re nonetheless displaying contemporary new frames precisely 16.67ms aside. That is the mismatch in pacing that animation error quantifies.

As Tom advised us earlier than, our problem is to “make individuals perceive you could take frames and present them with a fair cadence on show and nonetheless have it appear like shit.” He has an actual means with phrases. It’s like poetry.

We are able to additionally create an inverse instance. Take into consideration a flip ebook. Right here we have pulled frames from the supply video at even intervals for fixed simulated AnimationTimes of 33.33ms, however by displaying these frames at uneven intervals, we nonetheless create animation error. That is like drawing an ideal flipbook animation after which failing to flip via the pages at a continuing tempo. This can be a bizarre theoretical instance, as a result of if we assume this can be a case the place frametimes equal animation instances, PresentMon would report a very regular 30FPS based mostly on MsBetweenAppStart.

That is not the one means for animation error to manifest, although—in reality, it is fairly unlikely that you just’d naturally encounter completely constant DisplayedTimes with inconsistent AnimationTimes, or vice versa. A extra practical state of affairs is this one, which simulates CPU-based stuttering, just like the diagram Intel shared with us.

Right here, moderately than displaying a novel body each 16.67ms, we freeze on particular person frames. These are DisplayedTime spikes, which normally correlate to FrameTime spikes underneath the hood.

For every spike, we get two animation errors: after a frozen body, the subsequent body is judged to be too late, and the body after that’s judged to be too quickly.

That is nearer to what we have noticed in video games, however identical to with latency, it is necessary to keep in mind that the frametime spikes should not the identical factor as animation error: you could possibly theoretically preserve freezing on frames whereas sustaining zero animation error. 

To assist clarify, let’s cowl a real-world instance.

Seize Demonstrations

For simpler dialogue, we’re principally ignoring VSYNC and variable refresh in order that the monitor is just not a think about any means. 

After we say frames are “displayed,” we imply {that a} flip has been signalled to the working system, and with out VSYNC, that flip can occur even when the monitor is in the course of a refresh (main to ripping). 

That pushes numbers logged with out VSYNC in direction of the theoretical realm, however no extra so than typical: for instance, in our launch evaluation, the RTX 5090 averaged a ridiculous 407 FPS within the Dawntrail benchmark at 1080p. That is a comparable efficiency quantity, unbiased of no matter monitor we used, and in that context, that is what we needed as a result of we would like p.c scaling between gadgets.

Individually, greater framerates do correlate with decrease latency, so there’s worth from that aspect as properly. At present although, we’re additionally ignoring latency for functions of focusing dialogue.

All of that stated, the simplest solution to truly present animation error in captured footage is with VSYNC to keep away from tearing. Should you ignore the general drops in framerate and deal with the motion of objects that must be easily touring throughout the display screen, you will see them seem to alter pace and bounce round: that is animation error.

It is most noticeable in quick panning pictures with easy monitoring: the digicam must be shifting at a gradual fee even when the framerate drops, however it seems to hitch and rubber-band, particularly when performed again at half pace. Animation error is separate from frametime spikes, however the two issues are steadily related, and so they’re each unhealthy.

We wish to watch out right here, as a result of animation error is uniquely unhealthy when VSYNC is enabled, and this footage is not consultant of the non-VSYNCed take a look at passes that we’re about to debate.

One of many solely methods we will characterize these take a look at passes in fixed-framerate seize is with an FCAT-style overlay, which provides a visible indicator of the place torn frames start and finish. This helps illustrate runt frames as properly, and was utilized in VirtualDub again within the day. That offers us an indicator of every particular person body that we’re discussing with out including the complication of VSYNC, however it does additionally imply that frames could solely present up as a tiny sliver of pixels.

If we play again this footage slowly, you’ll be able to steadily see the sample of tearing: a brand new coloration exhibits up on the backside of the bar in a single body of the seize, then it continues from the highest of the bar within the subsequent body of seize. It is form of a large number, and it is troublesome to deal with one sliver of body at a time, therefore utilizing VSYNC for visualization despite the downsides. The FCAT performance could be extra helpful as a part of the PresentMon overlay, which might concurrently present a stay graph of animation error.

Superior Definitions

Listed here are the PresentMon CPU metrics.

PresentMon CPU Metrics:

CPUStartTimeInMs: The second the place a brand new body is born, expressed as a timestamp (relative to the start of the PresentMon session).

AnimationTime: That is PresentMon’s finest estimate of the frozen second in recreation time that is depicted by a body. As a participant, you are all the time seeing rendered pictures of a recreation a number of milliseconds after the in-game actuality that they depict. AnimationTime is the timestamp for that actuality. 

In response to Tom Petersen final yr, “At present individuals are principally utilizing CPUStart because the AnimationTime, which is a reasonably good proxy, and that is what we’ll be doing initially. There are express APIs, each from NVIDIA and from us [Intel] and others which can be permitting recreation engines or video games to let you know that AnimationTime. And in order that turns into extra out there, we’ll be constructing that into PresentMon.” 

Logically, AnimationTime must be the identical because the second the body was born (CPUStartTimeInMs), however video games can pull methods to easy animations in order that AnimationTime for a body would not line up with wall-clock time. Beneath regular circumstances it must be shut sufficient, although.

For example of an exception, PresentMon can monitor SimStart occasions when utilizing Intel XeLL (and shortly NVIDIA Reflex) and base AnimationTime on that as a substitute. That is useful as a result of Reflex and XeLL clear the render queue and preserve the CPU sitting round ready for enter till the final attainable second, so there is a greater potential for variations between CPUStart and the true animation time.

Right here we’ve got a PresentMon seize of Cyberpunk 2077. The X-axis exhibits particular person, logged frames, and the Y-axis is the delta between CPUStartTimeInMs and AnimationTime for every body. Usually, this may end in a superbly flat line at zero, however since we’ve got XeLL enabled, AnimationTime is predicated on SimStart as a substitute. With XeLL there is a vital delta between the 2 values on practically each body, which exhibits that AnimationError could be incorrect if it have been based mostly on CPUStartTimeInMs when low-latency modes like XeLL, Reflex, and Anti-Lag 2 are enabled. AMD’s Anti-Lag 2 would not generate SimStart occasions that PresentMon can seize, so (for now) we can’t have the ability to precisely rating AnimationError with that characteristic enabled.

MsCPUBusy: This era begins at CPUStartTimeInMS and consists of steps that Intel labels as Sport and Render. “Sport” is the time spent dealing with recreation logic and calculations for the body, and “Render” is the time spent changing the outcomes into API calls (DirectX, Vulkan, and many others.).

The top of those CPU-specific duties is marked by the Current() name, which indicators to the GPU that it has the whole lot it wants for rendering. Future variations of PresentMon could break this down additional as a result of CPU work is advanced.

TimeInMs: That is the timestamp of the Current() name we simply talked about. It is necessary to keep in mind that this name would not mark the start of the GPU rendering step, as a result of the GPU can get a head begin earlier than the CPU is completed producing API calls. 

Normally the tip of the Current() name is the CPUStartTimeInMS of the subsequent body.

MsBetweenPresents: The delta between this body’s Current() name (TimeInMs) and the earlier body’s. Within the previous days, time between Current() calls was used as a (pretty good) approximation for frametimes, however it’s technically a unique factor. For that motive, MsBetweenPresents is unusable for per-frame calculations like animation error.

MsInPresentAPI: This is similar as MsCPUWait. That is the interval between the Current() name and the second when the CPU begins engaged on a brand new body, which means that there is nothing blocking additional CPU work. 

MsBetweenAppStart: PresentMon’s finest illustration of the literal time taken to create a person body start-to-finish (an enchancment over MsBetweenPresents). It is the delta between the CPU beginning work on one body and the subsequent, so the distinction between CPUStartTimeInMs for the present body and CPUStartTimeInMs for the subsequent body (or MsCPUBusy plus MsCPUWait).

MsBetweenSimulationStart: This column would rely on SimStart occasions from Reflex or XeLL. Within the present model of PresentMon, MsBetweenSimulationStart is “disabled till underlying occasion help is enabled.”

PresentMon GPU Metrics: We do not want these numbers as a way to calculate AnimationError, however we’ll go over them briefly. They embody:

MsGPUTime: The whole GPU render interval, comprising GPUBusy and GPUWait durations. This was previously known as msGPUActive.

MsGPUBusy: The portion of the render interval “throughout which at the very least one GPU engine is executing work from the goal course of.”

MsGPUWait: The portion of the GPU render interval the place the GPU was idle, probably resulting from some codependency on CPU sources.

PresentMon Show Metrics: 

MsUntilDisplayed: The time between the Current() name for the body (TimeInMs) and the time at which the body is displayed. You’ll be able to calculate the timestamp at which the body is displayed by including these numbers, however it is not logged instantly. “Displayed” right here implies that a flip (pointing to a brand new body buffer) is signalled to the working system. That is totally different from new pixels actually displaying up on the bodily monitor, though the timing must be very shut.

MsBetweenDisplayChange: How lengthy the earlier body was displayed earlier than the present body began to be displayed. There’s an argument to be made that this displays the person expertise extra instantly than MsBetweenAppStart, however MsBetweenAppStart is instantly tied to efficiency, in order that metric continues to be higher for testing {hardware}. Nonetheless, since show occurs on the finish of the pipeline, MsBetweenDisplayChange is the one solution to embody post-processing stuff like generated frames and RTX 5000 body metering in outcomes (if you’d like that). 

Mixed Metrics:

MsGPULatency: The interval between absolutely the begin of labor on the body (CPUStartTimeInMs) and the purpose at which the GPU began engaged on it. The beginning of the GPU render interval could be inferred from this.

MsRenderPresentLatency: That is the interval from the Current() name on the finish of CPU rendering to the tip of GPU rendering. This is the same as MsUntilDisplayed except VSYNC is enabled.

MsAnimationError: This is how Intel represents the animation error components for body N:

(AnimationTimeN – AnimationTimeN-1) – MsBetweenDisplayChangeN

Once more, the results of the components could be optimistic or destructive, however additional away from zero is all the time worse. Including collectively all of the optimistic and destructive animation errors for a logging interval will sometimes cancel out, so to get a helpful whole we have to take absolute values.

Within the phrases of Tom Petersen, “The animation step is mainly equal to the frametime, principally. There’s some instances the place it is slightly totally different. However what you want that for is to be correlated with the DisplayedTime step. As a result of if the DisplayedTime step is totally different from the animation time step, you will get a simulation time error, which is measuring stutter instantly for the primary time.”

We have established that AnimationTime is the in-game cut-off date {that a} given body depicts. The delta between animation instances for consecutive frames is the quantity of in-game time that has handed between them.

We have additionally established that MsBetweenDisplayChange is the time {that a} body is displayed earlier than the subsequent one exhibits up.

If a very long time passes between taking snapshots of the sport state, a very long time ought to move between displaying the snapshots. Even when the AnimationTimes are spiky and uneven, the DisplayedTimes must be matched precisely, or else you get AnimationError.

Conclusion

Go to our Patreon web page to contribute just a few {dollars} towards this web site’s operation (or think about a direct donation or shopping for one thing from our GN Retailer!) Moreover, once you buy via hyperlinks to retailers on our website, we could earn a small affiliate fee.

This doesn’t change present testing or run as a substitute of it. It’s one other software — just like frametime charts — to assist higher perceive what’s taking place in a recreation. There’s additionally lots of theoretical conditions right here, so it isn’t all the time sensible.

1% and 0.1% lows as bars on a chart took on a lifetime of their very own over the past decade. They’re nonetheless the quickest “glanceability,” and we’re glad we launched them to our charts now 11 years in the past, and we’re going to proceed to make use of them. However it’s time to attempt to discover new metrics, and we hope animation error can complement the 1% and 0.1% common bar representations of frametime pacing as one other means to find out why a recreation simply feels unhealthy typically.

However we do not wish to oversell what this quantity truly means. In Tom Petersen’s phrases: this can be a “how jittery am I” metric. Our work right here on and off over the previous couple years, and extra critically over the previous month, has been attempting to show the idea and discover a solution to put it on a chart that is smart to anyone. 

We devoted lots of time to explaining how and why it is totally different from the numbers that we already measure, however in follow and typically, we count on it to enrich these numbers, not contradict them. 

That additionally means we aren’t essentially anticipating any large upsets versus what we have already concluded in present evaluations.

That stated, this can be a useful new software that may do a number of different issues for us: it might probably present us when we have to make a frametime plot extra simply, it might probably present us when stuttering occurs independently from frametime spikes (though which may be unlikely), it might probably (form of) normalize for frametimes in a means that makes comparisons between totally different {hardware} simpler, and it might probably take care of up-and-down frametime tendencies throughout take a look at passes successfully.

Most significantly, animation error forces us to consider why we measure the issues we do. We’re now nearer to discussing why stuttering feels unhealthy, not simply the truth that it exists. 

We’re nonetheless experimenting with methods to make it helpful. 

If you wish to attempt it out for your self, PresentMon is free and open-source, and it now has a GUI model out there as properly. Should you’re a reviewer and you discover this convenient in creating your personal strategies, we’d recognize you pointing again right here.

We’ve been utilizing PresentMon for years, and really, most individuals who’ve examined recreation efficiency have — they simply typically don’t realize it. 

PresentMon is wrapped by half a dozen different instruments that reskin it or use it in some capability and it’s an open supply venture with contributions from across the {hardware} group.

We use the command line model, however there’s additionally a person interface software you could see in our video the place we launched Jay to it beforehand. They generally have totally different options.

Due to that, a fast safety warning first: don’t go to PresentMon dot com or obtain something from that website. PresentMon is hosted on GitHub and Intel.com and is an open supply utility. Normally when somebody pretends PresentMon is their very own venture and reskins it with an interface, they at the very least give you a brand new title. For safety functions, we’d advise solely downloading PresentMon itself from GitHub or the Intel website.

Testing animation error like that is thrilling as a result of we will lastly instantly rating stuttering as a substitute of merely deducing it. Massive image, that is just like when PresentMon moved away from MsBetweenPresents: the numbers and conclusions could not change a lot, however the measurements are nearer to what we’re actually speaking about. 

This isn’t the endgame of benchmarking. Hopefully there received’t be one, as a result of that’d be boring. There’s much more to study and that is exploratory and simply us placing analysis out to the web to experiment with. It’s additionally as much as the distributors to play ball with open supply instruments like PresentMon. Experiment with the brand new concepts and level again to us should you discover our work useful as a basis, and credit score to Tom Petersen for opening up the instruments to measure these metrics.