Co-authored by Mario Tomás Serrafero & Steven Zimmerman. Thank you to John Poole from Primate Labs for his assistance.
A few years ago there was a considerable uproar, when numerous major manufacturers were caught cheating on benchmarks. OEMs of all sizes (including Samsung, HTC, Sony, and LG) took part in this arms race of attempting to fool users without getting caught, but thankfully they eventually stopped their benchmark cheating after some frank discussions with industry experts and journalists. After a bit of a public berating (and some private conversations) from technology publications, industry leaders, and the general public, most manufacturers got the message that benchmark cheating was simply not acceptable, and stopped as a result. Most of the few that didn’t stop at that point stopped soon after, as there were substantial changes made to how many benchmarks run, in an attempt to discourage benchmark cheating (by reducing the benefit from it). Many benchmarks were made longer so that the thermal throttling from maximizing clock speeds would become immediately apparent. Others went further, and started building in ways to detect and prevent benchmark cheating directly (as Geekbench did). With the combination of public outcry and reduced effectiveness, benchmark cheating largely disappeared from the mainstream. Unfortunately, it appears that some OEMs still haven’t gotten the message though, as we found evidence that OnePlus and Meizu are still trying to cheat on benchmarks.
Back in 2013, it was discovered that the Samsung was artificially boosting its GPU clock speeds in certain applications, sparking a series of investigations into benchmark cheating across the whole range of manufacturers. At the time, the investigation found that almost every manufacturer except for Google/Motorola were engaging in benchmark cheating. They were all investing time and money into attempts to eke a little bit extra performance out of their phones in benchmarks, in ways that wouldn’t have any positive effect on everyday usage, in an attempt to fool users into thinking that their phones were faster than they actually were.
It came as a bit of a shock to much of the Android enthusiast community at the time. While many people had suspected that OEMs were trying to game benchmarks, there hadn’t been much in the way of evidence up until that point. With that hard evidence in hand, users started demanding change (especially once they realized that the efforts to cheat on benchmarks often came at the cost of making the experience worse in normal usage). They didn’t want to be lied to, and they certainly didn’t want their user experience to be made worse (through higher temperatures and worse battery life) just so that OEMs could get higher benchmark scores.
In response to these investigations, UL (creators of 3DMark and PCMark) delisted numerous devices from their scoreboards, and benchmark developers across the industry (from Geekbench to GFXBench) started looking for ways to prevent benchmark cheating from affecting their scores. Eventually, with the combination of bad press and benchmarks that actively fought cheating, most OEMs stopped.
While testing out the OnePlus 3T’s OxygenOS Nougat update, we noticed some oddities with the CPU behavior. While the update did bring the phone up to Android 7.0, it also brought along some performance changes, including a more aggressive policy for ramping up the CPU in certain situations. The CPU appeared to be overly aggressive in how it ramped up the clocks for the little cores (sometimes jumping up to 100% usage across all four little cores for no readily apparent reason), while the big cores were sitting mostly idle. When we dug a little deeper, we found something far more interesting.
We brought a OnePlus 3T to Primate Labs’ office in Toronto for some initial analysis. The initial testing included a ROM dump which found that the OnePlus 3T was directly looking for quite a few apps by name. Most notably, the OnePlus 3T was looking for Geekbench, AnTuTu, Androbench, Quadrant, Vellamo, and GFXBench. As by this point we had fairly clear evidence that OnePlus was engaging in benchmark cheating, Primate Labs built a “Bob’s Mini Golf Putt” version of Geekbench 4 for us. Thanks to the substantial changes between Geekbench 3 and 4, the “Mini Golf” version had to be rebuilt from the ground up specifically for this testing. This “Mini Golf” version of Geekbench 4 works exactly the same as the regular version, however since it has a different name, the phone doesn’t know that it is a benchmark, and as a result does not activate its benchmark cheating code.
The results were not pretty. As we saw during our initial investigation, the little cores were ramping up far more aggressively than they normally would. As you can see in the chart below, when the hidden version of Geekbench 4 was running, the little cluster only hit its max frequency 29.7% of the time (and often sat at its minimum frequency of ~300MHz), but when the normal version was running, it hit its max frequency 86.8% of the time (and almost never sat at its minimum frequency). That means that the little cluster was running at full speed almost 3 times more often while running the normal version of Geekbench. The big cluster also saw a substantial increase in usage, going from 0.8% of the time at its max frequency to 4.4% of the time (over 5 times more often). That is a massive difference, and it results in a substantial score increase (going from 1972 / 4723 to 2041 / 4910 in this particular set of runs), especially when you remember that the OnePlus 3T is just targeting the application name. It isn’t doing anything clever, it isn’t trying to detect certain types of workloads, it is just looking to see if you are running Geekbench.
This is particularly interesting, as it seems like a very targeted approach. As you may know, the big cores in the Snapdragon 821 that the OnePlus 3T uses are substantially more powerful than the little cores (with one big core being about as powerful as all four little cores combined in Geekbench). Yet, the benchmark cheating seems to have a bigger effect on the little cores. While the big cores are seeing a larger relative increase in usage at max frequency, they spend so little time there to begin with, that it has a much smaller effect overall on the score. The benchmark cheating code in the OnePlus 3T gets a large boost out of the little cores (jumping from ~30% of the time at max frequency to ~87% of the time), which results in a substantial score increase. It makes one wonder if this was an attempt to fly under the radar, as most people looking for benchmark cheating code would be expecting the big cores to be targeted.
The OnePlus 3T was looking for Geekbench, AnTuTu, Androbench, Quadrant, Vellamo, and GFXBench
After seeing these results with the OnePlus 3T, we decided to test some other devices. We had previously covered the Meizu Pro 6’s interesting benchmark cheating method in our interview with Geekbench’s John Poole, and we were curious to see if it was still active. We reached out to Meizu to see if they had a Pro 6 available for us to test, and they were kind enough to provide one. Unfortunately, despite being aware that they were cheating on benchmarks, Meizu doesn’t appear to have removed their benchmark cheating code. The Meizu Pro 6 still appears to be disabling cores while not in a benchmark, resulting in a substantial score decrease when tested with a hidden version of Geekbench (with the score dropping from 1594 / 5092 in the normal version of Geekbench to just 822 / 3168 in our hidden version). While there is an argument that could be made for disabling cores to save power and reduce heat (as Meizu argued when they were originally caught), this should be done in a way that doesn’t mislead users. At the very least, it should be made clear to users what is going on, and ideally there should be a toggle available in settings to disable it.
Following our testing, we reached out to OnePlus about the issues we found. In response, OnePlus swiftly promised to stop targeting benchmarking apps with their benchmark cheating, but still intend to keep it for games (which also get benchmarked). In a future build of OxygenOS, this mechanism will not be triggered by benchmarks. OnePlus has been receptive of our suggestion to add a toggle as well, so that users know what is going on under the hood, and at the very least the unfair and misleading advantage in benchmarks should be corrected.
We also reached out to Meizu for comment regarding their continued benchmark cheating, however we have not received a response as of the time of publishing.
We are pleased to hear that OnePlus will be removing the benchmark cheating from their phones. Going forward we will continue to attempt to pressure OEMs to be more consumer friendly.
What do you think about benchmark cheating? Let us know in the comments below!