This story first appeared on ShroutResearch.com.
Last week the gang at Anandtech posted a story discovering systematic cheating by Huawei in smartphone benchmarks. In its story, AT focused on 3DMark and GFXBench, looking at how the Chinese-based silicon and phone provider was artificially increasing benchmark scores to gain an advantage in its battles with other smartphone providers and SoC vendors like Qualcomm.
As a result of that testing, UL Benchmarks (who acquired Futuremark) delisted several Huawei smartphones from 3DMark, taking the artificial scores down from the leaderboards. This puts the existing device reviews in question while also pulling a cloud over the recently announced (and impressive sounding) Kirin 980 SoC meant to battle with the Snapdragon 845 and next-gen Qualcomm product. The Kirin 980 will be the first shipping processor to integrate high performance Arm Cortex-A76 cores, so the need to cheat on performance claims is questionable.
Just a day after this story broke, UL and Huawei released a joint statement that is, quite honestly, laughable.
"In the discussion, Huawei explained that its smartphones use an artificial intelligent resource scheduling mechanism. Because different scenarios have different resource needs, the latest Huawei handsets leverage innovative technologies such as artificial intelligence to optimize resource allocation in a way so that the hardware can demonstrate its capabilities to the fullest extent, while fulfilling user demands across all scenarios.
To somehow assert that any kind of AI processing is happening on Huawei devices that is responsible for the performance differences that Anandtech measured is at best naïve and at worst straight out lying. This criticism is aimed at both Huawei and UL Benchmarks – I would assume that a company with as much experience in performance evaluation would not succumb to this kind of messaging.
After that AT story was posted, I started talking with the team that builds Geekbench, one of the most widely used and respected benchmarks for processors on mobile devices and PCs. It provides a valuable resource of comparative performance and leaderboards. As it turns out, Huawei devices are exhibiting the same cheating behavior in this benchmark.
Below I have compiled results from Geekbench that were run by developer John Poole on a Huawei P20 Pro device powered by the Kirin 970 SoC. (Private app results, public app results.) To be clear: the public version is the application package as downloaded from the Google Play Store while the private version is a custom build he created to test against this behavior. It uses absolutely identical workloads and only renames the package and does basic string replacement in the application.
Clearly the Huawei P20 Pro is increasing performance on the public version of the Geekbench test and not on the private version, despite using identical workloads on both. In the single threaded tests, the total score is 6.5% lower with the largest outlier being in the memory performance sub-score, where the true result is 14.3% slower than the inaccurate public version result. Raw integer performance drops by 3.7% and floating-point performance falls by 5.6%.
The multi-threaded score differences are much more substantial. Floating point performance drops by 26% in the private version of Geekbench, taking a significant hit that would no doubt affect its placement in the leaderboards and reviews of flagship Android smartphones.
Overall, the performance of the Huawei P20 Pro is 6.5% slower in single threaded testing and 16.7% slower in multi-threaded testing when the artificial score inflation in place within the Huawei customized OS is removed. Despite claims to the contrary, and that somehow an AI system is being used to recognize specific user scenarios and improve performance, this is another data point to prove that Huawei was hoping to pull one over on the media and consumers with invalid performance comparisons.
Some have asked me why this issue matters; if the hardware is clearly capable of performance like this, why should Huawei and HiSilicon not be able to present it that way? The higher performance results that 3DMark, GFXBench, and now Geekbench show are not indicative of the performance consumers get with their devices on real applications. The entire goal of benchmarks and reviews is to try to convey the experience a buyer would get for a smartphone, or anything else for that matter.
If Huawei wanted one of its devices to offer this level of performance in games and other applications, it could do so, but at the expense of other traits. Skin temperature, battery life, and device lifespan could all be impacted – something that would definitely affect the reviews and reception of a smartphone. Hence, the practice of cheating in an attempt to have the best of both.
The sad part about all of this is that Huawei’s flagship smartphones have been exceptional in nearly every way. Design, screen quality, camera integration, features; the Mate and P-series devices have been excellent representations of what an Android device can be. Unfortunately, for enthusiasts that follow the market, this situation will follow the company and cloud some of those positives.
Today’s data shows that the story of Huawei and benchmarks goes beyond just 3DMark and GFXBench. We will be watching this closely to see how Huawei responds and if any kinds of updates to existing hardware are distributed. And, as the release of Kirin 980 devices nears, you can be sure that testing and evaluation of these will get a more scrutinizing eye than ever.
If Kirin 980 is as powerful
If Kirin 980 is as powerful as they say and with every reviewer and every benchmark being much more careful with the scores the whole debacle may be beneficial for Mate 20.
The only thing powerful about
The only thing powerful about the 980 is the CPU. GPU is trash which is why they added GPU turbo.
I can’t muster any outrage
I can’t muster any outrage over this. Simply put, 1-click synthetic benchmark apps is a sham for lazy reviewers and dick meassuring “X brand” fanboys. Back in the day we used rulers for that kind of size meassureing, synthetic benchmark apps is just the evolution of the ruler. Now how good is it at fucking? If we are going to continue in the same analogy. Josh would approve this 😉
I’ve never taken that kind of benchmark into any consideration when bying a new smartphone. So let’s talk about how this affect real world apps and battery life and then we can revisit the outrage-o-meter.
Ryan, going to have to
Ryan, going to have to disagree here. How is this cheating when it actually extends to real gaming performance? I can understand if the only thing inflated was benchmarking scores but PubG for example is benefiting from the turbo gpu technology.
Do you really think the
Do you really think the opportunistic/fake turbo GPU technology is a benefit when it comes to toast your smartphone and reduce the operating cycle with battery?
Oh yeah you are so right. I
Oh yeah you are so right. I overclocked my desktop processor and it shortened the life span. /sarcasm
I’m afraid you can’t replace
I’m afraid you can’t replace the cooling system for your junkphone as easily as for your desktop computer in order to overclock the processor.
I for one am sick of the
I for one am sick of the phone upgrade race anyways. The entire concept that every 2 years we need to consume a new phone with no end in site until the end of time. Fighting simply to be the bestest thing for the sheep to buy the bestest, it’s getting boring and sad.
The entire phone benchmarking
The entire phone benchmarking process is rigged and lacks the required standardized scientific methods necesary to remove all unknown variables.
Also of more difficulty is the dedicated DSP/AI and other specilized processors that the phone makers make use of and that’s becoming more common. That and all these semi-custom ARM designs are somewhat different than the ARM Holdings Reference Designs with the fully custom ARM Cores not using any of ARM Holding’s IP other than the Fully Custom ARM cores being engineered to execute the ARMv8A/newer ISA. The Android based OSs on the phones are also customizied by the device’s OEM so there is differences there also.
I’d rather see the SOC tested outside of the phone on some testing mule to properly see the SOC’s full capability then tested again in the specific phone SKU just to see how much the OEM has compromised things in order to allow for more battery life and other such limitations of their specific SKU. Everyone should know the Phone OEM’s MO by now, the same holds true for the laptops OEMs.
The best way to know about any CPU/Platform is a complete breakdown of the CPU cores specific design and the design of any DSP/AI/Other processor/acceleator IP at the hardware level. And that Big:Middle:Little design of the Kirin 980 is different and new from HiSilicon(Huawei)/ARM Holdings(Softbank) as is that DynamIQ CPU cluster arrangement. DynamIQ is from ARM Holdings and not HiSilicon so that’s just more unknown variables that can be used to game the banchmarks.
Huawei Kirin Cheating, Intel AnTuTu cheating, and all the others that are cheating and have cheated in the past! And it’s the fools that buy new phones before their current phones wear out that are more to blame for creating a market that attracts the cheating in the first place.
With all the Android OSs that are semi-customizied to such a degree by the OEMs it means that these types of things need to be called out more often but this type of news is nothing new for the smart phone market that has been cheating for years.
“Chinese-based silicon” and ARM Holdings is owned by a Japanese Holding company Softbank with worldwide operations and investors. So what’s the difference Samsung, Intel, whatever the Chips maker/s the OEM Phone industry is corrupt as is the entire technology industry.
Huawei Technologies a “Collective” my A$$ they are an authoritarian Company more like any Company really with profit as the main motivator.
PCPer your posting system is being screwed up by your ad partners again! Lousy scripting trying to scan the post content for directed advertising! They are screwing up any text highlighting Copy/Paste functionality currently!
Geekbench is a respected name
Geekbench is a respected name in testing? Are you kidding? I didn’t see any sarcasm tag, maybe it got eaten by the HTML parsing. Geekbench is long know to be next to useless and to have signifigant platform bias.
And, similar to what CK said, one click benchmarking is about as close to a randon number generator and only encourages this kind of cheating.