Hi all. In my last post, I showed that the AMD Ryzen 9 3950x is quite a good processor for fighting diseases like Cancer, Alzheimer’s, and COVID-19. Folding@Home, the distributed computing project helping researchers understand various diseases, definitely makes good use of the 16 cores / 32 threads on the 3950x.
In this article, I’m taking a look at how virtualized CPU cores (Simultaneous Multithreading in AMD speak or Hyperthreading for you Intel fans) helps computational performance and efficiency when running Folding@Home on a high-end CPU such as the Ryzen 9 3950x.
Instead of regurgitating all of the previous information, here are some links to bring you up to speed if you haven’t read the previous posts.
AMD Ryzen 9 3950x Review: Part 1 (Overview)
AMD Ryzen 9 3950X Review: Part 2 (Average Results vs. # of Threads)
For this test, I used the same settings as in Part 2, except that I disabled SMT in the BIOS on my motherboard. Thus, Windows 10 will only see the 16 physical CPU cores, and will not be able to run two logical threads per CPU core. As before, I ran all testing using Folding@Home’s V7 client. I set the CPU slot configuration for a thread value of 1-16. At each setting, I ran five work units and averaged the results. Note that AMD’s core performance boost was turned off for all tests, so at all times the processor ran at 3.5 GHz.
As expected, as you throw more CPU cores at a problem, the computer can chew through the math faster. Thus, more science gets done in a given amount of time. In the case of Folding@Home, this performance is rated in terms of Points Per Day (PPD). The following plot shows the increase in computational performance as a function of # of threads utilized by the solver. Unlike in my previous testing on the 3950x, here an increase of 1 thread corresponds to an increase of 1 engaged CPU core, since virtual threads (SMT / Hyperthreading) are disabled.
The plot below includes the individual samples at each data point as light gray dots, as well as a + / – 2 sigma (95%) confidence interval. This means that 95% of the results for a given thread setting are statistically predicted to fall within the dashed lines.
As a side note, certain settings of thread count actually result in the exact same performance, because the Folding@Home client is internally using a different number than the specified value. For example, setting the CPU slot to 5 threads will still result in a 4-thread solve, because the solver is avoiding the numerical issues that occur when trying to stitch the solution together with 5 threads (5 is a tricky prime number to work with numerically). I noted these regions on the plot. If you would like more detail about this, please read the previous part of this review (part 2).
One interesting observation is that the maximum performance occurs with 15 CPU cores enabled, not the complete 16! This is somewhat similar to what was observed in Part 2 of this review (SMT enabled), where 30 threads provided slightly more points than 32 threads. More on that in a moment…
Using my P3 Kill A Watt Power Meter, I measured the power consumption of the entire computer at the wall. As expected, as you increase the number of CPUs engaged, the instantaneous power consumption goes up. The power numbers reported here are averaged by “the eyeball method”, since the actual instantaneous power goes up and down by a few watts as the computer does its thing. I’d estimate that these numbers are accurate within 5 watts.
The ultimate goal of this blog is to find the most efficient settings for computer hardware, so that we can do the most scientific research for a given amount of power consumption. Thus, this next plot is just performance (in PPD) divided by power consumption (in watts). I left off all the work unit variation and confidence interval lines, since it looks about the same as the performance plot, and it’s cleaner with just the one average line.
As with performance, setting Folding@Home to use 15 CPUs instead of the full 16 is surprisingly the best option for efficiency. The difference is pretty profound here, as the processor used more power at 16 threads than at 15 threads while producing less points at 16 threads than at 15.
Comparison to Hyperthreaded Results
To get a better idea of what’s going on, here are the same three plots again with the average results overlaid on the previous results from when SMT was enabled. Of course the SMT results go up to 32 threads, since with virtual cores enabled, the 16-core Ryzen 9 3950x can support 32 total threads.
Disabling SMT (aka Hyperthreading) essentially limits the Ryzen 9 3950x to a maximum thread count of 16 (one thread per physical core). The results from 1-16 threads are very similar to those results obtained with SMT enabled. Due to work unit variation, the performance and efficiency plots show what I would say is effectively the same result with SMT on vs. off, up to 16 threads. One thing to note was that the power consumption in the 12-16 thread range did trend higher for the SMT off case, although the offset was small (about 5-10 watts). This is likely due to Windows scheduling work to a new physical core to handle the higher thread count when SMT is disabled, as opposed to virtualizing the work onto an already-running core using SMT. Ultimately, this slightly higher power consumption didn’t have a noticeable effect on the efficiency plot.
The big takeaway is that for thread counts above 16 (the physical core count), the Ryzen 9 3950x can utilize thread virtualization very well. The logical processors that Windows sees don’t work quite as well as true physical cores (hence the decrease in slope on the performance and efficiency plots above 16 CPUs). However, when the thread count is doubled, SMT still does allow the processor to eek out an extra 100K PPD (about 33% more) and run more efficiently than when it is limited to scheduling work to physical CPUs.
Pro Tip #1: Turn on Hyperthreading / SMT and run with high core counts to get the most out of Folding@Home!
The final observation worth noting is that in both cases, setting the F@H client to use the maximum available number of threads (16 for SMT off, 32 for SMT on) is not the fastest or most efficient setting. Backing the physical core count down to 15 (and, similarly, the SMT core count down to 30) results in the fastest and most efficient solver performance.
My theory is that by leaving one physical core free (one physical core = 2 threads with SMT on), the computer has enough spare capacity to run all the crap that Windows 10 does in the background. Thus, there is less competition for CPU resources, and everything just works better. The computer is also easier to use for other tasks when you don’t fully max out the CPU core count. This is also especially valuable for those people also trying to fold on a GPU while CPU folding (more on that in the next article).
Pro Tip #2: For high core count CPUs, don’t fold at 100% of your processor’s core capacity. Go right to the limit, and then back it off by a core.
Since you’re using SMT / Hyperthreading due to Pro Tip #1, this means setting the CPUs box in the client to 2 less than the maximum allowed. On my 16-core, 32-thread Ryzen 9 3950x, this means CPUs = 32 (theoretical max) – 2 (2 threads per core) = 30
This result will be different on CPUs with different numbers of cores, so YMMV…I always recommend testing out your individual processor. For lower core count processors such as Intel’s quad core Q6600, running with the maximum number of cores offers the best performance. I previously showed this here.
In the next article, I’m going to kick off folding on the GPU, an Nvidia GeForce 1650, which I previously tested by its lonesome here. In a CPU + GPU folding configuration, it’s important to make sure the CPU has enough resources free to “feed” the GPU, or else points will suffer.
I’ve also started re-running the thread tests with Core Performance Boost enabled. This allows the processor to scale up in frequency automatically based on the power and thermal headroom. This should significantly change the character of the SMT On and SMT Off plots, since everything up till now has been run at the stock speed of 3.5 GHz.
Support My Blog (please!)
If you are interested in measuring the power consumption of your own computer (or any device), please consider purchasing a P3 Kill A Watt Power Meter from Amazon. You’ll be surprised what a $35 investment in a watt meter can tell you about your home’s power usage, and if you make a few changes based on what you learn you will save money every year! Using this link won’t cost you anything extra, but will provide me with a small percentage of the sale to support the site hosting fees of GreenFolding@Home.
If you enjoyed this article, perhaps you are in the market for an AMD Ryzen 9 3950x or similar Ryzen processor. If so, please consider using one of the links below to buy one from Amazon. Thanks for reading!