Due to the relative high cost of accessing the main memory, modern processors make use of a data and an instruction cache in an attempt to lower the latency involved in accessing the main memory.
A processor may choose to keep in its cache the values of frequently accessed memory addresses, or perhaps prefetch the values of adjacent memory addresses for possible future usage.
When the processor attempts to access a memory address, it first check to see whether it's already exist in one of its caches (L1, L2, L3 etc.). If the address isn't found, then we have a "cache miss" and the processor we'll have to perform a round-trip to the main memory in order to obtain the required data. Wasting valuable CPU cycles on the way.
Not too long ago, one of the most popular guidelines in designing cache-aware data structures was to split them into "hot" and "cold" parts. This kind of separation makes sense since it could lead to a more efficient utilization of the cache. Frequently used fields (the hot part) are grouped together in the same cache lines, instead of being mixed with infrequently used fields (the cold part). This way, the cache is able to contain more hot data, and cache lines aren't wasted to hold cold data.
For those of you who are interested, the subject is covered in greater detail in the article Splitting Data Objects to Increase Cache Utilization by Michael Franz and Thomas Kistler.
Red: Hot data, Green: Cold data
While following this guideline could lead to a more efficient cache utilization, it won't necessarily lead to a performance improvement in a multithreaded environment. In fact, it's more likely that you'll face a performance degradation than gaining any performance improvement at all.
The problem in hand is that highly utilized caches could easily send your valuable CPU cycles down the drain, due to the effects of false sharing. Once a single cache line holds several frequently written-to fields, the chance it will get invalidated by another processor gets greater. Sharing a single cache line with multiple frequently modified fields could easily have a negative effect on your cache's locality.
In multithreaded environments, one could benefit from sparsely allocating frequently used fields in the cache. There's an obvious trade-off between cache utilization (memory usage) and locality (performance). While the cache will contain less hot data (which could result in more round-trips to the main memory), it would benefit from better locality, hence better scalability across multiple processors that might attempt to modify the same object simultaneously.
When designing a cache-aware data structure, you don't necessarily have to order your fields in such way that a hot field will always get followed by a cold field. Instead, "artificial" padding could be used to fill the excessive space left in the cache line which holds the structure. In .Net, decorating the type with an StructLayout(LayoutKind.Explicit) attribute and assigning each field with its appropriate offset would do the job.