skip to content



M3 Architecture Research Group

Computer Systems Laboratory
361 Frank H.T. Rhodes Hall
Ithaca, NY 14853 USA
m3 at csl.cornell.edu

Reconfigurable and Self-optimizing Hardware

Chip multiprocessors (CMPs) have emerged as attractive alternatives to complex monolithic superscalar cores. Although a general consensus exists on the power, performance and complexity advantages of CMP architectures, the way CMPs will meet the intrinsically diverse requirements of the software that will run on them is much less clear.

In the short term, on-chip integration of a modest number of cores may yield high utilization when running multiple sequential applications. In that case, sequential programs will still favor relatively large cores that can extract high levels of ILP. However, although sequential codes are likely to remain important, they alone are not sufficient to sustain long-term performance scalability. Consequently, harnessing the full potential of CMPs in the long term also necessitates the adoption of parallel programming to build future applications. Overall, the conflicting demands of this software diversity, compounded by the need to support multiple such applications in a multiprogrammed environment, requires a level of flexibility that is hard to come by today in the research literature, much less in the market.

In [ISCA '07] we investigate a novel reconfigurable hardware mechanism that we call core fusion. It is an architectural technique that empowers groups of relatively simple and fundamentally independent CMP cores with the ability to "fuse" into one large CPU on demand. The goal is to "synthesize" dynamically the right CMP architecture based on software needs at each point in time. We envision a core fusion CMP as a homogeneous substrate with conventional memory coherence/consistency support, where groups of up to four adjacent cores and their i- and d-caches can be fused at run-time into CPUs that have up to four times the fetch, issue, and commit width, and up to four times the i-cache, d-cache, branch predictor, and BTB size.

In [ISCA'08] we propose the use of machine learning technology in designing a self-optimizing, adaptive memory controller capable of planning, learning, and continuously adapting to changing workload demands. We formulate memory access scheduling using reinforcement learning, a field of machine learning that studies how autonomous agents situated in stochastic environments can learn optimal control policies through interaction with their environment. Utimately, a reinforcement learning design approach allows the hardware designer to focus on what variables to monitor and what performance to optimize, rather than devising a policy that describes how it should be done.

Support

This work is supported in part by NSF awards CNS-0509404 and CNS-0720773, an IBM Faculty Award, and equipment donated by Intel.