A distributed variable-size cache placement architecture includes plural cache storage units (CSUs), each of which includes a CSU control logic, an address director, a data director, a placement array, placement logic (i.e., distribution controller), and a set associative memory for caching data. All CSUs in the architecture are connected over a communication network to a single processor interface and mainstore and which provides information to all CSUs about the status of each of the other CSUs. Any number of CSUs may be connected in parallel to provide a variable size cache. All CSUs sharing a processor interface utilize the same placement block size, contain the same number of sets of cache elements and use the same CSU placement mechanism.
The cache temporarily stores the data of the mainstorage element and tracks the data mainstore address. The cache handles all references to a mainstore address while the associated data is present in the cache.
A data reference, or search, which finds the information present in the cache is referred to as a cache HIT. When a data reference fails to find the data present in the cache the access is a cache MISS.
When a cache MISS occurs, the desired cache line from mainstorage is placed into the cache so that future references to the mainstorage element will result in a cache HIT. A cache MISS displaces any stale data present in the cache. The cache element used to store the data from mainstore is determined by the placement algorithm.
Known caching systems use a central placement algorithm. This arrangement is used in discrete cache implementations as well as in implementations which are integrated with processors. Some examples of known prior art cache implementations include cache tag devices, cache data devices and integrated cache devices.
The placement algorithm consists of a stored state for each set of elements in the cache and control logic for changing the state of this stored information. Prior art uses a central storage mechanism to track the placement element for each set in the SET ASSOCIATIVE cache memory, giving a fixed and limited number of elements for each set.
An example of an integrated cache device is found on the NEC uPD43608 8K byte cache device. This device supports programmable selection of interface signals, data width and up to three additional cache devices in the system. The internal organization includes 128 sets having four elements per set and a 16 byte cache block size. Another form of the integrated cache device is used in the Intel 82395 cache device, which is intended for use with the Intel 80386 integrated circuit. The Intel 82395 device has a special configuration option which allows the user to select a total of one, two or four devices in a system. The number of sets is increased by a factor equal to the number of devices present in the system, however, no more than four devices may be used in any one system. All system configurations have a four-way set associative memory.
Another form of the integrated cache is found in a chip set which utilizes the Intel 82495 cache controller and multiple 82490 cache data devices. The 82495 and 82490 combination offers a number of different configurations, however, they are all intended to work with the Intel 80486 processor and include an interface which is optimized for that processor. Use of the 82495 and 82490 caches are as secondary caches, as the 80486 circuit has an internal cache integrally formed therein. When arrayed with the 80486 processor, the 82495 and 82490 results in a two-way set associative memory.
While the aforementioned architectures are useful for their intended purpose, they do not have sufficient flexibility to allow distribution of data among a number of cache locations. Also increasing the size of the aforementioned architectures will have a major impact on the design of the circuit board.
The subject of this patent is a new mechanism for implementing the placement algorithm of mainstorage elements into a cache storage. This implementation allows the placement algorithm to be distributed across any number of devices. Additionally, only a subset of the total elements in each set need be present in the system for correct system operation.
An object of the invention is to provide a distributed cache placement architecture which allows the caching of information in a variety of locations.
Another object of the invention is to provide a single device type which may be operated in parallel to provide a cache storage sub-system.
A further object of the invention is to provide a cache placement architecture which supports a variable cache size.
Another object of the invention is to provide a single cache device type which may be configured in different caching densities and which may still be incorporated into a single system.
Yet another object of the invention is to provide a cache architecture which may be hardware-updated as new storage devices become available.
Another object of the invention is to provide a cache architecture which includes a mechanism for minimizing the delay path between a processor address and the cache during valid read data operations.
SUMMARY OF THE INVENTION
The cache placement architecture of the invention implements a cache placement algorithm and includes a cache storage unit (CSU) which includes a CSU control logic, an address director, a data director, a placement array, placement logic, a set associative memory for caching data, and a connection between individual cache subsystem components. Any number of CSUs may be connected in parallel to provide a variable size cache. The actual organization and features of the cache placement architectures are determined by the cache implementor, also referred to herein as the circuit designer. A CSU may be a single device, or may include multiple devices. All CSUs sharing a processor interface utilize the same placement block size, contain the same number of sets of cache elements and use the same CSU placement mechanism. The CSU(s) may be in a single device or may be located in multiple devices.
The following advantages result from this approach:
1. A single device type may be used to build an entire cache.
2. Cache size is variable based on the number of devices used.
3. Different device densities may be used in the same system.
4. Cache organization and features are independent of distributed placement.
All cache devices attached to a common processor interface utilize the same placement block size, contain the same number of SETs of cache elements, and use the same placement mechanism to determine which device is the placement device.
The use of distributed cache placement may be applied to all types of caches, several examples of which are described herein. Distributed placement provides an extension to caching not previously available. Distributed placement involves the tracking of the placement across an upgradable cache, normally involving multiple devices.
A system using a single device type called a CSU (Cache Storage Unit) is described as a device for implementing this distributed placement. The implementation of a single CSU may be used to build a cache subsystem having multiple configuration options.
These and other objects and advantages of the invention will become more fully apparent as the description which follows is read in connection with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a distributed cache placement architecture constructed according to the invention.
FIG. 2 is a variation of the basic architecture of FIG. 1.
FIG. 3 is a further variation of the architecture of FIG. 1.
FIG. 4 depicts a mainstorage address mapping scheme.
FIG. 5 depicts a cache arrangement for the mapping scheme of FIG. 4.
FIG. 6 is a block diagram of a cache subsystem organization (CSU), constructed according to FIG. 2.
FIG. 7 is a block diagram depicting CSU organization having a HITCSU bus associated therewith.
FIG. 8 depicts the architecture of an element.
FIG. 9 depicts the architecture of the HIT logic of an element.
FIG. 10 is a block diagram of a CSU organization constructed according to FIG. 3.
FIG. 11 is a block diagram of an alternate embodiment of the CSU of FIG. 3, having a secondary TAG architecture
FIG. 12 is a block diagram of a MESI state machine.
FIG. 13 is a power-on flow chart.
FIG. 14 is a CSU operation flow chart.
FIG. 15 is a representative block diagram showing the system interconnection of a CSU implemented to support multiple CSUs.
FIG. 16 is a block diagram of a CSU internal state.
FIG. 17 is a schematic representative block diagram showing the system interconnection of a CSU implemented to support a maximum of 4 CSUs.
FIG. 18 is a block diagram depicting a processor having an integrated CSU.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
A cache subsystem is depicted in three different configurations in FIGS. 1, 2 and 3. The subsystems each contain a cache which is connected to a processor and a mainstore. Each configuration provides the same caching function. The difference is in the manner in which a cache is attached to the processor and mainstore. The attachment, or interface, consists of a control bus, an address bus, and a data bus.
The first configuration, shown in FIG. 1, depicts a system, or cache architecture 20 having a processor 22, a mainstore 24, a cache 26, and a processor interface 28. In this embodiment, cache 26 and mainstore 24 share a processor control line 30, a processor address line 32, and a processor data line 34. Cache 26 and mainstore 24 receive the address and data at the same time. However, mainstore operations are started only after a cache MISS has been detected.
The second configuration, shown in FIG. 2, depicts a cache architecture 40 in which cache 26 buffers a processor interface 42 from the mainstore and which includes a cache-mainstore interface 44, which is also referred to herein as a mainstore interface. In this configuration, cache 26 handles all the mainstore accesses and reads data from mainstore into the cache whenever the cache does not currently contain the information. Processor interface 42 includes a processor control bus 46, a processor address bus 48 and a processor data bus 50. Mainstore interface 44 includes a mainstore control bus 52, a mainstore address bus 54 and a mainstore data bus 56.
The third configuration, shown in FIG. 3, depicts a cache architecture 60 having an operand cache 26a for data and an instruction cache 26b for instructions. These caches may be an integrated structure, or may be constructed as a distinct architecture. Processor 22 is connected to operand cache 26a by an operand interface 62 and to instruction cache 26b by an instruction interface 64. A mainstore interface 66 connects caches 26a and 26b to mainstore 24. Operand interface 62 includes an operand control bus 68, an operand address bus 70 and an operand data bus 72. Instruction interface 64 includes an instruction control bus 74, and instruction address bus 76 and an instruction data bus 78. Mainstore interface 66 includes a mainstore control bus 80, a mainstore address bus 82 and a mainstore data bus 84.
All three configurations use the same basic SET ASSOCIATIVE memory structure. The exact interface or interfaces may be selected from a wide variety of options, with the three configurations shown in FIGS. 1, 2 and 3 as examples. Additional features may be added to the structure shown in these figures. These features enhance the performance of the cache and include: write-back cache operation, sectored TAG organization, addition of a secondary TAG for cache snooping, and cache block prefetch. A second interface, as shown in FIGS. 2 and 3, may be added to further enhance cache operations and speed.
ADDRESS MAPPING
FIG. 4 depicts the mapping of mainstore 24 addresses into cache 26. In the preferred embodiment, mainstore 24 is one contiguous memory unit, starting at address `0` and ending at address (2.sup.32 -1). The processor address which is sent to cache 26 is divided into three fields, which will be more fully described later herein. For now, the fields are the Set Address field (SA), the TAG Address field (TA) and the Block Address field (BA). Mainstore 24 has 2.sup.TA GROUPs, such as GROUP 86. Each mainstore GROUP consists of 2.sup.SA cache lines, with one cache line from each GROUP mapping into its one associated cache set.
Referring now to FIG. 5, cache 26 has SA SETs 88, each having `E` elements 90. `E` represents the number of places a cache line from mainstore may be placed in the cache. Each cache line from a mainstore GROUP may be placed into any one of the `E` elements, such as element 90, of its associated cache set. LINE X in FIG. 4 depicts an example of mapping from a line from a mainstore group mapping into cache SET X. The offset from the base of the GROUP and the base of the CACHE is always the same.
The limits of a SET ASSOCIATIVE cache may be illustrated by setting either E or 2.sup.SA to 1. If E=1, a direct mapped cache is defined wherein each location in mainstore 24 may be placed only in the one element of its SET in the cache. In a fully associative cache, 2.sup.sA =1 and any block in mainstore 24 may be placed in any element of the single SET in the cache.
CACHE SUBSYSTEM
Cache 26, as shown in FIG. 6, is made up of `N` Cache Storage Unit (CSU) devices, such as CSU 1, 92, and CSU N, 94, where `N` is the maximum number of CSUs available to cache 26 as determined by the CSU implementor. All devices operate in parallel from an interface 28, having, in this embodiment, which corresponds to architecture 20 in FIG. 1, a shared set of PCTL 30 (Processor ConTroL), PA 32 (Processor Address), and PD 34 (Processor Data) lines. As used herein, "line" and "bus" are equivalent structures. A HITCSU line 102 extends between the CSUs and is a structure unique to the distributed cache placement function architecture of the invention, and comprises a portion of what is referred to herein as a communications network.
PCTL signals processor cycle information to the cache and receives information back from the cache. Information signaled to the cache may include; start cycle, cycle type, processor status, cachable cycle, & block transfer requested. Information received back from the cache may include: cycle status, cycle completion, cycle termination, and block transfer status. Other information may be conveyed, depending on the processor used.
The HITCSU lines convey information between devices indicating potential distributed placement order transitions. These lines also serve to communicate the configuration of the system between the devices at system initialization.
CSU ORGANIZATION--FIRST EMBODIMENT
FIG. 7 depicts a CSU, such as CSU 1, block 92, organization having a CONTROL LOGIC 104, a SET ASSOCIATIVE MEMORY 106, a PLACEMENT ARRAY 108, and a PLACEMENT LOGIC 110.
CONTROL LOGIC
CONTROL LOGIC 104 controls the interface between processor 22, mainstore 24, and CSU 92. The CONTROL LOGIC tracks the state of this interface, interpreting cache accesses and providing cache responses to any cached request from the interface. Internal cache state is also tracked and coordinated by this logic, in cooperation with PLACEMENT LOGIC 110 for accessing one ELEMENT out of the `E` elements in SET ASSOCIATIVE memory 106, where `E` can be any number greater than `0`.
TABLE 1
______________________________________
CACHE OPERATIONS
Cache Cycle
______________________________________
Processor Memory Read
Processor Memory Write
Location Invalidation
______________________________________
Table 1 lists the minimum set of operations supported by a WRITE-THRU cache. The processor may support non-cachable memory operations. Processor 22 signals cache 26 whenever an operation is non-cachable. These non-cachable operations are directed to mainstore 24 without a cache reference.
Mainstore 24 may also provide an interface to I/O devices for processor 22. Whenever processor 22 requests an I/O operation, CONTROL LOGIC 104 directs the access immediately to the mainstore interface.
The mainstore interface may be shared with other processors and DMA (Direct Memory Access) I/O devices. A mainstore write by a source other than the processor to which the cache is attached is snooped to determine if the mainstore location is currently present in the cache. If the location is present in the cache, the cache block containing the location must either be invalidated or updated. The common implementation solution is to invalidate the cache block, as this represents the least complex implementation.
PA--PROCESSOR ADDRESS
PA 32 is divided into the three fields. These three fields are the SA, TA, and BA fields and are transmitted over the SA line 112, the TA line 114, and the BA line 116. The SA Set Address field is decoded to select a SET of cache blocks in the CSU. The TA field is the TAG Address. The BA Block Address field selects the desired data within the cache block
PD--PROCESSOR DATA
PD line 34 provides a bi-directional data path between processor 22 and cache 26. Processor read data is sourced by the cache, during a cache HIT, or by mainstore during a cache MISS. During a cache MISS operation the data coming from mainstore is directed to both the processor and the cache. This data is written to the cache location during the MISS as a result of a placement operation. Processor writes occur to a cache block with a HIT indication. PD for write operations are always directed to both the cache and mainstorage
SET ASSOCIATIVE MEMORY
SET ASSOCIATIVE MEMORY 106 contains at least one element for each SET. CSUs with different numbers of elements can be mixed on the same processor interface. As technology advances, it is foreseeable that device density for CSUs having a greater number of elements will be built which are 100% backward compatible with initial CSUs which may contain as little as one-element-per-SET.
For any processor interface, the total number of elements per SET is the sum of all the elements in all the CSUs attached to this processor interface. Every element in every CSU operates in
TABLE 2
______________________________________
SET ASSOCIATIVE MEMORY ADDRESSING
ADDRESS
FIELD RANGE NAME FUNCTION
______________________________________
BA (3:0) BYTE This field selects data within the
ADDRESS cache line down to the byte level.
SA (13:4) SET This field selects a SET of cache
ADDRESS elements which may contain the
mapped cache line from any
mainstore GROUP. When this field
is used in mainstorage it selects
the cache line associated with
this set from the GROUP selected
by the TA field.
TA (31:14) TAG This field stores the value of the
ADDRESS mainstore GROUP currently
resident in this cache block. When
this field is used by mainstorage it
selects the GROUP in mainstorage.
______________________________________
parallel simultaneously checking for a HIT on any element in the addressed SET. Changing the number of elements per SET is completely transparent to the system, other than a change in the overall cache HIT ratio.
SET ASSOCIATIVE memory 106 stores data by mapping a set of data locations from mainstorage 24 into the cache 26 as shown in FIG. 6. SET ASSOCIATIVE 106 memory is made up of ELEMENT `1` 126 thru ELEMENT `E` 128. Each element contains 2.sup.SA cache lines. A single address selected by the SA field 112 constitutes a single SET in the SET ASSOCIATIVE MEMORY. All ELEMENTs are accessed in parallel looking for a HIT indication for the selected SET. The cache examples shown use a SET ASSOCIATIVE mapping. The SET ASSOCIATIVE memory cache divides the processor address into the three fields shown in Table 2. Use of the address bits is fixed in the hardware to map mainstore addresses into the cache.
Also shown in Table 2 is an example assignment of these address fields for a processor with 32 address bits. The cache line size is 16 bytes resulting in the BA field being assigned 4 bits (3:0). The number of sets is 1024 requiring 10 SA address bits (13:4). While the remaining 18 bits (31:14) are used in the TA field
Still referring to FIG. 7, a HITCSU interface, or line, 102, extends from PLACEMENT LOGIC 110 and is used to communicate with other CSU devices about the interdevice placement tracking. Specifically, the HITCSU line or bus communicates to all other CSUs in the architecture so that each CSU is notified about all events occurring in the other CSUs.
PLACEMENT LOGIC
PLACEMENT LOGIC 110 provides an interface to other CSUs connected to this processor interface and controls the contents of PLACEMENT ARRAY 108. The interface to the other CSUs for this processor interface consists of the HITCSU indication lines 102 interconnecting the devices. PLACEMENT LOGIC 110 is also referred to herein as a distribution controller.
PLACEMENT LOGIC 110 action is dependent on the results of the cache access. The cache access has one of the results and takes the associated action shown in Table III.
The update of PLACEMENT ARRAY 108 always takes into account the steady state active HITCSU indications from every other CSU attached to this processor interface. The state entered by
TABLE 3
______________________________________
CACHE ACCESS PLACEMENT ACTIONS
CACHE
ACCESS RESULTS
PLACEMENT LOGIC ACTION
______________________________________
HIT in this CSU with
Assert HITCSU 102 and update both the
every entry valid.
CSU placement and entry placement
portions of the PLACEMENT ARRAY 108.
HIT this CSU with at
Update the entry placement but not the
least one entry
CSU placement portions of the
invalid. PLACEMENT ARRAY 108 without asserting
HITCSU 102.
MISS this CSU with a
Update both the CSU placement portion
HIT indication from
but not the entry portion of the
another CSU.
PLACEMENT ARRAY 108.
MISS in every CSU
Monitor mainstore access by placement
when this CSU is not
CSU. When placement CSU finally gives
the placement CSU.
the delayed HIT indication 104, update
the CSU placement portion of the
PLACEMENT ARRAY 108.
MISS in every CSU
The placement entry is selected by the
when this CSU the
internal PLC`X` line, where `X` is an
placement CSU and
entry between 1 & `N`. The CSU
every entry is
CONTROL LOGIC also receives the PLC`X`
valid. indication and does a placement access
to mainstore.
Simultaneously with the assertion of
the HIT indication by the CSU CONTROL
LOGIC 104 the HITCSU line 102 is
asserted. Update then occurs to both
the CSU placement and entry placement
portions of the PLACEMENT ARRAY 108.
MISS in every CSU
The placement entry is selected by the
when this CSU the
internal PLC`X` line, where `X` is an
placement CSU and at
entry between 1 & `N`. The CSU
least one entry is
CONTROL LOGIC also receives the PLC`X`
invalid. indication and does a placement access
to mainstore.
Update then occurs to the entry
placement but not the CSU placement
portion of the PLACEMENT ARRAY without
asserting HITCSU.
______________________________________
PLACEMENT ARRAY 108 when this CSU has a HIT is the state that would have been entered after receiving subsequent HITCSU indications from those other CSUs with HITCSU always in an asserted state. This feature allows use of less than the maximum number of CSUs on a processor interface.
Invalidation operations, whether from the processor or mainstore, do not alter the placement order. All HITCSU indications are ignored by every CSU attached to a processor interface during an Invalidation operation.
PLACEMENT ARRAY
For each SET ASSOCIATIVE MEMORY 106, PLACEMENT ARRAY 108 stores information indicating the placement CSU as well as the placement element within this CSU. The SA address bits select placement data for this SET from the PLACEMENT ARRAY. Data stored here indicates:
1. The CSU placement portion contains information on the position in the placement algorithm of this CSU relative to every other CSU on this processor interface. When this CSU has the lowest position according to the placement algorithm it is the placement CSU.
2. The element placement portion contains information regarding the placement order of every element in this CSU. The data stored here is determined by the placement algorithm chosen, which may be a different algorithm from the external CSU placement algorithm. This field is not used when the CSU contains a single element.
A SET OF ELEMENTS
Referring now to FIGS. 7 and 8, the contents of each element in the SET ASSOCIATIVE MEMORY will be discussed. SET ASSOCIATIVE MEMORY 106 includes one or more elements, or element sets 1 through E, depicted at 126, 128, respectively. Each ELEMENT contains 2.sup.SA cache lines and consists of a TAG ARRAY 130, its associated DATA ARRAY 132, HIT LOGIC 134, and ELEMENT CONTROL LOGIC 136.
As previously noted, the PA is divided into three fields, shown in Table II, for addressing an element. The SA field selects the SET that is being currently accessed by the processor. When the SET has been selected by the SA field, the TA field bits are compared to the respective TD data 138 stored in that TAG location, and if every TD bit matches its respective TA bit, and the VLD (VaLiD) bit 140 is also set, a cache HIT has occurred. The BA 116 combined with the SA 112 field select a location in the data storage and when a HIT 142 occurs, a data transfer is performed on DATA bus 116.
ELEMENT CONTROL LOGIC
CONTROL LOGIC 104 provides an indication to LOGIC 136 (FIG. 8) of the type of operation being performed. The ELEMENT CONTROL LOGIC 136 in each ELEMENT 126, 128, provides an indication to CONTROL LOGIC 104 and PLACEMENT LOGIC 110 as to whether or not a HIT occurred in this CSU. Receiving a HIT 142 indication from an element causes CONTROL LOGIC 104 to send a completion indication to the requesting interface. When a HIT does not occur within this CSU (92 . . . 94), the PLACEMENT LOGIC gives an indication to the CONTROL LOGIC on the PE Placement Element 146 line to the CONTROL LOGIC and each ELEMENT if this is the placement CSU and which ELEMENT of the SET is to be replaced. When the CONTROL LOGIC has finished the PLACEMENT operation it indicates this completion on the PCTL to the processor and the PLACED line 160 to the PLACEMENT LOGIC 110.
ELEMENT CONTROL LOGIC 136 receives signals from CONTROL LOGIC 114 indicating the type of operation to perform (see FIG. 7) on ETCL line 144. When a MISS has occurred, selection of this element as the placement element is indicated by the PLACEMENT LOGIC 110 on PLC line 146. When the ELEMENT LOGIC receives a HIT indication signal from the HIT LOGIC on HIT line 142, the desired data operation is performed on DATA ARRAY 132 or on TAG ARRAY 130, as shown in Table 4.
TABLE 4
______________________________________
ELEMENT DETAILS DIAGRAM
______________________________________
Memory Read
Drive DATA ARRAY 132 contents on PD bus.
Memory Write
Data from PD written to DATA ARRAY 132.
Location Write VLD bit 150 in TAG ARRAY to a `0`.
Invalidate
Place Line Write VLD bit 150 in TAG ARRAY to a `1`,
write line from mainstore into data array, &
write TA field from address into the TAG DATA
152 array.
______________________________________
An element placement occurs by updating both TAG ARRAY 130 and DATA ARRAY 132. Multiple data transfers normally occur when placing a cache placement block in DATA ARRAY 132. Only a single write update occurs to the TAG ARRAY 130. The TAG ARRAY update always sets VLD bit 140 during a placement operation.
The update of TAG ARRAY 130 requires a write indication at the same time the TA address field is gated to TD 138. This is shown in HIT LOGIC 134 (FIG. 9) and is controlled by the assertion of a write tag (WT) signal on WT control line 148 from ELEMENT CONTROL LOGIC 136.
TAG ARRAY
TAG ARRAY 130 contains the mainstore TAG (Target Address) and state information about a particular cache block. Each element in the TAG ARRAY covers one cache block. FIG. 8 depicts a VLD bit in the TAG state information 150 that is the minimum required for standard cache operation. The TAG DATA 152 stores upper order address bits from the TA field as data.
The state information for the cache block may be extended to include additional information about a cache block. This extended information may be used in support of additional cache features.
HIT LOGIC
Referring now to FIG. 9, HIT LOGIC 134 determines when to signal a HIT for this element of this SET to PLACEMENT LOGIC 110 and ELEMENT CONTROL LOGIC 136.
A HIT is defined to occur on the ELEMENT whenever the selected SET of TD bits from TAG ARRAY 130 matches the TA address bits from processor 22, and the VLD bit is set. FIG. 9 depicts the condition of HIT LOGIC 134 where the results of the bit-wise compare, block 154, of each pair of TA 114 and TAG DATA 138 bits are ANDed, block 156, with the VLD bit 140 set to generate a HIT indication, which is transmitted on HIT line 142.
A signal on WT control line 148 is asserted by ELEMENT CONTROL LOGIC 136 upon a TAG ARRAY write. The data written into TAG ARRAY 130 is the TA field from the address. Buffer 158 provides a path during a cache placement operation for such data.
DATA ARRAY
Referring now to FIG. 8, DATA ARRAY 132 contains blocks of memory which are copied from mainstore 24. Processor memory references, both reads and writes, are handled by DATA ARRAY 132 during all cache HIT references which occur in its associated TAG ARRAY.
PLACEMENT LOGIC
Returning now to FIG. 7, PLACEMENT LOGIC 110 controls the contents of PLACEMENT ARRAY 108. PLACEMENT LOGIC action is dependent on the results of the cache access. The cache access has one of the results, and takes the associated action, shown in Table 5. The placement algorithm is implemented here. The placement algorithm may be one of many known alternatives including, but not limited to LRU (Least Recently Used), FIFO (First In First Out), or B-tree (Binary tree structure). Successful placement of a data line results in the transmission of a PLACED signal on PLACED line 160 from PLACEMENT LOGIC 110 to CONTROL LOGIC 104.
PLACEMENT ARRAY
For each associative memory SET, PLACEMENT ARRAY 108 stores information indicating the placement element within this CSU. The SA address bits select placement data for this SET from
TABLE 5
______________________________________
PLACEMENT LOGIC ACTIONS
ACCESS RESULTS
PLACEMENT LOGIC ACTION
______________________________________
HIT Assert HIT and update the PLACEMENT ARRAY.
MISS - all The placement element is selected based on
elements the state of PLACEMENT ARRAY the by the
valid. internal PLC`X` line, where `X` is an
element between 1 & `E`. The CONTROL LOGIC
receives the PLC`X` indication and does a
placement access to mainstore.
The PLACEMENT ARRAY is updated to indicate a
HIT just occurred to the new mainstore
location.
MISS - any An invalid element is selected by the
element internal PLC`X` line, where `X` is an
invalid. element between 1 & `E`. The CONTROL LOGIC
receives the PLC`X` indication and does a
placement access to mainstore.
The PLACEMENT ARRAY is updated to indicate a
HIT just occurred to the new mainstore
location.
______________________________________
the PLACEMENT ARRAY. It should be appreciated that separate CSU devices, such as the TAG and data devices, may be provided in the architecture rather than the unitary structure which is described in the preferred embodiment.
CSU ORGANIZATION--SECOND EMBODIMENT
Architecture 40 in FIG. 2 has the organization shown in FIG. 10. This configuration includes CONTROL LOGIC 170, SET ASSOCIATIVE MEMORY 172, PLACEMENT array 174, and placement logic 176. These devices may have the same structure and configuration as those like named devices shown in FIG. 7. Processor Interface 42 is connected to CONTROL LOGIC 170 by PCTL line 46, while the second interface, or mainstore interface 44 is connected to control logic by MCTL (Mainstore ConTroL) line 52. This configuration isolates the processor references to the cache from the mainstore. In this configuration, the CSU includes two additional components, the first of which is an address director 182, which is connected to the address buses of the interfaces by a PA line 48 and a MA (Mainstore Address) line 54. The second additional component is a data director 188 which is connected to the interfaces by a PD line 50 and a MD (Mainstore Data) line 56.
The address sent to SET ASSOCIATIVE MEMORY 172 and PLACEMENT ARRAY 174 is determined by CONTROL LOGIC 170 which is connected to ADDRESS DIRECTOR 182 by Address Director Control (ADC) line 194. The MA address may be directed to SET ASSOCIATIVE MEMORY 172 or the PA address may be directed to SET ASSOCIATIVE MEMORY 172.
The data sent to SET ASSOCIATIVE MEMORY 172 is connected by a Cache Data (CD) line 198 to DATA DIRECTOR 188. A DATA DIRECTOR CONTROL (DDC) line 196 extends from CONTROL LOGIC 170 and directs the DATA DIRECTOR to direct either the PD 50 or MD 56 data to CD line 198.
CONTROL LOGIC
CONTROL LOGIC 170 is similar to that depicted in FIG. 7. A Mainstore ConTroL (MCTL) line 52 is added as an interface to mainstore 24. MCTL signals the mainstore operation cycle type, request status, and operations by Direct Memory Access (DMA) or other processors.
The CONTROL LOGIC tracks the state of both the processor and mainstore interfaces interpreting cache accesses for them and providing cache responses to any cache request from either interface.
The CONTROL LOGIC determines whether the interface doing the access currently accessing the SET ASSOCIATIVE MEMORY and PLACEMENT array is the processor or the mainstorage interface. When the access is from the processor the PA and PD are the source of the address and data respectively for the SET ASSOCIATIVE MEMORY and PLACEMENT array. When the access is from the mainstorage, the MA and MD are the source of the address and data respectively for the SET ASSOCIATIVE MEMORY and PLACEMENT array.
ADDRESS DIRECTOR
ADDRESS DIRECTOR 182 controls the path of an address from its source to its destination(s). Address sources are processor interface 42 and mainstore interface 44. Address destinations are cache 26 and mainstore interface 44. The ADDRESS DIRECTOR also divides the PA or MA into the three fields required by SET ASSOCIATIVE memory 172.
Many operations will direct the signal on PA line 48 through address director 182 to MA line 54. Whenever the mainstorage interface is not busy, the signal on PA line 184 is directed to MA line 54. This saves the time required to drive the interface after a cache MISS or processor write operation to mainstore.
PA is not directed to the mainstore interface when another device is using the mainstore interface. At this time the processor must wait to acquire the mainstore interface before driving the MA lines.
DATA DIRECTOR
DATA DIRECTOR 188 provides a data path from a single source to one or more destinations. All three data connections PD 50, MD 56 and CD 198 to the DATA DIRECTOR are bi-directional.
Processor read data has as its source either the cache or the mainstore interface. Processor read data is sourced by the cache, during a cache HIT, or by mainstore during a cache MISS. During a cache MISS operation the data coming from mainstore is directed to both the processor and the cache. This data is written to the cache location as a result of the placement operation.
Writes to mainstore from other sources do not cause data transfers to or from the cache, but do cause an invalidation of any cache location with a HIT indication.
CSU ORGANIZATION--THIRD EMBODIMENT
The configuration of cache 26 in the third embodiment, shown in FIG. 11, is the same basic cache as that used in FIG. 10. The difference is in the use of the cache. In this embodiment, a cache is dedicated to the instruction stream and a second cache is dedicated to use by processor data references
ADDITIONAL CACHE FEATURES
The operation of the cache may be enhanced in a number of ways to improve overall system performance. The following is a description of such additional features which may be added to the previously described CSU, and include the provision of WRITE-BACK Operations, the provision of a SECONDARY TAG, a PREFETCH capability, and INVALIDATION capabilities.
SECONDARY TAG
With the single set of TAG elements, shown in FIGS. 7, 8 and 10, every reference from either the processor or mainstore interface must access the TAG elements in the SET ASSOCIATIVE MEMORY. This may be adequate for some implementations, but requires time multiplexing of the TAG between mainstore and processor references. This time multiplexing works when idle time is available to read the TAG from the mainstore interface between processor references. Otherwise the mainstore references cause the processor to wait until the completion of the mainstore TAG reference before continuing operation.
Adding a secondary TAG solves the conflict associated with accessing the TAG array. Such a configuration is depicted in FIG. 11, which is a modification of the configuration of FIG. 10, and which uses like reference numbers. This configuration is most effective when used for multiprocessing applications. A secondary TAG contains a copy of the TAG data used by the processor. Mainstore references from sources other than the processor to which the cache is attached now only access the primary TAG in SET ASSOCIATIVE MEMORY 172 when the secondary TAG has a HIT. The probability of a HIT on the secondary TAG from other sources referencing mainstorage is low, meaning an insignificant number of processor or primary TAG references occur from the mainstore interface.
The SET ASSOCIATIVE MEMORY 172 handles all the data references to the cache. Placement operations to the cache occur to both the TAG of SET ASSOCIATIVE MEMORY 172 and SECONDARY TAG 210 at the same time.
FIG. 11 depicts a cache with a secondary TAG, which is based on the configuration of FIG. 10, and which incorporates, where applicable, like reference numbers, having a secondary TAG array 210. Secondary tag 210 includes elements 1 . . . E, 212 . . . 214, respectively, which are connected to CONTROL LOGIC 170 by Secondary ConTroL (SCTL) line 216 and SNoop Hit (SNH) line 218.
In addition to Secondary TAG 210, the address director has been replaced by two address directors to improve performance by allowing parallel access to the two TAG arrays within the SET ASSOCIATIVE MEMORY. The MAINSTORE ADDRESS DIRECTOR 220 provides the paths between the MA 54, Internal Address IA 224 and Secondary SA (SSA) 228 lines while the PROCESSOR ADDRESS DIRECTOR provides the paths between the PA 48, IA 224, SA 112, TA 114 and BA 116 lines.
A Processor Address Director Control (PADC) line 222 controls the source and destination of the PA, IA, SA, TA and BA. The normal path is to the divided address SA, TA and BA to the SET ASSOCIATIVE memory. Use of IA line 224 is coordinated with the MADC line 226 to the MAINSTORE ADDRESS DIRECTOR, as IA can only have one source at a time.
The MADC lines 226 control the source and destination of the MA, SSA and IA. The normal path is to the SSA, which is the same group of addresses used for the SA lines to the processor TAG, to the secondary TAG. Use of the IA bus is coordinated with the PADC lines to the PROCESSOR ADDRESS DIRECTOR as IA can only have one source at a time.
CONTROL LOGIC complexity has increased. References to the SECONDARY TAG which cause a HIT result in the activation of the associated SNH line 218. When the CONTROL LOGIC receives a SNH indication the MADC 226 directs the MAINSTORE ADDRESS DIRECTOR to direct the signal on MA line 186 to IA line 224. Simultaneously the PADC 222 directs the PROCESSOR ADDRESS DIRECTOR to direct the respective bits from the IA 224 bus to the SA 112, TA 114 and BA 116 lines. References from the processor which MISS the cache are directed in the reverse manner over the IA bus down to the MA bus. SCTL 216 controls the SECONDARY TAG in a manner similar to the ECTL 144, lacking the control of the data array. SNH›E:1! provides the indications of SECONDARY TAG HITs on the respective elements.
The secondary TAG refers to a second set of TAG array elements identical to the TAG array elements in the SET ASSOCIATIVE MEMORY. The secondary TAG elements 212 . . . 214 differ from that shown in FIG. 8 in that DATA ARRAY 132 is not provided in a secondary TAG element. Whenever a data transfer is required from the cache after a HIT on the secondary TAG the SET ASSOCIATIVE MEMORY is accessed to obtain the data. PLC line 146 is connected to the respective elements in the SECONDARY TAG 210 as the SET ASSOCIATIVE MEMORY for line placement operations.
WRITE-BACK OPERATION
All the configurations shown in FIGS. 1-3 may be configured with either be a WRITE-THRU cache or a WRITE-BACK cache. The WRITE-THRU cache causes all write operations to write to mainstore as well as updating cached locations. A WRITE-BACK cache does most write operations only to the cache DATA ARRAY with a mainstore write required only one time, when the cache block is the placement cache block on a cache MISS.
The WRITE-THRU function is a simple and straight forward implementation. Little concern is required to ensure that references to mainstore always reference only the most current copy of data in the system. Mainstore always contains the most recent data copy and is the only place a reference needs to go to obtain the correct copy of a data location.
Mainstore references from another source in the system with a WRITE-BACK cache are more complex than a system with a WRITE-THRU cache. A simple invalidation of the cache block is no longer adequate with the WRITE-BACK cache since the cache block may be the only current copy of the data in the system. This copy of the cache block must either be written to mainstore or copied to another cache. A protocol for handling the mainstore references must be implemented to handle the data transfers between mainstore and the caches, or in some cases from one cache to another cache
An example of such a WRITE-BACK protocol is the MESI protocol, an example of which is depicted in FIG. 12, at 230. MESI is an acronym representing the four possible states for each cache block. This is double the number of WRITE-THRU cache block states. Each letter in MESI represents a cache block state that is described in following paragraphs.
M Modified (block 232)--This cache block contains data that has been modified. The copy of data in mainstore is invalid. The processor interface may write this block without any reference to mainstore. This is the only cache in the system that contains a valid copy of this data block at this time. A Reference to this mainstore block from any other source in the system requires this cache furnish the data to the requester, or write the data to mainstore, then leave the modified state. New possible states are Invalid or Shared.
E Exclusive (block 234)--This cache block is set aside for the exclusive use of this processor interface. The processor interface may write this block without any reference to mainstore making a transition to the Modified state. References to this mainstore block from any other source in the system require the cache block leave this state to the I or S state. New possible states are Modified, Invalid or Shared.
S Shared (block 236)--This cache block may be present in other caches in the system. The processor interface must first request an exclusive or modified state transition at the mainstore interface before this block may be written. This cache block is invalidated when another cache requests this block without a shared indication.
I Invalid (block 238)--This cache block is Invalid and may not be used without first obtaining a copy of the data from the mainstore interface. State transitions may occur to any other state dependent on the type of operation required.
An implementor might chose the following set of operations for its mainstore interface operations for cachable memory blocks. The transitions between the above states are shown for these operations in FIG. 12.
MRDS Mainstore Read Shared--Read the data from its current location with no intention of modification by the requester
MRDE Mainstore Read Exclusive--Read the data from its current location with the intention of modification by the requester. Obtain the right to modify the data without a mainstore reference.
MWRB Mainstore Write Block--Write a modified block back to mainstore. This occurs whenever a placement needs to use a cache block holding modified data.
The processor used in this implementation example has the following cachable memory operations.
PRDS Processor Read Shared--Read the data from its current location with no intention of modification by the requester.
PRD Processor Read--Read the data from its current location with the intention of modification by the requester. Obtain the right to modify the data without a mainstore reference.
PWR Processor Write--Write a location in memory.
The state machine of FIG. 12 does not show all the details involved in placing a block in cache. The cache block must first be checked to determine if it is in the M state and written back to mainstore