# VLSI Design of Power Gated NOR-based Content Addressable Memory

Jayalakshmi Devi, Dr.satyanarayana

Abstract: content addressable memory provides high speed search function in a single clock cycle.CAM is power hungry because of its parallel match line comparison. Thus we need to design low power high speed CAM designs. In this project the high speed will be acquired by using parity bit. We introduce an effective gated power technique to reduce the peak and average power consumption. A feedback loop is employed to auto turn off the power supply to the comparison elements and hence average power consumption by 75%. The proposed design can work at supply voltage 0.35v.

Index terms-CMOS, content addressable memory(CAM),match-line.

#### I.INTRODUCTION

Content addressable memory is a solid state memory in which data are accessed by contents rather than physical locations. It receives input search data, i.e, a search word, and returns the address of a similar word that is stored in its databank .In general, a content addressable memory(CAM) has three modes of operations they are READ, WRITE and COMPARE. Normal RAM performs first two operations READ, WRITE. COMPARE is the main operation performed by CAM. CAM also performs READ and WRITE but rarely. Fig .1 shows a simplified block diagram of a CAM core with an incorporated search data register and an output encoder. It starts compare operation by loading an n-bit input search word into the search data register. The search data are distributed in to the memory banks through n-pairs of complementary search-lines(SLs) and directly compared with every bit of the stored words using comparison circuits. Each stored word has a ML that is shared between bits to convey the comparison result. its



Fig 1.block diagram of a traditional CAM

Location of the matched word will be identified by an output encoder as shown in fig 1.during a pre-charge stage ,the MLs are at ground voltage level, while both search-line (SL) and complementary search-line (~SL) are at VDD. During evaluation stage, complementary search data is broadcast in to the SL and ~SLs. when mismatch occurs in any CAM cell (for example at the first cell of the row D="1" :~D="0":SL="1":~SL="0"),transistor p3 and p4 will be turned on, charging up the ML to a higher voltage level. A ML sense amplifier (MLSA) is used to detect the voltage change on the ML and amplifiers it to a full CMOS voltage output. If mismatch occurs to none of the cells on a row ,no charge up path will be formed and the voltage on the ML will remain unchanged ,indicating a match.

Since all available words in the CAMs are compared in parallel result can be obtained in a single clock cycle. CAMs are faster

than other hardware and software based search systems they are preferred in high throughput applications like network-routers and data compressors. However, the full parallel search operation leads to critical challenges in designing a low power system for high –speed. high-capacity CAMs[1].1)the power hungry nature due to the high switching activity of the SLs and the MLs and 2) a huge surge-on current(i.e peak current) occurs at the beginning of the search operation due to the concurrent evaluation of the MLs may cause a serious IR drop on the power grid ,thus affecting the operation reliability of the chip[1]. As a result ,high efforts have been put forth to reduce both the peak and the total dynamic power consumption of the CAMs[2]-[8]. For example Zukowski and pagiamtzis introduced selective pre-charge and pipeline architecture ,respectively to reduce the peak and average power consumption of the CAM[8].[5],[6], and[3] utilized the ML pre-charge low scheme (i.e.low ML swing) to reduce the average power consumption . these designs are sensitive to process and supply voltage variations.

In this project, a parity bit is used to boost the search speed of the parallel CAM. a power gated ML sense amplifier is proposed to improve the performance of the CAM ML comparison in terms of power and robustness. it also reduces the peak turn-on current at the beginning of each search cycle. The rest of project is organized as follows. Section II introduces parity bit based CAM architecture. In section III the gated power technique is proposed. performance analysis are presented in section IV. Section V concludes this project.

# II. BOOSTING OF A SEARCH SPEED USING A PARITY BIT

We introduced a versatile auxiliary bit to boost the search speed of the CAM at the cost of less than 1% area overhead and power consumption. This newly introduced auxiliary bit is similar to the existing pre-computation schemes with different operating principle. First briefly discuss the pre-computation schemes before presenting our proposed auxiliary bit scheme.

*1)pre-computation cam design:* The pre-computation CAM Uses additional bits to remove some mismatched CAM words before the actual comparison.



Fig.2 a)conventional pre-computation CAMand b)proposed paritybit based CAM

These extra bits are taken from the data bits and are used as the first comparison stage. For example in Fig2a) number of "1" in the stored word are counted and kept in the counting bits segment. when a search operation starts ,number of "1"s in the search word is counted and stored to the segment left of Fig 2(a). These extra information are compared first and only those that have the same number of "1"s (eg... the second and fourth) are turned on in the second sensing stage for further comparison. This scheme reduces a amount of power required for data comparison ,statistically. Here it requires additional silicon area and search delay to reduce energy consumption. The precomputation and all other existing designs shares one similar property. The ML sense amplifier has to distinguish between the matched ML and the 1-mismatch ML. This makes CAM designs sooner or later face challenges since the driving strength of the the single turned -on path is getting weaker after each process generation while leakage getting stronger. This problem is referred to as Ion/Ioff. Thus ,we propose a new auxiliary bit that can concurrently boost the sensing speed of the ML and at the same time improve the Ion/Ioff of the CAM by two times.

2)parity bit based CAM: The parity bit based CAM design is shown in Fig2(b).consisting of the original data segment and an extra one bit segment, derived from the actual data bits. we only obtain the parity bits that is even or odd number of "1"s. The obtained parity bit is placed directly to the corresponding word and ML. Thus new architecture has the same interface as the conventional CAM with one extra bit. During the search operation ,there is only one stage as in conventional CAM. Hence ,the use of this parity bits does not improve the power performance.

However, this additional parity bit ,in theory, reduces the sensing delay and boosts the driving strength of the 1-mismatch case (which is the worst case)by half, as discussed below.

In the case of a matched in the data segment (e.g..ML3) the parity bits of the search and stored word is the same, thus the overall word returns a match. When one mismatch occurs in the data segment (e.g..ML2) numbers of "1"s in the stored and search word must be different by 1. As a result, the corresponding parity bits are different. Therefore now we have two mismatches(one from the parity bit and one from the data bits). If there are two mismatches in the data segment(e.g..ML0,ML1 or ML4), the parity bits are the same and overall two mismatches. With more mismatches ,we can ignore these cases as they are not crucial cases. The sense amplifier now only have to identify between the 2-mismatched cases and matched cases. since the driving capability of the 2\_mismatch word is twice as strong as that of the 1-mismatch word, the proposed design greatly improves the search speed and the Ion/Ioff ratio of the design.

III Gated-power ML sense amplifier

1) operating principle

The proposed CAM architecture is shown in Fig 3. The CAM cells are organized in to rows(word) and columns (bit). Each cell has the same number of transistors as the traditional p-type NOR CAM cell



Fig 3. a)proposed CAM architecture (b) each CAM cell is powered by two power rails ,VDDML for the comparison transistors, VDD for SRAM transistors

(shown in Fig1) and use a similar ML structure. However the comparison unit i.e transistors M1-M4, and the "SRAM" unit i.e the cross coupled inverters, are powered by two separate metal rails ,namely VDDML and VDD respectively. The VDDML is independently controlled by a power transistor(px) and a feedback loop that can auto turn off the ML current to save power .The purpose of having two separate power rails VDD and VDDML is to completely isolate the SRAM cell from external power disturbances during compare cycle.

As shown in Fig3 the gated power transistor px, is controlled by a feedback loop, denoted as "power control" which will automatically turn off the px once the voltage on the ML reaches a certain threshold. At the beginning of each cycle ,the ML is first initialized by a global control signal EN. when EN signal is low the power transistor is off. This will make the signal ml and c1 initialized to ground and VDD respectively. After some time the signal EN goes to high and initiates compare phase. If one or more mismatches happen in the cam cells, the ML will be charged up. All the cells of a row will share the limited current offered by the transistor(px), despite whatever number of mismatches. when the voltage of the ML reaches the threshold voltage of transistor M8(i.e vth8), voltage at node c1 will be pulled down. After a short delay the NAND2 gate will be toggled and thus the power transistor px is turned off again. As a result ML is not fully charged to VDD, but limited to some voltage slightly above the threshold voltage of vth8.

Fig 4 shows the simulation result of the proposed power controller . one can see that, the slopes of the ML , node C1 and node MLout depend on the number of mismatches. when more mismatches happen(e.g.128 in the simulation), the ML and node C1 change faster. Less number of mismatches (e.g. 1 in the simulation) will slow down the transition of node C1 and results in a longer delay to turn off power transistor px. The voltage on the ML is finally charged to only around 0.5v.



Fig 4.waveforms of important nodes during evaluation of three rows of 128-bit of the proposed design

which is far below VDD and hence power consumption is reduced. With the introduction of the power transistor px, the driving strength of the 1-mismatch case is about 10% weaker than that of the conventional design and thus slower. However, as we combine this sense amplifier with the parity bit scheme mentioned in section II, the overall search delay is improved by 39%. Thus the new CAM architecture offers both low power and high-speed operation.

### 2) basic CAM cell lay out

Fig 5 shows the layout of basic CAM cell using 32-nm CMOS process. since the new CAM cell has VDD power supply.

It basically includes IN, INB: y, yb: ML.VDD and VSS are the power supply and ground respectively. IN,INB are the input and complementary input which acts as a search data. y, yb are the variables that stores the stored data by applying VDD power supply to the layout. Here ML is the match-line which is the main element for any CAM which acts as a comparison element.







Fig 6. Lay out of proposed CAM cell

## 3) Basic CAM cell Lay out operation

Fig 5 shows the basic CAM cell for traditional CAM. Which compares the stored data by using search data. Whenever the mismatch occurs in CAM cell the match line goes to low voltage according to lay out information. The match condition takes place in the CAM cell ML goes to high voltage.

Fig 6 shows the layout of the CAM cell using 32\_nm CMOS process .since the new CAM cell has a similar topology of that of the conventional design(except using wl gated control unit)their layouts are similar. By using gated control unit we obtain less power even area of CAM cell layout increased.

#### IV PERFORMANCE COMPARISON



Fig 7.output for basic CAM cell

This is the output wave form of basic CAM cell for the layout which is shown in Fig5 . Here the yellow line(IN) is the search data and white line(Y) is the stored data.

In this diagram the stored data and search data will be matched. So the ML (red colour line)indicates high voltage . If search data does not matches with the stored data the match-line will goes to low condition the output line will indicates logic zero in the above diagram.

Here we can check different types of wave forms such as voltage Vs time, voltage Vs current, voltage Vs voltage, frequency Vs time and eye diagram. This is the voltage Vs time waveform and this consumes 0.40 micro watts power which is indicated in above wave form.



Fig 8.output waveform for proposed CAM cell

This is the output of proposed CAM cell for the layout which is shown in Fig 6. Here we obtain less power that is 0.25 micro watts compared to traditional CAM which is low even though the circuit area is more.

Here we are using micro wind tool. By using this tool we can check outputs for circuits in various nanometer technologies.

The following table illustrates the power consumption in various nanometer technology for the traditional CAM and proposed CAM.

Table I. power consumption for traditional and proposed CAM

| Nanometer<br>technology | Traditional CAM       | Proposed CAM     |
|-------------------------|-----------------------|------------------|
| 32nm                    | 0.40micro watts       | 0.219 microwatts |
| 65nm                    | 19.456 micro<br>watts | 1.739 microwatts |
| 90nm                    | 0.311 mill watts      | 8.739 microwatts |

In this project, the traditional CAM and proposed CAMs are verified in various nanometer technologies such as 32nm, 65nm,90nm. In 32nm technology the proposed CAM consumes 0.219

microwatts where as in traditional CAM consumes 0.400 micro watts. Which is very less power consumption.

The Table I is for only one cell we are also done in project 4\*4 cell in different nanometer technology. The following table is 4\*4 traditional CAM and proposed CAM in various nanometer technologies.

Table II. power consumption for 4\*4 traditional and proposed CAM

| Nanometer<br>technology | 4*4 traditional<br>CAM | 4*4 proposed<br>CAM |
|-------------------------|------------------------|---------------------|
| 32nm                    | 1.861 micro watts      | 1.750 micro watts   |
| 65nm                    | 79.985 micro watts     | 41.803 micro watts  |
| 90nm                    | 1.257 milli watts      | 0.297 milli watts   |

By observing the table the power consumption in the proposed CAM cell is less compared to the power consumption in traditional CAM even though the area of the proposed CAM increased. This is good in this project. We achieved less power consumption by using gated power technique.

# V.CONCLUSION

We proposed an effective gated power technique to reduce the peak current and average power consumption. It is much stable than recently publish designs while maintain their low power consumption property. when compared to the conventional design ,its stability is degraded by 0.6% only at extremely low supply voltages. At 1v operating condition ,both designs are stable with no sensing errors, according to our simulations. Its area overhead is about 11%. it is therefore the most suitable design for implementing high capacity

parallel CAM in sub\_32nm CMOS technologies.

## REFERENCES

[1] k. pagiamtzis and A.sheikholeslami, "content addressable memory (CAM) circuits and architectures : A tutorial and survey" *IEEE j.solid state circuits*.vol.41.no.3,pp.712-727.mar.2006.

[2] A.T.Dos.s chen Z.h.kong and K.S.yeo "A low power CAM with efficient power and delay trade-off" in proc.*IEEE int.symp.circuits* syst (ISCAS), 2011,pp. 2573-2576.

[3] I.Arsovski and A.sheikholeslami "A mismatch dependent power allocation technique for matchline sensing in content addressable memories " *IEEE J.solid state circuits* ,vol.38,no.11.pp 1958-1966 Nov 2003.

[4] N.mohan and M.sachdev "low leakage storage cells foe ternary content addressable memories" *IEEE Trans. Very large scale integrated (VLSI)syst.* Vol.17,no.5.pp.604-612,may 2009.

[5]O.Tyshchenko and A.sheikholeslami "match sensing using matchline stability in content addressable memorys(CAM)" *IEEE J.solid state circuits*, vol.43.no.9.pp.1972-1981.sep,2008.

[6] N.mohan, w.Fung , D.wright and M.sachdev "A low power ternary CAM with positive –feedback match-line sense amplifiers ", *IEEE Trans.circuits syst .I.Reg. papers*, vol.56,no.3,pp 566-573.mar,2009.

[7] S.baeg "low power ternary content addressable memory design using a segmented match-line". *IEEE Trans. Circuits syst.I.Reg papers*, vol.55,n0.6.pp1485-1494,jul.2008.

[8] K.pagiamtzis and A.sheikholeslami, "A low power content addressable memory (CAM) using pipelined hierarchical search scheme", *IEEE J.solid-state circuits* ,vol.39,no.9,pp.1512-1519,sep-2004.



Email:yatham.jayalaksmi@gmail.com



yatham jayalakshmi devi

masters pursuing of technology in digital systems and computer electronics, Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal, Andrapradesh, India.

Mr.D. satya narayana ,ph.D. professor&Head of ECE Department, Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal, Andra pradesh, India. professional He has the memberships of the MISTE.FIETE.MIEEE.He has published more than 30 international , national journals , confereces .He is guiding 5 research scholoars

Email: dsn2003@rediffmail.com