The Mystery of Monte Carlo Simulation

If you are in VLSI industry, sometime or the other, you must have heard this term “Mont Carlo (MC)”. In this post let us understand the literal meaning of Monte Carlo simulation and its application in circuit design field. Going by the wiki definition of Monte Carlo “Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results”. Simplifying the definition, Monte Carlo algorithms are used for introducing random variations within the given limits to explore the corner cases of any problem. The problems fed to Monte Carlo algorithms are spanned over a wide variety of applications including Risk Analysis, Finances, Statistics, Physics, and Electronic Designs.

This was pretty much the introduction of Monte Carlo. Now let’s get to the business and talk about WHY and HOW Monte Carlo algorithms are important in the VLSI design.



In VLSI circuit design during simulation, we run the design through various PVT (Process, Voltage and Temperature)corners with an aim that the circuit should be able to reliably operate at all the extreme conditions. These PVT variations can be generalized as,

  1. Temperature from as low as -40° to as high as 125°C,
  2. Voltage ±10% variation from its nominal value
  3. Process – This is generally two letter convention where first letter is the behavior of NMOS and second letter is of PMOS. TT, SS, FF, SF and FS are the corners generally used. Letter T stands for Typical (Nominal Vt), F for Fast (Low Vt) and S for Slow (High Vt).

Running the design over different PVT corners cover the environmental variations (voltage and temperature) as well as manufacturing variations (process). A very common figure to illustrate the process corner is shown here,

process-cornors

Now the million dollar question is “Can we guarantee the functionality of silicon across all condition by simulating the design across PVT corners???”The answer comes out to be NO and guess what’s the reason??? It’s the manufacturing variations introduced during fabrication of the chip. But, didn’t we cover the manufacturing variations in the process corners (TT, SS, FF, FS and SF)? Yes, we did but that’s not enough. Let me explain !!!

Now just think about the scenario, we have a design which has 1000 NMOS and 1000 PMOS. Let’s say we are running this design at FS corner, considering all the 1000 NMOS are identically FAST and all the 1000 PMOS are identically SLOW. This is not true in real silicon where no two transistors are identical due to Systematic and Random variations. So even after running the design across process corners we are leaving behind the corner case where there is variations across different transistors in the same process corner. This is where Monte Carlo pitches in. It aids in introducing the randomness into the transistors by changing its Vt in different directions such that all the 1000 NMOS/PMOS are different at a time, depicting the real silicon behavior.

Industry-wide, the MC corner files itself are different than the usual process corner files for the reason that the variations at MC corners are different than the process corners hence one should avoid the mistake by running the MC at process corner, which will not be able to hit the worst case for above mentioned reasons.

The Monte Carlo simulations can be done in two ways for any given design, Global Monte and Local Monte. Again the corner files for these two will be different. Let’s understand what these are:

Global Monte: We can think of this Monte run as unconstrained in a way that the variations in this case can span over different process corners. Let’s understand this through a fig,

global-monte-carlo

In the figure, each dot represents one Monte Carlo run and as we can see it will spread the variation by introducing a Vt change in its every single run. The span of the variations in this global Monte run is spread across the process corners as its name also suggests, Global MC.

Local MC: This Monte run is constrained to a particular process corner. In general, first step is to run the design at various PVT coroners to find the worst one. Then second step is to run the Monte on this particular corner to see the functionality on worst of worst corner. Let us say the worst corner in the first step was found to be SS then the Monte variations will look something like this,

local-monte-carlo

This is Local Monte as the scope of variations is limited to a particular corner.

Both the methods have their own set of applications and used across industry to emulate the silicon behavior during simulation and have a working silicon in one go. Undoubtedly, The Mysterious Monte Carlo has many flavors or to say applications and I hope now you can appreciate the use of it to just another application, VLSI Circuit Design !!!

SETUP Time and SETUP Violation in a Single D Latch

Setup and Hold time concept is one of the fundamental concepts that is very necessary for closing and analysing and timing margin. The analysis in digital domain, in Reg to Reg system is very popular but the root cause of Setup and Hold time is often not taken care of in the education system. This Post elaborates the cause of setup analysis in a single D latch taking the Transistor level schematic into account, also I will try to explain the points where Setup and Hold time is measured and why we do so and why we can not write it on any other points.

Fig.1 Displays a Transistor Level Diagram of a simple D-latch, D is the input and I1, I2 are the inverters in the data path of the latch, T1 is the forward path transmission gate and T2 is the feedback path transmission gate, while L_I_1 and L_I_2 are the cross coupled latch inverters. The latch is controlled by the signal CK.

Full_latch

Fig.1 A single D Latch

Fig.2 shows the CK signal used, the CLK_delay is the delay between the rising edge and the falling edge of the CK and CK_bar signal.

CLOCK_and_clock_bar

Fig.2 Clock signals used .

Setup Analysis

1. Setup time is the minimum time required for the data to get settled before the latching edge of the clock in this case it is the Rising edge.

2. The requirement of the setup time arises from the fact that the latching action is performed by the cross coupled inverters L_I_1 and L_I_2, the latch is a Bi-Stable which means that is is stable at two points either (0,1) or (1,0)., this implies that if the latch is at any between logic, it can go in either direction, so to have the safest of the operation the logic at point B should be same as the logic at point C. This means any change in data should propagate to point C before the latching action of the latch begins that is closing of the T2 and opening of T1.

3. Moreover as the first latching edge of the clock arrives in this case it is the rising edge, corresponding transistor of T1 begins switching off and T2 begins conducting, this causes the output to degrade and finally the latch is unable to sample the changed data and it latches the previous data only, which is a functionality failure.

Fig.3 and Fig.4 displays the simulation of setup violation in the D-latch , the blue line is the clock edge, the Red one is the data, which in multiple simulations is pushed near to the clock edge,and the green one is the output data which is inverted of the input data due to the circuit taken.

setup1

Fig.3 Setup Simulation and Violation (A)

setup2

Fig.4 Setup Simulation and Violation (B)

4. As it can be clearly seen that as we push the data towards the clock edge the output data degrades and finally it tries to reach the level 0 but reverts back to the logic 1 as the latching action of the latch overcomes the changed data which was not able to propagate to point C, and the previous data which was residing at C take over the change.

5. So going by this explanation one can say that if this clock is the same as external clock then at the basic explanation we can say that the for the surest of the operation setup time is the data delay which is from the point D to C and it is commonly denoted as MAX Data delay

6. Different measurement standards calculate Setup time differently, some industry parameters measure it like the minimum time before the edge of the clock for which gets output degradation is up to 5% of the relaxed value.

For the basic understanding and explanation we can conclude that for the surety of the operation the data should reach the point C before the latching just kicks in, and this delay is the SETUP time of the latch as if some one violates it, the output data will start degrading, now its on the design requirement, that up to which extent degradation is allowed.

I hope this will suffice your queries, Keep following for the some more posts on Setup and Hold. Please post your queries in the comments or start a thread in the discussion forum.

Synthesis and Functioning of Blocking and Non-Blocking Assignments.

Here are some examples on blocking and non-blocking assignments in Verilog, that can be really useful for the budding design Engineers. First let us discuss the features of these assignments.

  1. They are procedural assignments always used in a procedural block like initial or always.
  2. In BA (Blocking assignment) RHS of the assignment is assigned immediately to the LHS in the active region of the scheduler, that is to understand we can say the assignment is immediate and it does not wait for the procedural block to end. While in NBA (Non-Blocking Assignments) the RHS is calculated first, then it is assigned to the LHS much later in the scheduler, that is to understand after the completion of procedural block (initial or always). Other wise we can say that they get executed in the NBA part of the scheduler which is second last in the sequence after Active and Inactive. .
  3. In BA, assignments are done in the sequence in which they are written, In NBA all the RHS are calculated and stored in the temporary memory of the compiler and then are assigned to the LHS, at the same time, that is there is no set order of assigning, it depends on the compiler.

The following example illustrates the Blocking Assignment

initial 
begin
   a=1;
   b=0;
   y=1; // at 0 simulation time y gets value '1'.
   #5 y=a&b; // at 5 time units y gets updated to a&b, i.e. 1&0=0
   z=y; // at 5 time units z gets updated to y value, i.e. '0'
   #5 temp=a; // temp gets the value of a, i.e. '1' at 10 time units
end

Wave forms for the above example
BAThe following example illustrates the Non-Blocking Assignment

initial 
begin
   a=1;// Use of blocking assignments to initialize 
       // Blocking and non-blocking assignments can not be used together in a single
       // procedural block from the synthesis point of view. 
       // It is used here for initializing purpose.
   b=0; 
   y<=1; // At 0 simulation time y gets 1.
   y<=#5 a&b; // at 5 time units y gets updated
   z<=y; // z has only 'X' as value it does not get updated as reg initializes 
         // to X and it does take the previous value of y which is X not 1 as y has a NBA on it.
   #5 temp<=a; // temp gets the value at 5 time units executes at 5 only as above delay is propagation delay
 // with NBA.
end

Wave forms for the above example
NBA

Example (a) Synthesis view of four bit shift reg.

module shiftreg_4bit(dout,clk,rst,din);
 input clk,rst,din;
 output reg dout;
 reg a,b,c;
 always @ (posedge clk or posedge rst)
  begin
  if (rst)
   begin
   dout<=0;
   a<=0;
   b<=0;
   c<=0;
   end
  else
   begin
   a<=din;  // Non-Blocking assignments here will hold the previous value and assign on the clock edge,         
   b<=a;   // implementing four registers as in conventional shift register.
   c<=b;
   dout<=c;
  end
  end
endmodule

4bit_shftreg

but if changed into Blocking Assignments.

 begin
  a=din;
  b=a;
  c=b; // Now blocking assignments will immediately update their values and will collapse into a wire
  dout=c; //resulting in a single flip flop.
 end

collapse_sr

Example (b) Example for scheduling.

module clk_ba_nba();
reg clk;
initial
 clk<=1;
always @ (clk)
 #5 clk<=~clk; // NBA on clk update after the event, accounted as a change for always block, clock toggles
endmodule

clock_toggle

but if changed assignments to Blocking Assignments

module clk_ba_nba();
reg clk;
initial
 clk=1;
always @ (clk)
 #5 clk=~clk; // Clk gets updated in the active region, change not accounted, clock does not toggle.
endmodule

clk_not_tgle

P.S Whenever you want to implement a fast event independent circuit like combinational circuits where the present value gets updated immediately, Use blocking assignments and whenever there are assignments that to be made together after an event use NBA, usually in sequential circuits.

NAND and NOR gate using CMOS Technology

For the design of any circuit with the CMOS technology; We need parallel or series connections of nMOS and pMOS with a nMOS source tied directly or indirectly to ground and a pMOS source tied directly or indirectly to Vdd. A basic CMOS structure of any 2-input logic gate can be drawn as follows:

Basic CMOS Structure

2 Input NAND Gate

TRUTH TABLE

NAND TTCIRCUIT

CMOS NAND

The above drawn circuit is a 2-input CMOS NAND gate. Now let’s understand how this circuit will behave like a NAND gate. The circuit output should follow the same pattern as in the truth table for different input combinations.

Case-1 : VA – Low & VB – Low

As VA and VB both are low, both the pMOS will be ON and both the nMOS will be OFF. So the output Vout will get two paths through two ON pMOS to get connected with Vdd. The output will be charged to the Vdd level. The output line will not get any path to the GND as both the nMOS are off. So, there is no path through which the output line can discharge. The output line will maintain the voltage level at Vdd; so, High.

 Case-2 : VA – Low & VB – High

VA – Low: pMOS1 – ON; nMOS1 – OFF

VB – High: pMOS2 – OFF; nMOS2 – ON

pMOS1 and pMOS2 are in parallel. Though pMOS2 is OFF, still the output line will get a path through pMOS1 to get connected with Vdd. nMOS1 and nMOS2 are in series. As nMOS1 is OFF, so Vout will not be able to find a path to GND to get discharged. This in turn results the Vout to be maintained at the level of Vdd; so, High.

Case-3 : VA – High & VB – Low

VA – High: pMOS1 – OFF; nMOS1 – ON

VB – Low: pMOS2 – ON; nMOS2 – OFF

 The explanation is similar as case-2. Vout level will be High.

Case-4 : VA – High & VB – High

VA – High: pMOS1 – OFF; nMOS1 – ON

VB – High: pMOS2 – OFF; nMOS2 – ON

In this case, both the pMOS are OFF. So, Vout will not find any path to get connected with Vdd. As both the nMOS are ON, the series connected nMOS will create a path from Vout to GND. Since, the path to ground is established, Vout will be discharged; so, Low.

In all the 4 cases we have observed that Vout is following the exact pattern as in the truth table for the corresponding input combination.

2 Input NOR Gate

TRUTH TABLE

NOR TTCIRCUIT

CMOS NOR

The above drawn circuit is a 2-input CMOS NOR gate. Now let’s understand how this circuit will behave like a NOR gate.

Case-1 : VA – Low & VB – Low

VA – Low: pMOS1 – ON; nMOS1 – OFF

VB – Low: pMOS2 – ON; nMOS2 – OFF

Path establishes from Vdd to Vout through the series connected ON pMOS transistors and Vout gets charged to Vdd level. No path from Vout to GND. Therefore, no discharging and hence Vout will be High.

Case-2 : VA – Low & VB – High

VA – Low: pMOS1 – ON; nMOS1 – OFF

VB – High: pMOS2 – OFF; nMOS2 – ON

In this case path establishes from Vout to GND through nMOS2, but no path to Vdd. So, Vout would get discharged and will be at level Low.

Case-3 : VA – High & VB – Low

VA – High: pMOS1 – OFF; nMOS1 – ON

VB – Low: pMOS2 – ON; nMOS2 – OFF

The explanation is similar as case-2. Vout will be at level Low.

Case-4 : VA – High & VB – High

VA – High: pMOS1 – OFF; nMOS1 – ON

VB – High: pMOS2 – OFF; nMOS2 – ON

No path to Vdd. Path establishes from Vout to GND. So, Vout will be at level Low.

In all the 4 cases we have observed that Vout is following the expected value as in 2 input NOR gate truth table.

For the design of ‘n’ input NAND or NOR gate:

Let’s say n = 3

In case of NAND gate, 3 pMOS will be connected in parallel and 3 nMOS will be connected in series, and other way around in case of 3 input NOR gate. The same pattern will continue even if for more than 3 inputs.

FAQs for Designing a Differential Amplifier

These are some of the commonly encountered questions regarding designing a Differential Amplifier –

Copyright - Gautam Vashisht

Copyright – Gautam Vashisht

If you want to learn the basic design steps of a differential amplifier, then click here

Q-1) When the design technology node changes, say from 180nm to 90nm, what are process-dependent parameters whose values would be changed?

A- On the change of technology node, the process-dependent parameters- µnCox, µpCox, VDD, VTn, VTp would change their values accordingly. For example, for 180nm technology, the value of VDD = 1.8V but for 90nm, it is around 1.2V. Similarly, the rest of the parameters would change their values. For a successful design, care should be taken to find out the correct values of these parameters for a particular technology node.

Q-2) What is ICMR? What is its use in designing?

A- ICMR stands for Input Common Mode Reach. It is a range of common input voltage supplied to the Differential Amplifier circuit for which all the MOSFETs would be in saturation mode to operate as Diff Amp. Its value depends upon the technology as ICMR depends on VDD which itself depends on the technology. It is used to determine W/L ratios of the transistors 3,4,5,6.

Q-3) What is Slew Rate & how it is used in designing?

A- It is the rate of change of output voltage, used to determine the current ID.

Q-4) What is Leakage Power? How can it be determined for the circuit above?

A- Leakage power is primarily the result of unwanted sub-threshold current through the transistor when it is turned off. This sub-threshold-driven leakage power is strongly influenced by variations in the transistor threshold voltage VT. It can be determined by doing transient analysis of the circuit.

Q-5) How to determine the values of µnCox, µpCox, VTn, VTp for the technology used for designing?

A- The values of the above process-dependent parameters can be determined by keeping the nMOSFET & pMOSFET in saturation mode (each in separate circuit) & then looking into their model files for their exact values.