# 79

### Midterm Exam, CS151B/EE116C, Spring 2019

#### General Guidelines:

- You can use a simple calculator, the course textbook, printouts of: pp. 318-340 from chapter 5 of the 3rd edition, and the class notes posted on the class web page. **NOTHING ELSE.**
- It is CHEATING to use any other notes (including your own notes), textbooks, or solutions to homework/example/old-exam problems.
- · Write all answers on the exam pages. Keep the pages stapled.
- Wherever there is space for an explanation, provide explanations and justifications for your answer and clearly state all your assumptions.
- If a problem requires computation, you must produce a result that is as close as possible to an actual number. Leaving calculations undone will **not** be accepted as a correct answer.
- Produce clear, well-organized answers! Points will be taken off for disorganized messy writing.
- Unless specified otherwise, when writing MIPS assembly code, you <u>must</u> use <u>only</u> the \$reg-number notation (e.g., \$22, \$5) to specify a register. Any other notation will be considered incorrect. Furthermore, you cannot use any pseudoinstructions.
- Your answers must fit in the space provided. You must not write on the margins or on the backs of pages. If lines are drawn, your answers must be written on the lines.
- (1) Write your name at the top of this page.
- (2) As discussed in class, the MIPS ISA defines the beq instruction to be a *delayed* branch. Consider the multicycle MIPS implementation shown in Figures 5.28 and 5.37 of Chap 5, **3rd Ed** (pages 323 and 338, slides 5.21 and 5.32).
  - A) For the beq instruction, does the multicycle MIPS implementation implement a *delayed* branch? Your answer must be **yes** or **no**.

Answer:

B) Is the implementation of the beq instruction in the multicycle MIPS implementation correct? Your answer must be **yes** or **no**.

Answer: 100

C) Explain your answers to both Part A and Part B.



The multicyclie MIPS implementation

first computes it the branch is needed and ensues the

PC is the conect value before fetching the new tinstruction,

so the branch isn't belayed. Since me multicycle

MIPS, be a has behavior not specified by

the MIPS ISA, it is technically incorrect.

Consider the single cycle MIPS implementation shown in Figure 4.24, page 271, in the textbook. The ALU is implemented as shown in Figure B.5.12, page B-36, in the textbook. In this implementation, the Bnegate input is connected to the Binvert inputs (Figure B.5.10, page B-33) of all the ALU bit slices.

Due to an implementation mistake, the Binvert inputs of all the bit slices are connected to 0 and are not connected to Bnegate. **Except for this specific change**, the circuit is unchanged.

A) Specify in full detail what will be the consequences of this fault when the processor executes programs — how will it change the behavior of the processor as observed by a user/programmer who does not know and does not care how the processor is implemented internally? Be sure to clearly identify each and every consequence of this fault with as much detail and specificity as possible.

If the uper performs subject, \$15, \$15, \$15, \$16, \$10 mill equal \$15+\$16 th rather ham \$15-\$16. If the, user performs \$16 \$16, \$15, \$16, \$16 is set to 1 if \$165-\$165-\$1 and 0 other wise. If me user persons bey \$15,516, taget, the instruction will branch if \$155+\$16 to 1.

B) Explain your answer to Part A based on the effect of the implementation mistake on the operation of **the implementation**.

The error win cause the ALV to not in verting to if a subtraction is needed. So, when it compares A-B, it really compares to that I Compare carry In MM SHM Sell, 1814 outputs of if ALV calculation 60 and will use subtraction on inputs from Irt. Sea instruction subtracts and Irt. Sea instruction subtracts and Irt and branches it result is a.

Do not write in this space or below

- (4) You are evaluating the purchase of a compiler for a processor that you have designed and implemented. Compiler C<sub>1</sub> is sold by Seller1 and compiler C<sub>2</sub> is sold by Seller2. Seller1 argues to you that you should purchase C<sub>1</sub> since the code produced by C<sub>1</sub> has a higher IPC. A higher IPC means a higher rate of instruction execution and that demonstrates the superiority of C<sub>1</sub>. Seller2 concedes that, for the same program, the code produced by C<sub>1</sub> has a higher IPC than the code produced by C<sub>2</sub>. However Seller2 claims that, despite this, the execution time of the code produced by C<sub>2</sub> is actually shorter. Hence, argues Seller2, you should purchase C<sub>2</sub>.
  - A) Is it possible that Seller2 is correct? Your answer must be **yes** or **no**.

Answer: Yes

B) If your answer to Part A is **yes**, explain and provide a concrete example. If your answer to Part A is **no**, explain.

Suppose IPC of complex Gre run on the same processor, so check

15 IPC, = 3. Both complex Gre run on the same processor, so check

Vica the is the same (Assume Itis 4/6 Hz), If the complex C,

generates 50 Msmutons for a program, but complex a generates

30, the Execution time of C/2 150 Hz 125 × 10-95, and executions

time of (2= 30/3 (4 × 10 4) = 2.5 × 15 95. Cz 15 faster, 2 14 generates less instructions

than CI

(5) Consider the multicycle MIPS implementation shown in Figures 5.28 and 5.37 of Chap 5, 3rd Ed (pages 323 and 338, slides 5.21 and 5.32). For this implementation, the MIPS jal instruction can be implemented so that it executes in three cycles. Consider replacing the jal instruction with an instruction, called njal, that pushes the return address on the stack instead of saving it in \$31. The processor that is almost identical to MIPS but with njal instead of jal is called NMIPS. With the multicycle implementation of NMIPS, the njal is executed in four cycles.

It turns out that for some programs the total performance overhead for procedure calls is lower with NMIPS than with MIPS. Explain how that is possible. Your answer must be as specific and **quantitative** as possible.

when we have mare procedure calls, we will need to pertorm a fall operations to call the procedures and stone from all ressesses & Don't need to preserve leaf procedure). MIPS needs: Graditional stone instruction to stone sig (4 cycles) but or NMIPS does not. So, MIPS needs 7 ca-1)+3 cycles to call procedures and stone \$19 but NMIPS needs 4n cycles

| (16) | (6) | Consider the following MIPS program segment: |
|------|-----|----------------------------------------------|
|------|-----|----------------------------------------------|

This program is executed in the MIPS implementation shown in Figure 4.65, page 326. Assume that in this figure all the modules shown are properly connected (as discussed in class, some connections are not shown to keep the figure simple).

A) The execution time of the program segment is measured starting from the cycle when the first program is fetched and ending with the cycle in which register \$18 is written. What is the execution time, in clock cycles, of the program segment, as written (unmodified)?

Answer: 11-12 cycles

B) Explain your answer to Part A. hot taken S+ (6-1)=100

Alsuming Stall only owns of branch is taken, me need ill yelles it branch

14 Instructions executed, 13+411) and (12) 4 not taken 66 instructions Break

C) Will the program segment, as written, execute correctly, i.e., produce the correct result in register \$18? Your answer must be **yes** or **no**.

Answer: ^ O

D) Explain, in detail, your answer to Part 🔨 📞

the Ipstage to determine it a branch owns. As a result, the comert value of register 19 connotible forwarded from the premons my mondon, so be bander is mismeet.

E) If your answer to Part C is no, mark on the code above the minimum necessary modifications so that the code will execute correctly. You can only use MIPS assembly instructions.

## Do not write in this space or below

(7) Consider the multicycle implementation shown in Figure 5.28, page 323, and Figure 5.37, page 338, of Chap 5, 3rd Ed (slides 5.21 and 5.32).

Your task is to modify this implementation so that it supports a new instruction, 1wd (load word direct). The instruction is a single 32-bit word in the following format:

| 101100 | Rd | addr |
|--------|----|------|
|        |    | 44.5 |

The width of the addr field is 21 bits. The lwd instruction reads a 32-bit word from memory into a register in the register file. The register number of the destination register is specified in the Rd field shown above. The address from which the word is read is derived from the addr field by multiplying it by 4 and sign-extending the result.

Your modifications must meet the following requirements:

- 1) All the existing functionality must be maintained.
- 2) You must not increase the execution time of any of the other instructions.
- 3) You cannot speed up any of the building blocks used to construct the implementation in Figure 5.28. This also means that you cannot add functionally-identical modules and specify that they are faster.

Within the above constraints, the first priority is to minimize the execution time of the lwd instruction. The second priority is to minimize the implementation complexity. This includes minimizing the number of new states (if any).

If you need to add any new building blocks to the datapath or the control, or replace an existing module with a new module that has some additional capabilities, it is **your responsibility** to make sure that it is completely clear **exactly** how each wire is connected to the new building block. If a new building block is not a **100% precise** copy of an already existing building block, you **must** draw the implementation of the new building block. Make sure that your drawings are not messy.

A) Explain the basic idea of your modifications in 2-4 clear sentences.

we add a shift-left-2 and sign extension modules to extract
altress from instruction, and use a max to pass it to them emory black
to it can be read the update the regost muse so instruction bits cas-21] can
be set as the destingtion register. Finally, we read memory in state 1 at instruction acoop

B) Specify the number of cycles it takes to execute the lwd instruction in your implementation:



C) Show the necessary modifications to the datapath. For your convenience, a copy of the datapath is on page 7 and you <u>must</u> use it. If you need to show the implementation of any new building blocks, show them in the available space on page 6.



D) Are any new control signals required? If so, list them with an explanation here and identify them on the datapath diagram.

Regist is applianted to 2 bits. Now it Register is 2, resister

TS will be yet as destrumon register. Respet has same be haven

it out I. we also added control sisnal Dir, which passes instruction

[20-0] as address of that we may it assured, and easses or ignally computed address

otherwise.

E) On page 8, there is the original state diagram of the control unit. Modify it to reflect your new implementation.

The space below can only be used for showing the implementation of any additional building blocks you need (part of your answer to Part C).

Non-extend Sign-extend

