ECE366 Advanced MIPS Simulator Python Program Project 4 Assignment A MIPs project with description below in the pdf.I will upload the previous 3 projects l

Uncategorized

ECE366 Advanced MIPS Simulator Python Program Project 4 Assignment A MIPs project with description below in the pdf.I will upload the previous 3 projects later to help get this one done. UIC – ECE 366: Computer Organization – Spring 2019
Project 4: Advanced MIPS Simulator
For this project, you will form your own team (of 1 – 3 members) to write a python program, which takes as
input a text file i_mem.txt (containing a program of MIPS machine code in hex) and output some important
information of running this MIPS program:
1. General Results: $8 – $23, M[0x2000 – 0x2050], Dynamic Instruction Count
2. (50%) Processor details of various implementations:
a.
b.
(20%) MC: a multi-cycle MIPS CPU:
i. (diagnose mode) cycle-by-cycle information: which assembly instruction is being
executed, at which stage, PC content, and any other information of your design;
ii. total cycle count, percentage of instructions with 3, 4, and 5 cycles.
(15%) Slow-Pipe: a cheap pipelined MIPS CPU, assuming:
• Control hazard: branches are resolved using ALU, and updating PC at the 4th cycle
(M). Furthermore, no flushing mechanism is employed, therefore:
o 3 dummy instructions (NOP) need to be inserted after each conditional
branch to prevent the wrong instructions from being loaded.
• Data hazard: no hazard detection or forwarding paths are implemented, but register
file supports the “write first read next” function within a cycle, so data hazard has
to be solved by inserting dummy instructions (NOP):
o 2 NOPs needed to solve a data dependency between instruction i and i+1;
o 1 NOP needed to solve a data dependency between instruction i and i+2.
your program should output:
i. (in diagnose mode) cycle-by-cycle information: which instructions are currently in the
5 pipeline stages (F, D, E, M, W); hazard information (which and how many NOPs
are used to solve which hazard, etc);
ii. total cycle count, breakdown of delay due to control or data hazards.
c. (15%) Fast-Pipe: a comprehensive pipelined MIPS CPU, assuming:
• Control hazard: branches are resolved at the 2nd ID stage and PC is updated with
the correct target at this stage. Furthermore, flushing is implemented, so a Not
Taken branch will cause no delay, while a Taken branch will result in 1 cycle of
delay.
• Data hazard: full hazard detection, stalling control, and forwarding paths are
implemented, so that the only cases needed to consider delay will be the lw-use
and compute-branch scenario
your program should output:
i. (in diagnose mode) cycle-by-cycle information: which instructions are currently in the
5 pipeline stages (F, D, E, M, W); hazard information (which forwarding path is used,
whether stall / flush is used to solve which hazard, etc)
ii. total cycle count, breakdown of delay due to control or data hazards, comparison /
improvement data over the slow pipelined CPU and the multi-cycle CPU versions.
3.
(40%) Cache behavior of memory access (lw & sw) instructions assuming LRU (least recently
used) policy for replacement, write-allocate for lw:
• (in diagnose mode) for each lw & sw instruction, provide:
 cache access info: memory address (in binary), which set / block of the cache is looked
at, information of every way (valid bit, tag bits), hit or miss,
 cache update info for hit and miss
• overall hit rate for the entire run of the program (hit # / (hit # + miss #))
To get full 40% points, your program should support any cache configurations by allowing the user
to input block size b (# of bytes), # of ways N, and # of sets S. Use the following three configurations
in your report to showcase:
a. a directly-mapped cache, block size of 2 words, a total of 8 blocks (b=8; N=1; S=8)
b. a fully-associated cache, block size of 4 words, a total of 4 blocks (b=16; N=4; S=1)
c. a 2-way set-associative cache, block size of 8 bytes, 4 sets (b=8; N=2; S=4)
If you cannot achieve the configurable implementation, showcase the above individually, for partial
credit (10% for each).
Assume the following limited support / subset of MIPS ISA for your program:
• Instructions: addu, sub, slt, sltu, xor, sll, addi, ori, beq, bne, lw, sw
• Registers:
$0 (always = 0), $8 – $23
• data memory address range:
[0x2000, 0x2100)
• instruction memory address range: [0x0000, 0x1000)
• All the registers / data memory content are initialized to be 0
• The program will end at a dead loop “label: beq $0, $0, label” the machine code of
which is 0x1000FFFF
Your python code should be able to read the file containing a valid MIPS program in hex, simulates its
running, and correctly output the relevant information.
You should be able to use MARS to partially help verifying your code. For example, the behavior of the
following code can be checked by MARs. Here, you can use Tools -> Data Cache Simulator in MARS to
check for the cache access behavior of a program.
Submission deadline and late penalty:
submission by the end of the day of
Thu (April 25th)
Fri
Sat
Sun
Mon
late penalty on your total score
0
5%
10%
15%
20%
Submission components:
Individually, upload the following 10 files (separately, not zipped) on Bb:
1.
2.
3.
4.
p4_groupname_report.pdf:
p4_groupname_sim.py:
p4_groupname_out_p_X.txt:
p4_groupname_out_c_Y.txt:
a self-contain PDF report
Python simulator
Simulator output: processor details of prog X
Simulator output: cache behavior of prog Y
PDF Report components:
Part A) (10 pts) Intro & reflections
1. (2pts) Introduction of your simulator:
•
Functionality – does your simulator achieve (and if so what exactly) for the following components?
o
o
o
o
o
•
MC
Slow-Pipe
Fast-Pipe
Cache
(extra credit) sorting algorithm comparison
Interface – what are the input / output? how does it interact with user?
2. (5pts) Project experience reflection:
•
•
•
List out your group members, and explain your general working style for this project.
Are there any changes of how you work now from the previous projects?
Tell us a few things that you have learned from your team members (including what to do and what
NOT to do) throughout project 2, 3, and 4.
3. (3pts) Tool reflection:
•
•
What are some of the features that you find useful in Python?
What would you advise other students about Python programming for this course in the future?
Part B) (50pts) General result + Processor details for Program X
1. (10pts) Main findings.
•
In general, what’s your conclusion about the performance comparison (as far as program X is
concerned) among multi-cycled CPU, simple pipelined CPU (without forwarding), and aggressive
pipeline (with all forwarding and hardware solution for hazards)? Is it worthy of all the design
efforts and hardware overhead to implement a pipelined CPU, with all the forwarding hardware?
•
Support your conclusion by data in a table or chart – use your simulator to collect abundant data by
running program X with different seeds (use the slt version). Your table and chart should be well
annotated and easy to understand.
2. (40pts) Simulator showcase
Demonstrate the interface and results of your simulator running program X (see appendix).
You will be graded by program correctness (verify your simulator with as many as possible small programs
with MARS), and interface design (make the output unique, easy to understand, and nice!).
Part C) (40pts) General result + Cache behavior for Program Y
1. (8pts) Main findings
•
In general, what’s your conclusion about the cache performance (in terms of overall hit rate of
program Y) of the various cache configurations? What are some of the important tradeoffs of cache
design to minimize miss rate?
•
Support your conclusion by data in a table or chart – use your simulator to collect abundant data by
running program Y with various cache configurations. Your table and chart should be well
annotated and easy to understand.
2. (32pts) Simulator showcase
Demonstrate the interface and results of your simulator running program Y (see appendix) on the three
given cache settings:
a. a directly-mapped cache, block size of 2 words, a total of 8 blocks (b=8; N=1; S=8)
b. a fully-associated cache, block size of 4 words, a total of 4 blocks (b=16; N=4; S=1)
c. a 2-way set-associative cache, block size of 8 bytes, 4 sets (b=8; N=2; S=4)
If your simulator supports configurable cache setting via user input, showcase it with another setting of
your choice:
d. an N-way set-associative cache, block size of b bytes, S sets
Again, you will be graded by program correctness and interface design here.
Part D) (10% extra credit): Sorting algorithm comparison
Now that your simulator can run pretty much any MIPS program and produce interesting data with regard
to processor and cache performance, it can be used to produce many interesting experiments. Here, your
job is to compare two sorting algorithms (pick any two from here: https://www.geeksforgeeks.org/sortingalgorithms/ ) to compare which one is more “cache friendly” – i.e, with lower miss rates.
Specifications:
•
•
•
Use the given code (Program Z) to establish an array
of pseudo random patterns in memory first, and then
provide your own code afterwards to sort the
numbers in memory.
The sorted array should be in increasing order. In
other words, the lowest address has the smallest /
most negative number.
We assume the numbers are signed, so use slt (not
sltu) to compare.
# prog Z, array generation
# seed = 24, size = 0x40
ori $8, $0, 24
addi $9, $0, 0x40
sw_loop:
sw $8, 0x2000($9)
addi $9, $9, -4
beq $9, $0, sw_done
sll $10, $8, 24
addu $10, $10, $8
sub $8, $0, $8
xor $8, $10, $8
Results:
beq $0, $0, sw_loop
•
Which sorting algorithms did you choose? What are
the main findings?
•
Use a table or chart to support your main findings.
Clearly provide all the parameters you have used to
collect the data (how many rounds have you run, on
what seeds, and what cache configurations did you
use, etc).
•
Use screenshots to demonstrate the interface and
results of your simulator running program Z1 and Z2
(the two sorting algorithms).
•
Include the assembly code for your program Z1 and
Z2 at the end.
sw_done:
addi $10, $0, 0x40
#
#
#
#
your sorting code below
provide two versions:
Z1 is one sorting algorithm
Z2 is the other one
end: beq $0, $0, end
Appendix: Testcase Programs
•
Program X: sw pseudo random patters into array, lw count positive numbers
#Prog X, seed = 24, version stl
#Prog X, seed = 7, version stlu
ori $8, $0, 24
addi $9, $0, 0x40
ori $8, $0, 7
addi $9, $0, 0x40
sw_loop:
sw_loop:
sw $8, 0x2000($9)
addi $9, $9, -4
beq $9, $0, sw_done
sw $8, 0x2000($9)
addi $9, $9, -4
beq $9, $0, sw_done
sll $10, $8, 24
addu $10, $10, $8
sub $8, $0, $8
xor $8, $10, $8
sll $10, $8, 7
addu $10, $10, $8
sub $8, $0, $8
xor $8, $10, $8
beq $0, $0, sw_loop
beq $0, $0, sw_loop
sw_done:
addi $10, $0, 0x40
addu $12, $0, $0
sw_done:
addi $10, $0, 0x40
addu $12, $0, $0
lw_loop:
lw $8, 0x2000($9)
slt $11, $8, $0
bne $11, $0, skip
addi $12, $12, 1
lw_loop:
lw $8, 0x2000($9)
sltu $11, $8, $0
bne $11, $0, skip
addi $12, $12, 1
skip:
addi $9, $9, 4
bne $9, $10, lw_loop
skip:
addi $9, $9, 4
bne $9, $10, lw_loop
sw $12, 0x2000($0)
sw $12, 0x2000($0)
end: beq $0, $0, end
end: beq $0, $0, end
•
Program Y: in an array of numbers, find min of neighborhood and store the min to an array after
#Prog Y, parameter = (0x2070, 5)
#Prog Y, parameter = (0x2078, 3)
ori $8, $0, 2
addi $9, $0, 0x60
ori $8, $0, 2
addi $9, $0, 0x60
sw_loop:
sw_loop:
sw $8, 0x2000($9)
addi $9, $9, -4
beq $9, $0, sw_done
sw $8, 0x2000($9)
addi $9, $9, -4
beq $9, $0, sw_done
addu $8, $8, $8
sub $8, $0, $8
addi $8, $8, -3
addu $8, $8, $8
sub $8, $0, $8
addi $8, $8, -3
beq $0, $0, sw_loop
beq $0, $0, sw_loop
sw_done:
addi $8, $0, 0x2070
addi $10, $0, 0x2060
addi $9, $0, 0x2000
sw_done:
addi $8, $0, 0x2078
addi $10, $0, 0x2060
addi $9, $0, 0x2000
outer_loop:
addi $14, $0, 5
lw $11, ($9)
outer_loop:
addi $14, $0, 3
lw $11, ($9)
inner_loop:
addi $9, $9, 4
lw $12, ($9)
slt $13, $12, $11
beq $13, $0, skip
add $11, $0, $12
inner_loop:
addi $9, $9, 4
lw $12, ($9)
slt $13, $12, $11
beq $13, $0, skip
add $11, $0, $12
skip:
addi $14, $14, -1
bne $14, $0, inner_loop
skip:
addi $14, $14, -1
bne $14, $0, inner_loop
sw $11, ($8)
addi $8, $8, 4
slt $13, $9, $10
bne $13, $0, outer_loop
sw $11, ($8)
addi $8, $8, 4
slt $13, $9, $10
bne $13, $0, outer_loop
end: beq $0, $0, end
end: beq $0, $0, end
Small testcases for verification and debugging purposes:
# case 0: DIC = 4
•
addi $8, $0, 5
xor $9, $0, $0
lw $10, 0x2004($0)
sw $8, 0x2000($0)
•
# case 1: DIC = 5
•
ori $9, $0, 0x1000
addu $8, $9, $9
sub $9, $0, $8
slt $10, $8, $9
sltu $11, $8, $9
Multi-Cycle CPU run:
cycle # = 4+4+5+4 = 17
Slow-Pipe:
cycle # = 4 + 4 = 8
# instr entering pipeline: 4
finishing up the last instruction: 4
control hazard delay: 0
data hazard delay: 0
Multi-Cycle CPU run:
Fast-Pipe:
cycle # = 4 + 4 = 8
# instr entering pipeline: 4
finishing up the last instruction: 4
control hazard delay: 0
data hazard delay: 0
•
cycle # = 4+4+4+4+4 = 20
Slow-Pipe:
cycle # = 5 + 4 + 6 = 15
# instr entering pipeline: 5
finishing up the last instruction: 4
control hazard delay: 0
data hazard delay: 2+2+2 = 6
•
Fast-Pipe:
cycle # = 5 + 4 = 9
# instr entering pipeline: 5
finishing up the last instruction: 4
control hazard delay: 0
data hazard delay: 0
•
# case 2: DIC = 6
•
addi $8, $0, -2
loop:
addi $8, $8, 1
bne $0, $8, loop
end:
beq $0, $0, end
Slow-Pipe:
cycle # = 6 + 4 + 6 + 6 = 22
# instr entering pipeline: 6
finishing up the last instruction: 4
control hazard delay: 3×2 (bne) = 6
data hazard delay: 2(addi-addi)
+2(addi-bne) + 2(addi-bne) = 6
# case 3: DIC = 22
•
Multi-Cycle CPU run: cycle # = 15 x 4 + 3×5 + 4×3 = 87
•
Slow-Pipe:
cycle # = 22+4+9+25 = 60
addi $11, $0, -1
ori $8, $0, 12
xor $10, $10, $10
loop:
addi $8, $8, -4
lw $9, 0x2000($8)
xor $9, $9, $11
sll $9, $9, 2
sw $9, 0x2010($8)
bne $0, $8, loop
end:
beq $0, $0, end
Multi-Cycle CPU run:
cycle # = 4+4+3+4+3+3 = 21
•
Fast-Pipe:
cycle # = 6 + 4 + 1 + 1 = 12
# instr entering pipeline: 6
finishing up the last instruction: 4
control hazard delay: 1 (bne Taken)
data hazard delay: 1 (comp-branch:
addi-bne)
•
•
Fast-Pipe:
cycle # = 22+4+2+3 = 31
# instr entering pipeline: 22
finishing up the last instruction: 4
# instr entering pipeline: 22
finishing up the last instruction: 4
control hazard delay: 3×3 (bne) = 9
control hazard delay: 2 (bne Taken)
data hazard delay: 25
1 (xor-addi)
+2×3 (addi-lw)
+2×3 (lw-xor)
+2×3 (xor-sll)
+2×3 (sll-sw)
data hazard delay:
1×3 (lw-xor use)

Purchase answer to see full
attachment

Don't use plagiarized sources. Get Your Custom Essay on

Get an essay WRITTEN FOR YOU, Plagiarism free, and by an EXPERT! Just from $10/Page

Order Essay

Continue to order Get a quote

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

ECE366 Advanced MIPS Simulator Python Program Project 4 Assignment A MIPs project with description below in the pdf.I will upload the previous 3 projects l

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee

Our Popular Essay Writing Services by Subject