Assignment 3: Simulated memory management

Files

Lab materials

The first task is a simulation, and you can use any computer and any programming environment, but I expect C using GCC on Linux will be easiest. For example, file formats can differ between systems, such as text files having "\r\n" (CRLF, or character codes 13 and 10) to mark end of line on Windows, and just "\n" (LF, or character code 10) on Linux.

The optional fourth task requires a Linux PC (or virtual machine).

Background: Virtual memory management and address translation

In a real computer, virtual addresses are translated to physical memory addresses using special hardware. We will write a program that simulates this address translation.

This task is similar, but not identical, to the programming project "Designing a Virtual Memory Manager" from the chapter "Virtual Memory" in the book by Silberschatz et al. In the 9th Edition it is in chapter 9, and in the 10th Edition it is in chapter 10.

We will simulate a computer with a byte-addressed virtual address space of size 220 = 1048576 bytes (1 megabyte, or "mebibyte" as some want to call it), and with a physical memory of size 216 = 65536 bytes. There is just one (simulated) process, so we will only have to deal with a single virtual address space.

You will write a program that reads virtual addresses from a text file, translates each virtual address to a physical address using a page table, and prints the physical address and the value of the byte stored at that physical address.

Some more information about the simulated computer:

Your program will need to translate virtual addresses to physical, copy pages from the backing store to the frames when needed, and update the page table. The virtual address space can be considered read-only, so you don't need to implement writing to the virtual address space, or copying data to the backing store. Also, the test files will not use more pages than fit in the simulated physical memory, so you don't need to implement page replacement. (Unless you wish to do an optional task.)

Address Translation

The book by Silberschatz et al. describes how virtual addresses are translated to physical addresses:
  1. First, the page number is extracted from the virtual address. It is just the high-order bits.
  2. If we are using a TLB (which we will not do at first in this lab), we look for the page number in the TLB. If the page number is found in the TLB, we get the frame number from the TLB.
  3. In case of a TLB miss, or if we don't use a TLB, we look for the page number in the page table. If the page number is found in the page table, we get the frame number from the page table.
  4. If the page number is not found in the page table, it is not in (simulated) physical memory. This is called a page fault. If there is a page fault, a free frame must be found, the data in the virtual page must be read from the backing store to that frame, and the page table must be updated.
  5. Finally, we create the physical address that corresponds to the virtual address, by combining the physical frame number with the offset part (that is, the low-order bits) of the virtual address.
As an example of a virtual address, here is the address 747875 (with normal decimal digits), or 10110110100101100011 with binary digits:

The virtual address 10110110100101100011

As we can see, the page number is 1011011010 in binary, which is 730 in decimal. The offset within the page is 0101100011 in binary, which is 355 in decimal.

Let's say that the data of the virtual page number 730 is stored in physical memory in frame number 47, which is 101111 in binary. The virtual address above would then get translated to the physical address 1011110101100011 with binary digits, or 48483 with normal decimal digits:

The physical address 1011110101100011

The page number 730 was translated to frame number 47 (101111 in binary). The bits for the offset were just copied, and are the same in the virtual and physical addresses.

In your program, you will probably use normal integers (the data type int in C), which are probably 32 bits on your computer, to store both virtual and physical addresses. Just ignore the extra high-order bits:

Addresses stored as 32-bit integers

(Note that this is similar to how modern 64-bit computers work. Addresses are stored with 64 bits, but only the lower 48 bits are actually used.)

Page Faults

Your program will use demand paging, as described in the book. For backing store (or "page file") we will use the provided data file called "backing-store.bin". It contains 1048576 bytes, or 1024 pages with 1024 bytes each. When a (simulated) page fault occurs, you will handle it by finding a free frame in the physical memory, reading the correct page from the file, and storing it in the free frame. You must also update the page table.

You need to treat the data file as a random-access file, so your program can seek to the correct positions in the file, and then read from it. Use standard C library functions such as fopen, fread, fseek, and fclose.

With the test data for the mandatory tasks in this lab, there will always be a free frame, so you will not need to throw out an existing page.

Test Files

The provided text file called "virtual-addresses.txt" contains 1000 integers between 0 and 1048575, representing 20-bit virtual addresses. Let your program read each virtual address from the file, translate it to its corresponding physical address, and print both the physical address and the byte value of the byte at that address.

Also, at the end, your program should report the page-fault rate: The percentage of address references that resulted in page faults.

The file "correct-output.txt" contains the expected output. (Unless I made a mistake in my own implementation!)

(The byte values must be the same, otherwise your program is wrong, but the physical addresses can be different, because they depend on in which frame your program places the page. I have just used the first free frame.)

Some hints

You can use the bit-masking (&, not &&) and bit-shifting (<<) operators in C to extract the page number and the offset from a virtual address. It is also possible to use division and modulus. You might want to start by writing a simple program that just reads an integer and prints the page number and the offset.

Here are some expected results you can compare to:

Virtual address Page number Offset
0 0 0
1 0 1
128 0 128
256 0 256
1023 0 1023
1024 1 0
262143 255 1023
262144 256 0
1048575 1023 1023
1048576 error error

When that program works, and you are confident that it is correct, go on and write the full simulation program.

Unless you are using a very strange and unusual implementation of the C language, the data type char can be used to store an 8-bit byte. If you want to store values from 0 to 255, and not -128 to 127, you might want to use the data type unsigned char.

The program should take the name of the file with virtual addresses (and possibly also the backing store file) as a command-line argument. That makes it easier to test it with different files. You can also write /dev/tty as the file name, and then enter addresses directly in the terminal, finishing the input with CTRL-D.

Task 1

Write the simulation program as described above, with a page table but without a TLB and without page replacement.

Task 2 (optional, counts toward a higher grade than 3)

Add a TLB, a "translation lookaside buffer" to your program. In a real computer, the TLB is implemented in hardware using associative memory, but here you need to simulate it in some way.

Now your program should report both the page-fault rate and the TLB hit rate:

The virtual addresses in "virtual-addresses.txt" were generated randomly, with no memory locality.

The file "more-localized-addresses.txt" contains addresses that are more localized than in the original "virtual-addresses.txt". If your program is working as intended, "more-localized-addresses.txt" should give a much better TLB hit rate.

Also, explain why a TLB is necessary in a real computer!

Task 3 (optional, counts toward a higher grade than 3)

Implement page replacement, either using FIFO or LRU as described in the book. You need to create a new input file with addresses, since the provided input file "virtual-addresses.txt" only uses a part of the virtual address space, and the frames will never run out, so no pages will need to be replaced.

Task 4 (optional, counts toward a higher grade than 3)

Write a C program that allocates lots and lots of memory using malloc (maybe 1 gigabyte at a time), without doing anything with that memory, until malloc returns NULL. How much memory was allocated? Try to explain why! Then let the program actually use that memory in some way, for example by filling it with garbage data. What happened? How much memory could be used? Try to explain what your OS is doing!

Report

Write down and hand in a short summary (1-2 pages) of what you have done. A pdf would be good. Don't forget to answer the questions above. Hand in using Blackboard. All source code for your programs must be attached. It is good if you can also demonstrate your results at one of the scheduled lab times.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se), 30 juni 2023