Exam in Datateknik C, Databaser 2003-01-15 (English version)

Mitthögskolan
ITE
Thomas Padron-McCarthy (tpm@nekotronic.se)

Exam in

Datateknik C, Databaser 5 poäng

Wednesday January 15, 2003

(This exam is available in a Swedish and an English version.)

Utilities: None.

Points: Maximum 60 points.
30 points required to pass.

Results and solutions: Will be posted on the course home page no later than Wednesday January 29, 2003.

Examiner and teacher on duty: Thomas Padron-McCarthy, telephone 070-7347013

In addition to the instructions on the cover of the test, please also:

Write readably and clearly. Solutions that can't be read will of course not give any points. Unclear sentences will be misunderstood.
Solve at most one problem on each sheet of paper. Write on one side of the paper only. Do not use a red pen or pencil.
Additional assumptions to those in each problem must be stated. Such assumptions may of course not alter the given problem.

GOOD LUCK!

Scenario for the problems:

Hulda creates a database for her CD records. The database contains three tables:

Artist(Anr, Name, Country)
Record(Rnr, Name, Artist)
Song(Record, Ordernumber, Name, Length)

In the table Artist, Anr is the primary key, and Name is an alternate key. In the table Record, Rnr is the primary key, and Artist is a reference attribute to Anr in the table Artist. Also, the combination of Name and Artist is an alternate key. In the table Song, Record and Ordernumber form a composite primary key, and Record is a reference attribute to Rnr in the table Record.

Problem 1 (5p)

Write the following queries both in SQL and in relational algebra:

a) (1p) What are the names of the records made by the artist Carola?

b) (2p) What is the sum of the lengths of all songs on the record Stranger by Carola?

c) (2p) Which artists are present in the database, while having no records in the database? We want to know the names of those artists.

Problem 2 (3p)

Queries of type (b) above, that require summing all the song lengths on a record, turn out to be very common. To speed up these queries Hulda adds a column Length to the table Record. This column contains the total length of all songs on that record. Hulda finds that it is difficult to keep the values in that column updated, so that they reflect the actual sum of the individual song lengths. For example, she must remember to change the content in Length if she discovers that a song length is wrong and has to be changed. Show how an active database could help her in her work! (You don't have to write any exactly correct SQL code for this, but explain what Hulda should do, and what the database management system will do.)

Problem 3 (8p)

The artist Carola has recorded a song called The Fourteenth of May, and we wish to know on which record we can find that song. We use the following SQL query:

select Record.Name
from Artist, Song, Record
where Artist.Name = 'Carola'
and Song.Name = 'The Fourteenth of May'
and Song.Record = Record.Rnr
and Record.Artist = Artist.Anr;

a) (1p) Perform the systematic translation of that SQL query into relational algebra, that is, translate it to the initial query tree, and draw that query tree.

b) (2p) A heuristic query optimizer transforms the query tree to a more efficient form. Draw the result!

c) (2p) Explain which transformations were made by the heuristic query optimizer, and why they (probably) will make the query faster to execute.

d) (1p) The database management system can execute a query tree in two different ways: using materialization, or using pipe-lining (also called "stream-based processing"). Explain the difference.

e) (2p) Which shortcomings (or problems) do a heuristic query optimizer of this type have? If we instead use a cost-based query optimizer, some of these shortcomings can be helped. How?

Problem 4 (7p)

a) (4p) Choose two algorithms that a database management system can use internally to execute the join operation, and describe how they work.

b) (2p) Is one of these two algorithms always the best one, or can which one that is best vary, and (in that case) on what does that depend?

c) (1p) What does "best" in problem b above mean?

Problem 5 (10p)

Here we have record number 717. Who made it? We use the following SQL query:

select Artist.Name
from Record, Artist
where Rnr = 717
and Record.Artist = Artist.Anr;

Hulda has 1000 records, with 100 different artists. On an average, there are ten songs on each record. She hasn't defined any indexes or other useful data structures. The database management system has a good query optimizer.

a) (2p) State reasonable assumptions concerning disk block size etc, and calculate how long, on an average, it will take to execute the query.

b) (2p) Hulda gets a job on the radio station Radio Megamix (with its well-known slogan "the most boring mix of songs you heard before") and decides to use a similar database for all the radio station's records. The radio station's record collection is one thousand times larger than Hulda's, so this database contains 1000000 (10⁶) records, with 100000 (10⁵) different artists. There are still ten songs on each record, on an average. How long does it take to execute the same query, still without any indexes or other useful data structures?

c) (2p) Hulda thinks the query is too slow. Which indexes should be created so that this particular query will execute quickly? Justify your answer.

d) (4p) Hulda creates these indexes. Hulda's database management system uses B⁺-trees for its indexes. How long will the query now take to execute? Show how you calculated the answer.

Problem 6 (4p)

The radio station Radio Megamix is going to make its record database searchable on the web, so that radio listeners can request records. Because of this, they have to connect their web site to the database. Hulda is considering using CGI scripts.

a) (2p) Explain how a CGI script would work in this case, so that data from the database can be shown by a web surfer's web browser.

b) (2p) Which alternatives are there to CGI scripts in this case? Which advantages and disadvantages do these alternatives have, compared to using CGI scripts?

Problem 7 (4p)

The people who work at the radio station need to access the database. They could work using the web, but Hulda, who is the database administrator, decides to write a special program for this. The program must be able to connect to the database, and Hulda is choosing between three different technologies: ODBC, ESQL and this particular database management system's own function library. Explain, in short, how the different technologies work, and which advantages and disadvantages they have, compared to each other. Would you recommend one of them? Justify your answer!

Problem 8 (12p)

All the employees at the radio station work diligently at adding all the records to the database, and correcting any errors that are discovered. Luckily, the database management system has ACID transactions.

a) (4p) Explain, for each of the four properties (A, C, I and D), what it means, and what could happen if it wasn't present.

b) (4p) The database management system uses two-phase locking of the simples kind, where data items are locked as late as possible, and unlocked as early as possible. Two transactions, T1 and T2, are executed concurrently, and if there were no locks their operations would be executed in this order:

Step Transaction T1 Transaction T2

1 Read(A)

2 Read(B)

3 A = A + B

4 B = 0

5 Read(C)

6 B = C

7 C = 0

8 Write(B)

9 Write(B)

10 Write(A)

11 Write(C)

Step	Transaction T1	Transaction T2
1	Read(A)
2	Read(B)
3	A = A + B
4	B = 0
5		Read(C)
6		B = C
7		C = 0
8		Write(B)
9	Write(B)
10	Write(A)
11		Write(C)

Show how the transactions will lock and unlock the data items A, B and C, and how they (perhaps) are forced to wait.

c) (4p) Now the database management system will instead use the time stamp method, as it is described in the course textbook, with the so-called "Thomas's Write Rule". The operations of the transactions are run in the same order as shown in the table above. Show what happens, including how the timestamps in the database change. Clearly state if, and when, any of the transactions have to be aborted. (In that case you don't have to show what happens after the transaction is aborted, but can finish your solution there.)

Problem 9 (3p)

Hulda puts searchable maps in the database, to show where the listeners are located. Show how a k-d tree and a quad tree, respectively, that index the following points, can look:

A system of coordinates, with points

Explain the trees that you have drawn.

Problem 10 (4p)

Explain, in short, the following concepts:

object-relational database
reconstruction program
data warehouse
cascade rollback

Utilities:	None.
Points:	Maximum 60 points. 30 points required to pass.
Results and solutions:	Will be posted on the course home page no later than Wednesday January 29, 2003.
Examiner and teacher on duty:	Thomas Padron-McCarthy, telephone 070-7347013