Mitthögskolan
ITE
Thomas Padron-McCarthy (tpm@nekotronic.se)
Datateknik C, Databaser 5 poäng
Wednesday January 15, 2003
(This exam is available in a Swedish and an English version.)
Utilities: | None. |
Points: |
Maximum 60 points.
30 points required to pass. |
Results and solutions: | Will be posted on the course home page no later than Wednesday January 29, 2003. |
Examiner and teacher on duty: | Thomas Padron-McCarthy, telephone 070-7347013 |
In addition to the instructions on the cover of the test, please also:
Hulda creates a database for her CD records. The database contains three tables:
In the table Artist, Anr is the primary key, and Name is an alternate key. In the table Record, Rnr is the primary key, and Artist is a reference attribute to Anr in the table Artist. Also, the combination of Name and Artist is an alternate key. In the table Song, Record and Ordernumber form a composite primary key, and Record is a reference attribute to Rnr in the table Record.
a) (1p) What are the names of the records made by the artist Carola?
b) (2p) What is the sum of the lengths of all songs on the record Stranger by Carola?
c) (2p) Which artists are present in the database, while having no records in the database? We want to know the names of those artists.
select Record.Name from Artist, Song, Record where Artist.Name = 'Carola' and Song.Name = 'The Fourteenth of May' and Song.Record = Record.Rnr and Record.Artist = Artist.Anr;
a) (1p) Perform the systematic translation of that SQL query into relational algebra, that is, translate it to the initial query tree, and draw that query tree.
b) (2p) A heuristic query optimizer transforms the query tree to a more efficient form. Draw the result!
c) (2p) Explain which transformations were made by the heuristic query optimizer, and why they (probably) will make the query faster to execute.
d) (1p) The database management system can execute a query tree in two different ways: using materialization, or using pipe-lining (also called "stream-based processing"). Explain the difference.
e) (2p) Which shortcomings (or problems) do a heuristic query optimizer of this type have? If we instead use a cost-based query optimizer, some of these shortcomings can be helped. How?
b) (2p) Is one of these two algorithms always the best one, or can which one that is best vary, and (in that case) on what does that depend?
c) (1p) What does "best" in problem b above mean?
Hulda has 1000 records, with 100 different artists. On an average, there are ten songs on each record. She hasn't defined any indexes or other useful data structures. The database management system has a good query optimizer.select Artist.Name from Record, Artist where Rnr = 717 and Record.Artist = Artist.Anr;
a) (2p) State reasonable assumptions concerning disk block size etc, and calculate how long, on an average, it will take to execute the query.
b) (2p) Hulda gets a job on the radio station Radio Megamix (with its well-known slogan "the most boring mix of songs you heard before") and decides to use a similar database for all the radio station's records. The radio station's record collection is one thousand times larger than Hulda's, so this database contains 1000000 (106) records, with 100000 (105) different artists. There are still ten songs on each record, on an average. How long does it take to execute the same query, still without any indexes or other useful data structures?
c) (2p) Hulda thinks the query is too slow. Which indexes should be created so that this particular query will execute quickly? Justify your answer.
d) (4p) Hulda creates these indexes. Hulda's database management system uses B+-trees for its indexes. How long will the query now take to execute? Show how you calculated the answer.
a) (2p) Explain how a CGI script would work in this case, so that data from the database can be shown by a web surfer's web browser.
b) (2p) Which alternatives are there to CGI scripts in this case? Which advantages and disadvantages do these alternatives have, compared to using CGI scripts?
a) (4p) Explain, for each of the four properties (A, C, I and D), what it means, and what could happen if it wasn't present.
b) (4p) The database management system uses two-phase locking of the simples kind, where data items are locked as late as possible, and unlocked as early as possible. Two transactions, T1 and T2, are executed concurrently, and if there were no locks their operations would be executed in this order:
Step | Transaction T1 | Transaction T2 |
---|---|---|
1 | Read(A) | |
2 | Read(B) | |
3 | A = A + B | |
4 | B = 0 | |
5 | Read(C) | |
6 | B = C | |
7 | C = 0 | |
8 | Write(B) | |
9 | Write(B) | |
10 | Write(A) | |
11 | Write(C) |
Show how the transactions will lock and unlock the data items A, B and C, and how they (perhaps) are forced to wait.
c) (4p) Now the database management system will instead use the time stamp method, as it is described in the course textbook, with the so-called "Thomas's Write Rule". The operations of the transactions are run in the same order as shown in the table above. Show what happens, including how the timestamps in the database change. Clearly state if, and when, any of the transactions have to be aborted. (In that case you don't have to show what happens after the transaction is aborted, but can finish your solution there.)
Explain the trees that you have drawn.