tl;dr reviewing and walking through a set of steps to solve a mystery, using just the information in a database, which contains the town’s records from around the time of the theft.
Week Review
Week 7 has flown by very quickly. It was interesting knowing more about databases, before this lecture I’ve imagined databases like an unknown blob of info to handle with a special power (aka code), but finally, I’ve retrieved this missing piece of information and they are simpler than I thought. Basically, databases are like different spreadsheet pages linked to each other; a place where data is stored, organized and shared between different tables(spreadsheets). To put and retrieve data from databases you need a specific set of instructions: here comes SQL, a coding language with specific keywords to handle data from databases.
Spreadsheets and databases both store and organize data, the only difference is that spreadsheets are static documents, while databases can be relational.
To me, they seemed something easier to control than 2D arrays or files!
As I wrote on Twitter:
I absolutely loved this problem set! Playing detective by solving mysteries and gathering data through SQL queries have been super fun and have an awesome educational value!
Also, it really shows how powerful big data can be, that’s both scary and interesting.
DEFINITION
It’s basically a SQL detective game, where we need to investigate and retrieve the information needed to solve the mystery.
The CS50 duck has been stolen and all we know is that the theft took place on July 28, 2020, and that it took place on Chamberlin Street.
Starting from this little information we can navigate the Fiftyville database, which contains tables of data from around the town, to know
- Who the thief is
- What city did the thief escape to
- Who the thief’s accomplice is who helped them escape
STEP 1 - KNOWING HOW THE DATABASE IS STRUCTURED
The first thing to do is know how many tables and what kind of tables the Fiftyville database has. So, on the terminal, we make sure we’re on the Fiftyville’s problem set and start SQLite to open the database file using this command line: sqlite3 fiftyville.db
Then we can use .schema to check the schema of our database.
This way we now know the Fiftivylle database has 10 tables:
- crime_scene_reports - containing id, year, month, day, street and a description columns
- interviews - containing id, name, year, month, day and a transcription columns
- courthouse_security_logs - containing id, year, month, day, hour, minute, activity, license_plate columns
- atm_transactions - containing id, account_number, year, month, day, atm_location, transaction_type, amount columns
- bank_accounts - containing account_number, person_id and creation_year columns
- airports - containing id, abbreviation, full_name and city columns
- flights - containing id, origin_airport_id, destination_airport_id, year, month, day, hour, and minute columns
- passengers - containing flight_id, passport_number and seat columns
- phone_calls - containing id, caller, receiver, year, month, day and duration columns
- people - containing id, name, phone_number, passport_number and license_plate columns
STEP 2 - KNOW MORE INFORMATION FROM CRIME REPORTS
Since the only information we have is a date and a street, we need more to go further with the investigations; the first place to start is the crime report table, that we can easily navigate with the date and street condition we already have:
SELECT * FROM crime_scene_reports
WHERE year = 2020 AND month = 7 AND day = 28 AND street = “Chamberlin Street”;
The result is:
295 | 2020 | 7 | 28 | Chamberlin Street | Theft of the CS50 duck took place at 10:15 am at the Chamberlin Street courthouse. Interviews were conducted today with three witnesses who were present at the time — each of their interview transcripts mentions the courthouse.
Thanks to the crime report we now know the time (10:15 am) of the theft and that there are 3 witnesses.
STEP 3 - KNOWING MORE INFORMATION BASED ON THE WITNESSES ’ INTERVIEWS TRANSCRIPTS
We still need more information, so, using the previous ones we can navigate the interviews table on the database to know the witnesses’ transcripts. We also need to mention the courthouse keyword on their transcript column as a condition (since it’s a detail mentioned in the crime report). That way we will only get the 3 witnesses we need to check:
SELECT * FROM interviews
WHERE transcript LIKE “%courthouse%”
AND year = 2020 AND month = 7 AND day = 28;
The results are:
161 | Ruth | 2020 | 7 | 28 | Sometime within ten minutes of the theft, I saw the thief get into a car in the courthouse parking lot and drive away. If you have security footage from the courthouse parking lot, you might want to look for cars that left the parking lot in that time frame.
162 | Eugene | 2020 | 7 | 28 | I don't know the thief's name, but it was someone I recognized. Earlier this morning, before I arrived at the courthouse, I was walking by the ATM on Fifer Street and saw the thief there withdrawing some money.
163 | Raymond | 2020 | 7 | 28 | As the thief was leaving the courthouse, they called someone who talked to them for less than a minute. In the call, I heard the thief say that they were planning to take the earliest flight out of Fiftyville tomorrow. The thief then asked the person on the other end of the phone to purchase the flight ticket.
The witnesses give us a lot of information, thanks to which we can retrieve the thief's name by joining and nesting queries and tables.
STEP 4 - INVESTIGATING WITNESSES ’ RECORDS
Since we have a new set of information, we need to organize them, so I search them individually to know for sure what kind of info I retrieve from the clues.
We can start from the first witness, searching for the car leaving the courthouse mentioned in her record by referring to the theft date and hour; we can search the car by searching the courthouse_security_log table, we just need to be careful to specify the minutes within ten minutes from the theft (detail mentioned from the witness):
SELECT * FROM courthouse_security_logs
WHERE year = 2020 AND month = 7 AND day = 28
AND hour = 10 AND minute BETWEEN 15 AND 25;
From this query, now we know that there were 8 cars leaving the courthouse at the time specified, where we can retrieve license plate numbers.
We can now proceed by checking the second witness clue, searching for ATM logs in the atm_transactions table, using the theft date, the transaction type, and the ATM location specified by the witness:
SELECT * FROM atm_transactions
WHERE year = 2020 AND month = 7 AND day = 28
AND atm_location = “Fifer Street”
AND transaction_type = “withdraw”;
From this query, we know that there were 8 people at ATM in Fifer Street at that time, where we can retrieve the bank account number.
Then, we can finally proceed to the last witness that gives us two clues, one is about searching the phone logs in the phone_calls table, specifying the theft date and the phone call duration specified by the witness. The other clue is about searching the earliest flight out of Fiftyville (the one the thief was planning to take), easily found in the flights table specifying the date for the next day:
SELECT * FROM phone_calls
WHERE year = 2020 AND month = 7 AND day = 28
AND duration < 60;
SELECT * FROM flights
WHERE year = 2020 AND month = 7 AND day = 29
ORDER BY hour, minute ASC LIMIT 1;
From these queries, in the first one, we have 9 phone calls logs, where we can retrieve a receiver phone number and the caller phone number. While in the second query, we have just one flight where we can retrieve the destination airport ID (4) that we will need to know where the thief escaped.
Unfortunately, we can’t do much with this set of information alone, we will need to cross data by joining one or more tables and nesting different queries to skim the result data to obtain the thief name.
STEP 5 - KNOWING THE THIEF NAME
This is the trickiest step, it’s easy to make a mess since we will need to join many tables and queries. We will try to do it as neat as possible using comments to navigate thought queries (comments in SQLite are made with --).
The first thing to do is start with the people table since is where the thief's name is stored and it's the data we want to output.
Then we cross the 3 witnesses' information obtained: the license plate number, bank account number, phone number, and passport number.
(All text below is part of a single query if you’re going to copy it, copy everything till the ; )
SELECT name FROM people
-- Query courthouse security logs table for the license plate
WHERE people.license_plate IN (
SELECT license_plate FROM courthouse_security_logs
WHERE year = 2020 AND month = 7 AND day = 28 AND hour = 10 AND minute BETWEEN 15 AND 25)
-- Query ATM transactions and bank accounts tables for bank account number
AND people.id IN (
SELECT person_id FROM bank_accounts
JOIN atm_transactions ON atm_transactions.account_number = bank_accounts.account_number
WHERE atm_transactions.year = 2020 AND atm_transactions.month = 7 AND atm_transactions.day = 28
AND transaction_type = “withdraw”
AND atm_transactions.atm_location = “Fifer Street”)
-- Query phone calls table for phone number
AND people.phone_number IN (
SELECT caller FROM phone_calls
WHERE year = 2020 AND month = 7 AND day = 28
AND duration < 60)
-- Query flights and passengers tables for passport number
AND people.passport_number IN (
SELECT passport_number FROM passengers
WHERE flight_id IN (
SELECT id FROM flights WHERE year = 2020 AND month = 7 AND day = 29
ORDER BY hour, minute ASC LIMIT 1));
Finally, by crossing all the information we obtain the thief's name: Ernest.
STEP 6 - KNOWING THE CITY WHERE THE THIEF ESCAPED TO
We basically already have the answer here: when we were searching for the flight mentioned by the third witness in the thief call, we found out his flight destination ID: 4. So we just need to find out what city the ID represents in the airports table:
SELECT city FROM airports
WHERE id = 4;
The city where the thief escaped is London.
STEP 7 - KNOWING THE ACCOMPLICE’S NAME
This step can easily be completed after finding the thief's name, we just need to search again on the phone_calls table by adding to the previous research the thief's name and his phone number:
SELECT name FROM people
WHERE phone_number IN (
SELECT receiver FROM phone_calls
WHERE year = 2020 AND month = 7 AND day = 28
AND caller = (
SELECT phone_number FROM people WHERE name = “Ernest”)
AND duration < 60);
The accomplice’s name is Berthold.
With this last query, we solved the Fiftyville mystery by answering all the questions! CONGRATULATIONS!