Lecture 31 - Reviewing Data Structures

# Lecture 31 -- Reviewing Data Structures Prior to the midterm, we learned the shapes of various data structures and the runtime performance of various key operations on each. As you move ahead in computing, you'll be asked to choose data structures to fit the _features_ of problems; runtime may or may not be critical depending on context. This class revisits our data structures to date, this time thinking about what they offer to programmers trying to solve problems. What might we mean? ## What are Features of Data Structure? Consider the following programming problem: > You are writing a program to manage email messages. People want to view email messages ordered by time received, with the most recent messages first. Sometimes, they want that view restricted to the person who sent the email. What are the key features of this problem? - the core data are email messages - we need to keep the email messages _in order_ - we need to be able to _find messages from a particular sender_ How should we organize the email messages? To think through this, we should consider which data structures are good for accessing items in order, and which are good for selecting messages from a particular sender. Once we find good candidates, we can make a decision about which to use. ### How well do data structures do with order? Let's step back. Here are three data structures from this semester: which ones are well-suited to maintaining a user- or programming-controlled order of data elements? - LinkedList - HashMap - DecisionTree A LinkedList is a linear-sequence of elements and programmers can control where elements are placed in that sequence. They are a natural data structure for managing data that needs to be accessed and created with an order. While a HashMap has an array and LinkedLists within its implementation, these are not directly accessible to a programmer trying to manage email messages. While one could associate each message with a number based on its position in the order, then use those numbers as hashmap keys, the programmer would need to track the order numbers in order to access the messages, which is clumsy. A HashMap would require too much overhead to maintain emails in order. A DecisionTree isn't designed to hold single elements of data; it is designed to hold a collection of attribute variables and their values along with a label associated with that collection. There is no order to these collections. #### So which would we use? This analysis suggests that the LinkedList makes the most sense for maintaining order. If we did a similar analysis for filtering by sender, we might conclude that a hashmap makes more sense for that requirement. You'd then want to consider which of the two operations would happen more often, or in the worst case having two data structures (one of each) over the message objects. ## Summarizing Data Structure Features Our above analysis with order is an example of how we might summarize how a data structure aligns with a feature. As a way to synthesize what we've been doing this semester, it makes sense to make a larger table of features against data structures. Here is a [blank version of an initial such table](https://docs.google.com/spreadsheets/d/1IZg6mXWSM23-OinTGtXMNg4eMgSD-P732UOJPKd6_JQ/edit?usp=sharing), if you want to try filling in the analyses in each cell. We worked through a few in class. Here is a [partly-complete version](https://docs.google.com/spreadsheets/d/1WBhNMc0QfWeF3pcfYtuyl_-s_-m4I2s7C4T8N0z1-L0/edit?usp=sharing) of the table. To really check your conceptual understanding, you should try to fill in the remaining cells using a similar style and level of detail as in the pre-populated cells. **Will the staff post the completed table at some point?** No, that defeats the purpose of this exercise. Knowing what different data structures are good for is something you have to synthesize/assemble for yourself. If we provide the answers, then you'll just try to memorize the answers. Memorization isn't learning. The more you wrestle with these, the more you will deeply learn this material (which you will be drawing on for the rest of your career in CS). We encourage you to try developing responses, to review them with one another, and maybe ask about what you've come up with as part of office hours.