/ Lists

Python Lists

While data itself is important, you must also know how to organize and structure it. In Python, a common way to do so is through a list. Whether you are an analyst or a fledgling programmer, you should be familiar with lists.

In this tutorial, we will:

  • Explain what a list is and cover the basic components
  • Explain how to use a list
  • Identify when (and when not) to use a list
  • Demonstrate advanced list features

We assume that you have some familiarity with Python, including basic knowledge of strings, floats, and functions. While you can follow along here, it is also helpful to open up a Python interpreter so that you can run the code on your own computer.

The List: the structure of sequence

You've likely already interacted with lists in some way — planning to-dos, preparing for a grocery story trip, etc. With a to-do list, you normally start with the first item and move in succession through your chores. If we need to add more chores, we can add it to the list.

Python lists are similar. Lists are ordered, mutable sequences of items. Ordered means that each item's place in the arrangement matters. Mutable means that the list can be changed (items can be removed or added).

Initializing a list

You can recognize a Python list by its surrounding brackets. To create an emply list, assign a pair of brackets to a variable. Alternatively, you can also use the list() function to create an empty list as well.

# Initialize empty list with just brackets
empty_list = []

# The list() function performs the same action
empty_list2 = list()

While an empty list is good to start with, we currently don't know how to add anything to them. We will cover how to do so later in the article.

For now, we'll create a populated list. Each item in your sequence must be separated by a comma. It's also good practice to give one space after each comma for better readability. Below are some examples of pre-populated lists:

# List of strings
languages = ["Python", "R", "SAS", "SPSS"]
# List of integers 
numerals = [1, 2, 3, 4, 5]

You'll notice these short lists keep everyone on one line. You can also create longer lists, with items broken up across different lines. Each new line of items should be aligned since Python recognizes blocks of code by their indentation levels. Breaking up long lists into lines keeps the items readable while keeping your code within a reasonable line length. The code block below demonstrates two acceptable ways to format longer lists.

multi_per_line = [
    "Python", "R", "SAS", "SPSS",
    "HTML", "CSS", "Javascript", "PHP"
]

one_per_line = [
    "Python", 
    "R", 
    "SAS", 
    "SPSS",
    "HTML", 
    "CSS", 
    "Javascript", 
    "PHP"
]

Choosing between these two options is a matter of preference. I personally use the former method since having only one item per line can often turn a long horizontal list into long vertical list one.

Learning lists with a real-world data set

To play around with lists, let's dive into the Ramen Ratings data set found on Kaggle here. The data set covers over 2,500 ramen reviews from Ramen Rater. Hit "Run" in the editor below — the code below reads in the data, and we'll explain each bit afterwards.


In order to use the ramen data set, we first need to read the data and store it into a variable. The data is contained in ramen-ratings.csv. A CSV is a comma-separated value file, and they are one of the most common type of data files you will encounter. The first row of a CSV contains the column names and the order these columns appear in. As the file name suggests, the elements in each row are separated by commas. The other rows contain the actual data. Each row is a distinct data entry. In the context of ramen-ratings.csv, each new line is an individual review on a ramen product.

Python contains a library of code specializing in handling CSV files aptly called csv. We'd like to use this library, so we need to use the import statement to indicate that we want to use it. Once we've imported it, we are able to use the code contained in the csv library.

The open function is a built-in function in Python and enables us to read our CSV file. The first item we indicate is the file we'd like to open for reading. The other items give more specification about how we want the file to be read. The r specifies that only want to read the file (as opposed to altering it), while utf-8 specifies how the characters should be read in. The specifics are not important to know, but the major takeaway is to know that they help ensure that our data is read correctly. When we open files, it is important that we close them after we are done. Not doing so may have unintended consequences with our computer's memory. The with statement ensures that the file will be closed after we are finished with it.

Within the with statement block, we perform the task of binding the ramen-ratings.csv data to a Python variable using the csv.reader function. Remember that CSV files are organized by lines. The csv.reader knows that each row in the CSV has its columns separated by commas, it separates them and turns them into the items of a single list. This process happens with each row in the CSV. We then wrap this entire statement within a list() function to turn the entire contents into one giant list. The end result is a list of lists with each inner list containing a row of data with a ramen review.

Now that our data is in the form of a list, we'll need to know how to start accessing items within it for us to be able to do any meaningful programming.

Using Lists

Indexing

To start looking at individual items within a list, you must use indexes. Indexes are numbered references to where items are placed in a list. That is to say, indexes are how we describe the first and last items in a list and everything in between.

Imagine a list as a row of boxes, each a number associated with it starting from 1. The numbers are our indexes, and they help us describe what box we'd like to interact with. Likewise, indexes help us indicate to Python what items in a list we'd like to interact with.

Zero-based indexing

You'll see that the first item is indexed by 0 instead of 1. This is because Python has zero-based numbering. This aspect throws off many beginning programmers since intuitively the word first is associated with the number 1. With 0 as the first index, you know that 1 corresponds to the second element, 2 to the third element and so on and so forth.

Currently, the data variable contains the entire contents of ramen-ratings.csv. We'd like to separate the column names from the actual data to keep our code readable and understandable. In order to separate variables, we'll need index them from data.

columns = data[0]
ramen = data[1:len(data)]

When we want to index something from a list, we need to enclose the index in brackets. The index and brackets should appear right after the list we want to index from. We've assigned the column names to columns and the ramen data to ramen.

Since we know that columns is a single list, we can also use indexing to look at single items within it. Using columns[0] as an example, see if you can use the same format to return the third element of columns.


Columns List

We can also use indexing to change that particular elements of a list. To do this, we'd just have to reassign that particular index to another value using =.

# Reassigning the first item of columns
columns[0] = "Review Number"

Columns List Reassignment

Slicing

To access multiple items at once, then intuitively we'd incorporate multiple numbers as indexes. Instead of having to write long lists of numbers, Python provides a convient way to access multiple items in a list through slicing.

In order to create a slice, we need to indicate the start index and the end index of the items of the list we want to slice from, separated by a semi-colon. For ramen, we started with the second item (which we now know is indexed by 1) and ended with the last index. The len(data) part uses a built-in function (len()) in Python that returns the number of items in an object we pass into it, which was a list in this case. The end result is a slice that started from the second item of data and includes everything until the end.

Slice Anatomy

However, the slice data[1:len(data)] isn't the best way to write this slice. Thankfully, Python gives us other options to perform this same slice in a more readable form. One alternative way to index data is to simply exclude the index where we need to end. When we exclude the ending index, Python understands this as grabbing everything from the start index onwards. The result is more concise and readable than having to write out the last index. The code below demonstrates this style below. Using the same syntax, have a go at trying to index from the fifth review to the end.


Alternative Way to Slice

Negative Indexing

Another aspect of slicing you can take advantage of is negative indexing. As its name suggests, negative indexing refers to the use of negative numbers as an indexes. Negative indexes start from the end of a list and move leftward as you get more negative. Thus, an index of -1 will refer to the last element in a list.

Negative indexing is convenient when want a more compact way to index the end of a list without having to invoke len or check how long a list is.

Negative Indexing

We must make ourselves aware of one last bit of slicing behavior. When we slice with both a start and end index in Python, it includes everything from the first index up to but not including the last index. This behavior accounts for the fact that Python is zero-indexed, and it enables us to write our slices without having to adjust them with a -1. If we try to index only the last item using len(), Python will throw an error because the last item is technically indexed as len(data) - 1. Instead of using len() to reference the ends of the list, it's better to use negative indexes to be more concise.

ramen[len(ramen)]
>>> IndexError: list index out of range

# Not bad, but not great
ramen[len(ramen)-1]
>>> ['1', 'Westbrae', 'Miso Ramen', 'Pack', 'USA', '0.5', '']

# Better!
ramen[-1]
>>> ['1', 'Westbrae', 'Miso Ramen', 'Pack', 'USA', '0.5', '']

Adding and removing items from a list

We've covered how to use indexing to access the items of a list, so now we'll explore how to add and remove items using various list methods.

A primer on list methods

If you are unsure about what a method is, we'll cover that here. Methods are another word for functions, but are usually associated with objects. Objects refer to objects in object-oriented programming. This is out of the scope of this article, but the key nugget of knowledge you need to be aware of is that everything in Python is an object. That is to say, everything in Python has methods that are associated with that object.

List methods are particular to lists themselves, so you must make sure that when you call the methods we explore below, you're using them with lists. If you try to use them with strings, Python will throw an error at you.

append() and extend()

Let's say that we get a new ramen review. Chronologically, this review comes after the others, it makes the most sense to add it to the end of our data set. Since it's just a single item we want to add, we'll use the append() function. append() is a list method which takes one argument. The item you pass into append() will be added to the end of the list that you called the method with.

new_review = [
    "2581", "Nongshim", "Pot-au-feu Flavor", "Pack",
    "South Korea", "3.5", ""
]

ramen.append(new_review)

ramen[-1]
>>> ["2581", "Nongshim", "Pot-au-feu Flavor", "Pack", "South Korea", "3.5", ""]

If we have multiple new reviews in the form of a list of lists, using append() will not add these lists correctly. Instead, it will make the whole list of lists itself as the last element of the data set. In this case, we'll want to use extend(). extend() is another list method can take in a list as an argument and will add each element of this argument list to the list that you called the function from.

new_reviews = [
    ["2582", "Maruchan", "Roast Beef", "Pack",
    "Japan", "2", ""],
    ["2583", "Mom", "Homemade Flavor", "Bowl",
    "USA", "5", ""]
]

ramen.extend(new_reviews)

# Ensure that the last item of ramen is not a list of lists, but just a list
ramen[-1]
>>> ["2583", "Mom", "Homemade Flavor", "Bowl", "USA", "5", ""]

Both append() and extend() add items to the end of the list, but what if we want to add a new item within the list? Enter insert(). The insert() method takes two arguments in the given order:

  1. The index where you'd like to add the new item in
  2. The item you'd like to add
inner_review = [
    "1.5", "Mama", "Shrimp Creamy Tom Yum Flavor", "Pack",
    "Thailand", "4", ""
]

ramen.insert(4, inner_review)

pop(), remove(), the del statement, and clear()

We've learned three methods that allow us to add items to our lists, but sometimes we'll need to remove items. We will discuss the main ways to remove items: pop(), remove(), del, and clear().

pop() functions much like an opposite to append(). pop() is a list method that can optionally be passed a single argument. If we do not pass in anything to pop(), it removes the last element of a list and returns it. Even though pop() returns the element that was removed, it also alters the list itself.

# Check the length of the data set
len(ramen)
>>> 2584

ramen.pop()
>>> ["2583", "Mom", "Homemade Flavor", "Bowl", "USA", "5", ""]

ramen.pop()
>>> ["2582", "Maruchan", "Roast Beef", "Pack", "Japan", "2", ""]

# Check it again to see that the list was in fact altered
len(ramen)
>>> 2582

If we pass an index to pop(), it will remove the element at that index and return the value to you (think the opposite of insert()).

ramen.pop(4)
>>> ["1.5", "Mama", "Shrimp Creamy Tom Yum Flavor", "Pack", "Thailand", "4", ""]

Using pop() is fine, but what if you want to specifically name an item to remove? Luckily, the remove() method takes in an argument and will remove the first instance of the argument from the list you call it from. If the item is not present in the list, then remove() will throw an error. With remove(), you don't need to know the index of the item so long as you know the precise item you'd like to remove.

unneeded_item = [
    "2581", "Nongshim", "Pot-au-feu Flavor", "Pack", 
    "South Korea", "3.5", ""
]
ramen.remove(unneeded_item)

not_present = "Fake Review"
ramen.remove(not_present)
>>> ValueError: list.remove(x): x not in list

Another way we can remove items from our lists is through the del statement. Unlike pop() and remove(), del is not a list method. However, we can use the del statement to delete single or even multiple items from a list at a time. We need to know the indices of the items we want to remove. If we want to remove multiple items using del, we need to pass it a slice.

# Delete the first review from the data set
del ramen[0]

# Delete the third and second to last items from the list
del ramen[-3:-1]

If we're done with our list and want to eliminate it in its entirety, we can use clear(). clear() deletes all of the elements from the list that you call it from, but also leaves an empty list behind. This means that the variable holding the list will still exist, but the list itself will by empty. del deletes the the whole list and removes the variable from Python's knowledge.

ramen.clear()
ramen
>>> []

# Remove the list entirely
del ramen

# So that if we try to access it again...
ramen
>>> NameError: name 'ramen' is not defined

An incorrect way to remove items

If you want to remove items from your lists, then the methods described should handle all cases. However, some programmers may be tempted to reassign an undesired item to None. Doing this in our ramen context is not ideal because it leaves behind an item that is inconsistently formatted with the rest of the data.

# Incorrectly try to delete an item
ramen[0] = None

# Now instead of actually deleting the first item, its replaced by None
ramen[:5]
>>> [None, # <- not good!
 ['2579', 'Just Way', 'Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles', 'Pack', 'Taiwan', '1', ''],
 ['2578', 'Nissin', 'Cup Noodles Chicken Vegetable', 'Cup', 'USA', '2.25', ''],
 ['2577', 'Wei Lih', 'GGE Ramen Snack Tomato Flavor', 'Pack', 'Taiwan', '2.75', ''],
 ['2576', "Ching's Secret", 'Singapore Curry', 'Pack', 'India', '3.75', '']]

We've learned how to index items in our lists, and we've learned how to mutate our lists to our liking. These operations form the bulk of the actions you'll need to perform with lists, so we'll start to look at how we can leverage lists to perform common time-consuming tasks much more quickly than we ever could.

When to use a list

Lists and iteration

Lists are useful in situations where you need to iterate over each item and perform some operation on each element. Python provides a simple and intuitive way to iterate through the list through the for loop.

for review in ramen[:5]:
    print(review)
>>> ['2580', 'New Touch', "T's Restaurant Tantanmen ", 'Cup', 'Japan', '3.75', '']
['2579', 'Just Way', 'Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles', 'Pack', 'Taiwan', '1', '']
['2578', 'Nissin', 'Cup Noodles Chicken Vegetable', 'Cup', 'USA', '2.25', '']
['2577', 'Wei Lih', 'GGE Ramen Snack Tomato Flavor', 'Pack', 'Taiwan', '2.75', '']
['2576', "Ching's Secret", 'Singapore Curry', 'Pack', 'India', '3.75', '']

The for loop reads like plain English: for each review in the first 5 elements of ramen, we want to print out what the review looks like. We know that each of the elements in ramen is a review in the form of a list, and that's what we get.

review is a special variable that represents the current item in the iteration. As we go through the for loop, review will be replaced by the next item in the ramen iteration. We can rename review to anything that we'd like, but we chose that name because it's an accurate representation of what each item in ramen is.

With the for loop at our disposal, we can start wrangling our data into more usable forms. One of the most common reasons we'd need to iterate through a list is to capture a particular aspect of each element in the list.

Let's say that we want to have a deeper look at each of the countries represented in our data set. The country in each data row is in the 5th element, so we'll collect them all in their own list by using iteration and the append() method. Hit the code below, and you can confirm that countries now contains information only on the countries in our data set.


Since we want to separately store the countries in their own list, we need to initialize an empty list with a more descriptive variable name. With a list of all the countries on hand, we can start asking some more direct questions of the data. Below, we create a new list unique_countries and only add a country if it's not already inside. Knowing that unique_countries is a list, how would you calculate how many there are?


To arrive at our answer, we performed a similar task to how we gathered all of the countries. First, we created an empty list to store the information we needed. Then, we looped through our countries list and checked if a country was currently in unique_countries. If the country wasn't in the list, it's added to unique_countries. The end result is a list of unique countries in our data set. This pattern is extremely common, and you'll likely find yourself using it frequently.

The combination of iteration and lists is a powerful tool for programmers. We can load in a data set and then make more specific inquiries of our data. Our ramen data set has over 2,500 rows, so it would be cumbersome and inefficient to gather all of the data by hand. The for loop turns this menial task into a trivial one.

Lists and mathematical calculations

Let's say that we want to know about the characteristics of each of the ramen scores in the data. Each score is the 6th element of each review, and they are all in string format. We'll have to iterate through the data again to gather all of the scores and provide the necessary type conversion. However, there's also an "Unrated" score floating around, so we'll need to be cognizant of this as we wrangle the data. If we don't have a list of all floats or integers, Python won't be able to perform the mathematical calculations we want. If you change the code below to remove the "Unrated" filter or remove the float() conversion, what do you think would happen? You may test out your answer below.


Now that we have a uniform list of numbers in the correct format, we can now take advantage of our lists to quickly answer our mathematical and statistical questions. We'll use our list of scores along with some of Python's built-in functions to start asking about how our scores are distributed and what their ranges are. The following code looks at the minimum and maximum scores, the average score of the data set, and the standard deviation.

Lists save us from the grueling task of having to calculate each of the figures above by hand. Take special note of how we leveraged iteration to calculate the sum needed for standard deviation. While Python has implmented a standard deviation function (no doubt, more efficiently) elsewhere in another library, we can still highlight the how lists and iteration helps us perform more complex mathematical calculations in just a few lines of code.

Thanks to our list of scores, we now know that the review scores range from 0.0 to 5.0 with an average score of 3.65. Theoretically, the standard deviation tells us that 68% of the data falls within one standard deviation of the average, assuming the data falls under a normal distribution.

However, we don't know if this assumption holds, so we'll need to test it. While there are numerical ways we can test if the data falls under a normal distribution, it's often simpler and faster to start with visualizing the data first. We can use lists to start visualizing our data and check if it looks normally distributed. If it looks vaguely normally distributed, then we can investigate further, but if not, we save some time eliminating a non-valid assumption. Human beings are visual creatures, so creative visualizations of our data can improve our understanding quicker than any number or calculation can.

Lists and visualization

In order to check the distribution of our data, we need to create a histogram of the count of how many times each score appears in a certain score range, called bins. These bins will cover all the possibilities of scores between 0 and 5 and will be evenly distributed (0 to 1, 1 to 2, etc.). We'll do this counting in the code below and use the matplotlib library to create our plot.

import matplotlib.pyplot as plt

# Each element in bins will represent a count of the values that fall within a one 
# increment score range (i.e 0 to <1, 1 to <2) 
bin_positions = [1, 2, 3, 4, 5]
bins = [0, 0, 0, 0, 0]
for score in scores:
    if score >= 0 and score < 1:
        bins[0] += 1
    elif score >= 1 and score <2:
        bins[1] += 1
    elif score >= 2 and score <3:
        bins[2] += 1
    elif score >= 3 and score <4:
        bins[3] += 1
    else:
        bins[4] += 1

# This code ensures that the plot will be properly labelled and informative
plt.bar(bin_positions, bins)
plt.xticks(bin_positions, ["0-1", "1-2", "2-3", "3-4", "4-5"])
plt.xlabel("Score Bins")
plt.ylabel("Score Frequency")
plt.title('Distribution of scores in Ramen Review data')
plt.show()

Ramen Visualization

The code starts by initializing a list of zeros to indicate that we haven't started counting the scores for each bin yet. Because we need to see how many times a score falls into a bin, we iterate through the scores and use an if-else structure to make sure that the correct bin is incremented. This ensures that all scores are accounted for in the histogram.

We won't delve too deeply into the matplotlib code, but know that on a basic level, most of it formats the plot so that everything labeled informatively. We want our visualizations to be understood immediately, or they ruin the exact purpose of visualization.

The meat of the plot is contained in plt.bar(bin_positions, bins). bin_positions is a list that will simply hold the x-positions for the plot. After the code executes through the if-else structure, bins will contain how many scores fall within each bin. plt.bar will take the two lists and create a bar plot based on the values contained in bin_positions and bins. bin_positions tells plt.bar where the x-positions of the bars should be, while bins describes how high these bars should be (y-positions). The final result is a histogram that reveals that our data is, in fact, not normally distributed. Most of the ramen is highly rated, so we would call this data left-skewed. The skew refers to the few extreme low review scores that push some of the distribution to the left.

We had made an earlier assumption that we successfully rejected using a quick visualization. Lists come in handy for the visualization of our data because of their ordered nature. plt.bar knows to associate the first value of bin_pos with the first value of bins, and so on. Our number of bins was small in our example, but we can easily adjust the granularity of the plot by changing how small the bins are and adjusting the if-else code as such.

Lists and flexibility

The last application of lists we'll discuss relates to their flexibility. Recall that each of the rows in the data set was conveniently representable by a list with each column represented as an element in the list. We could also represent the data set in its entirety by encapsulating all of these lists into a huge super list. Each of the elements of this list was another list. Python lists are extremely flexible because you are able to store anything you'd like in a list, whether its a string, a float, or a dictionary.

A list of lists is reminiscient of a matrix, and these are ubiquitous in real life. All the images that are rendered on your screen can effectively be represented by a list of lists with each element in the inner list being the color value at that precise location. Taken altogether, these individual elements store all of the information needed to convey an image on your computer!

We've demonstrated that lists are essential for data wrangling and analysis, thanks to their inherent structure and the power of iteration. These uses cover much of the basic uses of lists, but you may find another use for them.

Advanced topic: list comprehensions

We've developed a common workflow to follow if we need to wrangle some specific information from our data set. First, we create an empty list to store the values that we need from the data set. Second, we create a for loop to start the process of iterating through our data set. Finally, we perform some operation on each item in the list and store the result of this operation in our empty list. The example below stores all of the different brands seen in the ramen data set. We've seen this pattern many times before in the article.

brands = []
    for review in ramen:
        brands.append(review[1])

The above code is good for a beginner programmer because its easy to read and instantly know what it does. While there's nothing inherently wrong with this structure, Python provides a more concise way to do this exact operation.

This more concise way is called the list comprehension. A list comprehension is a list that is that is created from a basis list, using some aspect of the basis list. The word comprehension doesn't come from the word for understanding, but rather it comes from the set comprehension from set theory. A comprehension can be thought of as defining the property that each member of the list must satisfy.

Despite its complicated sounding name, the structure of a list comprehension is easy to pick up. The below code recreates the iteration from the code above, but uses a list comprehension instead. You can confirm that both methods produce the same output.


The three lines we needed with the for-in loop can be reduced to a one-liner with a list comprehension! It looks remarkably similar to the for-in loop code, but there's no need to create an empty list beforehand. The review variable here serves the same placeholder purpose of the review in the for-in loop code. Similarly, we can easily replace review with any other name, and it will still perform the same task. The review[1] portion is what defines the property or what each of the elements of brands_lc should be defined by based on the elements of ramen (our basis list).

List Comprehension

Another powerful aspect of list comprehensions is that we can incorporate Boolean logic straight into them. Recall the collection of the ramen scores. We needed to filter out any unrated reviews, so we incorporated that logic into the for loop. We can collect all of the scores again using a list comprehension one-liner.

scores_lc = [float(review[5]) for review in ramen if review[5] != "Unrated"]

# Confirming that the list comprehension performs the same operation
scores == scores_lc
>>> True

List comprehensions are a powerful asset to you as a Python programmer. They provide all of the functionality we've previously described in one line of code. This brevity comes at the cost of readability since they are not as intuitive on sight as the for loop versions. Be aware that as you incorporate more complicated properties and more obtuse logic, your list comprehensions will become harder to read and harder to debug. Keep this in mind as you begin to include list comprehensions in your everyday code.

Downsides of lists

We've described some common uses to lists and worked with a Kaggle data set to demonstrate these uses. Lists are great data structures in many situations, but conversely, there are some situations where using a list is unwise or not optimal. Each data structure has its own advantages and disadvantages, and the list is no different.

One of the first disadvantages of lists comes from how they are implemented in Python. Lists must account for each of the pointers to Python objects it contains in addition to the size of these objects. Small lists are no problem for computers, but as you approach large scale list sizes, on the scale of millions, your lists will take tremendous amounts of your computer's memory. The end result will be an extreme slowdown of your computer's ability to perform its operations.

Many of the disadvantages of lists relate to the issue of large scale. When you store data in a data structure, you'll have to reaccess that information again in the future. Because lists are ordered as a sequence, the only way to look for information again in a list is to use indexing or check each item in the list.

Recall that the remove() method looks for a particular element of a list and will delete the first item it encounters that matches. With our ramen data set, the use of remove() seems instant because the data set is relatively small and our computers can traverse the list quickly. But imagine a situation where you'd need to remove an element from a list with millions of items. remove() will start with the first item and check if matches the argument you gave it. It will repeat this until either it finds a match or throws an exception after going through every element. In the worst-case scenario, the item you want to remove is located at the end of the list or is not in the list at all! You'll be waiting a few seconds for a single remove() action to finish, but if you need to perform that operation multiple times, the waiting will quickly build up.

This worst-case scenario applies to any other operation that requires you to look through many or all of the elements of a list: inserting a new item in the middle of the list, creating a list comprehension, or checking if an item is a member of the list at all. As your data sets increase in size, so will the amount of time these common list operations will take.

In these situations, it is important to assess whether the characteristics of a list (ordered and mutable) fit your needs as a programmer. If you need to check if a certain term or key is in your data, your data may better be stored in a dictionary. If you need to perform some functions on some key characteristics of your data, then perhaps objects may be better. We've described many reasons to use a list, but lists ultimately are only one tool in a set of many that you'll obtain as you progress as a programmer.

Conclusion

This tutorial covered how to access items from a list and how to add and remove items, common uses for lists in a data wrangling and analysis perspective, an advanced use of the list, and finally some areas where lists aren't useful or may be problematic. We covered a lot, but since lists are so ubiquitous, its important to be familiar with the most common use cases. In my own experience, I've found a lot of use in understanding list comprehensions, and they've definitely increased my productivity as a data analyst.

Although we've covered many of the basic functions associated with lists, there are still plenty that we haven't covered. If you're interested in learning more, you can consult Python's own documentation on lists. List comprehensions also have much more depth than was covered in this tutorial, but you can read more about them here. I hope that you've learned a lot and can now use lists to your advantage with your own data sets!