Object-oriented programming (OOP) is a programming paradigm, which is just a fancy word for a particular way of organizing and thinking about your code. As its name suggests, OOP would like us to model our code in the form of objects.
As a programming term, an object is code that bundles data and functionality that are related to each other. You can think of data as the storage of information, like how you can store a string or a number into a variable. Functionality is the use of functions to perform tasks. Many times, an object's functions will also use the data that the object has.
The idea of bundling data and functionality together is contrasted against having them stand alone. The example below contrasts non-OOP programming with OOP. In Python, we can create objects using classes. These classes represent Python's "code bundles," and they indicate to Python that the variables and functions we provide it are related and should be grouped together. You don't have to entirely understand the code below (we'll go in-depth later), but the main point to grasp that a class enables us to group a variable and function together.
# Non-OOP approach to describing a singer singer_name = "Emi" singer_albums = 5 def sing(lyrics): print line
# OOP approach to describing a singer class Singer: name = "Emi" albums = 5 def sing(line): print line
Both blocks of code store similar information and functions, but the crucial difference is their organization. The OOP code groups the variables and function in a bundle using the
class statement. Here, the
album varibles are grouped up with the
sing function within the
Singer block. On a higher level, the
class block also helps to define the concept of a singer for yourself as a programmer. We know that singers in real life have names and albums with their music, so we can try to emulate these facts in our code. In contrast, the non-OOP simply creates descriptive variables and a sole function.
You may wonder, "We've been talking all about object oriented programming, so why are we using the
class keyword. Why isn't it called
Python's classy approach to OOP
As we alluded to earlier, Python itself is a class-based, meaning that objects are created from classes. You can think of classes as blueprints for describing an object's attributes and behavior.
This distinction is slight, but worth mentioning again. Often, you will often hear the terms class and object being used interchangeably. Technically speaking, classes refer to the actual instructions for creating objects, while objects are instances, or distinct realizations, of these instructions. Python's approach to OOP is through classes, and classes are what you'll actually be coding.
All this talk has been theoretical so far, but classes are best learned through example. Let's say we want to represent a programmer in terms of Python code. Instead of just giving descriptive names to variables and functions, we'll create a class that will contain all of our programmer information.
We can start with the bare minimum code needed to define a class. Class names are conventionally started with a capital letter. You can name classes otherwise, but you run the risk of confusing your fellow programmers.
class Programmer: """Represents a programming student and their current skills.""" pass
There are three things to notice in the above block code:
classstatement and the name of the class
- A docstring surrounded by triple quotes
class statement is how we start creating classes. Everything that is indented below the
class statement will be included in the class block. Immediately after the
class statement, you will name what your class should be. In this case, we've called our class
Programmer since we'd like to model real life programmers.
The documentation string or docstring is a description in your code of what the class is supposed to represent. You don't need to include a docstring into your classes, but it is good practice to include them so that you can document what your code does for future reference. Docstrings are also required to be placed on the line immediately after the
class statement. You can see what your docstring is using the
print(Programmer) Represents a programming student and their current skills.
Docstrings can contain much more complex information, such as how to use the class or explanations of what should be put into the class. However, our current class is simple, so a simple one-liner will suffice. If you'd like more information on docstrings and their uses, have a look at the official documentation.
Finally, we have the
pass keyword. The
pass statement is what we use to create a null operation, which is an operation that does nothing. Absolutely nothing. After we create our
Programmer class and give it a helpful docstring, those are now really the only things in the class since
pass does nothing. If we do not include
pass in the class, Python will throw an
Indentation Error since it is expecting code to populate the class. The ultimate result of our
Programmer class block is that we can create
Programmer objects, but they have no variables or functions we can use.
Remember that OOP wants us to bundle code together in an understandable manner to us as programmers. Our
Programmer objects currently don't do much, but we'll continue to add in code in ways that will let us describe them in programmer-esque attributes.
Properties: how we describe objects
In OOP-speak, we need to add properties to the
Programmer class. Properties are the data in the "data and functionality bundle" defintion of an object. Naturally, we'd describe programmers by what kinds of languages they know and how many years of experience they have, so we can recreate these ideas as properties in the
Programmer class. In order to create these properties, we'll name and set variables within a special function called
__init__() (more on this soon). The
__init__() function will take in arguments and assign them to whatever properties you specify. Just as a heads up, it's okay if you don't understand all of the terms used below, we'll go into careful detail into everything soon.
class Programmer: """Represents a programming student and their current skills.""" def __init__(self, name, langs, years): self.name = name self.lang = langs self.years = years
This is the same
Programmer class as before, but with some major revisions:
- We've removed the pass statement, since our classes need to start describing programmers.
- We've created an
__init__()function that takes in 4 arguments.
- We've assigned the arguments to some strange variables starting with
The idea of
self are historically difficult topics for starting Python programmers, so we'll break them down and describe how they are related to our
Programmer class properties.
The init() function: assigning properties to the object
At the most basic level,
__init__() is a function. Like other functions, you give
__init__() some arguments and it will do something with those arguments. What makes
__init__() special is that it is run immediately when you create an object. The reason this function is important is because it enables us to establish what our object properties are when the object is created.
__init__() constructs or builds our objects. Everytime we create an object,
__init__() is called.
In our current
__init__() function, we pass in four arguments. Three of these arguments —
years — are properties that we want to assign to each of our
Programmer objects. The remaining one,
self, is a special argument that requires much more explanation.
Self: how objects refer to themselves
Objects must be able to access the properties that are assigned to them and not the properties of another object. To do so, objects have a concept of the
self. The concept of a self sounds complicated, but has a simple real world example.
You, the reader, are a human being with your own name, desires and ambitions. You distinguish yourself from other human beings. You are yourself. Objects also have this idea of a "self," recreated in the
self variable. The
self argument is special because it refers to the object itself.
We'll look at our
Programmer block again with our newfound knowledge. We now know that when we actually create an object, the
__init__() function helps us initialize object properties.
__init__() initializes properties by taking in arguments and assigning them to attributes of
self. This ensures that when we create an object with its own properties, it is only that object that gets those properties. When we create another object, we have to give it its own properties as well. Not having a
self variable here means that there is no reference to the object itself when it is actually created. Thus, without
self, we cannot properly give each object their own properties. If we cannot create individual objects with their own unique properties, we aren't fully taking advantage of the benefits of OOP.
class Programmer: """Represents a programming student and their current skills.""" def __init__(self, name, langs, years): self.name = name self.lang = langs self.years = years
Object properties are created and accessed via dot notation. We have three properties —
years — which are assigned arguments of the same name.
self.name is how the
name property is stored in the object, and we are assigning it the
name argument that is passed into
__init__(). The same assignment happens with
self is a reference to the object itself, it is not needed when we actually create an object. We will create two
Programmer objects and store them in the variables
miho to demonstrate this point.
Notice that creating an object is similar to passing in arguments to a function. The three values we passed in will be given to the
__init__() function, where it will assign these values to the
self. We've created two objects, but
self makes sure that the properties we give to
chris are only given to
chris and that the
miho properties are only in
When we create objects, we must pass in the properties in the order that they appear in
__init__(). In our example above,
name is the second argument in
__init__() and it is assigned to the
self.name property. Since we do not include
self in object creation, the
name argument will correspond to the first value given to the
Programmer objects. "Chris" corresponds to the
name argument in
__init__() function also dictates how many arguments it must be given to correctly create an object. If we try to give
Programmer objects anything other than 3 arguments, it will throw a
TypeError and demand you give it the correct amount.
Finally, we can discuss how to access the object properties within
chris is a variable that stores the object, we can use dot notation with this variable to look at the properties we assigned.
__init__() and the idea of
self take time to fully internalize and understand, but they are critical to understanding how we code our classes and create our objects. Feel free to reread these sections and consult multiple sources solidify your grasp of these concepts.
Methods: codifying object behavior
Currently, we've been referring to
__init__() as a function, but it is more correct to call it a method. The two terms are interchangeable, but the term "method" is usually associated with classes. Methods are the functionality in the "data and functionality bundle" defintion of an object. We would like to give our
Programmer objects some capabilities, so we'll add in some methods.
class Programmer: """Represents a programming student and their current skills.""" def __init__(self, name, langs, years): self.name = name self.lang = langs self.years = years def number_of_langs(self): return len(self.langs) def add_lang(self, new_lang): self.lang.append(new_lang)
We've added two new methods to the
number_of_langs method performs the task of counting the number of items in the
langs property, while
add_lang takes in one argument and adds it to
The concept of
self returns in our introduction to methods. Remember that objects are bundles of data and functionality. The reason we'd like to bundle them together is because there's usually some relationship between the two. With
number_of_langs, we don't need to pass it anything; we are just creating a method that enables an object to describe how many languages it knows. The implementation of the method is simple, but it is only through
self that we can allow objects to describe their individual selves.
chris.number_of_langs() >>> 3 miho.number_of_langs() >>> 2
__init__(), we can pass in arguments to our methods if we'd like to use them.
add_lang takes in an argument and adds it to the
langs property. Because this method alters an object property, we must use
self to make sure that we refer the object itself.
The methods we created are simple implementations. Currently,
number_of_langs just uses the built-in
len function. However, as our class code gets more complicated, we may want to change how
number_of_langs is implemented, but still keep the method name. Keeping the method name has time-saving benefits. If you've been using the
number_of_langs method multiple times throughout a script, changing the method name would mean looking for every single instance of the method and adjusting it. If we only change its implementation, we can rest assured that our code will still work.
Let's say that we want to change our
number_of_langs to only count only certain languages instead of just including everything that a
Programmer knows. All we need to do is change the implementation of
# New number_of_langs method class Programmer: ... def number_of_langs(self): related_langs = 0 for language in self.langs: if language not in ["HTML", "CSS", "Swift, "Visual Basic"]: related_langs += 1 return related_langs
The key takeaway here is that we can implement any kind of code within our methods and keep them hidden. If we'd like to use this bit of code, we only need to use the method itself and not have to worry about the implementation. This idea of hiding the underlying implementation of your objects is referred to as abstraction. Abstraction is powerful because it allows us to hide complicated implementations behind methods in our objects and use these methods without having to keep referring back to the object properties. As you start creating your own objects and methods, the time-saving benefits of abstraction will make themselves quickly known.
Properties II: class vs instance
Programmer class currently has only one type of property: an instance property. Instance properties are particular to each instance of a class, just as "Chris" was particular to only
chris. These properties are created at the time of instantiation with the
self makes noticing instance properties easy: anything with self denotes that that that property is related only to that object.
However, there is another important type of property: the class property. Instance and class properties are similar in most aspects with one important difference: class properties are shared across all instances of the class. We'll give our
Programmer class a class property to demonstrate this.
class Programmer: """Represents a programming student and their current skills.""" favorite_language = "Python" def __init__(self, name, langs, years): self.name = name self.lang = langs self.years = years def number_of_langs(self): return len(self.langs) def add_lang(self, new_lang): self.lang.append(new_lang)
favorite_language was as simple as creating a regular variable within the class block. Notice that the class property does not contain a
self reference. This should make sense since instance properties contain all the attributes that we want to assign to individual objects. We can confirm below that all
Programmer objects share a love for Python.
>>> chris.favorite_language "Python" >>> chris.favorite_language == miho.favorite_language True
Class properties are important because you may find that some properties are better characterized as being shared by all members of a class. Instead of having to initialize every object with the same value in
__init__(), we can just create a class property to ensure that it is shared between all objects of that class.
Now that we've covered properties and methods, you can start creating your own objects. However, no OOP article would be complete without reference to another powerful concept: inheritance between objects.
Inheritance: from one unto another
We've given ourselves a solid foundation to OOP with properties and methods. However, no OOP tutorial would be complete without a discussion of inheritance. Just as an heir inherits a will, we can program our objects to inherit properties and methods from other objects. Inheritance allows us to add new features and behavior to our objects while allowing us to reuse our older object code.
Let's say we want to model two different jobs that programmers might promote into over their careers. Both of these jobs will still require programming skills, but will also need their own distinct attributes and behavior. Instead of writing new classes outright, we'll inherit from the
Programmer class and save ourselves some time.
class SoftwareEngineer(Programmer): """Creates software for consumption.""" motto = "Software is the best thing ever!" def __init__(self, name, langs, years, ide): Programmer.__init__(self, name, langs, years) self.editor = ide def create_software(self): if "Python" in self.langs: print("I made this 'Hello World!' program for you!") else: print("I shouldn't have been promoted yet!") class DataEngineer(Programmer): """Creates data pipelines for use by businesses.""" motto = "Data pipelines are the best thing ever!" def __init__(self, name, langs, years, db): Programmer.__init__(self, name, langs, years) self.database = db def create_database(self): if self.database == "SQLite": print("Yes, I can create a database for you.") else: print("Hold on, let me look at the documentation first...")
DataEngineer classes inherit from our old
Programmer class. Because these two classes inherit from
Programmer, we refer to them as subclasses, or child classes, of
Programmer itself is the superclass or parent class.
As we've done for the
Programmer class, we've given each subclass an important class and instance property in addition to a unique method. Even though you can't see the
Programmer class code, you can still access all of its properties and attributes thanks to inheritance!
Take note on how the structure of the child classes'
__init__() methods: each subclass calls upon its parent's
__init__() method to initialize the parent properties. Furthermore, any properties that aren't set by the superclass will be initialized by the subclass
DataEngineer both inherit from just one class, but what if we'd like to inherit from multiple classes? Python makes multiple inheritance easy since it's a simple extension of regular inheritance. The below code will create a new
SuperSoftwareEngineer class that will inherit from the
SoftwareEngineer class and a new class to gain both of their skill sets.
class Superhero: """Your everyday superhero""" def __init__(self, power): self.superpower = power class SuperSoftwareEngineer(SoftwareEngineer, Superhero): """Not your average software engineer""" def __init__(self, name, lang, years, ide, power): SoftwareEngineer.__init__(self, name, soft_lang, soft_years, ide) Superhero.__init__(self, power) def number_of_langs(self): if self.power == "Super fluency": print("I know all languages!") else: return len(self.langs)
Notice that the
__init__() method of our
SuperSoftwareEngineer class simply calls the
__init__() methods of both its parents. However, you'll need to give a
SuperSoftwareEngineer all of the elements required from both parents to correctly initialize this class.
We've rewritten an older method to account for the fact that
SuperSoftwareEngineer needs a more complex implementation of
number_of_langs. Overriding parental methods allows us to change the behavior of our child classes. You only need to rewrite the method in the child class using the same method name from the parent. For
SuperSoftwareEngineer, Python will first look within the subclass for the most updated
Instead of having to rewrite code for each new class, we've leveraged OOP concepts to build upon more basic functionality and add other features to our classes. We've learned a lot of OOP, so let's bring it all together in a small project!
Project: finding the best candies
You are a budding data analyst in a bit of a predicament. It is Halloween in a few days, and you have no idea which candies to purchase. You want to be known as the best house in the neighborhood for candy opportunities, and you don't want to be on the "trick" end of a trick-or-treat.
Thankfully, we have data on our side! The FiveThirtyEight data set contains data on different candies, their candy-related attributes (such as if the candy is a chocolate or is fruity) and their relative popularity. With this data, we'll be able to pick out candies by their characteristics and make data-driven decisions on what to buy.
We will be using the
csv library to read in the data set. The data is in the form of a CSV, or comma-separated value file. As the name suggests, columns are separated by commas, and rows are separated by line breaks. We'll use a
reader from the
csv library to iterate over each line in the CSV file and bring into a variable we can operate on.
Let's have a look at the column names and the first data row to see what we're dealing with in terms of data.
# Column names candies ['competitorname', 'chocolate', 'fruity', 'caramel', 'peanutyalmondy', 'nougat', 'crispedricewafer', 'hard', 'bar', 'pluribus', 'sugarpercent', 'pricepercent', 'winpercent']
# First data row of the candies data set candies ['100 Grand', '1', '0', '1', '0', '0', '1', '0', '1', '0', '.73199999', '.86000001', '66.971725']]
Looks like most of the columns in this data set are Boolean values, taking up either a 1 for True or 0 for False.
pricepercent indicate percentiles, and
winpercent is how often the candy was chosen over other competitors. We must remember that all the values are all strings, so we'll need to format them correctly.
We could continue with our analyses with our data organized in this way, this list of lists is difficult to use. We'd have to remember which columns represented what on top of remembering which values to use for calculations. Let's try again and give the data better structure and readability for our sanity.
We know that each row in the CSV represents a candy, so let's take advantage of this mental representation. We'll make a
Candy class to store the raw data into appropriately named properties. By doing so, we can preserve the data in the original values we received them. Then, we can leverage these properties to create some helper methods to add more readability to our object.
class Candy: """Represents a candy, its taste attributes and popularity.""" def __init__(self, name, choco, fruit, cara, pean, noug, crisp, hard, bar, plur, sug, price, win): self.name = name self.chocolate_val = choco self.fruit_val = fruit self.caramel_val = cara self.peanut_almond_val = pean self.nougat_val = noug self.crisped_rice_wafer_val = crisp self.hard_val = hard self.bar_val = bar self.pluribus_val = plur self.sugar_percentile = float(sug) self.price_percentile = float(price) self.win_percentage = float(win) # Helper method that will indicate True or False based on the property def is_chocolatey(self): if self.chocolate_val == "1": return True else: return False # Repeat this for all the attributes... def is_pluribus(self): if self.pluribus_val == "1": return True else: return False
All we've added are some simple True-False methods to our class to make it easier to inquire each
Candy about its characteristics. Sure, we could have packed all of this logic into the
__init__(), but our point here was to keep the original data values and enhance usability at the same time.
Now we'll reorganize our data using our Candy objects.
By listing the properties in the class in the order they appear in the data, we only have to remember the order of the candy attributes once. Iteration will do the rest of the work for us. We can start inquiring our
Candy objects about what might be the best candy to buy based on different preferences. For example, what if we'd like to know what the best fruity candy was?
Indeed, Starburst is a good fruity candy to get. You can replace
is_fruity() with any other of the attribute methods we created, and you'll be able to answer different questions based on your interests. You may even try adding in more conditions into the if block to get more nuanced answers to your candy-related questions.
Candy class was useful for looking at the candies on an individual basis, but what if we wanted to get a higher level view of each type of candy? How could we know if chocolates are overall more popular than fruity candies? Where there's a will, there's a class we can make to pave the way! Let's model a group of
Candy objects and create some helpful calculation methods.
class CandyCollection: """Represents a category of candy and all candies that fall into it.""" def __init__(self, candies): self.relevant_candies = candies def calculate_avg_sweetness_prct(self): summed_sugar_pct = sum([rc.sugar_percentile for rc in self.relevant_candies]) return summed_sugar_pct / len(self.relevant_candies) def calculate_avg_price_prct(self): summed_price_pct = sum([rc.price_percentile for rc in self.relevant_candies]) return summed_price_pct / len(self.relevant_candies) def calculate_avg_win_prct(self): summed_win_pct = sum([rc.win_percentage for rc in self.relevant_candies]) return summed_win_pct / len(self.relevant_candies) def best_candy(self): """Returns the most popular candy in the collection.""" current_best = 0 current_best_candy = None for candy in self.relevant_candies: if candy.win_percentage > current_best: current_best_candy = candy current_best = candy.win_percentage return current_best_candy
A quick aside: many of the methods within
CandyCollection use what's called a list comprehension. If you're not familiar with list comprehensions, we will offer a quick overview here. List comprehensions create new lists from a basis list, using some aspect of the basis list to create the items in the new list. We'll take a closer look at the
calculate_avg_win_prct method as an example to explore this and compare it with a
for loop that performs the same task.
# Original method using a list comprehension def calculate_avg_win_prct(self): summed_win_pct = sum([rc.win_percentage for rc in self.relevant_candies]) return summed_win_pct / len(self.relevant_candies) # Same method as above, but uses a longer for loop def calculate_avg_win_prct(self): win_percentages =  for rc in self.relevant_candies: win_percentages.append(rc.win_percentage) summed_win_pct = sum(win_percentages) return summed_win_pct / len(self.relevant_candies)
for loop in the second version of
calculate_avg_win_prct goes through
relevant_candies, which is a list of
Candy objects. The
rc variable you see in both versions is a placeholder variable that represents the current item in the iteration of the
for loop. It takes the
win_percentage and collects that information into
win_percentages. Afterwards, the
win_percentages is summed and averaged so that we know the average win percentage of the
CandyCollection. Whereas the
for loop version took three lines, the list comprehension performs this same task in one line. We use a list comprehension here to save space and give each of the methods in
CandyCollection a similar format. Of course, we can use the
for loops to replace each of the methods, but it results in bulkier code.
We designed our new
CandyCollection class with some useful methods that abstract away our calculations into useful functions. We've taken advantage of the list comprehension to extract the exact property that we wanted from our list of
Candy objects. We'll use them again to filter out for particular candy types. Let's see how chocolates compare to their fruity counterparts.
I don't think the results are surprising, but perhaps you may not like chocolate. Like the previous interactive code, you can switch the attribute testing method to test the candies you want to compare. Maybe you'd like to compare relative sugar percentiles or unit price percentiles?
One last step before we wrap up this investigation. Let's say we want to combine the best candies of multiple candy attributes into one ultimate sweet. Then, we could just combine all the best candies together and ensure that our house is the best candy house of them all.
SuperCandy sounds a lot like our
CandyCollection class, so let's use inheritance! All we'll need to do is rewrite the
best_candy method to better suit our needs.
class SuperCandy(CandyCollection): """The best candy of all time.""" def __init__(self, candies): CandyCollection.__init__(self, candies) def best_candy(self): print("I am the ultimate candy composed of:") for candy in self.relevant_candies: print(" " * 4 + "-" + candy.name)
From your own experience, you enjoy chocolates, fruity candy, caramels, pluribus candies, nougats and hard candies. We can't go wrong if we combine them all!
It sounds a bit wonky, but I guess the data doesn't lie. If you think you can come up with a better combination, try subbing in different candy combinations and see what your ideal SuperCandy is.
Object-oriented programming in Python is a way of organzing your code in bundles of code called classes. Classes are blueprints that combine data with functionality, and objects are instances of the classes. Within these objects, properties are attributes that describe the object, and methods are functions that describe object behavior. Objects can also inherit properties and methods from other objects, in addition to having their own unique behavior. Finally, Python allows us to customize class creation even more with metaclasses.