/ OOP

Object-Oriented Programming in Python

Object-oriented programming (OOP) is a programming paradigm, which is just a fancy word for a particular way of organizing and thinking about your code. As its name suggests, OOP would like us to model our code in the form of objects.

As a programming term, an object is code that bundles data and functionality that are related to each other. You can think of data as the storage of information, like how you can store a string or a number into a variable. Functionality is the use of functions to perform tasks. Many times, an object's functions will also use the data that the object has.

The idea of bundling data and functionality together is contrasted against having them stand alone. The example below contrasts non-OOP programming with OOP. In Python, we can create objects using classes. These classes represent Python's "code bundles," and they indicate to Python that the variables and functions we provide it are related and should be grouped together. You don't have to entirely understand the code below (we'll go in-depth later), but the main point to grasp that a class enables us to group a variable and function together.

# Non-OOP approach to describing a singer
singer_name = "Emi"
singer_albums = 5

def sing(lyrics):
    print line
# OOP approach to describing a singer
class Singer:

    name = "Emi"
    albums = 5
        
    def sing(line):
        print line

Both blocks of code store similar information and functions, but the crucial difference is their organization. The OOP code groups the variables and function in a bundle using the class statement. Here, the name and album varibles are grouped up with the sing function within the Singer block. On a higher level, the class block also helps to define the concept of a singer for yourself as a programmer. We know that singers in real life have names and albums with their music, so we can try to emulate these facts in our code. In contrast, the non-OOP simply creates descriptive variables and a sole function.

You may wonder, "We've been talking all about object oriented programming, so why are we using the class keyword. Why isn't it called object instead?"

Python's classy approach to OOP

As we alluded to earlier, Python itself is a class-based, meaning that objects are created from classes. You can think of classes as blueprints for describing an object's attributes and behavior.

This distinction is slight, but worth mentioning again. Often, you will often hear the terms class and object being used interchangeably. Technically speaking, classes refer to the actual instructions for creating objects, while objects are instances, or distinct realizations, of these instructions. Python's approach to OOP is through classes, and classes are what you'll actually be coding.

All this talk has been theoretical so far, but classes are best learned through example. Let's say we want to represent a programmer in terms of Python code. Instead of just giving descriptive names to variables and functions, we'll create a class that will contain all of our programmer information.

We can start with the bare minimum code needed to define a class. Class names are conventionally started with a capital letter. You can name classes otherwise, but you run the risk of confusing your fellow programmers.

class Programmer:
    """Represents a programming student and their current skills."""
    pass

There are three things to notice in the above block code:

  1. The class statement and the name of the class
  2. A docstring surrounded by triple quotes
  3. The pass statement

The class statement is how we start creating classes. Everything that is indented below the class statement will be included in the class block. Immediately after the class statement, you will name what your class should be. In this case, we've called our class Programmer since we'd like to model real life programmers.

The documentation string or docstring is a description in your code of what the class is supposed to represent. You don't need to include a docstring into your classes, but it is good practice to include them so that you can document what your code does for future reference. Docstrings are also required to be placed on the line immediately after the class statement. You can see what your docstring is using the print statement on your class.

print(Programmer)
Represents a programming student and their current skills.

Docstrings can contain much more complex information, such as how to use the class or explanations of what should be put into the class. However, our current class is simple, so a simple one-liner will suffice. If you'd like more information on docstrings and their uses, have a look at the official documentation.

Finally, we have the pass keyword. The pass statement is what we use to create a null operation, which is an operation that does nothing. Absolutely nothing. After we create our Programmer class and give it a helpful docstring, those are now really the only things in the class since pass does nothing. If we do not include pass in the class, Python will throw an Indentation Error since it is expecting code to populate the class. The ultimate result of our Programmer class block is that we can create Programmer objects, but they have no variables or functions we can use.

Remember that OOP wants us to bundle code together in an understandable manner to us as programmers. Our Programmer objects currently don't do much, but we'll continue to add in code in ways that will let us describe them in programmer-esque attributes.

Properties: how we describe objects

In OOP-speak, we need to add properties to the Programmer class. Properties are the data in the "data and functionality bundle" defintion of an object. Naturally, we'd describe programmers by what kinds of languages they know and how many years of experience they have, so we can recreate these ideas as properties in the Programmer class. In order to create these properties, we'll name and set variables within a special function called __init__() (more on this soon). The __init__() function will take in arguments and assign them to whatever properties you specify. Just as a heads up, it's okay if you don't understand all of the terms used below, we'll go into careful detail into everything soon.

class Programmer:
    """Represents a programming student and their current skills."""
    
    def __init__(self, name, langs, years):
        self.name = name 
        self.lang = langs
        self.years = years

This is the same Programmer class as before, but with some major revisions:

  1. We've removed the pass statement, since our classes need to start describing programmers.
  2. We've created an __init__() function that takes in 4 arguments.
  3. We've assigned the arguments to some strange variables starting with self.

The idea of __init__() and self are historically difficult topics for starting Python programmers, so we'll break them down and describe how they are related to our Programmer class properties.

The init() function: assigning properties to the object

At the most basic level, __init__() is a function. Like other functions, you give __init__() some arguments and it will do something with those arguments. What makes __init__() special is that it is run immediately when you create an object. The reason this function is important is because it enables us to establish what our object properties are when the object is created. __init__() constructs or builds our objects. Everytime we create an object, __init__() is called.

In our current __init__() function, we pass in four arguments. Three of these arguments — name, langs and years — are properties that we want to assign to each of our Programmer objects. The remaining one, self, is a special argument that requires much more explanation.

Self: how objects refer to themselves

Objects must be able to access the properties that are assigned to them and not the properties of another object. To do so, objects have a concept of the self. The concept of a self sounds complicated, but has a simple real world example.

You, the reader, are a human being with your own name, desires and ambitions. You distinguish yourself from other human beings. You are yourself. Objects also have this idea of a "self," recreated in the self variable. The self argument is special because it refers to the object itself.

We'll look at our Programmer block again with our newfound knowledge. We now know that when we actually create an object, the __init__() function helps us initialize object properties. __init__() initializes properties by taking in arguments and assigning them to attributes of self. This ensures that when we create an object with its own properties, it is only that object that gets those properties. When we create another object, we have to give it its own properties as well. Not having a self variable here means that there is no reference to the object itself when it is actually created. Thus, without self, we cannot properly give each object their own properties. If we cannot create individual objects with their own unique properties, we aren't fully taking advantage of the benefits of OOP.

class Programmer:
    """Represents a programming student and their current skills."""
    
    def __init__(self, name, langs, years):
        self.name = name 
        self.lang = langs
        self.years = years

Object properties are created and accessed via dot notation. We have three properties — name, langs, and years — which are assigned arguments of the same name. self.name is how the name property is stored in the object, and we are assigning it the name argument that is passed into __init__(). The same assignment happens with lang and years. Because self is a reference to the object itself, it is not needed when we actually create an object. We will create two Programmer objects and store them in the variables chris and miho to demonstrate this point.

chris = Programmer("Chris", ["Python", "R", "Javascript"], 3)
miho = Programmer("Miho", ["Ruby", "PHP"], 1)

Notice that creating an object is similar to passing in arguments to a function. The three values we passed in will be given to the __init__() function, where it will assign these values to the self. We've created two objects, but self makes sure that the properties we give to chris are only given to chris and that the miho properties are only in miho.

When we create objects, we must pass in the properties in the order that they appear in __init__(). In our example above, name is the second argument in __init__() and it is assigned to the self.name property. Since we do not include self in object creation, the name argument will correspond to the first value given to the Programmer objects. "Chris" corresponds to the name argument in chris.

The __init__() function also dictates how many arguments it must be given to correctly create an object. If we try to give Programmer objects anything other than 3 arguments, it will throw a TypeError and demand you give it the correct amount.

Finally, we can discuss how to access the object properties within chris. Since chris is a variable that stores the object, we can use dot notation with this variable to look at the properties we assigned.

chris.name
>>> "Chris"
chris.langs
>>> ["Python", "R", "Javascript"]
miho.years
>>> 1

__init__() and the idea of self take time to fully internalize and understand, but they are critical to understanding how we code our classes and create our objects. Feel free to reread these sections and consult multiple sources solidify your grasp of these concepts.

Methods: codifying object behavior

Currently, we've been referring to __init__() as a function, but it is more correct to call it a method. The two terms are interchangeable, but the term "method" is usually associated with classes. Methods are the functionality in the "data and functionality bundle" defintion of an object. We would like to give our Programmer objects some capabilities, so we'll add in some methods.

class Programmer:
    """Represents a programming student and their current skills."""
    
    def __init__(self, name, langs, years):
        self.name = name 
        self.lang = langs
        self.years = years

    def number_of_langs(self):
        return len(self.langs)
    
    def add_lang(self, new_lang):
        self.lang.append(new_lang)

We've added two new methods to the Programmer class: number_of_langs and add_lang. The number_of_langs method performs the task of counting the number of items in the langs property, while add_lang takes in one argument and adds it to langs.

The concept of self returns in our introduction to methods. Remember that objects are bundles of data and functionality. The reason we'd like to bundle them together is because there's usually some relationship between the two. With number_of_langs, we don't need to pass it anything; we are just creating a method that enables an object to describe how many languages it knows. The implementation of the method is simple, but it is only through self that we can allow objects to describe their individual selves.

chris.number_of_langs()
>>> 3
miho.number_of_langs()
>>> 2

Like __init__(), we can pass in arguments to our methods if we'd like to use them. add_lang takes in an argument and adds it to the langs property. Because this method alters an object property, we must use self to make sure that we refer the object itself.

chris.add_lang("Matlab")
chris.langs
>>> ["Python", "R", "Javascript", "Matlab"]
chris.number_of_langs()
4

The methods we created are simple implementations. Currently, number_of_langs just uses the built-in len function. However, as our class code gets more complicated, we may want to change how number_of_langs is implemented, but still keep the method name. Keeping the method name has time-saving benefits. If you've been using the number_of_langs method multiple times throughout a script, changing the method name would mean looking for every single instance of the method and adjusting it. If we only change its implementation, we can rest assured that our code will still work.

Let's say that we want to change our number_of_langs to only count only certain languages instead of just including everything that a Programmer knows. All we need to do is change the implementation of number_of_langs.

# New number_of_langs method
class Programmer:
    ...
    def number_of_langs(self):
        related_langs = 0
        for language in self.langs:
            if language not in ["HTML", "CSS", "Swift, "Visual Basic"]:
                related_langs += 1
        return related_langs

The key takeaway here is that we can implement any kind of code within our methods and keep them hidden. If we'd like to use this bit of code, we only need to use the method itself and not have to worry about the implementation. This idea of hiding the underlying implementation of your objects is referred to as abstraction. Abstraction is powerful because it allows us to hide complicated implementations behind methods in our objects and use these methods without having to keep referring back to the object properties. As you start creating your own objects and methods, the time-saving benefits of abstraction will make themselves quickly known.

Abstraction

Properties II: class vs instance

The Programmer class currently has only one type of property: an instance property. Instance properties are particular to each instance of a class, just as "Chris" was particular to only chris. These properties are created at the time of instantiation with the __init__() method. self makes noticing instance properties easy: anything with self denotes that that that property is related only to that object.

However, there is another important type of property: the class property. Instance and class properties are similar in most aspects with one important difference: class properties are shared across all instances of the class. We'll give our Programmer class a class property to demonstrate this.

class Programmer:
    """Represents a programming student and their current skills."""
    
    favorite_language = "Python"
    
    def __init__(self, name, langs, years):
        self.name = name 
        self.lang = langs
        self.years = years

    def number_of_langs(self):
        return len(self.langs)
    
    def add_lang(self, new_lang):
        self.lang.append(new_lang)

Creating favorite_language was as simple as creating a regular variable within the class block. Notice that the class property does not contain a self reference. This should make sense since instance properties contain all the attributes that we want to assign to individual objects. We can confirm below that all Programmer objects share a love for Python.

>>> chris.favorite_language
"Python"
>>> chris.favorite_language == miho.favorite_language
True

Class properties are important because you may find that some properties are better characterized as being shared by all members of a class. Instead of having to initialize every object with the same value in __init__(), we can just create a class property to ensure that it is shared between all objects of that class.

Now that we've covered properties and methods, you can start creating your own objects. However, no OOP article would be complete without reference to another powerful concept: inheritance between objects.

Inheritance: from one unto another

We've given ourselves a solid foundation to OOP with properties and methods. However, no OOP tutorial would be complete without a discussion of inheritance. Just as an heir inherits a will, we can program our objects to inherit properties and methods from other objects. Inheritance allows us to add new features and behavior to our objects while allowing us to reuse our older object code.

Let's say we want to model two different jobs that programmers might promote into over their careers. Both of these jobs will still require programming skills, but will also need their own distinct attributes and behavior. Instead of writing new classes outright, we'll inherit from the Programmer class and save ourselves some time.

class SoftwareEngineer(Programmer):
    """Creates software for consumption."""
    
    motto = "Software is the best thing ever!"
    
    def __init__(self, name, langs, years, ide):
        Programmer.__init__(self, name, langs, years)
        self.editor = ide
        
    def create_software(self):
        if "Python" in self.langs:
            print("I made this 'Hello World!' program for you!")
        else: 
            print("I shouldn't have been promoted yet!")

class DataEngineer(Programmer):
    """Creates data pipelines for use by businesses."""
    
    motto = "Data pipelines are the best thing ever!"
    
    def __init__(self, name, langs, years, db):
        Programmer.__init__(self, name, langs, years)
        self.database = db
        
    def create_database(self):
        if self.database == "SQLite":
            print("Yes, I can create a database for you.")
        else:
            print("Hold on, let me look at the documentation first...")

Both the SoftwareEngineer and DataEngineer classes inherit from our old Programmer class. Because these two classes inherit from Programmer, we refer to them as subclasses, or child classes, of Programmer. Programmer itself is the superclass or parent class.

Inheritance I

As we've done for the Programmer class, we've given each subclass an important class and instance property in addition to a unique method. Even though you can't see the Programmer class code, you can still access all of its properties and attributes thanks to inheritance!

Take note on how the structure of the child classes' __init__() methods: each subclass calls upon its parent's __init__() method to initialize the parent properties. Furthermore, any properties that aren't set by the superclass will be initialized by the subclass __init__().

hana = SoftwareEngineer("Hana", ["Python", "Go", "HTML", "Javascript"], 9, "Pycharm")

# Hana is an ardent software engineer
>>> hana.motto 
"Software is the best thing ever!"

# but she still remembers her Programmer roots
>>> hana.number_of_langs
4

Multiple inheritance

SoftwareEngineer and DataEngineer both inherit from just one class, but what if we'd like to inherit from multiple classes? Python makes multiple inheritance easy since it's a simple extension of regular inheritance. The below code will create a new SuperSoftwareEngineer class that will inherit from the SoftwareEngineer class and a new class to gain both of their skill sets.

class Superhero:
    """Your everyday superhero"""
    
    def __init__(self, power):
        self.superpower = power

class SuperSoftwareEngineer(SoftwareEngineer, Superhero):
    """Not your average software engineer"""
    def __init__(self, name, lang, years, ide, power):
        SoftwareEngineer.__init__(self, name, soft_lang, soft_years, ide)
        Superhero.__init__(self, power)
    
    def number_of_langs(self):
        if self.power == "Super fluency":
            print("I know all languages!")
        else:
            return len(self.langs)

Multiple Inheritance

Notice that the __init__() method of our SuperSoftwareEngineer class simply calls the __init__() methods of both its parents. However, you'll need to give a SuperSoftwareEngineer all of the elements required from both parents to correctly initialize this class.

We've rewritten an older method to account for the fact that SuperSoftwareEngineer needs a more complex implementation of number_of_langs. Overriding parental methods allows us to change the behavior of our child classes. You only need to rewrite the method in the child class using the same method name from the parent. For SuperSoftwareEngineer, Python will first look within the subclass for the most updated number_of_langs() implementation.

Instead of having to rewrite code for each new class, we've leveraged OOP concepts to build upon more basic functionality and add other features to our classes. We've learned a lot of OOP, so let's bring it all together in a small project!

Project: finding the best candies

For this project, we'll be working with FiveThirtyEight's "Candy Power Ranking" data set. You can find the original article here and the data set itself here.

You are a budding data analyst in a bit of a predicament. It is Halloween in a few days, and you have no idea which candies to purchase. You want to be known as the best house in the neighborhood for candy opportunities, and you don't want to be on the "trick" end of a trick-or-treat.

Thankfully, we have data on our side! The FiveThirtyEight data set contains data on different candies, their candy-related attributes (such as if the candy is a chocolate or is fruity) and their relative popularity. With this data, we'll be able to pick out candies by their characteristics and make data-driven decisions on what to buy.

We will be using the csv library to read in the data set. The data is in the form of a CSV, or comma-separated value file. As the name suggests, columns are separated by commas, and rows are separated by line breaks. We'll use a reader from the csv library to iterate over each line in the CSV file and bring into a variable we can operate on.


Let's have a look at the column names and the first data row to see what we're dealing with in terms of data.

# Column names
candies[0]

['competitorname',
  'chocolate',
  'fruity',
  'caramel',
  'peanutyalmondy',
  'nougat',
  'crispedricewafer',
  'hard',
  'bar',
  'pluribus',
  'sugarpercent',
  'pricepercent',
  'winpercent']
# First data row of the candies data set
candies[1]

['100 Grand',
  '1',
  '0',
  '1',
  '0',
  '0',
  '1',
  '0',
  '1',
  '0',
  '.73199999',
  '.86000001',
  '66.971725']]

Looks like most of the columns in this data set are Boolean values, taking up either a 1 for True or 0 for False. sugarpercent and pricepercent indicate percentiles, and winpercent is how often the candy was chosen over other competitors. We must remember that all the values are all strings, so we'll need to format them correctly.

We could continue with our analyses with our data organized in this way, this list of lists is difficult to use. We'd have to remember which columns represented what on top of remembering which values to use for calculations. Let's try again and give the data better structure and readability for our sanity.

We know that each row in the CSV represents a candy, so let's take advantage of this mental representation. We'll make a Candy class to store the raw data into appropriately named properties. By doing so, we can preserve the data in the original values we received them. Then, we can leverage these properties to create some helper methods to add more readability to our object.

class Candy:
    """Represents a candy, its taste attributes and popularity."""
    def __init__(self, name, choco, fruit, cara, pean, noug, crisp, hard, bar, plur, sug, price, win):
    self.name = name
    self.chocolate_val = choco
    self.fruit_val = fruit
    self.caramel_val = cara
    self.peanut_almond_val = pean
    self.nougat_val = noug
    self.crisped_rice_wafer_val = crisp
    self.hard_val = hard
    self.bar_val = bar
    self.pluribus_val = plur
    self.sugar_percentile = float(sug)
    self.price_percentile = float(price)
    self.win_percentage = float(win)
    # Helper method that will indicate True or False based on the property
    def is_chocolatey(self):
        if self.chocolate_val == "1":
            return True
        else:
            return False
    # Repeat this for all the attributes...
    def is_pluribus(self):
        if self.pluribus_val == "1":
            return True
        else:
            return False

All we've added are some simple True-False methods to our class to make it easier to inquire each Candy about its characteristics. Sure, we could have packed all of this logic into the __init__(), but our point here was to keep the original data values and enhance usability at the same time.

Now we'll reorganize our data using our Candy objects.


By listing the properties in the class in the order they appear in the data, we only have to remember the order of the candy attributes once. Iteration will do the rest of the work for us. We can start inquiring our Candy objects about what might be the best candy to buy based on different preferences. For example, what if we'd like to know what the best fruity candy was?


Indeed, Starburst is a good fruity candy to get. You can replace is_fruity() with any other of the attribute methods we created, and you'll be able to answer different questions based on your interests. You may even try adding in more conditions into the if block to get more nuanced answers to your candy-related questions.

Creating the Candy class was useful for looking at the candies on an individual basis, but what if we wanted to get a higher level view of each type of candy? How could we know if chocolates are overall more popular than fruity candies? Where there's a will, there's a class we can make to pave the way! Let's model a group of Candy objects and create some helpful calculation methods.

class CandyCollection:
    """Represents a category of candy and all candies that fall into it."""
    def __init__(self, candies):
        self.relevant_candies = candies
    
    def calculate_avg_sweetness_prct(self):
        summed_sugar_pct = sum([rc.sugar_percentile for rc in self.relevant_candies])
        return summed_sugar_pct / len(self.relevant_candies)
        
    def calculate_avg_price_prct(self):
        summed_price_pct = sum([rc.price_percentile for rc in self.relevant_candies])
        return summed_price_pct / len(self.relevant_candies)
    
    def calculate_avg_win_prct(self):
        summed_win_pct = sum([rc.win_percentage for rc in self.relevant_candies])
        return summed_win_pct / len(self.relevant_candies)
    
    def best_candy(self):
        """Returns the most popular candy in the collection."""
        current_best = 0
        current_best_candy = None
        for candy in self.relevant_candies:
            if candy.win_percentage > current_best:
                current_best_candy = candy
                current_best = candy.win_percentage
        return current_best_candy

A quick aside: many of the methods within CandyCollection use what's called a list comprehension. If you're not familiar with list comprehensions, we will offer a quick overview here. List comprehensions create new lists from a basis list, using some aspect of the basis list to create the items in the new list. We'll take a closer look at the calculate_avg_win_prct method as an example to explore this and compare it with a for loop that performs the same task.

# Original method using a list comprehension
def calculate_avg_win_prct(self):
    summed_win_pct = sum([rc.win_percentage for rc in self.relevant_candies])
    return summed_win_pct / len(self.relevant_candies)

# Same method as above, but uses a longer for loop
def calculate_avg_win_prct(self):

    win_percentages = []
    for rc in self.relevant_candies:
        win_percentages.append(rc.win_percentage)
    
    summed_win_pct = sum(win_percentages)
    return summed_win_pct / len(self.relevant_candies)    

The for loop in the second version of calculate_avg_win_prct goes through relevant_candies, which is a list of Candy objects. The rc variable you see in both versions is a placeholder variable that represents the current item in the iteration of the for loop. It takes the win_percentage and collects that information into win_percentages. Afterwards, the win_percentages is summed and averaged so that we know the average win percentage of the CandyCollection. Whereas the for loop version took three lines, the list comprehension performs this same task in one line. We use a list comprehension here to save space and give each of the methods in CandyCollection a similar format. Of course, we can use the for loops to replace each of the methods, but it results in bulkier code.

We designed our new CandyCollection class with some useful methods that abstract away our calculations into useful functions. We've taken advantage of the list comprehension to extract the exact property that we wanted from our list of Candy objects. We'll use them again to filter out for particular candy types. Let's see how chocolates compare to their fruity counterparts.


I don't think the results are surprising, but perhaps you may not like chocolate. Like the previous interactive code, you can switch the attribute testing method to test the candies you want to compare. Maybe you'd like to compare relative sugar percentiles or unit price percentiles?

One last step before we wrap up this investigation. Let's say we want to combine the best candies of multiple candy attributes into one ultimate sweet. Then, we could just combine all the best candies together and ensure that our house is the best candy house of them all.

This new SuperCandy sounds a lot like our CandyCollection class, so let's use inheritance! All we'll need to do is rewrite the best_candy method to better suit our needs.

class SuperCandy(CandyCollection):
    """The best candy of all time."""
    
    def __init__(self, candies):
        CandyCollection.__init__(self, candies)
    
    def best_candy(self):
        print("I am the ultimate candy composed of:")
        for candy in self.relevant_candies:
            print(" " * 4 + "-" + candy.name)

From your own experience, you enjoy chocolates, fruity candy, caramels, pluribus candies, nougats and hard candies. We can't go wrong if we combine them all!


It sounds a bit wonky, but I guess the data doesn't lie. If you think you can come up with a better combination, try subbing in different candy combinations and see what your ideal SuperCandy is.

Summary

Object-oriented programming in Python is a way of organzing your code in bundles of code called classes. Classes are blueprints that combine data with functionality, and objects are instances of the classes. Within these objects, properties are attributes that describe the object, and methods are functions that describe object behavior. Objects can also inherit properties and methods from other objects, in addition to having their own unique behavior. Finally, Python allows us to customize class creation even more with metaclasses.