/ numbers

Numbers and Strings in Python

Much of our interaction with the world comes via numerical or textual information. Imagine a trip to the grocery store—you use signs, labels, prices, weights, and more to make your purchasing decisions.

Computers are similar—they excel at processing numerical and text data. Numbers and strings will form the base of your Python programming skills.

Python's popularity stems, at least in part, from its ability to make working with numbers and strings easy and efficient. In this tutorial, we will practice doing math and manipulating text using samples from an online retailer's order history data set.

shop-2

Math in Python

Python will quickly become your favorite calculator for any mathematical operation. We will only cover the basics in this tutorial, but Python can perform extremely high-level mathematical tasks if you wish and we encourage you to explore further once you have the basics down.

Let's explore using Python for basic math operations.

Basic operations

Writing code for math in Python is very similar to writing math with pen and paper. We only need to know the proper symbols to communicate each operation to Python and most are the same symbols a calculator would use. The table below displays the symbols Python uses for foundational operations.

Symbol Operation
+ Addition
- Subtraction and Negative Numbers
* Multiplication
/ Division
** Exponents

More information about all of the operators can be found in the documentation here. Run the code below to see these operations in action.


You may have noticed that most of our operations printed 27 and one printed 27.0. If you noticed, good for you, this is an important concept for numbers in Python. We will learn more about this later; first let's discuss order of operations and see if any other outputs surprise us.

Order of operations in Python

Python follows the order of operations when doing all of its calculations. Any time that our calculation includes multiple operations, we need to be sure that they will be performed in our desired order. Fortunately, we can use parenthesis in our Python code just like we would on paper.

x = 3 + 4 * 5   # 23
x = (3 + 4) * 5 # 35

One quick note: you will notice that I regularly place a whitespace character between my numbers and operators. Python does not require this space be added, but it helps the readability of the code. Since we read much more code than we write, do yourself and others that read your code a favor and follow this convention!

x=1+2*3/(4-5)           # ugly
x = 1 + 2 * 3 / (4 - 5) # beautiful

Now let's write our own code! Take a look at the sample order from our online retail order data set below, then follow the steps to complete the code in the code editor.
inventory_short-1

  1. Calculate the total number of items sold in the total_sold variable
  2. Calculate the average item price in the average_price variable
  3. Calculate the income generated from the sale of items with "COAT RACK" in their name in the coat_rack_income variable
  4. Run the code and repeat if necessary

Again, you may have been surprised by the output. The result of coat_rack_income should be exactly 29.70, but we get a slightly inaccurate output. These differing outputs are the result of how numbers are stored in Python, our next topic.

Number types

Computers process information using binary code. The Python language allows us to write code using our familiar base-10 number system, but ultimately converts everything to a binary representation under the hood. How Python does this is outside the scope of this tutorial, but we do need to understand Python's number types.

Distinguishing between number types can provide useful information. Looking at our shop order data, we see that we use whole numbers to represent quantity and decimal numbers to represent price. We can't sell part of an item, so we want to ensure the Quantity column only contains whole numbers. In contrast, we definitely want our Price column to use decimal values. Let's see how Python defines these different number types.

Integers and floats (and complex)

An integer in Python refers to any whole number value and will not include a decimal point. A floating point number or float is Python's name for numbers with a decimal. Certain decimal values cannot be represented exactly using binary, so small inaccurracies can occur as we saw in the code above. A third type, complex, exists for complex numbers, but our math in this tutorial will be limited to integers and floats.

For most operations, if all of the input numbers are integers the output will be an integer. However, we have already seen there are exceptions, like the division operator. Similarly, if your math includes one or more floating point numbers, the output will be a float. Since we don't always know the output number's type, we can use Python to check or change it.

Checking and changing numerical types

We can use Python's built-in type() function to check object types. Run the code below to check the type of the variables x and y.


Now we can see that Python uses int to represent the integer type and float to represent the floating point type. Python also includes the built-in functions int() and float() which convert a number to an integer and float, respectively. Running the code below will convert a float into an integer.


Notice that the int() function does not round, it only returns the integer value to the left of the decimal point.

Algebra: using variables

Performing basic operations is useful, but the real power in Python comes from variables. For now, we will develop a base understanding of how to store and modify numbers as variables and use them in our computations. As you learn more about lists, loops, functions and more in Python, you'll be able to take full advantage of variables.

One great aspect of algebraic functions in Python is the freedom to use descriptive variables. Python encourages readability in its code and we do not have to use a single letter to represent a variable. Both of the following examples accomplish the same task with Python, but the second provides much more description.

i = 1000
c = 500
p = i - c

income = 1000
costs = 500
profit = income - costs

Python also makes it easy to change the value of variables. Python starts by performing any computations to the right of the =, then assigns that value to the variable on the left. If that variable has already been defined, Python will overwrite that value and assign the new value to the variable. Additionally, we can make adjustments to the current value of a variable and reassign the variable to the adjusted value. Run the code below to see how our income variable changes.


We have two variables in our order history below: quantity and price. Let's use this data to practice creating variables and changing their values.
inventory_short-1

  1. Assign the price and quantity variables to their respective values for the RECIPE BOX WITH METAL HEART
  2. Reassign the price and quantity variables to their respective values for the JAM MAKING SET WITH JARS
  3. Reassign the price and quantity variables to the respective values for the YELLOW COAT RACK PARIS FASHION

Additional Math Tools

Here are a few more built-in Python tools that you may find useful when doing math with Python.

Symbol or Function Operation
// Floor division operator
% Modulo operator
abs() Absolute Value Function
round() Rounding Function

These functions can be found in the documentation here. If you want to extend Python's math abilities even further, take a look at Python's math library here, and numpy, scipy, and matplotlib libraries here.

Strings in python

Python stores text information as a data type called a string. To create a string, we enclose the text within single or double quotes. Anything within the quotes is considered part of the string, even numbers! Strings are also case-sensitive, so "HELLO", "Hello", and "hello" are all unique strings.

String type

Similar to the integer data type, Python shortens the name of the string data type to str. Python also includes a built-in function str() which converts data into the string data type.

x = str(100) # '100'

String lengths

Knowing the length of a string often helps us interact with string data. To find the length of a string, we use Python's built-in len() function. This function will return the number of characters in the string.

len("Hello World!") # 12

String operations

Two mathematical operators provide functionality for strings as well. We can add two or more strings together to make one string using a +. We call the process of adding strings together concatenation. Similarly, using the multiplication operator *, we can repeat our string a given number of times.

print("Luke, I am" + " " + "your father.") # "Luke, I am your father."

letter_o = "o"
print('N' + letter_o * 20 + '!')           # 'Noooooooooooooooooooo!'

Finding patterns in data let's us take advantage of the computer's ability to perform repetitive tasks very quickly. In our data below, one pattern is that two of the items start with the words "BOX OF". We might actually have hundreds or even thousands of products that come in boxes in our store. We could avoid typing "BOX OF" repeatedly by using concatenation.
inventory_top

  1. Assign the object type of item2 to the variable item2_type
  2. Create a string variable named item3 which concatenates box_prefix and the string "VINTAGE JIGSAW BLOCKS"
  3. Assign the length of item3 to the variable item3_length

With our current skills, this seems less efficient than just typing "BOX OF". However, you'll soon add loops and lists to your skillset and quickly create thousands of items using the box_prefix variable only once!

String methods

Python includes many useful methods to make alterations to a string. Calling methods differs from calling the built-in functions that we have been using. The examples below demonstrate how to call the methods, if you would like more detail there is a detailed answer on StackOverflow here.

Using the upper() and lower() methods, we can change the case of our string. Since Python is case-sensitive, these methods are often used to ensure consistency across our data.

# calling method on string directly
"Hello World!".upper() # "HELLO WORLD!"
# calling method on a string variable
name = "Peter Piper"
name.lower() # "peter piper"

To use the count() method, we pass a substring as an argument and the method returns the number of times that substring is in the string.

sentence = 'Peter Piper picked a peck of pickled peppers"
sentence.count("P") # 2
sentence.count("p") # 7
sentence.count("pick") # 2

Just like case-sensitivity, additional space characters can create problems when working with string data. The strings "Hello" and "Hello " are not the same. The strip() method will remove any extra whitespace at the beginning or end of a string. Additionally, the split() method will create a list of each word separated by a space. These are the default methods, both also take an argument to strip or split using other string characters or substrings.

extra_space = "  A peck of pickled peppers Peter Piper picked   "
extra_space.strip()  # "A peck of pickled peppers Peter Piper picked"

list_line = "If Peter Piper picked a peck of pickled peppers,'
list_line.split() # ['If', 'Peter', 'Piper', 'picked', 'a', 
                     'peck', 'of', 'pickled', 'peppers,']
list_line.split("a") # ['If Peter Piper picked', 'peck of pickled peppers']

Python has many additional string methods, see the documentation here for a complete list.

String indexing

Many data types in Python make use of indexing because indexing allows us to access a specific part of our data instead of the entire thing. For strings, this means we can access each character individually using its index value. We use indexing in Python by passing the desired index value in square brackets [ ] after the string or string variable.

Python is zero-indexed, meaning it assigns the first character an index value of 0. Index values increase by one for each additional character, so the index value of the second character is 1, third character is 2, fourth character is 3, and so on. Indexing can also be done from the end of the string using negative values. -1 corresponds with the last character, -2 the second to last character, and so on.

name = "Bond...James Bond."
first_character = name[0]  # 'B'
third_character = name[2]  # 'n'
last_character = name[-1]  # "."
third_from_last = name[-3] # "n"

String slicing

Python does not stop with single characters, we can also choose a selection of characters, or slice, from the string. Slicing also uses square brackets, but now we pass up to three different values separated by a :.

The first two values are index values that tell Python where to start and stop. The start value is inclusive, meaning the character at the start index will be included in the slice. However, the stop value is exclusive, the slice will include all the characters up to the stop index, but not the character at the stop index.

slice-diagram

The start or stop values can also be omitted and Python will default to index 0 and the length of the string, respectively.

martini = 'Shaken, not stirred.'
first_word = martini[0:6]    # 'Shaken'
second_word = martini[8:11]  # 'not'
third_word = martini[12:]    # 'stirred.'
omit_both = martini[:] # 'Shaken, not stirred.'

An optional third value can be included as well, the step value. By default, slicing begins at the start value and adds, or steps, by one until it reaches the stop value. Passing a different step value will add that amount, causing it to step over (not include) some of the indexes between start and stop. A negative value can also be passed to have Python step backward through the index values.

armstrong = "That's one small step for a man, one giant leap for mankind."
every_other = armstrong[0:60:2]  # "Ta' n ml tpframn n in epfrmnid"
mankind_backward = armstrong[58:51:-1] # "dniknam"
# reverse the entire string
reverse = armstrong[::-1] # ".dniknam rof pael tnaig eno ,nam a rof pets llams eno s'tahT"

Let's practice string methods, indexing, and slicing on our online order data set before putting all of our new skills together in a final project. This time, three new values for item1, item2 and item3 have already been uploaded for you. We will need all of our new string knowledge to get the information we want.

  1. item1 has an extra whitespace character at the end. Use the strip() method to remove it before we continue.
  2. item1 and item2 include a specific number of objects. Use indexing to assign the number values to item1_objects and item2_objects
  3. item3 has one lowercase letter. To stay consistent, apply a method to convert all the letters to uppercase and reassign that string to item3
  4. Use slicing to select the "500G" substring from item3 and assign it to the variable item3_weight

Additional string tools

  • Python also supports multiline strings using three single or double enclosing quotes """, '''.
  • Python documentation has an excellent tutorial on strings. It can be found here. For more advanced string handling, checkout regular expressions with Python's re package.

Project: analyzing orders based on word patterns

This tutorial covered numbers and strings, two of the basic building blocks of Python programming. This project will provide less structure than the previous examples and is designed to be more challenging. Thus, you may need to refer back to this content, use many print statements to check your output, or even use Google and/or Stackoverflow to find the solutions. Python can solve the same problem in many ways; you'll learn more and learn faster by trying to find creative ways to solve each problem.

The prompt below provides an example of a task where we might need to use the all of the skills we covered in this tutorial:

An online retail company wants to email customers a discount offer, after they make a purchase, to entice them shop at their store again. Additionally, they want to tailor the offer to each customer. Based on their previous orders, the company wants to know which type of product they ordered so they can discount a similar product and they also want to change the discount based on the customer's volume of purchases. Specifically, if the customer's average order quantity is at least 40 they will offer a 20% discount, otherwise they will offer a 10% discount. The company has provided their dataset, which includes over 500,000 rows online order data.

Completing this task in full with only our knowledge of numbers and strings would be a nearly impossible task given the amount of data. However, we can still use the lessons from this tutorial to begin solving many aspects of this task. And fortunately, you will soon learn many more Python fundamentals, such as loops, if/else statements, lists, dictionaries, and functions.

We'll begin by looking at one more data sample. Previously we were only using three columns of information, but the actual data set includes eight columns. Below, you will find data from the first order in the data set. We can use this data to begin analyzing which type of products the customer purchased and the customer's purchasing volume.

first_order

MODIFYING DATA

  1. We want to format our data to ensure it is consistent. The Description of the fifth item in this order looks inconsistent. Fix the item5_desc variable.
  2. Next, our CustomerID variable is a float type. The decimal is unnecessary and we want the number to be stored as a string. Change the customerID format.
  3. The InvoiceNo is actually stored as a string type, but an integer would be better. Change the type of the invoiceNo variable.

NUMERICAL ANALYSIS

  1. We need to know if this customer made over 40 purchases to determine their discount. Calculate this value in the order_purchases variable.
  2. It might also be helpful to know the average unit price of this customer's order. Calculate this value, then round it to two decimal places and assign that value to the avg_unit_price variable

STRING PATTERN ANALYSIS

Identifying the Product

  1. The last two words of the Description usually provide a good summary of the product. For the first three items, assign these two-word summaries to item1_summary, item2_summary and item3_summary.
  2. Some StockCodes end in a letter while others do not. Knowing these letters is helpful to the retailer. Try using two different approaches to find the letter in the StockCodes for item4 and item5. Assign these to the variables item4_letter and item5_letter

Identifying Product Characteristics

  1. Customers may also prefer a certain color of item. This could be helpful in making a product recommendation. Concatenate all the item description variables together and assign it to the variable order_descriptions. Then update red_count, white_count and blue_count to the number of times those respective colors appear in order_descriptions
  2. Creating lists of words will be very helpful when we add lists to our Python skills. It will allow us to identify common words. For now, just create a list of all the words in the order_descriptions variable and assign it to the variable order_words_list

A look ahead

Congratulations! You now have a solid base with numbers and strings to build upon your Python skills. As a preview of some concepts that you will be learning next, the script below uses your order_purchases and order_words_list variables. It includes some explanatory comments but don't worry if you don't understand the concepts yet. Keep working hard, be persistent, get involved in the community and your Python skills will continue to grow!