Tutorial:PracticalPython/3 Program organization

From HandWiki


Program Organization

So far, we’ve learned some Python basics and have written some short scripts. However, as you start to write larger programs, you’ll want to get organized. This section dives into greater details on writing functions, handling errors, and introduces modules. By the end you should be able to write programs that are subdivided into functions across multiple files. We’ll also give some useful code templates for writing more useful scripts.


Scripting

In this part we look more closely at the practice of writing Python scripts.

What is a Script?

A script is a program that runs a series of statements and stops.

# program.py

statement1
statement2
statement3
...

We have mostly been writing scripts to this point.

A Problem

If you write a useful script, it will grow in features and functionality. You may want to apply it to other related problems. Over time, it might become a critical application. And if you don’t take care, it might turn into a huge tangled mess. So, let’s get organized.

Defining Things

Names must always be defined before they get used later.

def square(x):
    return x*x

a = 42
b = a + 2     # Requires that `a` is defined

z = square(b) # Requires `square` and `b` to be defined

The order is important. You almost always put the definitions of variables and functions near the top.

Defining Functions

It is a good idea to put all of the code related to a single task all in one place. Use a function.

def read_prices(filename):
    prices = {}
    with open(filename) as f:
        f_csv = csv.reader(f)
        for row in f_csv:
            prices[row[0]] = float(row[1])
    return prices

A function also simplifies repeated operations.

oldprices = read_prices('oldprices.csv')
newprices = read_prices('newprices.csv')

What is a Function?

A function is a named sequence of statements.

def funcname(args):
  statement
  statement
  ...
  return result

Any Python statement can be used inside.

def foo():
    import math
    print(math.sqrt(2))
    help(math)

There are no special statements in Python (which makes it easy to remember).

Function Definition

Functions can be defined in any order.

def foo(x):
    bar(x)

def bar(x):
    statements

# OR
def bar(x)
    statements

def foo(x):
    bar(x)

Functions must only be defined prior to actually being used (or called) during program execution.

foo(3)        # foo must be defined already

Stylistically, it is probably more common to see functions defined in a bottom-up fashion.

Bottom-up Style

Functions are treated as building blocks. The smaller/simpler blocks go first.

# myprogram.py
def foo(x):
    ...

def bar(x):
    ...
    foo(x)          # Defined above
    ...

def spam(x):
    ...
    bar(x)          # Defined above
    ...

spam(42)            # Code that uses the functions appears at the end

Later functions build upon earlier functions. Again, this is only a point of style. The only thing that matters in the above program is that the call to spam(42) go last.

Function Design

Ideally, functions should be a black box. They should only operate on passed inputs and avoid global variables and mysterious side-effects. Your main goals: Modularity and Predictability.

Doc Strings

It’s good practice to include documentation in the form of a doc-string. Doc-strings are strings written immediately after the name of the function. They feed help(), IDEs and other tools.

def read_prices(filename):
    '''
    Read prices from a CSV file of name,price data
    '''
    prices = {}
    with open(filename) as f:
        f_csv = csv.reader(f)
        for row in f_csv:
            prices[row[0]] = float(row[1])
    return prices

A good practice for doc strings is to write a short one sentence summary of what the function does. If more information is needed, include a short example of usage along with a more detailed description of the arguments.

Type Annotations

You can also add optional type hints to function definitions.

def read_prices(filename: str) -> dict:
    '''
    Read prices from a CSV file of name,price data
    '''
    prices = {}
    with open(filename) as f:
        f_csv = csv.reader(f)
        for row in f_csv:
            prices[row[0]] = float(row[1])
    return prices

The hints do nothing operationally. They are purely informational. However, they may be used by IDEs, code checkers, and other tools to do more.

Exercises

In section 2, you wrote a program called report.py that printed out a report showing the performance of a stock portfolio. This program consisted of some functions. For example:

# report.py
import csv

def read_portfolio(filename):
    '''
    Read a stock portfolio file into a list of dictionaries with keys
    name, shares, and price.
    '''
    portfolio = []
    with open(filename) as f:
        rows = csv.reader(f)
        headers = next(rows)

        for row in rows:
            record = dict(zip(headers, row))
            stock = {
                'name' : record['name'],
                'shares' : int(record['shares']),
                'price' : float(record['price'])
            }
            portfolio.append(stock)
    return portfolio
...

However, there were also portions of the program that just performed a series of scripted calculations. This code appeared near the end of the program. For example:

...

# Output the report

headers = ('Name', 'Shares', 'Price', 'Change')
print('%10s %10s %10s %10s'  % headers)
print(('-' * 10 + ' ') * len(headers))
for row in report:
    print('%10s %10d %10.2f %10.2f' % row)
...

In this exercise, we’re going take this program and organize it a little more strongly around the use of functions.

Exercise 3.1: Structuring a program as a collection of functions

Modify your report.py program so that all major operations, including calculations and output, are carried out by a collection of functions. Specifically:

  • Create a function print_report(report) that prints out the report.
  • Change the last part of the program so that it is nothing more than a series of function calls and no other computation.

Exercise 3.2: Creating a top-level function for program execution

Take the last part of your program and package it into a single function portfolio_report(portfolio_filename, prices_filename). Have the function work so that the following function call creates the report as before:

portfolio_report('Data/portfolio.csv', 'Data/prices.csv')

In this final version, your program will be nothing more than a series of function definitions followed by a single function call to portfolio_report() at the very end (which executes all of the steps involved in the program).

By turning your program into a single function, it becomes easy to run it on different inputs. For example, try these statements interactively after running your program:

>>> portfolio_report('Data/portfolio2.csv', 'Data/prices.csv')
... look at the output ...
>>> files = ['Data/portfolio.csv', 'Data/portfolio2.csv']
>>> for name in files:
        print(f'{name:-^43s}')
        portfolio_report(name, 'prices.csv')
        print()

... look at the output ...
>>>

Commentary

Python makes it very easy to write relatively unstructured scripting code where you just have a file with a sequence of statements in it. In the big picture, it’s almost always better to utilize functions whenever you can. At some point, that script is going to grow and you’ll wish you had a bit more organization. Also, a little known fact is that Python runs a bit faster if you use functions.

More on Functions

Although functions were introduced earlier, very few details were provided on how they actually work at a deeper level. This section aims to fill in some gaps and discuss matters such as calling conventions, scoping rules, and more.

Calling a Function

Consider this function:

def read_prices(filename, debug):
    ...

You can call the function with positional arguments:

prices = read_prices('prices.csv', True)

Or you can call the function with keyword arguments:

prices = read_prices(filename='prices.csv', debug=True)

Default Arguments

Sometimes you want an argument to be optional. If so, assign a default value in the function definition.

def read_prices(filename, debug=False):
    ...

If a default value is assigned, the argument is optional in function calls.

d = read_prices('prices.csv')
e = read_prices('prices.dat', True)

Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).

Prefer keyword arguments for optional arguments

Compare and contrast these two different calling styles:

parse_data(data, False, True) # ?????

parse_data(data, ignore_errors=True)
parse_data(data, debug=True)
parse_data(data, debug=True, ignore_errors=True)

In most cases, keyword arguments improve code clarity–especially for arguments that serve as flags or which are related to optional features.

Design Best Practices

Always give short, but meaningful names to functions arguments.

Someone using a function may want to use the keyword calling style.

d = read_prices('prices.csv', debug=True)

Python development tools will show the names in help features and documentation.

Returning Values

The return statement returns a value

def square(x):
    return x * x

If no return value is given or return is missing, None is returned.

def bar(x):
    statements
    return

a = bar(4)      # a = None

# OR
def foo(x):
    statements  # No `return`

b = foo(4)      # b = None

Multiple Return Values

Functions can only return one value. However, a function may return multiple values by returning them in a tuple.

def divide(a,b):
    q = a // b      # Quotient
    r = a % b       # Remainder
    return q, r     # Return a tuple

Usage example:

x, y = divide(37,5) # x = 7, y = 2

x = divide(37, 5)   # x = (7, 2)

Variable Scope

Programs assign values to variables.

x = value # Global variable

def foo():
    y = value # Local variable

Variables assignments occur outside and inside function definitions. Variables defined outside are “global”. Variables inside a function are “local”.

Local Variables

Variables assigned inside functions are private.

def read_portfolio(filename):
    portfolio = []
    for line in open(filename):
        fields = line.split(',')
        s = (fields[0], int(fields[1]), float(fields[2]))
        portfolio.append(s)
    return portfolio

In this example, filename, portfolio, line, fields and s are local variables. Those variables are not retained or accessible after the function call.

>>> stocks = read_portfolio('portfolio.csv')
>>> fields
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'fields' is not defined
>>>

Locals also can’t conflict with variables found elsewhere.

Global Variables

Functions can freely access the values of globals defined in the same file.

name = 'Dave'

def greeting():
    print('Hello', name)  # Using `name` global variable

However, functions can’t modify globals:

name = 'Dave'

def spam():
  name = 'Guido'

spam()
print(name) # prints 'Dave'

Remember: All assignments in functions are local.

Modifying Globals

If you must modify a global variable you must declare it as such.

name = 'Dave'

def spam():
    global name
    name = 'Guido' # Changes the global name above

The global declaration must appear before its use and the corresponding variable must exist in the same file as the function. Having seen this, know that it is considered poor form. In fact, try to avoid global entirely if you can. If you need a function to modify some kind of state outside of the function, it’s better to use a class instead (more on this later).

Argument Passing

When you call a function, the argument variables are names that refer to the passed values. These values are NOT copies (see [[../02_Working_with_data/07_Objects|section 2.7]]). If mutable data types are passed (e.g. lists, dicts), they can be modified in-place.

def foo(items):
    items.append(42)    # Modifies the input object

a = [1, 2, 3]
foo(a)
print(a)                # [1, 2, 3, 42]

Key point: Functions don’t receive a copy of the input arguments.

Reassignment vs Modifying

Make sure you understand the subtle difference between modifying a value and reassigning a variable name.

def foo(items):
    items.append(42)    # Modifies the input object

a = [1, 2, 3]
foo(a)
print(a)                # [1, 2, 3, 42]

# VS
def bar(items):
    items = [4,5,6]    # Changes local `items` variable to point to a different object

b = [1, 2, 3]
bar(b)
print(b)                # [1, 2, 3]

Reminder: Variable assignment never overwrites memory. The name is merely bound to a new value.

Exercises

This set of exercises have you implement what is, perhaps, the most powerful and difficult part of the course. There are a lot of steps and many concepts from past exercises are put together all at once. The final solution is only about 25 lines of code, but take your time and make sure you understand each part.

A central part of your report.py program focuses on the reading of CSV files. For example, the function read_portfolio() reads a file containing rows of portfolio data and the function read_prices() reads a file containing rows of price data. In both of those functions, there are a lot of low-level “fiddly” bits and similar features. For example, they both open a file and wrap it with the csv module and they both convert various fields into new types.

If you were doing a lot of file parsing for real, you’d probably want to clean some of this up and make it more general purpose. That’s our goal.

Start this exercise by creating a new file called Work/fileparse.py. This is where we will be doing our work.

Exercise 3.3: Reading CSV Files

To start, let’s just focus on the problem of reading a CSV file into a list of dictionaries. In the file fileparse.py, define a function that looks like this:

# fileparse.py
import csv

def parse_csv(filename):
    '''
    Parse a CSV file into a list of records
    '''
    with open(filename) as f:
        rows = csv.reader(f)

        # Read the file headers
        headers = next(rows)
        records = []
        for row in rows:
            if not row:    # Skip rows with no data
                continue
            record = dict(zip(headers, row))
            records.append(record)

    return records

This function reads a CSV file into a list of dictionaries while hiding the details of opening the file, wrapping it with the csv module, ignoring blank lines, and so forth.

Try it out:

Hint: python3 -i fileparse.py.

>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

This is good except that you can’t do any kind of useful calculation with the data because everything is represented as a string. We’ll fix this shortly, but let’s keep building on it.

Exercise 3.4: Building a Column Selector

In many cases, you’re only interested in selected columns from a CSV file, not all of the data. Modify the parse_csv() function so that it optionally allows user-specified columns to be picked out as follows:

>>> # Read all of the data
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]

>>> # Read only some of the data
>>> shares_held = parse_csv('portfolio.csv', select=['name','shares'])
>>> shares_held
[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]
>>>

An example of a column selector was given in [[../02_Working_with_data/06_List_comprehension|Exercise 2.23]]. However, here’s one way to do it:

# fileparse.py
import csv

def parse_csv(filename, select=None):
    '''
    Parse a CSV file into a list of records
    '''
    with open(filename) as f:
        rows = csv.reader(f)

        # Read the file headers
        headers = next(rows)

        # If a column selector was given, find indices of the specified columns.
        # Also narrow the set of headers used for resulting dictionaries
        if select:
            indices = [headers.index(colname) for colname in select]
            headers = select
        else:
            indices = []

        records = []
        for row in rows:
            if not row:    # Skip rows with no data
                continue
            # Filter the row if specific columns were selected
            if indices:
                row = [ row[index] for index in indices ]

            # Make a dictionary
            record = dict(zip(headers, row))
            records.append(record)

    return records

There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices. For example, suppose the input file had the following headers:

>>> headers = ['name', 'date', 'time', 'shares', 'price']
>>>

Now, suppose the selected columns were as follows:

>>> select = ['name', 'shares']
>>>

To perform the proper selection, you have to map the selected column names to column indices in the file. That’s what this step is doing:

>>> indices = [headers.index(colname) for colname in select ]
>>> indices
[0, 3]
>>>

In other words, “name” is column 0 and “shares” is column 3. When you read a row of data from the file, the indices are used to filter it:

>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]
>>> row = [ row[index] for index in indices ]
>>> row
['AA', '100']
>>>

Exercise 3.5: Performing Type Conversion

Modify the parse_csv() function so that it optionally allows type-conversions to be applied to the returned data. For example:

>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]

>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])
>>> shares_held
[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]
>>>

You already explored this in [[../02_Working_with_data/07_Objects|Exercise 2.24]]. You’ll need to insert the following fragment of code into your solution:

...
if types:
    row = [func(val) for func, val in zip(types, row) ]
...

Exercise 3.6: Working without Headers

Some CSV files don’t include any header information. For example, the file prices.csv looks like this:

"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
...

Modify the parse_csv() function so that it can work with such files by creating a list of tuples instead. For example:

>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)
>>> prices
[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]
>>>

To make this change, you’ll need to modify the code so that the first line of data isn’t interpreted as a header line. Also, you’ll need to make sure you don’t create dictionaries as there are no longer any column names to use for keys.

Exercise 3.7: Picking a different column delimitier

Although CSV files are pretty common, it’s also possible that you could encounter a file that uses a different column separator such as a tab or space. For example, the file Data/portfolio.dat looks like this:

name shares price
"AA" 100 32.20
"IBM" 50 91.10
"CAT" 150 83.44
"MSFT" 200 51.23
"GE" 95 40.37
"MSFT" 50 65.10
"IBM" 100 70.44

The csv.reader() function allows a different column delimiter to be given as follows:

rows = csv.reader(f, delimiter=' ')

Modify your parse_csv() function so that it also allows the delimiter to be changed.

For example:

>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

Commentary

If you’ve made it this far, you’ve created a nice library function that’s genuinely useful. You can use it to parse arbitrary CSV files, select out columns of interest, perform type conversions, without having to worry too much about the inner workings of files or the csv module.

Error Checking

Although exceptions were introduced earlier, this section fills in some additional details about error checking and exception handling.

How programs fail

Python performs no checking or validation of function argument types or values. A function will work on any data that is compatible with the statements in the function.

def add(x, y):
    return x + y

add(3, 4)               # 7
add('Hello', 'World')   # 'HelloWorld'
add('3', '4')           # '34'

If there are errors in a function, they appear at run time (as an exception).

def add(x, y):
    return x + y

>>> add(3, '4')
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +:
'int' and 'str'
>>>

To verify code, there is a strong emphasis on testing (covered later).

Exceptions

Exceptions are used to signal errors. To raise an exception yourself, use raise statement.

if name not in authorized:
    raise RuntimeError(f'{name} not authorized')

To catch an exception use try-except.

try:
    authenticate(username)
except RuntimeError as e:
    print(e)

Exception Handling

Exceptions propagate to the first matching except.

def grok():
    ...
    raise RuntimeError('Whoa!')   # Exception raised here

def spam():
    grok()                        # Call that will raise exception

def bar():
    try:
       spam()
    except RuntimeError as e:     # Exception caught here
        ...

def foo():
    try:
         bar()
    except RuntimeError as e:     # Exception does NOT arrive here
        ...

foo()

To handle the exception, put statements in the except block. You can add any statements you want to handle the error.

def grok(): ...
    raise RuntimeError('Whoa!')

def bar():
    try:
      grok()
    except RuntimeError as e:   # Exception caught here
        statements              # Use this statements
        statements
        ...

bar()

After handling, execution resumes with the first statement after the try-except.

def grok(): ...
    raise RuntimeError('Whoa!')

def bar():
    try:
      grok()
    except RuntimeError as e:   # Exception caught here
        statements
        statements
        ...
    statements                  # Resumes execution here
    statements                  # And continues here
    ...

bar()

Built-in Exceptions

There are about two-dozen built-in exceptions. Usually the name of the exception is indicative of what’s wrong (e.g., a ValueError is raised because you supplied a bad value). This is not an exhaustive list. Check the documentation for more.

ArithmeticError
AssertionError
EnvironmentError
EOFError
ImportError
IndexError
KeyboardInterrupt
KeyError
MemoryError
NameError
ReferenceError
RuntimeError
SyntaxError
SystemError
TypeError
ValueError

Exception Values

Exceptions have an associated value. It contains more specific information about what’s wrong.

raise RuntimeError('Invalid user name')

This value is part of the exception instance that’s placed in the variable supplied to except.

try:
    ...
except RuntimeError as e:   # `e` holds the exception raised
    ...

e is an instance of the exception type. However, it often looks like a string when printed.

except RuntimeError as e:
    print('Failed : Reason', e)

Catching Multiple Errors

You can catch different kinds of exceptions using multiple except blocks.

try:
  ...
except LookupError as e:
  ...
except RuntimeError as e:
  ...
except IOError as e:
  ...
except KeyboardInterrupt as e:
  ...

Alternatively, if the statements to handle them is the same, you can group them:

try:
  ...
except (IOError,LookupError,RuntimeError) as e:
  ...

Catching All Errors

To catch any exception, use Exception like this:

try:
    ...
except Exception:       # DANGER. See below
    print('An error occurred')

In general, writing code like that is a bad idea because you’ll have no idea why it failed.

Wrong Way to Catch Errors

Here is the wrong way to use exceptions.

try:
    go_do_something()
except Exception:
    print('Computer says no')

This catches all possible errors and it may make it impossible to debug when the code is failing for some reason you didn’t expect at all (e.g. uninstalled Python module, etc.).

Somewhat Better Approach

If you’re going to catch all errors, this is a more sane approach.

try:
    go_do_something()
except Exception as e:
    print('Computer says no. Reason :', e)

It reports a specific reason for failure. It is almost always a good idea to have some mechanism for viewing/reporting errors when you write code that catches all possible exceptions.

In general though, it’s better to catch the error as narrowly as is reasonable. Only catch the errors you can actually handle. Let other errors pass by–maybe some other code can handle them.

Reraising an Exception

Use raise to propagate a caught error.

try:
    go_do_something()
except Exception as e:
    print('Computer says no. Reason :', e)
    raise

This allows you to take action (e.g. logging) and pass the error on to the caller.

Exception Best Practices

Don’t catch exceptions. Fail fast and loud. If it’s important, someone else will take care of the problem. Only catch an exception if you are that someone. That is, only catch errors where you can recover and sanely keep going.

finally statement

It specifies code that must run regardless of whether or not an exception occurs.

lock = Lock()
...
lock.acquire()
try:
    ...
finally:
    lock.release()  # this will ALWAYS be executed. With and without exception.

Commonly used to safely manage resources (especially locks, files, etc.).

with statement

In modern code, try-finally is often replaced with the with statement.

lock = Lock()
with lock:
    # lock acquired
    ...
# lock released

A more familiar example:

with open(filename) as f:
    # Use the file
    ...
# File closed

with defines a usage context for a resource. When execution leaves that context, resources are released. with only works with certain objects that have been specifically programmed to support it.

Exercises

Exercise 3.8: Raising exceptions

The parse_csv() function you wrote in the last section allows user-specified columns to be selected, but that only works if the input data file has column headers.

Modify the code so that an exception gets raised if both the select and has_headers=False arguments are passed. For example:

>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fileparse.py", line 9, in parse_csv
    raise RuntimeError("select argument requires column headers")
RuntimeError: select argument requires column headers
>>>

Having added this one check, you might ask if you should be performing other kinds of sanity checks in the function. For example, should you check that the filename is a string, that types is a list, or anything of that nature?

As a general rule, it’s usually best to skip such tests and to just let the program fail on bad inputs. The traceback message will point at the source of the problem and can assist in debugging.

The main reason for adding the above check to avoid running the code in a non-sensical mode (e.g., using a feature that requires column headers, but simultaneously specifying that there are no headers).

This indicates a programming error on the part of the calling code. Checking for cases that “aren’t supposed to happen” is often a good idea.

Exercise 3.9: Catching exceptions

The parse_csv() function you wrote is used to process the entire contents of a file. However, in the real-world, it’s possible that input files might have corrupted, missing, or dirty data. Try this experiment:

>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fileparse.py", line 36, in parse_csv
    row = [func(val) for func, val in zip(types, row)]
ValueError: invalid literal for int() with base 10: ''
>>>

Modify the parse_csv() function to catch all ValueError exceptions generated during record creation and print a warning message for rows that can’t be converted.

The message should include the row number and information about the reason why it failed. To test your function, try reading the file Data/missing.csv above. For example:

>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Row 4: Couldn't convert ['MSFT', '', '51.23']
Row 4: Reason invalid literal for int() with base 10: ''
Row 7: Couldn't convert ['IBM', '', '70.44']
Row 7: Reason invalid literal for int() with base 10: ''
>>>
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>

Exercise 3.10: Silencing Errors

Modify the parse_csv() function so that parsing error messages can be silenced if explicitly desired by the user. For example:

>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True)
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>

Error handling is one of the most difficult things to get right in most programs. As a general rule, you shouldn’t silently ignore errors. Instead, it’s better to report problems and to give the user an option to the silence the error message if they choose to do so.

Modules

This section introduces the concept of modules and working with functions that span multiple files.

Modules and import

Any Python source file is a module.

# foo.py
def grok(a):
    ...
def spam(b):
    ...

The import statement loads and executes a module.

# program.py
import foo

a = foo.grok(2)
b = foo.spam('Hello')
...

Namespaces

A module is a collection of named values and is sometimes said to be a namespace. The names are all of the global variables and functions defined in the source file. After importing, the module name is used as a prefix. Hence the namespace.

import foo

a = foo.grok(2)
b = foo.spam('Hello')
...

The module name is directly tied to the file name (foo -> foo.py).

Global Definitions

Everything defined in the global scope is what populates the module namespace. Consider two modules that define the same variable x.

# foo.py
x = 42
def grok(a):
    ...
# bar.py
x = 37
def spam(a):
    ...

In this case, the x definitions refer to different variables. One is foo.x and the other is bar.x. Different modules can use the same names and those names won’t conflict with each other.

Modules are isolated.

Modules as Environments

Modules form an enclosing environment for all of the code defined inside.

# foo.py
x = 42

def grok(a):
    print(x)

Global variables are always bound to the enclosing module (same file). Each source file is its own little universe.

Module Execution

When a module is imported, all of the statements in the module execute one after another until the end of the file is reached. The contents of the module namespace are all of the global names that are still defined at the end of the execution process. If there are scripting statements that carry out tasks in the global scope (printing, creating files, etc.) you will see them run on import.

import as statement

You can change the name of a module as you import it:

import math as m
def rectangular(r, theta):
    x = r * m.cos(theta)
    y = r * m.sin(theta)
    return x, y

It works the same as a normal import. It just renames the module in that one file.

from module import

This picks selected symbols out of a module and makes them available locally.

from math import sin, cos

def rectangular(r, theta):
    x = r * cos(theta)
    y = r * sin(theta)
    return x, y

This allows parts of a module to be used without having to type the module prefix. It’s useful for frequently used names.

Comments on importing

Variations on import do not change the way that modules work.

import math
# vs
import math as m
# vs
from math import cos, sin
...

Specifically, import always executes the entire file and modules are still isolated environments.

The import module as statement is only changing the name locally. The from math import cos, sin statement still loads the entire math module behind the scenes. It’s merely copying the cos and sin names from the module into the local space after it’s done.

Module Loading

Each module loads and executes only once. Note: Repeated imports just return a reference to the previously loaded module.

sys.modules is a dict of all loaded modules.

>>> import sys
>>> sys.modules.keys()
['copy_reg', '__main__', 'site', '__builtin__', 'encodings', 'encodings.encodings', 'posixpath', ...]
>>>

Caution: A common confusion arises if you repeat an import statement after changing the source code for a module. Because of the module cache sys.modules, repeated imports always return the previously loaded module–even if a change was made. The safest way to load modified code into Python is to quit and restart the interpreter.

Locating Modules

Python consults a path list (sys.path) when looking for modules.

>>> import sys
>>> sys.path
[
  '',
  '/usr/local/lib/python36/python36.zip',
  '/usr/local/lib/python36',
  ...
]

The current working directory is usually first.

Module Search Path

As noted, sys.path contains the search paths. You can manually adjust if you need to.

import sys
sys.path.append('/project/foo/pyfiles')

Paths can also be added via environment variables.

% env PYTHONPATH=/project/foo/pyfiles python3
Python 3.6.0 (default, Feb 3 2017, 05:53:21)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)]
>>> import sys
>>> sys.path
['','/project/foo/pyfiles', ...]

As a general rule, it should not be necessary to manually adjust the module search path. However, it sometimes arises if you’re trying to import Python code that’s in an unusual location or not readily accessible from the current working directory.

Exercises

For this exercise involving modules, it is critically important to make sure you are running Python in a proper environment. Modules are usually when programmers encounter problems with the current working directory or with Python’s path settings. For this course, it is assumed that you’re writing all of your code in the Work/ directory. For best results, you should make sure you’re also in that directory when you launch the interpreter. If not, you need to make sure practical-python/Work is added to sys.path.

Exercise 3.11: Module imports

In section 3, we created a general purpose function parse_csv() for parsing the contents of CSV datafiles.

Now, we’re going to see how to use that function in other programs. First, start in a new shell window. Navigate to the folder where you have all your files. We are going to import them.

Start Python interactive mode.

bash % python3
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Once you’ve done that, try importing some of the programs you previously wrote. You should see their output exactly as before. Just to emphasize, importing a module runs its code.

>>> import bounce
... watch output ...
>>> import mortgage
... watch output ...
>>> import report
... watch output ...
>>>

If none of this works, you’re probably running Python in the wrong directory. Now, try importing your fileparse module and getting some help on it.

>>> import fileparse
>>> help(fileparse)
... look at the output ...
>>> dir(fileparse)
... look at the output ...
>>>

Try using the module to read some data:

>>> portfolio = fileparse.parse_csv('Data/portfolio.csv',select=['name','shares','price'], types=[str,int,float])
>>> portfolio
... look at the output ...
>>> pricelist = fileparse.parse_csv('Data/prices.csv',types=[str,float], has_headers=False)
>>> pricelist
... look at the output ...
>>> prices = dict(pricelist)
>>> prices
... look at the output ...
>>> prices['IBM']
106.11
>>>

Try importing a function so that you don’t need to include the module name:

>>> from fileparse import parse_csv
>>> portfolio = parse_csv('Data/portfolio.csv', select=['name','shares','price'], types=[str,int,float])
>>> portfolio
... look at the output ...
>>>

Exercise 3.12: Using your library module

In section 2, you wrote a program report.py that produced a stock report like this:

      Name     Shares      Price     Change
---------- ---------- ---------- ----------
        AA        100      39.91       7.71
       IBM         50     106.11      15.01
       CAT        150      78.58      -4.86
      MSFT        200      30.47     -20.76
        GE         95      37.38      -2.99
      MSFT         50      30.47     -34.63
       IBM        100     106.11      35.67

Take that program and modify it so that all of the input file processing is done using functions in your fileparse module. To do that, import fileparse as a module and change the read_portfolio() and read_prices() functions to use the parse_csv() function.

Use the interactive example at the start of this exercise as a guide. Afterwards, you should get exactly the same output as before.

Exercise 3.14: Using more library imports

In section 1, you wrote a program pcost.py that read a portfolio and computed its cost.

>>> import pcost
>>> pcost.portfolio_cost('Data/portfolio.csv')
44671.15
>>>

Modify the pcost.py file so that it uses the report.read_portfolio() function.

Commentary

When you are done with this exercise, you should have three programs. fileparse.py which contains a general purpose parse_csv() function. report.py which produces a nice report, but also contains read_portfolio() and read_prices() functions. And finally, pcost.py which computes the portfolio cost, but makes use of the read_portfolio() function written for the report.py program.

[[../Contents.md|Contents]] | Previous (3.3 Error Checking) | Next (3.5 Main Module) [[../Contents.md|Contents]] | Previous (3.4 Modules) | Next (3.6 Design Discussion)

3.5 Main Module

This section introduces the concept of a main program or main module.

Main Functions

In many programming languages, there is a concept of a main function or method.

// c / c++
int main(int argc, char *argv[]) {
    ...
}
// java
class myprog {
    public static void main(String args[]) {
        ...
    }
}

This is the first function that executes when an application is launched.

Python Main Module

Python has no main function or method. Instead, there is a main module. The main module is the source file that runs first.

bash % python3 prog.py
...

Whatever file you give to the interpreter at startup becomes main. It doesn’t matter the name.

__main__ check

It is standard practice for modules that run as a main script to use this convention:

# prog.py
...
if __name__ == '__main__':
    # Running as the main program ...
    statements
    ...

Statements enclosed inside the if statement become the main program.

Main programs vs. library imports

Any Python file can either run as main or as a library import:

bash % python3 prog.py # Running as main
import prog   # Running as library import

In both cases, __name__ is the name of the module. However, it will only be set to __main__ if running as main.

Usually, you don’t want statements that are part of the main program to execute on a library import. So, it’s common to have an if-check in code that might be used either way.

if __name__ == '__main__':
    # Does not execute if loaded with import ...

Program Template

Here is a common program template for writing a Python program:

# prog.py
# Import statements (libraries)
import modules

# Functions
def spam():
    ...

def blah():
    ...

# Main function
def main():
    ...

if __name__ == '__main__':
    main()

Command Line Tools

Python is often used for command-line tools

bash % python3 report.py portfolio.csv prices.csv

It means that the scripts are executed from the shell / terminal. Common use cases are for automation, background tasks, etc.

Command Line Args

The command line is a list of text strings.

bash % python3 report.py portfolio.csv prices.csv

This list of text strings is found in sys.argv.

# In the previous bash command
sys.argv # ['report.py, 'portfolio.csv', 'prices.csv']

Here is a simple example of processing the arguments:

import sys

if len(sys.argv) != 3:
    raise SystemExit(f'Usage: {sys.argv[0]} ' 'portfile pricefile')
portfile = sys.argv[1]
pricefile = sys.argv[2]
...

Standard I/O

Standard Input / Output (or stdio) are files that work the same as normal files.

sys.stdout
sys.stderr
sys.stdin

By default, print is directed to sys.stdout. Input is read from sys.stdin. Tracebacks and errors are directed to sys.stderr.

Be aware that stdio could be connected to terminals, files, pipes, etc.

bash % python3 prog.py > results.txt
# or
bash % cmd1 | python3 prog.py | cmd2

Environment Variables

Environment variables are set in the shell.

bash % setenv NAME dave
bash % setenv RSH ssh
bash % python3 prog.py

os.environ is a dictionary that contains these values.

import os

name = os.environ['NAME'] # 'dave'

Changes are reflected in any subprocesses later launched by the program.

Program Exit

Program exit is handled through exceptions.

raise SystemExit
raise SystemExit(exitcode)
raise SystemExit('Informative message')

An alternative.

import sys
sys.exit(exitcode)

A non-zero exit code indicates an error.

The #! line

On Unix, the #! line can launch a script as Python. Add the following to the first line of your script file.

#!/usr/bin/env python3
# prog.py
...

It requires the executable permission.

bash % chmod +x prog.py
# Then you can execute
bash % prog.py
... output ...

Note: The Python Launcher on Windows also looks for the #! line to indicate language version.

Script Template

Finally, here is a common code template for Python programs that run as command-line scripts:

#!/usr/bin/env python3
# prog.py

# Import statements (libraries)
import modules

# Functions
def spam():
    ...

def blah():
    ...

# Main function
def main(argv):
    # Parse command line args, environment, etc.
    ...

if __name__ == '__main__':
    import sys
    main(sys.argv)

Exercises

Exercise 3.15: main() functions

In the file report.py add a main() function that accepts a list of command line options and produces the same output as before. You should be able to run it interatively like this:

>>> import report
>>> report.main(['report.py', 'Data/portfolio.csv', 'Data/prices.csv'])
      Name     Shares      Price     Change
---------- ---------- ---------- ----------
        AA        100      39.91       7.71
       IBM         50     106.11      15.01
       CAT        150      78.58      -4.86
      MSFT        200      30.47     -20.76
        GE         95      37.38      -2.99
      MSFT         50      30.47     -34.63
       IBM        100     106.11      35.67
>>>

Modify the pcost.py file so that it has a similar main() function:

>>> import pcost
>>> pcost.main(['pcost.py', 'Data/portfolio.csv'])
Total cost: 44671.15
>>>

Exercise 3.16: Making Scripts

Modify the report.py and pcost.py programs so that they can execute as a script on the command line:

bash $ python3 report.py Data/portfolio.csv Data/prices.csv
      Name     Shares      Price     Change
---------- ---------- ---------- ----------
        AA        100      39.91       7.71
       IBM         50     106.11      15.01
       CAT        150      78.58      -4.86
      MSFT        200      30.47     -20.76
        GE         95      37.38      -2.99
      MSFT         50      30.47     -34.63
       IBM        100     106.11      35.67

bash $ python3 pcost.py Data/portfolio.csv
Total cost: 44671.15

[[../Contents.md|Contents]] | Previous (3.4 Modules) | Next (3.6 Design Discussion)[[../Contents.md|Contents]] | Previous (3.5 Main module) | [[../04_Classes_objects/00_Overview.md|Next (4 Classes)]]

3.6 Design Discussion

In this section we reconsider a design decision made earlier.

Filenames versus Iterables

Compare these two programs that return the same output.

# Provide a filename
def read_data(filename):
    records = []
    with open(filename) as f:
        for line in f:
            ...
            records.append(r)
    return records

d = read_data('file.csv')
# Provide lines
def read_data(lines):
    records = []
    for line in lines:
        ...
        records.append(r)
    return records

with open('file.csv') as f:
    d = read_data(f)
  • Which of these functions do you prefer? Why?
  • Which of these functions is more flexible?

Deep Idea: “Duck Typing”

Duck Typing is a computer programming concept to determine whether an object can be used for a particular purpose. It is an application of the duck test.

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

In the second version of read_data() above, the function expects any iterable object. Not just the lines of a file.

def read_data(lines):
    records = []
    for line in lines:
        ...
        records.append(r)
    return records

This means that we can use it with other lines.

# A CSV file
lines = open('data.csv')
data = read_data(lines)

# A zipped file
lines = gzip.open('data.csv.gz','rt')
data = read_data(lines)

# The Standard Input
lines = sys.stdin
data = read_data(lines)

# A list of strings
lines = ['ACME,50,91.1','IBM,75,123.45', ... ]
data = read_data(lines)

There is considerable flexibility with this design.

Question: Should we embrace or fight this flexibility?

Library Design Best Practices

Code libraries are often better served by embracing flexibility. Don’t restrict your options. With great flexibility comes great power.

Exercise

Exercise 3.17: From filenames to file-like objects

You’ve now created a file fileparse.py that contained a function parse_csv(). The function worked like this:

>>> import fileparse
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
>>>

Right now, the function expects to be passed a filename. However, you can make the code more flexible. Modify the function so that it works with any file-like/iterable object. For example:

>>> import fileparse
>>> import gzip
>>> with gzip.open('Data/portfolio.csv.gz', 'rt') as file:
...      port = fileparse.parse_csv(file, types=[str,int,float])
...
>>> lines = ['name,shares,price', 'AA,34.23,100', 'IBM,50,91.1', 'HPE,75,45.1']
>>> port = fileparse.parse_csv(lines, types=[str,int,float])
>>>

In this new code, what happens if you pass a filename as before?

>>> port = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
>>> port
... look at output (it should be crazy) ...
>>>

Yes, you’ll need to be careful. Could you add a safety check to avoid this?

Exercise 3.18: Fixing existing functions

Fix the read_portfolio() and read_prices() functions in the report.py file so that they work with the modified version of parse_csv(). This should only involve a minor modification. Afterwards, your report.py and pcost.py programs should work the same way they always did.