Tutorial:PracticalPython/3 Program organization
Program Organization
So far, we’ve learned some Python basics and have written some short scripts. However, as you start to write larger programs, you’ll want to get organized. This section dives into greater details on writing functions, handling errors, and introduces modules. By the end you should be able to write programs that are subdivided into functions across multiple files. We’ll also give some useful code templates for writing more useful scripts.
Scripting
In this part we look more closely at the practice of writing Python scripts.
What is a Script?
A script is a program that runs a series of statements and stops.
# program.py statement1 statement2 statement3 ...
We have mostly been writing scripts to this point.
A Problem
If you write a useful script, it will grow in features and functionality. You may want to apply it to other related problems. Over time, it might become a critical application. And if you don’t take care, it might turn into a huge tangled mess. So, let’s get organized.
Defining Things
Names must always be defined before they get used later.
def square(x): return x*x a = 42 b = a + 2 # Requires that `a` is defined z = square(b) # Requires `square` and `b` to be defined
The order is important. You almost always put the definitions of variables and functions near the top.
Defining Functions
It is a good idea to put all of the code related to a single task all in one place. Use a function.
def read_prices(filename): prices = {} with open(filename) as f: f_csv = csv.reader(f) for row in f_csv: prices[row[0]] = float(row[1]) return prices
A function also simplifies repeated operations.
oldprices = read_prices('oldprices.csv') newprices = read_prices('newprices.csv')
What is a Function?
A function is a named sequence of statements.
def funcname(args): statement statement ... return result
Any Python statement can be used inside.
def foo(): import math print(math.sqrt(2)) help(math)
There are no special statements in Python (which makes it easy to remember).
Function Definition
Functions can be defined in any order.
def foo(x): bar(x) def bar(x): statements # OR def bar(x) statements def foo(x): bar(x)
Functions must only be defined prior to actually being used (or called) during program execution.
foo(3) # foo must be defined already
Stylistically, it is probably more common to see functions defined in a bottom-up fashion.
Bottom-up Style
Functions are treated as building blocks. The smaller/simpler blocks go first.
# myprogram.py def foo(x): ... def bar(x): ... foo(x) # Defined above ... def spam(x): ... bar(x) # Defined above ... spam(42) # Code that uses the functions appears at the end
Later functions build upon earlier functions. Again, this is only a point of style. The only thing that matters in the above program is that the call to spam(42)
go last.
Function Design
Ideally, functions should be a black box. They should only operate on passed inputs and avoid global variables and mysterious side-effects. Your main goals: Modularity and Predictability.
Doc Strings
It’s good practice to include documentation in the form of a doc-string. Doc-strings are strings written immediately after the name of the function. They feed help()
, IDEs and other tools.
def read_prices(filename): ''' Read prices from a CSV file of name,price data ''' prices = {} with open(filename) as f: f_csv = csv.reader(f) for row in f_csv: prices[row[0]] = float(row[1]) return prices
A good practice for doc strings is to write a short one sentence summary of what the function does. If more information is needed, include a short example of usage along with a more detailed description of the arguments.
Type Annotations
You can also add optional type hints to function definitions.
def read_prices(filename: str) -> dict: ''' Read prices from a CSV file of name,price data ''' prices = {} with open(filename) as f: f_csv = csv.reader(f) for row in f_csv: prices[row[0]] = float(row[1]) return prices
The hints do nothing operationally. They are purely informational. However, they may be used by IDEs, code checkers, and other tools to do more.
Exercises
In section 2, you wrote a program called report.py
that printed out a report showing the performance of a stock portfolio. This program consisted of some functions. For example:
# report.py import csv def read_portfolio(filename): ''' Read a stock portfolio file into a list of dictionaries with keys name, shares, and price. ''' portfolio = [] with open(filename) as f: rows = csv.reader(f) headers = next(rows) for row in rows: record = dict(zip(headers, row)) stock = { 'name' : record['name'], 'shares' : int(record['shares']), 'price' : float(record['price']) } portfolio.append(stock) return portfolio ...
However, there were also portions of the program that just performed a series of scripted calculations. This code appeared near the end of the program. For example:
... # Output the report headers = ('Name', 'Shares', 'Price', 'Change') print('%10s %10s %10s %10s' % headers) print(('-' * 10 + ' ') * len(headers)) for row in report: print('%10s %10d %10.2f %10.2f' % row) ...
In this exercise, we’re going take this program and organize it a little more strongly around the use of functions.
Exercise 3.1: Structuring a program as a collection of functions
Modify your report.py
program so that all major operations, including calculations and output, are carried out by a collection of functions. Specifically:
- Create a function
print_report(report)
that prints out the report. - Change the last part of the program so that it is nothing more than a series of function calls and no other computation.
Exercise 3.2: Creating a top-level function for program execution
Take the last part of your program and package it into a single function portfolio_report(portfolio_filename, prices_filename)
. Have the function work so that the following function call creates the report as before:
portfolio_report('Data/portfolio.csv', 'Data/prices.csv')
In this final version, your program will be nothing more than a series of function definitions followed by a single function call to portfolio_report()
at the very end (which executes all of the steps involved in the program).
By turning your program into a single function, it becomes easy to run it on different inputs. For example, try these statements interactively after running your program:
>>> portfolio_report('Data/portfolio2.csv', 'Data/prices.csv') ... look at the output ... >>> files = ['Data/portfolio.csv', 'Data/portfolio2.csv'] >>> for name in files: print(f'{name:-^43s}') portfolio_report(name, 'prices.csv') print() ... look at the output ... >>>
Commentary
Python makes it very easy to write relatively unstructured scripting code where you just have a file with a sequence of statements in it. In the big picture, it’s almost always better to utilize functions whenever you can. At some point, that script is going to grow and you’ll wish you had a bit more organization. Also, a little known fact is that Python runs a bit faster if you use functions.
More on Functions
Although functions were introduced earlier, very few details were provided on how they actually work at a deeper level. This section aims to fill in some gaps and discuss matters such as calling conventions, scoping rules, and more.
Calling a Function
Consider this function:
def read_prices(filename, debug): ...
You can call the function with positional arguments:
prices = read_prices('prices.csv', True)
Or you can call the function with keyword arguments:
prices = read_prices(filename='prices.csv', debug=True)
Default Arguments
Sometimes you want an argument to be optional. If so, assign a default value in the function definition.
def read_prices(filename, debug=False): ...
If a default value is assigned, the argument is optional in function calls.
d = read_prices('prices.csv') e = read_prices('prices.dat', True)
Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).
Prefer keyword arguments for optional arguments
Compare and contrast these two different calling styles:
parse_data(data, False, True) # ????? parse_data(data, ignore_errors=True) parse_data(data, debug=True) parse_data(data, debug=True, ignore_errors=True)
In most cases, keyword arguments improve code clarity–especially for arguments that serve as flags or which are related to optional features.
Design Best Practices
Always give short, but meaningful names to functions arguments.
Someone using a function may want to use the keyword calling style.
d = read_prices('prices.csv', debug=True)
Python development tools will show the names in help features and documentation.
Returning Values
The return
statement returns a value
def square(x): return x * x
If no return value is given or return
is missing, None
is returned.
def bar(x): statements return a = bar(4) # a = None # OR def foo(x): statements # No `return` b = foo(4) # b = None
Multiple Return Values
Functions can only return one value. However, a function may return multiple values by returning them in a tuple.
def divide(a,b): q = a // b # Quotient r = a % b # Remainder return q, r # Return a tuple
Usage example:
x, y = divide(37,5) # x = 7, y = 2 x = divide(37, 5) # x = (7, 2)
Variable Scope
Programs assign values to variables.
x = value # Global variable def foo(): y = value # Local variable
Variables assignments occur outside and inside function definitions. Variables defined outside are “global”. Variables inside a function are “local”.
Local Variables
Variables assigned inside functions are private.
def read_portfolio(filename): portfolio = [] for line in open(filename): fields = line.split(',') s = (fields[0], int(fields[1]), float(fields[2])) portfolio.append(s) return portfolio
In this example, filename
, portfolio
, line
, fields
and s
are local variables. Those variables are not retained or accessible after the function call.
>>> stocks = read_portfolio('portfolio.csv') >>> fields Traceback (most recent call last): File "<stdin>", line 1, in ? NameError: name 'fields' is not defined >>>
Locals also can’t conflict with variables found elsewhere.
Global Variables
Functions can freely access the values of globals defined in the same file.
name = 'Dave' def greeting(): print('Hello', name) # Using `name` global variable
However, functions can’t modify globals:
name = 'Dave' def spam(): name = 'Guido' spam() print(name) # prints 'Dave'
Remember: All assignments in functions are local.
Modifying Globals
If you must modify a global variable you must declare it as such.
name = 'Dave' def spam(): global name name = 'Guido' # Changes the global name above
The global declaration must appear before its use and the corresponding variable must exist in the same file as the function. Having seen this, know that it is considered poor form. In fact, try to avoid global
entirely if you can. If you need a function to modify some kind of state outside of the function, it’s better to use a class instead (more on this later).
Argument Passing
When you call a function, the argument variables are names that refer to the passed values. These values are NOT copies (see [[../02_Working_with_data/07_Objects|section 2.7]]). If mutable data types are passed (e.g. lists, dicts), they can be modified in-place.
def foo(items): items.append(42) # Modifies the input object a = [1, 2, 3] foo(a) print(a) # [1, 2, 3, 42]
Key point: Functions don’t receive a copy of the input arguments.
Reassignment vs Modifying
Make sure you understand the subtle difference between modifying a value and reassigning a variable name.
def foo(items): items.append(42) # Modifies the input object a = [1, 2, 3] foo(a) print(a) # [1, 2, 3, 42] # VS def bar(items): items = [4,5,6] # Changes local `items` variable to point to a different object b = [1, 2, 3] bar(b) print(b) # [1, 2, 3]
Reminder: Variable assignment never overwrites memory. The name is merely bound to a new value.
Exercises
This set of exercises have you implement what is, perhaps, the most powerful and difficult part of the course. There are a lot of steps and many concepts from past exercises are put together all at once. The final solution is only about 25 lines of code, but take your time and make sure you understand each part.
A central part of your report.py
program focuses on the reading of CSV files. For example, the function read_portfolio()
reads a file containing rows of portfolio data and the function read_prices()
reads a file containing rows of price data. In both of those functions, there are a lot of low-level “fiddly” bits and similar features. For example, they both open a file and wrap it with the csv
module and they both convert various fields into new types.
If you were doing a lot of file parsing for real, you’d probably want to clean some of this up and make it more general purpose. That’s our goal.
Start this exercise by creating a new file called Work/fileparse.py
. This is where we will be doing our work.
Exercise 3.3: Reading CSV Files
To start, let’s just focus on the problem of reading a CSV file into a list of dictionaries. In the file fileparse.py
, define a function that looks like this:
# fileparse.py import csv def parse_csv(filename): ''' Parse a CSV file into a list of records ''' with open(filename) as f: rows = csv.reader(f) # Read the file headers headers = next(rows) records = [] for row in rows: if not row: # Skip rows with no data continue record = dict(zip(headers, row)) records.append(record) return records
This function reads a CSV file into a list of dictionaries while hiding the details of opening the file, wrapping it with the csv
module, ignoring blank lines, and so forth.
Try it out:
Hint: python3 -i fileparse.py
.
>>> portfolio = parse_csv('Data/portfolio.csv') >>> portfolio [{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] >>>
This is good except that you can’t do any kind of useful calculation with the data because everything is represented as a string. We’ll fix this shortly, but let’s keep building on it.
Exercise 3.4: Building a Column Selector
In many cases, you’re only interested in selected columns from a CSV file, not all of the data. Modify the parse_csv()
function so that it optionally allows user-specified columns to be picked out as follows:
>>> # Read all of the data >>> portfolio = parse_csv('Data/portfolio.csv') >>> portfolio [{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] >>> # Read only some of the data >>> shares_held = parse_csv('portfolio.csv', select=['name','shares']) >>> shares_held [{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}] >>>
An example of a column selector was given in [[../02_Working_with_data/06_List_comprehension|Exercise 2.23]]. However, here’s one way to do it:
# fileparse.py import csv def parse_csv(filename, select=None): ''' Parse a CSV file into a list of records ''' with open(filename) as f: rows = csv.reader(f) # Read the file headers headers = next(rows) # If a column selector was given, find indices of the specified columns. # Also narrow the set of headers used for resulting dictionaries if select: indices = [headers.index(colname) for colname in select] headers = select else: indices = [] records = [] for row in rows: if not row: # Skip rows with no data continue # Filter the row if specific columns were selected if indices: row = [ row[index] for index in indices ] # Make a dictionary record = dict(zip(headers, row)) records.append(record) return records
There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices. For example, suppose the input file had the following headers:
>>> headers = ['name', 'date', 'time', 'shares', 'price'] >>>
Now, suppose the selected columns were as follows:
>>> select = ['name', 'shares'] >>>
To perform the proper selection, you have to map the selected column names to column indices in the file. That’s what this step is doing:
>>> indices = [headers.index(colname) for colname in select ] >>> indices [0, 3] >>>
In other words, “name” is column 0 and “shares” is column 3. When you read a row of data from the file, the indices are used to filter it:
>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ] >>> row = [ row[index] for index in indices ] >>> row ['AA', '100'] >>>
Exercise 3.5: Performing Type Conversion
Modify the parse_csv()
function so that it optionally allows type-conversions to be applied to the returned data. For example:
>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float]) >>> portfolio [{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}] >>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int]) >>> shares_held [{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}] >>>
You already explored this in [[../02_Working_with_data/07_Objects|Exercise 2.24]]. You’ll need to insert the following fragment of code into your solution:
... if types: row = [func(val) for func, val in zip(types, row) ] ...
Exercise 3.6: Working without Headers
Some CSV files don’t include any header information. For example, the file prices.csv
looks like this:
"AA",9.22 "AXP",24.85 "BA",44.85 "BAC",11.27 ...
Modify the parse_csv()
function so that it can work with such files by creating a list of tuples instead. For example:
>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False) >>> prices [('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)] >>>
To make this change, you’ll need to modify the code so that the first line of data isn’t interpreted as a header line. Also, you’ll need to make sure you don’t create dictionaries as there are no longer any column names to use for keys.
Exercise 3.7: Picking a different column delimitier
Although CSV files are pretty common, it’s also possible that you could encounter a file that uses a different column separator such as a tab or space. For example, the file Data/portfolio.dat
looks like this:
name shares price "AA" 100 32.20 "IBM" 50 91.10 "CAT" 150 83.44 "MSFT" 200 51.23 "GE" 95 40.37 "MSFT" 50 65.10 "IBM" 100 70.44
The csv.reader()
function allows a different column delimiter to be given as follows:
rows = csv.reader(f, delimiter=' ')
Modify your parse_csv()
function so that it also allows the delimiter to be changed.
For example:
>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ') >>> portfolio [{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] >>>
Commentary
If you’ve made it this far, you’ve created a nice library function that’s genuinely useful. You can use it to parse arbitrary CSV files, select out columns of interest, perform type conversions, without having to worry too much about the inner workings of files or the csv
module.
Error Checking
Although exceptions were introduced earlier, this section fills in some additional details about error checking and exception handling.
How programs fail
Python performs no checking or validation of function argument types or values. A function will work on any data that is compatible with the statements in the function.
def add(x, y): return x + y add(3, 4) # 7 add('Hello', 'World') # 'HelloWorld' add('3', '4') # '34'
If there are errors in a function, they appear at run time (as an exception).
def add(x, y): return x + y >>> add(3, '4') Traceback (most recent call last): ... TypeError: unsupported operand type(s) for +: 'int' and 'str' >>>
To verify code, there is a strong emphasis on testing (covered later).
Exceptions
Exceptions are used to signal errors. To raise an exception yourself, use raise
statement.
if name not in authorized: raise RuntimeError(f'{name} not authorized')
To catch an exception use try-except
.
try: authenticate(username) except RuntimeError as e: print(e)
Exception Handling
Exceptions propagate to the first matching except
.
def grok(): ... raise RuntimeError('Whoa!') # Exception raised here def spam(): grok() # Call that will raise exception def bar(): try: spam() except RuntimeError as e: # Exception caught here ... def foo(): try: bar() except RuntimeError as e: # Exception does NOT arrive here ... foo()
To handle the exception, put statements in the except
block. You can add any statements you want to handle the error.
def grok(): ... raise RuntimeError('Whoa!') def bar(): try: grok() except RuntimeError as e: # Exception caught here statements # Use this statements statements ... bar()
After handling, execution resumes with the first statement after the try-except
.
def grok(): ... raise RuntimeError('Whoa!') def bar(): try: grok() except RuntimeError as e: # Exception caught here statements statements ... statements # Resumes execution here statements # And continues here ... bar()
Built-in Exceptions
There are about two-dozen built-in exceptions. Usually the name of the exception is indicative of what’s wrong (e.g., a ValueError
is raised because you supplied a bad value). This is not an exhaustive list. Check the documentation for more.
ArithmeticError AssertionError EnvironmentError EOFError ImportError IndexError KeyboardInterrupt KeyError MemoryError NameError ReferenceError RuntimeError SyntaxError SystemError TypeError ValueError
Exception Values
Exceptions have an associated value. It contains more specific information about what’s wrong.
raise RuntimeError('Invalid user name')
This value is part of the exception instance that’s placed in the variable supplied to except
.
try: ... except RuntimeError as e: # `e` holds the exception raised ...
e
is an instance of the exception type. However, it often looks like a string when printed.
except RuntimeError as e: print('Failed : Reason', e)
Catching Multiple Errors
You can catch different kinds of exceptions using multiple except
blocks.
try: ... except LookupError as e: ... except RuntimeError as e: ... except IOError as e: ... except KeyboardInterrupt as e: ...
Alternatively, if the statements to handle them is the same, you can group them:
try: ... except (IOError,LookupError,RuntimeError) as e: ...
Catching All Errors
To catch any exception, use Exception
like this:
try: ... except Exception: # DANGER. See below print('An error occurred')
In general, writing code like that is a bad idea because you’ll have no idea why it failed.
Wrong Way to Catch Errors
Here is the wrong way to use exceptions.
try: go_do_something() except Exception: print('Computer says no')
This catches all possible errors and it may make it impossible to debug when the code is failing for some reason you didn’t expect at all (e.g. uninstalled Python module, etc.).
Somewhat Better Approach
If you’re going to catch all errors, this is a more sane approach.
try: go_do_something() except Exception as e: print('Computer says no. Reason :', e)
It reports a specific reason for failure. It is almost always a good idea to have some mechanism for viewing/reporting errors when you write code that catches all possible exceptions.
In general though, it’s better to catch the error as narrowly as is reasonable. Only catch the errors you can actually handle. Let other errors pass by–maybe some other code can handle them.
Reraising an Exception
Use raise
to propagate a caught error.
try: go_do_something() except Exception as e: print('Computer says no. Reason :', e) raise
This allows you to take action (e.g. logging) and pass the error on to the caller.
Exception Best Practices
Don’t catch exceptions. Fail fast and loud. If it’s important, someone else will take care of the problem. Only catch an exception if you are that someone. That is, only catch errors where you can recover and sanely keep going.
finally
statement
It specifies code that must run regardless of whether or not an exception occurs.
lock = Lock() ... lock.acquire() try: ... finally: lock.release() # this will ALWAYS be executed. With and without exception.
Commonly used to safely manage resources (especially locks, files, etc.).
with
statement
In modern code, try-finally
is often replaced with the with
statement.
lock = Lock() with lock: # lock acquired ... # lock released
A more familiar example:
with open(filename) as f: # Use the file ... # File closed
with
defines a usage context for a resource. When execution leaves that context, resources are released. with
only works with certain objects that have been specifically programmed to support it.
Exercises
Exercise 3.8: Raising exceptions
The parse_csv()
function you wrote in the last section allows user-specified columns to be selected, but that only works if the input data file has column headers.
Modify the code so that an exception gets raised if both the select
and has_headers=False
arguments are passed. For example:
>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "fileparse.py", line 9, in parse_csv raise RuntimeError("select argument requires column headers") RuntimeError: select argument requires column headers >>>
Having added this one check, you might ask if you should be performing other kinds of sanity checks in the function. For example, should you check that the filename is a string, that types is a list, or anything of that nature?
As a general rule, it’s usually best to skip such tests and to just let the program fail on bad inputs. The traceback message will point at the source of the problem and can assist in debugging.
The main reason for adding the above check to avoid running the code in a non-sensical mode (e.g., using a feature that requires column headers, but simultaneously specifying that there are no headers).
This indicates a programming error on the part of the calling code. Checking for cases that “aren’t supposed to happen” is often a good idea.
Exercise 3.9: Catching exceptions
The parse_csv()
function you wrote is used to process the entire contents of a file. However, in the real-world, it’s possible that input files might have corrupted, missing, or dirty data. Try this experiment:
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "fileparse.py", line 36, in parse_csv row = [func(val) for func, val in zip(types, row)] ValueError: invalid literal for int() with base 10: '' >>>
Modify the parse_csv()
function to catch all ValueError
exceptions generated during record creation and print a warning message for rows that can’t be converted.
The message should include the row number and information about the reason why it failed. To test your function, try reading the file Data/missing.csv
above. For example:
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float]) Row 4: Couldn't convert ['MSFT', '', '51.23'] Row 4: Reason invalid literal for int() with base 10: '' Row 7: Couldn't convert ['IBM', '', '70.44'] Row 7: Reason invalid literal for int() with base 10: '' >>> >>> portfolio [{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}] >>>
Exercise 3.10: Silencing Errors
Modify the parse_csv()
function so that parsing error messages can be silenced if explicitly desired by the user. For example:
>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True) >>> portfolio [{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}] >>>
Error handling is one of the most difficult things to get right in most programs. As a general rule, you shouldn’t silently ignore errors. Instead, it’s better to report problems and to give the user an option to the silence the error message if they choose to do so.
Modules
This section introduces the concept of modules and working with functions that span multiple files.
Modules and import
Any Python source file is a module.
# foo.py def grok(a): ... def spam(b): ...
The import
statement loads and executes a module.
# program.py import foo a = foo.grok(2) b = foo.spam('Hello') ...
Namespaces
A module is a collection of named values and is sometimes said to be a namespace. The names are all of the global variables and functions defined in the source file. After importing, the module name is used as a prefix. Hence the namespace.
import foo a = foo.grok(2) b = foo.spam('Hello') ...
The module name is directly tied to the file name (foo -> foo.py).
Global Definitions
Everything defined in the global scope is what populates the module namespace. Consider two modules that define the same variable x
.
# foo.py x = 42 def grok(a): ...
# bar.py x = 37 def spam(a): ...
In this case, the x
definitions refer to different variables. One is foo.x
and the other is bar.x
. Different modules can use the same names and those names won’t conflict with each other.
Modules are isolated.
Modules as Environments
Modules form an enclosing environment for all of the code defined inside.
# foo.py x = 42 def grok(a): print(x)
Global variables are always bound to the enclosing module (same file). Each source file is its own little universe.
Module Execution
When a module is imported, all of the statements in the module execute one after another until the end of the file is reached. The contents of the module namespace are all of the global names that are still defined at the end of the execution process. If there are scripting statements that carry out tasks in the global scope (printing, creating files, etc.) you will see them run on import.
import as
statement
You can change the name of a module as you import it:
import math as m def rectangular(r, theta): x = r * m.cos(theta) y = r * m.sin(theta) return x, y
It works the same as a normal import. It just renames the module in that one file.
from
module import
This picks selected symbols out of a module and makes them available locally.
from math import sin, cos def rectangular(r, theta): x = r * cos(theta) y = r * sin(theta) return x, y
This allows parts of a module to be used without having to type the module prefix. It’s useful for frequently used names.
Comments on importing
Variations on import do not change the way that modules work.
import math # vs import math as m # vs from math import cos, sin ...
Specifically, import
always executes the entire file and modules are still isolated environments.
The import module as
statement is only changing the name locally. The from math import cos, sin
statement still loads the entire math module behind the scenes. It’s merely copying the cos
and sin
names from the module into the local space after it’s done.
Module Loading
Each module loads and executes only once. Note: Repeated imports just return a reference to the previously loaded module.
sys.modules
is a dict of all loaded modules.
>>> import sys >>> sys.modules.keys() ['copy_reg', '__main__', 'site', '__builtin__', 'encodings', 'encodings.encodings', 'posixpath', ...] >>>
Caution: A common confusion arises if you repeat an import
statement after changing the source code for a module. Because of the module cache sys.modules
, repeated imports always return the previously loaded module–even if a change was made. The safest way to load modified code into Python is to quit and restart the interpreter.
Locating Modules
Python consults a path list (sys.path) when looking for modules.
>>> import sys >>> sys.path [ '', '/usr/local/lib/python36/python36.zip', '/usr/local/lib/python36', ... ]
The current working directory is usually first.
Module Search Path
As noted, sys.path
contains the search paths. You can manually adjust if you need to.
import sys sys.path.append('/project/foo/pyfiles')
Paths can also be added via environment variables.
% env PYTHONPATH=/project/foo/pyfiles python3 Python 3.6.0 (default, Feb 3 2017, 05:53:21) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] >>> import sys >>> sys.path ['','/project/foo/pyfiles', ...]
As a general rule, it should not be necessary to manually adjust the module search path. However, it sometimes arises if you’re trying to import Python code that’s in an unusual location or not readily accessible from the current working directory.
Exercises
For this exercise involving modules, it is critically important to make sure you are running Python in a proper environment. Modules are usually when programmers encounter problems with the current working directory or with Python’s path settings. For this course, it is assumed that you’re writing all of your code in the Work/
directory. For best results, you should make sure you’re also in that directory when you launch the interpreter. If not, you need to make sure practical-python/Work
is added to sys.path
.
Exercise 3.11: Module imports
In section 3, we created a general purpose function parse_csv()
for parsing the contents of CSV datafiles.
Now, we’re going to see how to use that function in other programs. First, start in a new shell window. Navigate to the folder where you have all your files. We are going to import them.
Start Python interactive mode.
bash % python3 Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
Once you’ve done that, try importing some of the programs you previously wrote. You should see their output exactly as before. Just to emphasize, importing a module runs its code.
>>> import bounce ... watch output ... >>> import mortgage ... watch output ... >>> import report ... watch output ... >>>
If none of this works, you’re probably running Python in the wrong directory. Now, try importing your fileparse
module and getting some help on it.
>>> import fileparse >>> help(fileparse) ... look at the output ... >>> dir(fileparse) ... look at the output ... >>>
Try using the module to read some data:
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv',select=['name','shares','price'], types=[str,int,float]) >>> portfolio ... look at the output ... >>> pricelist = fileparse.parse_csv('Data/prices.csv',types=[str,float], has_headers=False) >>> pricelist ... look at the output ... >>> prices = dict(pricelist) >>> prices ... look at the output ... >>> prices['IBM'] 106.11 >>>
Try importing a function so that you don’t need to include the module name:
>>> from fileparse import parse_csv >>> portfolio = parse_csv('Data/portfolio.csv', select=['name','shares','price'], types=[str,int,float]) >>> portfolio ... look at the output ... >>>
Exercise 3.12: Using your library module
In section 2, you wrote a program report.py
that produced a stock report like this:
Name Shares Price Change ---------- ---------- ---------- ---------- AA 100 39.91 7.71 IBM 50 106.11 15.01 CAT 150 78.58 -4.86 MSFT 200 30.47 -20.76 GE 95 37.38 -2.99 MSFT 50 30.47 -34.63 IBM 100 106.11 35.67
Take that program and modify it so that all of the input file processing is done using functions in your fileparse
module. To do that, import fileparse
as a module and change the read_portfolio()
and read_prices()
functions to use the parse_csv()
function.
Use the interactive example at the start of this exercise as a guide. Afterwards, you should get exactly the same output as before.
Exercise 3.14: Using more library imports
In section 1, you wrote a program pcost.py
that read a portfolio and computed its cost.
>>> import pcost >>> pcost.portfolio_cost('Data/portfolio.csv') 44671.15 >>>
Modify the pcost.py
file so that it uses the report.read_portfolio()
function.
Commentary
When you are done with this exercise, you should have three programs. fileparse.py
which contains a general purpose parse_csv()
function. report.py
which produces a nice report, but also contains read_portfolio()
and read_prices()
functions. And finally, pcost.py
which computes the portfolio cost, but makes use of the read_portfolio()
function written for the report.py
program.
[[../Contents.md|Contents]] | Previous (3.3 Error Checking) | Next (3.5 Main Module) [[../Contents.md|Contents]] | Previous (3.4 Modules) | Next (3.6 Design Discussion)
3.5 Main Module
This section introduces the concept of a main program or main module.
Main Functions
In many programming languages, there is a concept of a main function or method.
// c / c++ int main(int argc, char *argv[]) { ... }
// java class myprog { public static void main(String args[]) { ... } }
This is the first function that executes when an application is launched.
Python Main Module
Python has no main function or method. Instead, there is a main module. The main module is the source file that runs first.
bash % python3 prog.py ...
Whatever file you give to the interpreter at startup becomes main. It doesn’t matter the name.
__main__
check
It is standard practice for modules that run as a main script to use this convention:
# prog.py ... if __name__ == '__main__': # Running as the main program ... statements ...
Statements enclosed inside the if
statement become the main program.
Main programs vs. library imports
Any Python file can either run as main or as a library import:
bash % python3 prog.py # Running as main
import prog # Running as library import
In both cases, __name__
is the name of the module. However, it will only be set to __main__
if running as main.
Usually, you don’t want statements that are part of the main program to execute on a library import. So, it’s common to have an if-
check in code that might be used either way.
if __name__ == '__main__': # Does not execute if loaded with import ...
Program Template
Here is a common program template for writing a Python program:
# prog.py # Import statements (libraries) import modules # Functions def spam(): ... def blah(): ... # Main function def main(): ... if __name__ == '__main__': main()
Command Line Tools
Python is often used for command-line tools
bash % python3 report.py portfolio.csv prices.csv
It means that the scripts are executed from the shell / terminal. Common use cases are for automation, background tasks, etc.
Command Line Args
The command line is a list of text strings.
bash % python3 report.py portfolio.csv prices.csv
This list of text strings is found in sys.argv
.
# In the previous bash command sys.argv # ['report.py, 'portfolio.csv', 'prices.csv']
Here is a simple example of processing the arguments:
import sys if len(sys.argv) != 3: raise SystemExit(f'Usage: {sys.argv[0]} ' 'portfile pricefile') portfile = sys.argv[1] pricefile = sys.argv[2] ...
Standard I/O
Standard Input / Output (or stdio) are files that work the same as normal files.
sys.stdout sys.stderr sys.stdin
By default, print is directed to sys.stdout
. Input is read from sys.stdin
. Tracebacks and errors are directed to sys.stderr
.
Be aware that stdio could be connected to terminals, files, pipes, etc.
bash % python3 prog.py > results.txt # or bash % cmd1 | python3 prog.py | cmd2
Environment Variables
Environment variables are set in the shell.
bash % setenv NAME dave bash % setenv RSH ssh bash % python3 prog.py
os.environ
is a dictionary that contains these values.
import os name = os.environ['NAME'] # 'dave'
Changes are reflected in any subprocesses later launched by the program.
Program Exit
Program exit is handled through exceptions.
raise SystemExit raise SystemExit(exitcode) raise SystemExit('Informative message')
An alternative.
import sys sys.exit(exitcode)
A non-zero exit code indicates an error.
The #!
line
On Unix, the #!
line can launch a script as Python. Add the following to the first line of your script file.
#!/usr/bin/env python3 # prog.py ...
It requires the executable permission.
bash % chmod +x prog.py # Then you can execute bash % prog.py ... output ...
Note: The Python Launcher on Windows also looks for the #!
line to indicate language version.
Script Template
Finally, here is a common code template for Python programs that run as command-line scripts:
#!/usr/bin/env python3 # prog.py # Import statements (libraries) import modules # Functions def spam(): ... def blah(): ... # Main function def main(argv): # Parse command line args, environment, etc. ... if __name__ == '__main__': import sys main(sys.argv)
Exercises
Exercise 3.15: main()
functions
In the file report.py
add a main()
function that accepts a list of command line options and produces the same output as before. You should be able to run it interatively like this:
>>> import report >>> report.main(['report.py', 'Data/portfolio.csv', 'Data/prices.csv']) Name Shares Price Change ---------- ---------- ---------- ---------- AA 100 39.91 7.71 IBM 50 106.11 15.01 CAT 150 78.58 -4.86 MSFT 200 30.47 -20.76 GE 95 37.38 -2.99 MSFT 50 30.47 -34.63 IBM 100 106.11 35.67 >>>
Modify the pcost.py
file so that it has a similar main()
function:
>>> import pcost >>> pcost.main(['pcost.py', 'Data/portfolio.csv']) Total cost: 44671.15 >>>
Exercise 3.16: Making Scripts
Modify the report.py
and pcost.py
programs so that they can execute as a script on the command line:
bash $ python3 report.py Data/portfolio.csv Data/prices.csv Name Shares Price Change ---------- ---------- ---------- ---------- AA 100 39.91 7.71 IBM 50 106.11 15.01 CAT 150 78.58 -4.86 MSFT 200 30.47 -20.76 GE 95 37.38 -2.99 MSFT 50 30.47 -34.63 IBM 100 106.11 35.67 bash $ python3 pcost.py Data/portfolio.csv Total cost: 44671.15
[[../Contents.md|Contents]] | Previous (3.4 Modules) | Next (3.6 Design Discussion)[[../Contents.md|Contents]] | Previous (3.5 Main module) | [[../04_Classes_objects/00_Overview.md|Next (4 Classes)]]
3.6 Design Discussion
In this section we reconsider a design decision made earlier.
Filenames versus Iterables
Compare these two programs that return the same output.
# Provide a filename def read_data(filename): records = [] with open(filename) as f: for line in f: ... records.append(r) return records d = read_data('file.csv')
# Provide lines def read_data(lines): records = [] for line in lines: ... records.append(r) return records with open('file.csv') as f: d = read_data(f)
- Which of these functions do you prefer? Why?
- Which of these functions is more flexible?
Deep Idea: “Duck Typing”
Duck Typing is a computer programming concept to determine whether an object can be used for a particular purpose. It is an application of the duck test.
If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
In the second version of read_data()
above, the function expects any iterable object. Not just the lines of a file.
def read_data(lines): records = [] for line in lines: ... records.append(r) return records
This means that we can use it with other lines.
# A CSV file lines = open('data.csv') data = read_data(lines) # A zipped file lines = gzip.open('data.csv.gz','rt') data = read_data(lines) # The Standard Input lines = sys.stdin data = read_data(lines) # A list of strings lines = ['ACME,50,91.1','IBM,75,123.45', ... ] data = read_data(lines)
There is considerable flexibility with this design.
Question: Should we embrace or fight this flexibility?
Library Design Best Practices
Code libraries are often better served by embracing flexibility. Don’t restrict your options. With great flexibility comes great power.
Exercise
Exercise 3.17: From filenames to file-like objects
You’ve now created a file fileparse.py
that contained a function parse_csv()
. The function worked like this:
>>> import fileparse >>> portfolio = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float]) >>>
Right now, the function expects to be passed a filename. However, you can make the code more flexible. Modify the function so that it works with any file-like/iterable object. For example:
>>> import fileparse >>> import gzip >>> with gzip.open('Data/portfolio.csv.gz', 'rt') as file: ... port = fileparse.parse_csv(file, types=[str,int,float]) ... >>> lines = ['name,shares,price', 'AA,34.23,100', 'IBM,50,91.1', 'HPE,75,45.1'] >>> port = fileparse.parse_csv(lines, types=[str,int,float]) >>>
In this new code, what happens if you pass a filename as before?
>>> port = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float]) >>> port ... look at output (it should be crazy) ... >>>
Yes, you’ll need to be careful. Could you add a safety check to avoid this?
Exercise 3.18: Fixing existing functions
Fix the read_portfolio()
and read_prices()
functions in the report.py
file so that they work with the modified version of parse_csv()
. This should only involve a minor modification. Afterwards, your report.py
and pcost.py
programs should work the same way they always did.