Tutorial:PracticalPython/9 Packages

From HandWiki
Revision as of 07:59, 6 June 2020 by imported>Jworkorg
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Packages

We conclude the course with a few details on how to organize your code into a package structure. We’ll also discuss the installation of third party packages and preparing to give your own code away to others.

The subject of packaging is an ever-evolving, overly complex part of Python development. Rather than focus on specific tools, the main focus of this section is on some general code organization principles that will prove useful no matter what tools you later use to give code away or manage dependencies.

[[../Contents.md|Contents]] | [[../08_Testing_debugging/00_Overview.md|Prev (8 Testing and Debugging)]] [[../Contents.md|Contents]] | [[../08_Testing_debugging/03_Debugging.md|Previous (8.3 Debugging)]] | Next (9.2 Third Party Packages)

Packages

If writing a larger program, you don’t really want to organize it as a large of collection of standalone files at the top level. This section introduces the concept of a package.

Modules

Any Python source file is a module.

# foo.py
def grok(a):
    ...
def spam(b):
    ...

An import statement loads and executes a module.

# program.py
import foo

a = foo.grok(2)
b = foo.spam('Hello')
...

Packages vs Modules

For larger collections of code, it is common to organize modules into a package.

# From this
pcost.py
report.py
fileparse.py

# To this
porty/
    __init__.py
    pcost.py
    report.py
    fileparse.py

You pick a name and make a top-level directory. porty in the example above (clearly picking this name is the most important first step).

Add an __init__.py file to the directory. It may be empty.

Put your source files into the directory.

Using a Package

A package serves as a namespace for imports.

This means that there are now multilevel imports.

import porty.report
port = porty.report.read_portfolio('port.csv')

There are other variations of import statements.

from porty import report
port = report.read_portfolio('portfolio.csv')

from porty.report import read_portfolio
port = read_portfolio('portfolio.csv')

Two problems

There are two main problems with this approach.

  • imports between files in the same package break.
  • Main scripts placed inside the package break.

So, basically everything breaks. But, other than that, it works.

Problem: Imports

Imports between files in the same package must now include the package name in the import. Remember the structure.

porty/
    __init__.py
    pcost.py
    report.py
    fileparse.py

Modified import example.

# report.py
from porty import fileparse

def read_portfolio(filename):
    return fileparse.parse_csv(...)

All imports are absolute, not relative.

# report.py
import fileparse    # BREAKS. fileparse not found

...

Relative Imports

Instead of directly using the package name, you can use . to refer to the current package.

# report.py
from . import fileparse

def read_portfolio(filename):
    return fileparse.parse_csv(...)

Syntax:

from . import modname

This makes it easy to rename the package.

Problem: Main Scripts

Running a package submodule as a main script breaks.

bash $ python porty/pcost.py # BREAKS
...

Reason: You are running Python on a single file and Python doesn’t see the rest of the package structure correctly (sys.path is wrong).

All imports break. To fix this, you need to run your program in a different way, using the -m option.

bash $ python -m porty.pcost # WORKS
...

__init__.py files

The primary purpose of these files is to stitch modules together.

Example: consolidating functions

# porty/__init__.py
from .pcost import portfolio_cost
from .report import portfolio_report

This makes names appear at the top-level when importing.

from porty import portfolio_cost
portfolio_cost('portfolio.csv')

Instead of using the multilevel imports.

from porty import pcost
pcost.portfolio_cost('portfolio.csv')

Another solution for scripts

As noted, you now need to use -m package.module to run scripts within your package.

bash % python3 -m porty.pcost portfolio.csv

There is another alternative: Write a new top-level script.

#!/usr/bin/env python3
# pcost.py
import porty.pcost
import sys
porty.pcost.main(sys.argv)

This script lives outside the package. For example, looking at the directory structure:

pcost.py       # top-level-script
porty/         # package directory
    __init__.py
    pcost.py
    ...

Application Structure

Code organization and file structure is key to the maintainability of an application.

There is no “one-size fits all” approach for Python. However, one structure that works for a lot of problems is something like this.

porty-app/
  README.txt
  script.py         # SCRIPT
  porty/
    # LIBRARY CODE
    __init__.py
    pcost.py
    report.py
    fileparse.py

The top-level porty-app is a container for everything else–documentation, top-level scripts, examples, etc.

Again, top-level scripts (if any) need to exist outside the code package. One level up.

#!/usr/bin/env python3
# porty-add/script.py
import sys
import porty

porty.report.main(sys.argv)

Exercises

At this point, you have a directory with several programs:

pcost.py          # computes portfolio cost
report.py         # Makes a report
ticker.py         # Produce a real-time stock ticker

There are a variety of supporting modules with other functionality:

stock.py          # Stock class
portfolio.py      # Portfolio class
fileparse.py      # CSV parsing
tableformat.py    # Formatted tables
follow.py         # Follow a log file
typedproperty.py  # Typed class properties

In this exercise, we’re going to clean up the code and put it into a common package.

Exercise 9.1: Making a simple package

Make a directory called porty/ and put all of the above Python files into it. Additionally create an empty __init__.py file and put it in the directory. You should have a directory of files like this:

porty/
    __init__.py
    fileparse.py
    follow.py
    pcost.py
    portfolio.py
    report.py
    stock.py
    tableformat.py
    ticker.py
    typedproperty.py

Remove the file __pycache__ that’s sitting in your directory. This contains pre-compiled Python modules from before. We want to start fresh.

Try importing some of package modules:

>>> import porty.report
>>> import porty.pcost
>>> import porty.ticker

If these imports fail, go into the appropriate file and fix the module imports to include a package-relative import. For example, a statement such as import fileparse might change to the following:

# report.py
from . import fileparse
...

If you have a statement such as from fileparse import parse_csv, change the code to the following:

# report.py
from .fileparse import parse_csv
...

Exercise 9.2: Making an application directory

Putting all of your code into a “package” isn’t often enough for an application. Sometimes there are supporting files, documentation, scripts, and other things. These files need to exist OUTSIDE of the porty/ directory you made above.

Create a new directory called porty-app. Move the porty directory you created in Exercise 9.1 into that directory. Copy the Data/portfolio.csv and Data/prices.csv test files into this directory. Additionally create a README.txt file with some information about yourself. Your code should now be organized as follows:

porty-app/
    portfolio.csv
    prices.csv
    README.txt
    porty/
        __init__.py
        fileparse.py
        follow.py
        pcost.py
        portfolio.py
        report.py
        stock.py
        tableformat.py
        ticker.py
        typedproperty.py

To run your code, you need to make sure you are working in the top-level porty-app/ directory. For example, from the terminal:

shell % cd porty-app
shell % python3
>>> import porty.report
>>>

Try running some of your prior scripts as a main program:

shell % cd porty-app
shell % python3 -m porty.report portfolio.csv prices.csv txt
      Name     Shares      Price     Change
---------- ---------- ---------- ----------
        AA        100       9.22     -22.98
       IBM         50     106.28      15.18
       CAT        150      35.46     -47.98
      MSFT        200      20.89     -30.34
        GE         95      13.48     -26.89
      MSFT         50      20.89     -44.21
       IBM        100     106.28      35.84

shell %

Exercise 9.3: Top-level Scripts

Using the python -m command is often a bit weird. You may want to write a top level script that simply deals with the oddities of packages. Create a script print-report.py that produces the above report:

#!/usr/bin/env python3
# print-report.py
import sys
from porty.report import main
main(sys.argv)

Put this script in the top-level porty-app/ directory. Make sure you can run it in that location:

shell % cd porty-app
shell % python3 print-report.py portfolio.csv prices.csv txt
      Name     Shares      Price     Change
---------- ---------- ---------- ----------
        AA        100       9.22     -22.98
       IBM         50     106.28      15.18
       CAT        150      35.46     -47.98
      MSFT        200      20.89     -30.34
        GE         95      13.48     -26.89
      MSFT         50      20.89     -44.21
       IBM        100     106.28      35.84

shell %

Your final code should now be structured something like this:

porty-app/
    portfolio.csv
    prices.csv
    print-report.py
    README.txt
    porty/
        __init__.py
        fileparse.py
        follow.py
        pcost.py
        portfolio.py
        report.py
        stock.py
        tableformat.py
        ticker.py
        typedproperty.py

Third Party Modules

Python has a large library of built-in modules (batteries included).

There are even more third party modules. Check them in the Python Package Index or PyPi. Or just do a Google search for a specific topic.

How to handle third-party dependencies is an ever-evolving topic with Python. This section merely covers the basics to help you wrap your brain around how it works.

The Module Search Path

sys.path is a directory that contains the list of all directories checked by the import statement. Look at it:

>>> import sys
>>> sys.path
... look at the result ...
>>>

If you import something and it’s not located in one of those directories, you will get an ImportError exception.

Standard Library Modules

Modules from Python’s standard library usually come from a location such as `/usr/local/lib/python3.6’. You can find out for certain by trying a short test:

>>> import re
>>> re
<module 're' from '/usr/local/lib/python3.6/re.py'>
>>>

Simply looking at a module in the REPL is a good debugging tip to know about. It will show you the location of the file.

Third-party Modules

Third party modules are usually located in a dedicated site-packages directory. You’ll see it if you perform the same steps as above:

>>> import numpy
<module 'numpy' from '/usr/local/lib/python3.6/site-packages/numpy/__init__.py'>
>>>

Again, looking at a module is a good debugging tip if you’re trying to figure out why something related to import isn’t working as expected.

Installing Modules

The most common technique for installing a third-party module is to use pip. For example:

bash % python3 -m pip install packagename

This command will download the package and install it in the site-packages directory.

Problems

  • You may be using an installation of Python that you don’t directly control.
    • A corporate approved installation
    • You’re using the Python version that comes with the OS.
  • You might not have permission to install global packages in the computer.
  • There might be other dependencies.

Virtual Environments

A common solution to package installation issues is to create a so-called “virtual environment” for yourself. Naturally, there is no “one way” to do this–in fact, there are several competing tools and techniques. However, if you are using a standard Python installation, you can try typing this:

bash % python -m venv mypython
bash %

After a few moments of waiting, you will have a new directory mypython that’s your own little Python install. Within that directory you’ll find a bin/ directory (Unix) or a Scripts/ directory (Windows). If you run the activate script found there, it will “activate” this version of Python, making it the default python command for the shell. For example:

bash % source mypython/bin/activate
(mypython) bash %

From here, you can now start installing Python packages for yourself. For example:

(mypython) bash % python -m pip install pandas
...

For the purposes of experimenting and trying out different packages, a virtual environment will usually work fine. If, on the other hand, you’re creating an application and it has specific package dependencies, that is a slightly different problem.

Handling Third-Party Dependencies in Your Application

If you have written an application and it has specific third-party dependencies, one challange concerns the creation and preservation of the environment that includes your code and the dependencies. Sadly, this has been an area of great confusion and frequent change over Python’s lifetime. It continues to evolve even now.

Rather than provide information that’s bound to be out of date soon, I refer you to the Python Packaging User Guide.

Exercises

Exercise 9.4 : Creating a Virtual Environment

See if you can recreate the steps of making a virtual environment and installing pandas into it as shown above.

Distribution

At some point you might want to give your code to someone else, possibly just a co-worker. This section gives the most basic technique of doing that. For more detailed information, you’ll need to consult the Python Packaging User Guide.

Creating a setup.py file

Add a setup.py file to the top-level of your project directory.

# setup.py
import setuptools

setuptools.setup(
    name="porty",
    version="0.0.1",
    author="Your Name",
    author_email="you@example.com",
    description="Practical Python Code",
    packages=setuptools.find_packages(),
)

Creating MANIFEST.in

If there are additional files associated with your project, specify them with a MANIFEST.in file. For example:

# MANIFEST.in
include *.csv

Put the MANIFEST.in file in the same directory as setup.py.

Creating a source distribution

To create a distribution of your code, use the setup.py file. For example:

bash % python setup.py sdist

This will create a .tar.gz or .zip file in the directory dist/. That file is something that you can now give away to others.

Installing your code

Others can install your Python code using pip in the same way that they do for other packages. They simply need to supply the file created in the previous step. For example:

bash % python -m pip install porty-0.0.1.tar.gz

Commentary

The steps above describe the absolute most minimal basics of creating a package of Python code that you can give to another person. In reality, it can be much more complicated depending on third-party dependencies, whether or not your application includes foreign code (i.e., C/C++), and so forth. Covering that is outside the scope of this course. We’ve only taken a tiny first step.

Exercises

Exercise 9.5: Make a package

Take the porty-app/ code you created for Exercise 9.3 and see if you can recreate the steps described here. Specifically, add a setup.py file and a MANIFEST.in file to the top-level directory. Create a source distribution file by running python setup.py sdist.

As a final step, see if you can install your package into a Python virtual environment.

The End!

You’ve made it to the end of the course. Thanks for your time and your attention. May your future Python hacking be fun and productive!

I’m always happy to get feedback. You can find me at https://dabeaz.com or on Twitter at [@dabeaz](https://twitter.com/dabeaz). - David Beazley.