Tutorial:PracticalPython/5 Object model

From HandWiki


Inner Workings of Python Objects

This section covers some of the inner workings of Python objects. Programmers coming from other programming languages often find Python’s notion of classes lacking in features. For example, there is no notion of access-control (e.g., private, protected), the whole self argument feels weird, and frankly, working with objects sometimes feel like a “free for all.” Maybe that’s true, but we’ll find out how it all works as well as some common programming idioms to better encapsulate the internals of objects.

It’s not necessary to worry about the inner details to be productive. However, most Python coders have a basic awareness of how classes work. So, that’s why we’re covering it.


Dictionaries Revisited

The Python object system is largely based on an implementation involving dictionaries. This section discusses that.

Dictionaries, Revisited

Remember that a dictionary is a collection of names values.

stock = {
    'name' : 'GOOG',
    'shares' : 100,
    'price' : 490.1
}

Dictionaries are commonly used for simple data structures. However, they are used for critical parts of the interpreter and may be the most important type of data in Python.

Dicts and Modules

Within a module, a dictionary holds all of the global variables and functions.

# foo.py

x = 42
def bar():
    ...

def spam():
    ...

If you inspect foo.__dict__ or globals(), you’ll see the dictionary.

{
    'x' : 42,
    'bar' : <function bar>,
    'spam' : <function spam>
}

Dicts and Objects

User defined objects also use dictionaries for both instance data and classes. In fact, the entire object system is mostly an extra layer that’s put on top of dictionaries.

A dictionary holds the instance data, __dict__.

>>> s = Stock('GOOG', 100, 490.1)
>>> s.__dict__
{'name' : 'GOOG','shares' : 100, 'price': 490.1 }

You populate this dict (and instance) when assigning to self.

class Stock:
    def __init__(self, name, shares, price):
        self.name = name
        self.shares = shares
        self.price = price

The instance data, self.__dict__, looks like this:

{
    'name': 'GOOG',
    'shares': 100,
    'price': 490.1
}

Each instance gets its own private dictionary.

s = Stock('GOOG', 100, 490.1)     # {'name' : 'GOOG','shares' : 100, 'price': 490.1 }
t = Stock('AAPL', 50, 123.45)     # {'name' : 'AAPL','shares' : 50, 'price': 123.45 }

If you created 100 instances of some class, there are 100 dictionaries sitting around holding data.

Class Members

A separate dictionary also holds the methods.

class Stock:
    def __init__(self, name, shares, price):
        self.name = name
        self.shares = shares
        self.price = price

    def cost(self):
        return self.shares * self.price

    def sell(self, nshares):
        self.shares -= nshares

The dictionary is in Stock.__dict__.

{
    'cost': <function>,
    'sell': <function>,
    '__init__': <function>
}

Instances and Classes

Instances and classes are linked together. The __class__ attribute refers back to the class.

>>> s = Stock('GOOG', 100, 490.1)
>>> s.__dict__
{ 'name': 'GOOG', 'shares': 100, 'price': 490.1 }
>>> s.__class__
<class '__main__.Stock'>
>>>

The instance dictionary holds data unique to each instance, whereas the class dictionary holds data collectively shared by all instances.

Attribute Access

When you work with objects, you access data and methods using the . operator.

x = obj.name          # Getting
obj.name = value      # Setting
del obj.name          # Deleting

These operations are directly tied to the dictionaries sitting underneath the covers.

Modifying Instances

Operations that modify an object update the underlying dictionary.

>>> s = Stock('GOOG', 100, 490.1)
>>> s.__dict__
{ 'name':'GOOG', 'shares': 100, 'price': 490.1 }
>>> s.shares = 50       # Setting
>>> s.date = '6/7/2007' # Setting
>>> s.__dict__
{ 'name': 'GOOG', 'shares': 50, 'price': 490.1, 'date': '6/7/2007' }
>>> del s.shares        # Deleting
>>> s.__dict__
{ 'name': 'GOOG', 'price': 490.1, 'date': '6/7/2007' }
>>>

Reading Attributes

Suppose you read an attribute on an instance.

x = obj.name

The attribute may exist in two places:

  • Local instance dictionary.
  • Class dictionary.

Both dictionaries must be checked. First, check in local __dict__. If not found, look in __dict__ of class through __class__.

>>> s = Stock(...)
>>> s.name
'GOOG'
>>> s.cost()
49010.0
>>>

This lookup scheme is how the members of a class get shared by all instances.

How inheritance works

Classes may inherit from other classes.

class A(B, C):
    ...

The base classes are stored in a tuple in each class.

>>> A.__bases__
(<class '__main__.B'>, <class '__main__.C'>)
>>>

This provides a link to parent classes.

Reading Attributes with Inheritance

Logically, the process of finding an attribute is as follows. First, check in local __dict__. If not found, look in __dict__ of the class. If not found in class, look in the base classes through __bases__. However, there are some subtle aspects of this discussed next.

Reading Attributes with Single Inheritance

In inheritance hierarchies, attributes are found by walking up the inheritance tree in order.

class A: pass
class B(A): pass
class C(A): pass
class D(B): pass
class E(D): pass

With single inheritance, there is single path to the top. You stop with the first match.

Method Resolution Order or MRO

Python precomputes an inheritance chain and stores it in the MRO attribute on the class. You can view it.

>>> E.__mro__
(<class '__main__.E'>, <class '__main__.D'>,
 <class '__main__.B'>, <class '__main__.A'>,
 <type 'object'>)
>>>

This chain is called the Method Resolution Order. The find an attribute, Python walks the MRO in order. The first match wins.

MRO in Multiple Inheritance

With multiple inheritance, there is no single path to the top. Let’s take a look at an example.

class A: pass
class B: pass
class C(A, B): pass
class D(B): pass
class E(C, D): pass

What happens when you access an attribute?

e = E()
e.attr

A attribute search process is carried out, but what is the order? That’s a problem.

Python uses cooperative multiple inheritance which obeys some rules about class ordering.

  • Children are always checked before parents
  • Parents (if multiple) are always checked in the order listed.

The MRO is computed by sorting all of the classes in a hierarchy according to those rules.

>>> E.__mro__
(
  <class 'E'>,
  <class 'C'>,
  <class 'A'>,
  <class 'D'>,
  <class 'B'>,
  <class 'object'>)
>>>

The underlying algorithm is called the “C3 Linearization Algorithm.” The precise details aren’t important as long as you remember that a class hierarchy obeys the same ordering rules you might follow if your house was on fire and you had to evacuate–children first, followed by parents.

An Odd Code Reuse (Involving Multiple Inheritance)

Consider two completely unrelated objects:

class Dog:
    def noise(self):
        return 'Bark'

    def chase(self):
        return 'Chasing!'

class LoudDog(Dog):
    def noise(self):
        # Code commonality with LoudBike (below)
        return super().noise().upper()

And

class Bike:
    def noise(self):
        return 'On Your Left'

    def pedal(self):
        return 'Pedaling!'

class LoudBike(Bike):
    def noise(self):
        # Code commonality with LoudDog (above)
        return super().noise().upper()

There is a code commonality in the implementation of LoudDog.noise() and LoudBike.noise(). In fact, the code is exactly the same. Naturally, code like that is bound to attract software engineers.

The “Mixin” Pattern

The Mixin pattern is a class with a fragment of code.

class Loud:
    def noise(self):
        return super().noise().upper()

This class is not usable in isolation. It mixes with other classes via inheritance.

class LoudDog(Loud, Dog):
    pass

class LoudBike(Loud, Bike):
    pass

Miraculously, loudness was now implemented just once and reused in two completely unrelated classes. This sort of trick is one of the primary uses of multiple inheritance in Python.

Why super()

Always use super() when overriding methods.

class Loud:
    def noise(self):
        return super().noise().upper()

super() delegates to the next class on the MRO.

The tricky bit is that you don’t know what it is. You especially don’t know what it is if multiple inheritance is being used.

Some Cautions

Multiple inheritance is a powerful tool. Remember that with power comes responsibility. Frameworks / libraries sometimes use it for advanced features involving composition of components. Now, forget that you saw that.

Exercises

In Section 4, you defined a class Stock that represented a holding of stock. In this exercise, we will use that class. Restart the interpreter and make a few instances:

>>> ================================ RESTART ================================
>>> from stock import Stock
>>> goog = Stock('GOOG',100,490.10)
>>> ibm  = Stock('IBM',50, 91.23)
>>>

Exercise 5.1: Representation of Instances

At the interactive shell, inspect the underlying dictionaries of the two instances you created:

>>> goog.__dict__
... look at the output ...
>>> ibm.__dict__
... look at the output ...
>>>

Exercise 5.2: Modification of Instance Data

Try setting a new attribute on one of the above instances:

>>> goog.date = '6/11/2007'
>>> goog.__dict__
... look at output ...
>>> ibm.__dict__
... look at output ...
>>>

In the above output, you’ll notice that the goog instance has a attribute date whereas the ibm instance does not. It is important to note that Python really doesn’t place any restrictions on attributes. For example, the attributes of an instance are not limited to those set up in the __init__() method.

Instead of setting an attribute, try placing a new value directly into the __dict__ object:

>>> goog.__dict__['time'] = '9:45am'
>>> goog.time
'9:45am'
>>>

Here, you really notice the fact that an instance is just a layer on top of a dictionary. Note: it should be emphasized that direct manipulation of the dictionary is uncommon–you should always write your code to use the (.) syntax.

Exercise 5.3: The role of classes

The definitions that make up a class definition are shared by all instances of that class. Notice, that all instances have a link back to their associated class:

>>> goog.__class__
... look at output ...
>>> ibm.__class__
... look at output ...
>>>

Try calling a method on the instances:

>>> goog.cost()
49010.0
>>> ibm.cost()
4561.5
>>>

Notice that the name ‘cost’ is not defined in either goog.__dict__ or ibm.__dict__. Instead, it is being supplied by the class dictionary. Try this:

>>> Stock.__dict__['cost']
... look at output ...
>>>

Try calling the cost() method directly through the dictionary:

>>> Stock.__dict__['cost'](goog)
49010.0
>>> Stock.__dict__['cost'](ibm)
4561.5
>>>

Notice how you are calling the function defined in the class definition and how the self argument gets the instance.

Try adding a new attribute to the Stock class:

>>> Stock.foo = 42
>>>

Notice how this new attribute now shows up on all of the instances:

>>> goog.foo
42
>>> ibm.foo
42
>>>

However, notice that it is not part of the instance dictionary:

>>> goog.__dict__
... look at output and notice there is no 'foo' attribute ...
>>>

The reason you can access the foo attribute on instances is that Python always checks the class dictionary if it can’t find something on the instance itself.

Note: This part of the exercise illustrates something known as a class variable. Suppose, for instance, you have a class like this:

class Foo(object):
     a = 13                  # Class variable
     def __init__(self,b):
         self.b = b          # Instance variable

In this class, the variable a, assigned in the body of the class itself, is a “class variable.” It is shared by all of the instances that get created. For example:

>>> f = Foo(10)
>>> g = Foo(20)
>>> f.a          # Inspect the class variable (same for both instances)
13
>>> g.a
13
>>> f.b          # Inspect the instance variable (differs)
10
>>> g.b
20
>>> Foo.a = 42   # Change the value of the class variable
>>> f.a
42
>>> g.a
42
>>>

Exercise 5.4: Bound methods

A subtle feature of Python is that invoking a method actually involves two steps and something known as a bound method. For example:

>>> s = goog.sell
>>> s
<bound method Stock.sell of Stock('GOOG', 100, 490.1)>
>>> s(25)
>>> goog.shares
75
>>>

Bound methods actually contain all of the pieces needed to call a method. For instance, they keep a record of the function implementing the method:

>>> s.__func__
<function sell at 0x10049af50>
>>>

This is the same value as found in the Stock dictionary.

>>> Stock.__dict__['sell']
<function sell at 0x10049af50>
>>>

Bound methods also record the instance, which is the self argument.

>>> s.__self__
Stock('GOOG',75,490.1)
>>>

When you invoke the function using () all of the pieces come together. For example, calling s(25) actually does this:

>>> s.__func__(s.__self__, 25)    # Same as s(25)
>>> goog.shares
50
>>>

Exercise 5.5: Inheritance

Make a new class that inherits from Stock.

>>> class NewStock(Stock):
        def yow(self):
            print('Yow!')

>>> n = NewStock('ACME', 50, 123.45)
>>> n.cost()
6172.50
>>> n.yow()
Yow!
>>>

Inheritance is implemented by extending the search process for attributes. The __bases__ attribute has a tuple of the immediate parents:

>>> NewStock.__bases__
(<class 'stock.Stock'>,)
>>>

The __mro__ attribute has a tuple of all parents, in the order that they will be searched for attributes.

>>> NewStock.__mro__
(<class '__main__.NewStock'>, <class 'stock.Stock'>, <class 'object'>)
>>>

Here’s how the cost() method of instance n above would be found:

>>> for cls in n.__class__.__mro__:
        if 'cost' in cls.__dict__:
            break

>>> cls
<class '__main__.Stock'>
>>> cls.__dict__['cost']
<function cost at 0x101aed598>
>>>

Classes and Encapsulation

When writing classes, it is common to try and encapsulate internal details. This section introduces a few Python programming idioms for this including private variables and properties.

Public vs Private.

One of the primary roles of a class is to encapsulate data and internal implementation details of an object. However, a class also defines a public interface that the outside world is supposed to use to manipulate the object. This distinction between implementation details and the public interface is important.

A Problem

In Python, almost everything about classes and objects is open.

  • You can easily inspect object internals.
  • You can change things at will.
  • There is no strong notion of access-control (i.e., private class members)

That is an issue when you are trying to isolate details of the internal implementation.

Python Encapsulation

Python relies on programming conventions to indicate the intended use of something. These conventions are based on naming. There is a general attitude that it is up to the programmer to observe the rules as opposed to having the language enforce them.

Private Attributes

Any attribute name with leading _ is considered to be private.

class Person(object):
    def __init__(self, name):
        self._name = 0

As mentioned earlier, this is only a programming style. You can still access and change it.

>>> p = Person('Guido')
>>> p._name
'Guido'
>>> p._name = 'Dave'
>>>

As a general rule, any name with a leading _ is considered internal implementation whether it’s a variable, a function, or a module name. If you find yourself using such names directly, you’re probably doing something wrong. Look for higher level functionality.

Simple Attributes

Consider the following class.

class Stock:
    def __init__(self, name, shares, price):
        self.name = name
        self.shares = shares
        self.price = price

A surprising feature is that you can set the attributes to any value at all:

>>> s = Stock('IBM', 50, 91.1)
>>> s.shares = 100
>>> s.shares = "hundred"
>>> s.shares = [1, 0, 0]
>>>

You might look at that and think you want some extra checks.

s.shares = '50'     # Raise a TypeError, this is a string

How would you do it?

Managed Attributes

One approach: introduce accessor methods.

class Stock:
    def __init__(self, name, shares, price):
        self.name = name self.set_shares(shares) self.price = price

    # Function that layers the "get" operation
    def get_shares(self):
        return self._shares

    # Function that layers the "set" operation
    def set_shares(self, value):
        if not isinstance(value, int):
            raise TypeError('Expected an int')
        self._shares = value

Too bad that this breaks all of our existing code. s.shares = 50 becomes s.set_shares(50)

Properties

There is an alternative approach to the previous pattern.

class Stock:
    def __init__(self, name, shares, price):
        self.name = name
        self.shares = shares
        self.price = price

    @property
    def shares(self):
        return self._shares

    @shares.setter
    def shares(self, value):
        if not isinstance(value, int):
            raise TypeError('Expected int')
        self._shares = value

Normal attribute access now triggers the getter and setter methods under @property and @shares.setter.

>>> s = Stock('IBM', 50, 91.1)
>>> s.shares         # Triggers @property
50
>>> s.shares = 75    # Triggers @shares.setter
>>>

With this pattern, there are no changes needed to the source code. The new setter is also called when there is an assignment within the class, including inside the __init__() method.

class Stock:
    def __init__(self, name, shares, price):
        ...
        # This assignment calls the setter below
        self.shares = shares
        ...

    ...
    @shares.setter
    def shares(self, value):
        if not isinstance(value, int):
            raise TypeError('Expected int')
        self._shares = value

There is often a confusion between a property and the use of private names. Although a property internally uses a private name like _shares, the rest of the class (not the property) can continue to use a name like shares.

Properties are also useful for computed data attributes.

class Stock:
    def __init__(self, name, shares, price):
        self.name = name
        self.shares = shares
        self.price = price

    @property
    def cost(self):
        return self.shares * self.price
    ...

This allows you to drop the extra parantheses, hiding the fact that it’s actually a method:

>>> s = Stock('GOOG', 100, 490.1)
>>> s.shares # Instance variable
100
>>> s.cost   # Computed Value
49010.0
>>>

Uniform access

The last example shows how to put a more uniform interface on an object. If you don’t do this, an object might be confusing to use:

>>> s = Stock('GOOG', 100, 490.1)
>>> a = s.cost() # Method
49010.0
>>> b = s.shares # Data attribute
100
>>>

Why is the () required for the cost, but not for the shares? A property can fix this.

Decorator Syntax

The @ syntax is known as *decoration". It specifies a modifier that’s applied to the function definition that immediately follows.

...
@property
def cost(self):
    return self.shares * self.price

More details are given in [[../07_Advanced_Topics/00_Overview|Section 7]].

__slots__ Attribute

You can restrict the set of attributes names.

class Stock:
    __slots__ = ('name','_shares','price')
    def __init__(self, name, shares, price):
        self.name = name
        ...

It will raise an error for other attributes.

>>> s.price = 385.15
>>> s.prices = 410.2
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'Stock' object has no attribute 'prices'

Although this prevents errors and restricts usage of objects, it’s actually used for performance and makes Python use memory more efficiently.

Final Comments on Encapsulation

Don’t go overboard with private attributes, properties, slots, etc. They serve a specific purpose and you may see them when reading other Python code. However, they are not necessary for most day-to-day coding.

Exercises

Exercise 5.6: Simple Properties

Properties are a useful way to add “computed attributes” to an object. In stock.py, you created an object Stock. Notice that on your object there is a slight inconsistency in how different kinds of data are extracted:

>>> from stock import Stock
>>> s = Stock('GOOG', 100, 490.1)
>>> s.shares
100
>>> s.price
490.1
>>> s.cost()
49010.0
>>>

Specifically, notice how you have to add the extra () to cost because it is a method.

You can get rid of the extra () on cost() if you turn it into a property. Take your Stock class and modify it so that the cost calculation works like this:

>>> ================================ RESTART ================================
>>> from stock import Stock
>>> s = Stock('GOOG', 100, 490.1)
>>> s.cost
49010.0
>>>

Try calling s.cost() as a function and observe that it doesn’t work now that cost has been defined as a property.

>>> s.cost()
... fails ...
>>>

Making this change will likely break your earlier pcost.py program. You might need to go back and get rid of the () on the cost() method.

Exercise 5.7: Properties and Setters

Modify the shares attribute so that the value is stored in a private attribute and that a pair of property functions are used to ensure that it is always set to an integer value. Here is an example of the expected behavior:

>>> ================================ RESTART ================================
>>> from stock import Stock
>>> s = Stock('GOOG',100,490.10)
>>> s.shares = 50
>>> s.shares = 'a lot'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected an integer
>>>

Exercise 5.8: Adding slots

Modify the Stock class so that it has a __slots__ attribute. Then, verify that new attributes can’t be added:

>>> ================================ RESTART ================================
>>> from stock import Stock
>>> s = Stock('GOOG', 100, 490.10)
>>> s.name
'GOOG'
>>> s.blah = 42
... see what happens ...
>>>

When you use __slots__, Python uses a more efficient internal representation of objects. What happens if you try to inspect the underlying dictionary of s above?

>>> s.__dict__
... see what happens ...
>>>

It should be noted that __slots__ is most commonly used as an optimization on classes that serve as data structures. Using slots will make such programs use far-less memory and run a bit faster. You should probably avoid __slots__ on most other classes however.