Recursive Types in Python

Explore Python's type hinting evolution: from forward references for complex recursive types to the innovative Self type for simpler, more precise code. Stay updated on Python's advancements and best practices with our insightful articles. Subscribe for more Python programming insights.

Recursive Types in Python
Navigating the intricate mazes of Python's recursive types—a programmer's odyssey.

Overview

In computer science, some of the more valuable data structures are recursive, meaning they are either defined in terms of themselves like lists or composed of different parts that refer to each other, like Abstract Syntax Trees.

Being dynamically typed, Python does not perform static type checking at runtime. However, the interpreter has a name resolution step requiring that any identifiers we use, including in Type Hints, are declared beforehand.

Python's type checkers, such as mypy, can implement a more complex process, including multiple passes. Unfortunately, programs with recursive types that type-check cleanly might result in a NameError at runtime.

This limitation makes it difficult to directly declare recursive or mutually recursive types in Python. This article will discuss two solutions, one generally applicable in all cases and one for particular types referencing themselves.


Forward References

In Python, identifiers have to be defined before use.; Python's name resolution will throw a NameError otherwise. This presents a challenge for using Type Hints in recursive data structures.

We could avoid adding Type Hints in such situations, limiting the usability of the type system. Fortunately, PEP 484 introduced Forward References to enable us to declare direct or mutually recursive types.

Forward references use strings instead of identifiers to specify type hints. When Python's interpreter encounters these strings, it skips the name resolution because they are strings, not names. This approach allows the use of types defined later in the code, circumventing the limitations imposed by Python's one-pass interpreter.

However, type checkers are not limited to a single pass and are aware of forward references. They treat these strings as regular type hints, using them for type-checking our programs.

Consider an application designed to evaluate mathematical expressions. We define two primary classes: Expression and Operator. The Expression class represents a single expression, which could be as simple as a numeric value or as complex as a nested operation. The Operator class encapsulates an operation (like addition or multiplication) and operates on multiple Expression objects.

However, there's a catch: each Operator contains Expression objects, and each Expression can encapsulate an Operator. This circular dependency creates a mutually recursive relationship, posing a challenge for Python's type system.

To resolve this, we use forward references. Here's how the classes might be defined:

Python ≥ 3.8

from typing import Union, List

class Value:
    def __init__(self, data: int):
        self.data = data

    def evaluate(self) -> int:
        return self.data

class Operation:
    def __init__(self, operands: List['Expression']):
        self.operands = operands

    def evaluate(self) -> int:
        # Placeholder for operation logic (e.g., addition, multiplication)
        # For simplicity, let's assume it's a sum of operands
        return sum(operand.evaluate() for operand in self.operands)

class Expression:
    def __init__(self, content: Union[Value, Operation]):
        self.content = content

    def evaluate(self) -> int:
        return self.content.evaluate()
        

Understanding the Self Type in Python

Traditionally, if a Python method within a class needed to return an instance of its class or accept parameters of the same class type, it required us to declare Type Variables (TypeVar).

This could complicate the code, making it harder to understand for those new to the language.

PEP 673 addresses this by providing a concise, straightforward way to express that a method or attribute is inherently tied to its class. It Introduced the Self Type annotation into Python, simplifying the tasks of writing Type Hints for recursive Data Structures.

We can use Self anywhere inside a class or a protocol where we need to refer to the current object type. This reduces the need to introduce type variables and type bounds on those variables.

Self always refers to the immediately enclosing class; it will reference the innermost class scope if used within a nested class.

Let's understand Self Types using a recursive data structure known as Tries. A Trie (or Prefix Tree) is a tree-like data structure used to store a dynamic set of strings efficiently. In a Trie, each node represents a single character of a string, and a path from the root to a node represents a prefix of strings stored in the Trie.

In our TrieNode class example, each instance of TrieNode represents a single character and potentially leads to further characters. The children dictionary in each TrieNode instance maps characters to their subsequent TrieNode instances:

Python ≥ 3.8

from typing import Dict, Optional, Self

class TrieNode:
    def __init__(self):
        self.children: Dict[str, TrieNode] = {}
        self.is_end_of_word: bool = False

    def insert(self, word: str) -> Self:
        node = self
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True
        return self

    def count_words_with_prefix(self, prefix: str) -> int:
        node = self._find_node(prefix)
        if not node:
            return 0
        return self._count_words(node)

    def _count_words(self, node: 'TrieNode') -> int:
        count = 1 if node.is_end_of_word else 0
        for child in node.children.values():
            count += self._count_words(child)
        return count

    def _find_node(self, string: str) -> Optional[Self]:
        node = self
        for char in string:
            if char not in node.children:
                return None
            node = node.children[char]
        return node
        

Breakdown of the TrieNode Operations

Inserting a Word:

  • When inserting a word, the insert method starts at the root node (an instance of TrieNode).
  • For each character in the word, the method checks if the character already exists as a key in the children dictionary of the current node.
  • If the character is absent, it creates a new TrieNode and adds it to children.
  • It then moves to this new node and continues with the next character.
  • When the end of the word is reached, it marks the current node as the end of a word (is_end_of_word = True).

Searching for a Word:

  • The search method also starts at the root.
  • It traverses the Trie according to the characters in the word being searched.
  • If, at any point, the required character is not found in the children of the current node, it returns False.
  • If all characters are located, and the final node is marked as is_end_of_word, it returns True.

Checking for a Prefix:

  • The starts_with method is similar to search but doesn't require the final node to be marked as is_end_of_word.
  • It only checks if the path for the prefix exists in the Trie.

Counting Occurrences of a Prefix:

  • The method begins by finding the TrieNode corresponding to the last character of the given prefix. This is accomplished using the _find_node helper method.
  • _find_node traverses the Trie following the characters of the prefix. If any character in the prefix is not found, it returns None, indicating that no words start with that prefix.
  • Once the node corresponding to the last character of the prefix is found, the _count_words method is called on this node.
  • _count_words is a recursive method that traverses all the child nodes starting from the given node.
  • For each node it visits, it checks if the node represents the end of a word (is_end_of_word is True). If so, it increments the count.
  • The method then recursively calls itself for each of the current node's children, adding up the counts from each subtree.
  • The total count of words that start with the given prefix is returned.

In this implementation, the insert method adds a word to the Trie, while count_words_with_prefix counts the number of words starting with a given prefix. The Self type simplifies the representation of the TrieNode's methods, making the code more readable and maintainable.

Using Self in recursive data structures like Tries enhances code readability and a self-explanatory nature. It immediately clarifies that methods return an instance of their class or operate on parameters of their class type. This not only improves the clarity of the code but also maintains consistency in type annotations, making the codebase more maintainable and easier to understand.

Using the Trie

The true power of this Trie implementation is showcased in scenarios like counting the number of words that start with a particular prefix, a common requirement in applications like autocomplete systems or search engines:

Python ≥ 3.8

words = ["apple", "app", "apricot", "banana", "berry", "blueberry", "blackberry"]
trie = TrieNode()
for word in words:
    trie.insert(word)

# Counting words that start with 'app'
prefix = "app"
count = trie.count_words_with_prefix(prefix)
print(f"There are {count} words that start with '{prefix}'")
        

In this example, we efficiently insert a set of words into our Trie and then demonstrate how to count the number of words starting with a specific prefix. The Trie makes these operations highly efficient, as it reduces the search space with each character in the prefix, and the recursive count_words method enables quick counting of qualifying words.

A snake with a Python logo emerges from a digital landscape, representing the depth of recursive programming concepts.
n the matrix of recursion, Python strikes with precision.

Practical Applications of the Self-Type in Python

Self Type is advantageous in scenarios where a method in a class needs to refer to the class itself. Below are several practical applications where the Self Type can be particularly beneficial:

  1. Fluent Interfaces: In fluent interfaces, methods often return an instance of the class to allow for method chaining. Using Self as the return type clarifies that these methods are part of a fluent interface design pattern.
  2. Recursive Data Structures: Data structures like linked lists, trees, and graphs often have methods that return instances of the same class. The Self type simplifies the type hinting in these cases, making the recursive nature of these structures more apparent.
  3. Factory Methods: Factory methods within a class that return a new instance can use the Self type to indicate their purpose. This is particularly useful in classes designed to be extended or inherited.
  4. Copy or Clone Methods: Methods intended to create a copy or clone of the current instance can use Self to show that they return a new instance of the same class.
  5. Builders and Constructors: In builder patterns or alternative constructors, using Self can denote methods that build or construct instances of the class.
  6. Type Checking in Dynamic Environments: In dynamic environments where the exact class type might vary due to inheritance or extension, Self can provide a more accurate and flexible type hint than directly using the class name.
  7. Polymorphic Methods: In polymorphic class hierarchies, where methods may return instances of the base class or any of its subclasses, Self can be used to maintain generality and flexibility in the type hints.
  8. Method Overriding in Inheritance: In cases of method overriding in class hierarchies, Self can be used in the return type to ensure that the overridden methods in subclasses return instances of the subclass.

Conclusion

Python's journey in enhancing its type hinting capabilities has been remarkable. With the introduction of forward references, Python made it possible to declare complex recursive type hints, a significant leap in handling intricate data structures. Building upon this, introducing Self types represents a further development in Python's type system. Self-types streamline the process of writing type hints for simple recursive types, making the code more readable and intuitive for developers.

If you're interested in modern Python and passionate about staying updated with the latest in programming, don't forget to subscribe.


Addendum: A Special Note for Our Readers

I decided to delay the introduction of subscriptions, you can read the full story here.

In the meantime, I decided to accept donations.

If you can afford it, please consider donating:

Every donation helps me offset the running costs of the site and an unexpected tax bill. Any amount is greatly appreciated.

Also, if you are looking to buy some Swag, please visit I invite you to visit the TuringTacoTales Store on Redbubble.

Take a look, maybe you can find something you like: