Est. read time: 8 minutes | Last updated: November 07, 2024 by John Gentile


Contents

Language

Python is one of the most popular interpreted programming languages. An interpreted language is not directly compiled to a target machine code (e.x. x86 assembly), but rather a different program- the interpreter- reads and executes the code. This leads to use of Python- and other interpreted languages- as scripting languages, since they can be used to quickly cobble together commands in scripted fashion. However, unlike pure scripting languages (e.x. Shell scripting), Python has had some serious improvements to speed and scalability, which makes it perfectly viable for “production code”; one of the canonical examples is where YouTube was able to outpace Google Video in implementing features due to simply having only Python codebase. Sometimes the speed up in development is more valuable than the speedup in execution time.

Python is mainly split between the 2.X (older, now somewhat ‘de facto’ standard due to no more planned changes) and 3.X (the newer, forced Unicode generation of Python) and it’s important to note where compatibility breaks between the two.

Pros of the language:

  • Quality: Python has been designed to be readable and very maintainable with a smaller size / less amount of code for similar functions in other programming languages like C++ or Java.
  • Portability: For the most part, Python can be run on all supported platforms (Windows, Mac & Linux) with little to no code modification.
  • Libraries & Components: Python is so widespread that there is a huge amount of libraries supporting a vast array of functionality that is easy to implement. Easy compnent integration also allows Python to be flexible enough in calling other code libraries or frameworks.

Cons:

  • Really the main one is execution speed of Python’s intermediate “byte code” versus a fully compiled language like C.

The Basics

Python uses whitespace (tabs or spaces) to structure code, unlike other programming languages which uses braces (like C/C++). A colon (:) marks the start of an indented, logical code block- after which, indentation returns to its previous level. For instance, this for loop with with a conditional check:

# this is a comment
for i in my_list:
    if i < max_value:
        print("Not max")
    else:
        max_value = i

Objects & Datatypes

Everything is an object in Python.

Identifiers and Assignments

An assignment statement looks like:

temperature = 98.6

Where the identifier temperature is associated with the floating-point object, with value of 98.6. The semantics of Python identifiers is similar to a reference variable in Java or pointer in C/C++; an identifier is associated with the memory address of the object it refers to. Similarly to null references/pointers, a Python identifier can be assigned the special object None.

Python is a dynamically typed language; there is no type declaration associating an identifier/variable to a particular data type. An identifier can be associated with any object and can be later reassigned to another object of the same- or different- type. Objects have definite types, so in the above assignment statement, temperature is associated with an instance of the float class with the value 98.6.

An alias can be established when a second identifier is assigned to an existing object/identifier. In this case, either name can be used to access the underlying object, and if it supports behaviors that affect its state, changes enacted through one alias will be apparent when using the other alias. However, if one of the names is reassigned to a new value using an assignment statement, this does not affect the aliased object, but rather breaks the alias. For example:

>>> temperature = 98.6
>>> new_temp = temperature
>>> temperature += 5
>>> temperature
103.6
>>> new_temp
98.6

Built-In Classes

Class Description Immutable?
bool Boolean value, True and False. Numbers evaluate to False if zero (True if non-zero), and container types (strings, lists, etc.) evaluate to False if empty, True otherwise.
int Integer numeric type with arbitrary internal size (e.g. not 32-bit limited, Python chooses internal representation based on magnitude). Literals for binary, octal, decimal and hexadecimal representations are 0b1011, 0o56, -23, 0x5f respectively. Converting a floating-point value to int truncates similar to other languages (e.g. int(3.14) gives 3). Conversion to int from an invalid type (e.g. string) returns a ValueError
float Floating-point type, similar to double precision type in Java/C++. Literals can be expressed as 2.0, 3., or 5.123e22 (for value of 5.123×10225.123 \times 10^{22}). Python can convert floating-point strings to float with constructors like float('3.14')
list Stores a sequence of objects, similar to an “array” in other languages using [] delimiters. A list stores a sequence of references to its elements. Lists are mutable and can dynamically expand and contract their capacities. A list containing three strings can be shown as ['red', 'green', 'blue']  
tuple An immutable sequence class using () delimiters. Note a one-element tuple should be expressed as (12,) to not be confused with general parenthesis usage.
str Python string class to efficiently represent an immutable sequence of Unicode characters. Sting literals can use single or double quotes (e.g. 'hello' or "hello")
set A collection of elements without duplicates and without order, similar to the mathematical notion of a set, delimited with curly braces {}. Compared to a list, the major advantage is the internal representation of a hash table which efficiently can check if a specific element is contained in the set. A set does maintain elements in any order, and only immutable types can be added to a set.  
frozenset Immutable form of the set type, thus its legal to have a set of frozensets
dict A dictionary class or mapping, from a set of distinct keys to associated values. Python implements a dict similar to a set but with storage of associated values. Key:Value pairs are expressed with comma-separated pairs like {'ga' : 'Irish', 'de' : 'German'} to map 'ga' to 'Irish' and 'de' to 'German'.  

Exception Handling

Exceptions are unexpected events that happen during program execution. Exceptions (or errors) are objects raised (or thrown) by code that encounters an unexpected circumstance. A raised error may be caught by a surrounding context that “handles” the exception. If uncaught, an exception causes the interpreter to stop program execution.

Class Description
Exception A base class for most error types
AttributeError Raised by syntax obj.foo, if obj has no member named foo
EOFError Raised if “end of file” reached for console or file input
IOError Raised upon failure of I/O operation (e.g., opening file)
IndexError Raised if index to sequence is out of bounds
KeyError Raised if nonexistent key requested for set or dictionary
KeyboardInterrupt Raised if user types ctrl-C while program is executing
NameError Raised if nonexistent identifier used
StopIteration Raised by next(iterator) if no element; see Section 1.8
TypeError Raised when wrong type of parameter is sent to a function
ValueError Raised when parameter has invalid value (e.g., sqrt(−5))
ZeroDivisionError Raised when any division operator used with 0 as divisor

An exception is thrown with a raise statement. For instance, the sqrt (square root) function in Python’s math library performs the following error-checking before executing the square root math:

def sqrt(x):
    if not instance(x, (int, float)):
        raise TypeError('x must be numeric')
    elif x < 0:
        raise ValueError('x cannot be negative')
    # start doing sqrt() math now...

To catch an exception, there is the try-except control structure. For instance to handle a potential- however not highly likely path that would otherwise be handled with a more direct control structure- divide-by-zero error:

try:
    ratio = x / y
except ZeroDivisionError:
    # handle issue

Conventions

Common Import Naming

The Python community has adopted some common naming conventions for popular modules:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm

Performance

Since Python is an interpreted language, compiled languages like C/C++, Rust, Java, etc. will generally run faster in execution. In applications where the execution time of only a small portion of the code is critical, Python can be used as “glue code” to implement the majority of the functionality, while calling the performance-critical code using bindings or Foreign Function Interface (FFI). Nearly as fast, the series of improvements in just-in-time (JIT) compilers and libraries have become a way to get high performance while staying within Python. A popular JIT compiler is Numba which can generate vectorized, multithreaded code from Python.

Multithreading & the GIL

In general, Python can be challenging to build highly concurrent applications due to the Global Interpreter Lock (GIL) which prevents the interpreter from executing more than one Python instruction at a time.

Testing

pytest

pytest is a testing framework with easy usage; you need only write test functions that match def test_* and use the standard assert Python operator. For example, a simple Python file test_sample.py can contain the following:

# content of test_sample.py
def inc(x):
    return x + 1

def test_answer():
    assert inc(3) == 5

And the expected unit test failure can be found with pytest as:

$ pytest
=========================== test session starts ============================
platform linux -- Python 3.x.y, pytest-7.x.y, pluggy-1.x.y
rootdir: /home/sweet/project
collected 1 item

test_sample.py F                                                     [100%]

================================= FAILURES =================================
_______________________________ test_answer ________________________________

    def test_answer():
>       assert inc(3) == 5
E       assert 4 == 5
E        +  where 4 = inc(3)

test_sample.py:6: AssertionError
========================= short test summary info ==========================
FAILED test_sample.py::test_answer - assert 4 == 5
============================ 1 failed in 0.12s =============================

Other Testing Tools

  • tox automates standardized testing across Python platforms and environments.

Packaging

Virtual Environments venv

Python virtual environments- named venv- are used to isolate packages and tool versions. It’s suggested when working with a Python library that uses third party packages to use pip in a venv. Generally, the steps to create and activate a venv are:

  1. Create the environment (need only do once, usually at the root of a project repo): $ python3 -m venv .venv.
  2. Activate the venv: $ source .venv/bin/activate.
  3. Do Python things, like pip install required packages: $ python3 -m pip install -r requirements.txt.
  4. When done, deactivate the venv with simply: $ deactivate.

Packaging and Uploading Python Projects

See the tutorial on Packaging Python Projects and register an account on PyPI to publicly share your project. After setting up the repo structure and pyproject.toml, you can:

  • Build the distribution archive using:
    • $ python3 -m pip install --upgrade build
    • $ python3 -m build
  • Install locally and automatically track any edits to the underlying repo with:
    • $ python3 -m pip install --editable . (from within the repo top-level).
  • Uninstall package with $ python3 -m pip uninstall <package name> -y

If your pyproject.toml properly includes the package dependencies, calling pip install will also install the correct dependencies as well. You can also install optional dependencies by specifying the list expicitly, like pip install .[docs] (note no space between . and [key] for optional install).

Documenting Your Project

A great way to document your Python project, as well as docstrings, in Markdown is using MkDocs. This is an example on setting up a repo for MkDocs.

Other documentation frameworks which use reStructuredText (*.rst) format are:

Other Useful Tools

  • psf/black: uncompromising Python code formatter.
  • pyright: static type checker and linter for Python
  • mypy: Static Type checker
  • flake8: a python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
  • pylint: static code analyzer and linter.
  • ruff: extremely fast Python linter written in Rust

Basic Python Web Server

If you have some static web resources (e.g. HTML pages, etc.), you can quickly spin-up a web server in that directory to display them using Python(3):

$ python -m http.server <port_number>

Libraries and Distributions

SciPy

SciPy is the de facto Python library for computing, math, science and engineering. The ecosystem encompasses other large, popular, open-source libraries such as:

  • NumPy: multi-dimensional array processing package. Provides a fast and efficient multidimensional array object ndarray. Has many linear algebra and signal processing algorithms built-in, as well as a mature C APU to enable Python extensions and C/C++ to access NumPy’s data structures.
  • Matplotlib: a plotting & graphing library.
    • seaborn is a data visualization library, and high-level interface, based on matplotlib.
  • IPython: an interactive console approach to Python development, which also coincides with the Jupyter Project, which provides interactive computing notebooks.
    • Jupyter is a very powerful resource for data analysis, engineering, math and other disciplines. A wide variety of tools and plugins exist to make Jupyter notebooks similar to a full IDE (or similar to other full-fledged processing and graphing tools like MATLAB), and tools like nbconvert can be used to export Jupyter notebooks to other formats, such as LaTeX and PDF.
  • SymPy: performs symbolic math manipulations and computations. It can solve algebraic and differential equations, simplify expresions, apply trigonometric identities, differentiate, integrate, etc.
    • SageMath is a mathematics software system which integrates SymPy and other SciPy libraries in a complete system.
  • pandas: a library for data structures & analysis. pandas blends the array processing ideas of NumPy with the common data manipulation ideas found in spreadsheets and relational databases (e.g. SQL).

It’s recommended to install SciPy, and all of the associated packages, with pip since some distros still point to Python 2 repos. Or you could install Anaconda which can be installed on Mac, Windows or Linux and easily installs Python and all required libraries.

Jupyter Notebook Tips

Running Jupyter Notebook/Lab Remotely

Sometimes it’s advantageous to have Jupyter run on a remote machine that you can SSH into from a local machine. This can be accomplished- with only forwarding over SSH port- by:

  1. SSH into the remote box with port forwarding with $ ssh -L localhost:8889:localhost:8889 <remote IP address or hostname>
    • Add -Nf too ssh command if wanting to launch in another terminal window and immediately return.
  2. In SSH session, launch Jupyter lab headlessly on a specific port, like $ jupyter lab --no-browser --port=8889
  3. Open a browser window to http://localhost:8889/lab?token=<token URL string from Jupyter launch in SSH session> (or just click the link it outputs in the terminal)
    • Besides running notebooks remotely, this method allows opening & viewing of other remote files.
Jupyter Lab Plugins
  • Jupyter Lab is the next-generation web-based UI for Jupyter notebooks.
  • Since plaintext diffs of Jupyter notebooks are sometimes not very insightful, tools like nbdime are useful to better diff & merge notebooks within a Git repo.
    • ReviewNB is a service that can similarly help with reviews on public repos like GitHub
    • You can also use a git filter to remove Jupyter notebook cell output, and other unnecessary metadata (see https://stackoverflow.com/a/58004619 and https://stackoverflow.com/a/73218382). This allows git operations (e.g. when diff’ing or committing changes) to operate on the cleaned notebook (e.g. just code and Markdown cell changes) while allowing the local file copy to retain any current output cell state.
  • Use %matplotlib widget to render animated matplotlib plots
Exporting Jupyter Notebooks

Jupyter notebooks can be exported to many formats, like HTML, LaTeX and PDF. However, if you run into weird font errors in nbconvert, note that it may be due to leading and/or trailing spaces in inline math text; for instance $ \epsilon \gt 0 $ within a text block may cause an error and should be changed to $\epsilon \gt 0$.

Data Analysis

References

To Read