Formation: A Functional Middleware Infrastructure for Python

Posted on 2019-05-21

Formation is a generic functional middleware infrastructure for Python.

With Formation, you can build production-grade software — whether you want to build resilient, circuit-breaker infused and operable HTTP clients, HTTP services, or easily apply best practices and standards to other kinds of software to be flexible, composable, and maintainable.

Although not tied to a web service framework, it takes inspiration from Ruby’s Rack middleware, and Node’s connect. In the context of Python, you can think of it as a higher-level abstraction or “WSGI over anything”.

The Pipeline Abstraction and Functions

A graphics pipeline in digital cameras often helps perform these (photography connoisseurs may notice it’s not a complete set nor an accurate set of steps, but it helps make a point):

  1. White balancing
  2. Color optimization
  3. Sharpening
  4. Cropping

White balancing is the act of finding how much white is in a picture, so that the camera can optimize for what we care about — like skin tone, over what we don’t care about — like sun glare from the sea.

Before and after I shoot, I can select from several kinds of white balancing settings, regardless of the rest of the steps. I’m surgically changing one part, and all other parts including the pipeline itself keeps working.

The idea of the pipeline abstraction is to organize and execute a set of similarily-behaving and interchangeable building blocks. This means:

  1. With a successful design, we can place different building blocks into our pipeline, as well as borrow building blocks from others. There should be nothing specific to our pipeline design or anyone else’s that limits that kind of reuse.
  2. We can compose, order, and enable or disable the building blocks, and the pipeline should allow for it to be easy and unsurprising.
  3. It should be trivial to create new blocks from scratch, without unnecessary ceremonies.

Here’s a makeshift pipeline with a traditional object-oriented design:

blocks = [balance, optimize]
def pipeline(img):
    for block in blocks:
        img = block.execute(img)

And here’s an example block which goes into that pipeline:

class Balance:
    def execute(self, img):
        return img

Those who identify this interface realize that it’s actually an interface for a function!. Here’s a simpler pipeline and blocks:

def balance(img):
    pass
def optimize(img):
    pass

blocks = [balance, optimize]
def pipeline(img):
    for f in blocks:
        img = f(img)

What did we gain by moving away from a class (and by that, moving away from object-oriented design)? Well, we’ve got composability for free, because functions in programming, when pure, are much like functions in mathematics — composable:

f = x*3
g = x*10
k = f(g) = f(g(x))

Composing functions feel natural, and mechanical to code — boring code is great; it easily lends itself to abstraction and DRY.

And with that in mind — here we compose with toolz, a functional programming library for Python:

from toolz import compose

optimizing_balancer = compose(blance, optimize)

But just to drive a point — back to object-oriented design and the function class — How do we compose a class with an #execute method? Well, we resort to design patterns.

We break our first version of the execute interface by making each block accept another block, and we introduce state by having to store it:

class Balance:
    def __init__(other_block):
        self.other_block = other_block
    def execute():
        # logic for _actually_ composing things with other_block
        pass

Any way we look at it, or any way we try to make sense out of this — the sad result is that we patch an unnatural abstraction with more abstractions, until it makes sense.### Production Ready Services

Wrapping our request logic with concerns

Here are some common concerns for Web services(a representative set and by no means a covering one). You can go even further and claim these to be standard enforced on your company’s services.

  • Content caching and e-tags — before spilling content onto a TCP connection, digest it, cache it, or more generally perform a content-wise operation that helps the receiving end cache it and save bytes for other calls in the future.
  • Exception handling — when things blow up, we don’t want to send a stack trace or any parts of the internals of our service. We can always detect that, wrap it in something user-friendly, some troubleshooting advice, or retry advice.
  • Compression — it’s almost standard these days, with commodity computing power, to Gzip all content passing into and from services. In application servers like Apache and Nginx, this is a simple flag.
  • Throttling — It’s easy for a service to get overwhelmed these days, especially with the surge of serverless consumers with virtually endless scale and concurrency. What we do here is put inplace a module that blocks any consumer that have passed a certain threshold.
  • Header aesthetics_ — Security by obscurity, though not security, is still effective against certain kinds of attackers. If we don’t say we’re using Apache, an attacker might consider us not a good ROI and move to the next target. There are plenty of other “chatty” HTTP headers that reveal unnecessary information to any public consumer.
  • Circuit breaking — When a service is malfunctioning, it may be that it is down, or it operates in a degraded mode. To facilitate various levels of operative functions and fault tolerance, we use circuit breakers. There are some variants for building these, and most famously there is Hysterix, by Netflix.

My Point? It All Exists For Clients, Too

We’ve discussed a Web service, or an API. We can push all these requirements to the client side — the consumer — and you’d be amazed to realize, they all are valid there as well.

Historically, having Web services do all these hard work had facilitated a more shallow approach for clients — they don’t need to take care of those if the backend is.

With the move to microservices the weights have been shifting — since there’s a service mesh that’s has complex inter-communication between services to perform a given request, it is now arguably as important to facilitate these on the client side as well as the server side.

For example, if I clean my headers on my way out — I’m supporting better security. If I compress my content before it hits servers — I’m enabling faster, better traffic shapes. And lastly, if I avoid overwhelming a backend and stop short before I put bytes on the connection — I’m promoting a healthier backend server, of which the only overwhelming traffic event now becomes an exception rather than the norm.

Middleware: A Blueprint

These two stories bring us to our point: Middleware, functions, and client-side middleware. Here are a few examples for popular middleware libraries for web services and APIs:

Python’s WSGI

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    yield b'Hello, World\n'

Ruby’s Rack

app = Proc.new do |env|
    ['200', {'Content-Type' => 'text/html'}, ['A barebones rack app.']]
end

Node’s Connect

app.use(function(req, res){
  res.end('Hello from Connect!\n');
});

All these work in a backend environment, serving as request handlers in a way. They all take and serve — with different kinds of interfaces — the same triplet; status code, headers, and body. This simple design goes back as far as CGI.

If we look at Node’s Connect, there’s already a nice pile of middleware out there for you to choose from. Every middleware you use is code you don’t write in your handlers, and is a standard you can set to let others know they don’t need to write that code either.

Middleware in Clients

Let’s focus on HTTP clients for now. There’s already one popular example of a great HTTP client that supports middleware in Ruby: Faraday.

With Ruby’s elaborate DSL capabilities, Faraday lets you compose new clients with middleware — which turns these into advanced and robust HTTP clients, and it almost looks like prose:

Faraday.new(...) do |conn|
  # POST/PUT params encoders:
  conn.request :multipart
  conn.request :url_encoded
  # Last middleware must be the adapter:
  conn.adapter :net_http
end

There’s no such thing for Python — that is, until now.

Introducing: Formation

Formation is a generic middleware infrastructure for Python, as well as advanced HTTP client building infrastructure for Python — the same way Faraday makes all this possible in Ruby.

Here’s a quick Formation HTTP client:

@formation.client
class Google(object):
    base_uri = "https://google.com"
    middleware = [request_logger(structlog.getLogger())]
    response_as = html_response

    def search(self, text):
        return self.request.get("/", params=Query(text))

if __name__ == "__main__":
    google = Google()
    (xml, _code, _headers) = google.search("larry page")
    print(xml.xpath("//title/text()"))

And here’s a simple piece of code that we want to infuse with middleware magic, regardless of HTTP and clients:

from formation import wrap
from requests import get

def log(ctx, call):
    print("started")
    ctx = call(ctx)
    print("ended")
    return ctx

def timeit(ctx, call):
    started = now()
    ctx = call(ctx)
    ended = now() - started
    ctx['duration'] = ended
    return ctx

def to_requests(ctx):
    get(ctx['url'])

fancy_get = wrap(to_requests, middleware=[log, timeit])
fancy_get({'url':'https://google.com'})

Lastly, here’s how we build a new middleware, just like that:

def log(ctx, call):
    print("started")
    ctx = call(ctx)
    print("ended")
    return ctx

Middleware Should Be Easy

Middleware should be easy to add.

What’s easy? a pure function, with no base class, no imports, no specialized programming language knowledge, and no prior knowledge about the host framework.

Knowledge of a request — response interaction and the concept of context is needed; but that’s not a specialized form of knowlege in any way.

Here is one middleware:

def log(ctx, call):
    print("started")
    return call(ctx)

A simple function that takes a context — can be anything, just a plain dict; and the next function it’s supposed to call — or in the case it decides to break the flow — not call.

From this, we extract the entire middleware API in a single sentence:

A middleware is a plain function that takes the current context and next middleware and is responsible to return a new context.

Decorators

Formation is opinionated. The definition of “should be easy” rules out decorators; first, they are language-specific, and require prior knowledge, second, and more important they’re not flexible.

Functions, given that they are first-class citizens in a language, inherently lend themselves to composition and high-level operations over functions.

First Class Operability

Here’s a definition for Operability:

Operability is the ability to keep a piece of equipment, a system or a whole industrial installation in a safe and reliable functioning condition, according to pre-defined operational requirements.

In our more modest software world, I would define operability as:

The ability of an organization to move a piece of production software from a malfunctioning to a functioning state as quickly, reliably and precisely as possible, accompanied by effective and precise learning about the malfunction.

A couple things Formation middleware does to support operable software are:

  1. context -- the context middleware gather information about the running host that's needed to operate in case of malfunction. Information to answer questions like "from what Git version was this service deployed?", and "are these two threads running on the same process?"
context(
    namespace="service",
    scope="all",
    env="local",
    sha="dev",
    version="0.01",
    context_fn=get_context,
    getpid=os.getpid,
    gettid=thread.get_ident,
)
  1. logger -- a structured request loggers packs all information into structured logs. Read here about why structured logging promotes operable software much better than plain old text logs.
context_logger(
    structlog.getLogger()
)

And there’s more.

Transparent Resilience

Resilience in Formation comes in the form of two concepts:

  1. circuit_breaker -- we don't want to bombard a service that's down, and bearly getting up because it's overwhelmed.
def circuit_breaker(
    logger,
    name,
    fail_max=5,
    reset_timeout=60,
    state_storage=None,
    exclude=[]
)
  1. retry -- in the case of a failed request, transparently retry it, without bothering the userland code. With flaky network, you don't even notice Formation is doing the retries for you.
retry(
    max_retries=3
)

Built In Content Handling

Formation can create a client that only understand a certain content type, to ease on your programming experience. For example, if an API supports only JSON (a reasonable thing these days), you don’t need that content sniffing code to understand what is it that you’ve just got in the response body, nor must you have to deal with manually parsing text into JSON.

Here’s an example client, in which we explicitly choose json_response as what this client receives:

from formation.for_requests import client, json_response

@client
class HttpBin(object):
    response_as = json_response
    base_uri = "https://httpbin.org"
    middleware = [
        timeout(0.1),
        accept("application/json"),
        ua("the-fabricator/1.0.0"),
        request_logger(structlog.getLogger()),
    ]

Structured Queries For Developer Happiness

Formation understands attrs the most popular data-class-like library for Python. With Formation’s sister project attrs-serde it is possible to have fully structured queries in Formation clients, like so:

@serde
@attrs
class Query(object):
    query = attrib(metadata={"to": ["q"]})

@client
class Google(object):
    base_uri = "https://google.com"
    middleware = [request_logger(structlog.getLogger())]
    response_as = html_response

    def search(self, query):
        return self.request.get("/", params=query)

Now the Google client takes a Query object. We can create parameter hierarchies, taking out a common page and size parameters and applying to all queries and enjoy code completion and type-correction.

Summary

Formation is a generic functional middleware infrastructure for Python. With it, you compose a stack of middleware, each of which has a tiny API, a shared context, and an ability to cancel or proceed to the next middleware.

In many ways, it is similar to Ruby’s Rack middleware, and Node’s connect — just for anything and not just the server side. In the context of Python, it is a higher-level abstraction over WSGI. Formation is not Pythonic, and it doesn’t abide to the Zen of Python; where suitable it does away with these, and optimizes for developer happiness.

Feel free to look through Formation’s docs and have a go!