The pleasure of writing Clean OOP code /s

Jun 30, 2024

Here's the task:

You have an endpoint, and you are given a_id and b_id, you have to associate in the DB a with b.
And in addition you have to associate a.c with b.
a can only have one b associated with it, and if it already does, then it shouldn't be reassigned, should just return success.
c can have multiple bs associated with it, although of course each one only once.
You have to treat the whole thing as a transaction, and only save changes once you've done all changes successfully.

Now the above is almost the pseudocode, but let's do it in proper pseudocode, AKA Python that kinda looks like Django but not quite:

def assign_b_to_a_and_c(a_id: str, b_id: str):
    try:
        a = A.objects.get(id=a_id)
    except A.DoesNotExist:
        return 404

    # Already done, skip
    if "b_id" in a and a.b_id is not None:
        return 200

    # Just make sure it exists
    try:
        B.objects.get(id=b_id)
    except B.DoesNotExist:
        return 404

    # Associate A with B
    a.b_id = b_id

    try:
        c = C.objects.get(id=a.c_id)
    except C.DoesNotExist:
        return 404

    # Associate C with B
    if "b_ids" in c and c.b_ids is not None:
        c.b_ids += b_id
    else:
        c.b_ids = [b_id]

    with transaction.atomic():
        a.save()
        c.save()

I'd say it's rather straightforward, easy to follow and to understand.

This is what I like to call simple code, at least as simple as it can be in Python, ignoring the kinda ugly try/except syntax and so on.

I can read it top to bottom, it fits in one screen in this case, and I know what's happening at any given moment, no surprises.

Code review: We need to follow the architecture rules.

OK.

Let's add some OOP to this, because actually while here we do it, we do in fact need to do these same operations in other parts of the code, so let's DRY a little.

# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
    # ... same as before ...

    # Associate A with B
    a.assign_b(b_id)

    # ... same as before ...

    # Associate C with B
    c.add_b(b_id)

    # ... same as before ...

# models.py
class A(models.Model):
    # ...
    def assign_b(self, b_id: str):
        # Oops some duplication, whatever
        if "b_id" in self and self.b_id is not None:
            return
        self.b_id = b_id

class C(models.Model):
    # ...
    def add_b(self, b_id: str):
        if "b_ids" in self and self.b_ids is not None:
            self.b_ids += b_id
        else:
            self.b_ids = [b_id]

Hmm... suddenly I can't read the code in one go. And unless I'm familiar with what a.assign_b() and c.add_b() do, which as a first-time reader I hope are named correctly, I have to jump a file or two to figure out what's happening.

No biggie, this is normal.

Let's go a bit further to follow the proper architecture rules.

Every time we assign a label, we actually want to save the file, says someone. So when we call a.assign_b() we are going to also save. Reasonable statement, especially if in most cases this is what we intend to do.

# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
    # ... same as before ...

    # Associate A with B
    a.assign_b(b_id)

    # ... same as before ...

    # Associate C with B
    c.add_b(b_id)

    # ... same as before ...

# models.py
class A(models.Model):
    # ...
    def assign_b(self, b_id: str):
        # Oops some duplication, whatever
        if "b_id" in self and self.b_id is not None:
            return
        self.b_id = b_id
        self.save()

Those of you that are following along will realize, this breaks one of the requirements: this was meant to run as a transaction.

During PR review somebody realizes this and requests that the programmer fixes it.

So he does.

# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
    try:
        a = A.objects.get(id=a_id)
    except A.DoesNotExist:
        return 404

    # Already done, skip
    if "b_id" in a and a.b_id is not None:
        return 200

    # Just make sure it exists
    try:
        B.objects.get(id=b_id)
    except B.DoesNotExist:
        return 404

    try:
        c = C.objects.get(id=a.c_id)
    except C.DoesNotExist:
        return 404

    # Associate C with B
    if "b_ids" in c and c.b_ids is not None:
        c.b_ids += b_id
    else:
        c.b_ids = [b_id]

    with transaction.atomic():
        a.assign_b(b_id)
        c.save()

Well obviously that's ugly so let's change c.add_b() as well:

# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
    # ... same as before ...
    with transaction.atomic():
        a.assign_b(b_id)
        c.add_b(b_id)

# models.py
# ...
class C(models.Model):
    # ...
    def add_b(self, b_id: str):
        if "b_ids" in self and self.b_ids is not None:
            self.b_ids += b_id
        else:
            self.b_ids = [b_id]
        self.save()

OK we're back to a normal scenario, and now things are transactional again.

This is how our code looks right now:

# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
    try:
        a = A.objects.get(id=a_id)
    except A.DoesNotExist:
        return 404

    # Already done, skip
    if "b_id" in a and a.b_id is not None:
        return 200

    # Just make sure it exists
    try:
        B.objects.get(id=b_id)
    except B.DoesNotExist:
        return 404

    try:
        c = C.objects.get(id=a.c_id)
    except C.DoesNotExist:
        return 404

    with transaction.atomic():
        a.assign_b(b_id)
        c.add_b(b_id)

# models.py
class A(models.Model):
    # ...
    def assign_b(self, b_id: str):
        # Oops some duplication, whatever
        if "b_id" in self and self.b_id is not None:
            return
        self.b_id = b_id
        self.save()

class C(models.Model):
    # ...
    def add_b(self, b_id: str):
        if "b_ids" in self and self.b_ids is not None:
            self.b_ids += b_id
        else:
            self.b_ids = [b_id]
        self.save()

Beautiful.

The amount of code increased slightly.

The amount of complexity hasn't reduced.

But now we're more properly encapsulated, y'know?

Technically Python doesn't have private methods and private classes, but in this way we at least let the model control the logic of how it expects to work and how it expects its logic to be modified.

It's true that now the code is harder to read, you have to jump around and just know that the self.save() will happen inside of these methods. But again, encapsulation is a clear win in this case.

Now, this code is a lie.

It's not segregated enough so it cannot be.

A and C are actually in two different Domains inside of our Onion Architecture (AKA Hexagonal Architecture, or Clean Architecture, all every similar). And the way you communicate between these layers in an Event-driven Architecture is of course by events!

So here's what we need to do.

Add events
Separate these models into two different Django apps
Pass information between domains using a special "integration event"
Save while handling the event, as part of handling the event

Actually this is where my brilliant Python code breaks down because Python doesn't even allow this, because circular dependency graphs are not possible in Python due to execution order.

But for the sake of argument, so you can see how understandable and easy to read and maintain this code is, I give you some theoretical Python code.

BTW I have really worked on codebases this brilliant.

# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
    try:
        a = A.objects.get(id=a_id)
    except A.DoesNotExist:
        return 404

    # Already done, skip
    if "b_id" in a and a.b_id is not None:
        return 200

    # Just make sure it exists
    try:
        B.objects.get(id=b_id)
    except B.DoesNotExist:
        return 404

    # Wow this code is so simple and minimal!
    with transaction.atomic():
        a.assign_b(b_id)
        a.save()

# A/events.py
class AUpdatedEvent():
    a: A

    def __init__(self, a: A):
        self.a = A


class AUpdatedIntegrationEvent():
    a: A

    def __init__(self, a: A):
        self.a = A

# A/handlers.py
class AUpdatedEventHandler():
    def handle(self, event: AUpdatedEvent):
        # Imagine this function exists
        push_to_async_event_bus(AUpdatedIntegrationEvent(event))

# A/models.py
class A(models.Model):
    # ...
    def assign_b(self, b_id: str):
        if "b_id" in self and self.b_id is not None:
            return
        self.b_id = b_id
        self.add_domain_event(AUpdatedEvent(self))

# C/handlers
class AUpdatedIntegrationEventBus():
    def handle(self, event: AUpdatedIntegrationEvent):
        try:
            c = C.objects.get(id=a.c_id)
        except C.DoesNotExist:
            return 404
        c.add_b(a.b_id)
        c.save()

# C/models.py
class C(models.Model):
    # ...
    def add_b(self, b_id: str):
        if "b_ids" in self and self.b_ids is not None:
            self.b_ids += b_id
        else:
            self.b_ids = [b_id]

If you can't follow along with this easy to follow code, I'm sorry but, skill issues.

Of course we now violated the idea of simple, and of transactions, and all these things that are actually very useful to us. But in exchange: it's more maintainable and it's segregated and decentralized!

I hope you've realized that this article is a criticism of this kind of code, not a love letter.

This kind of code can only be produced when you've forsaken how the computer actually works (procedurally), and you're enamoured with the idea that code must be "Clean" (capital C, Uncle Bob), OOP, SOLID and so on.

That performance is for the hardware to handle, not for the engineer to handle.

And you've adopted the idea that somehow accidental complexity is simpler to understand and more maintainable than simplicity.

Some people will genuinely argue that the final version is more maintainable.

Now let's get to the actual point of this article.

First, in case you're wondering, this really happened to me, recently even. In the end the transaction was thrown out the window. I'm sure that won't cause any issues ever.

But I don't want anyone to get distracted by the code.

Yes the code these ideologies produce is hard to follow.

But there are good ideologue programmers that will produce good code. But it will be despite the ideology, not because of it.

There are good and bad programmers anywhere and everywhere.

But these ideologies encourage this kind of indirect and hard to follow code.

They make the programmer's job harder, and the computer's.

Don't think I will leave you just on the negatives!

If you want an alternative, I recommend you slowly load yourself up on the following:

CppCon 2014: Mike Acton "Data-Oriented Design and C++"
Semantic Compression post by Casey Muratori
Casey Muratori "Getting rid of the OOP mindset"
Brian Will's "Object-Oriented Programming is Bad"
The Software's Primary Technical Imperative: Managing Complexity concept from the influential paper No Silver Bullets: Essence and Accidents of Software Engineering and then expanded by the Out of the Tar Pit paper.
- The primary concept is that there is essential complexity, and accidental complexity. Programmers are responsible for accidental complexity (OOP, "Clean" Code, etc. etc.), reduce it as much as possible.

If this post is your first introduction to this idea, welcome!

The name of the idea is Data-Oriented Design as described by Mike Acton, not by Stoyan Nikolov who has a totally different concept.

Love it or hate it, hello to you too.

I sincerely hope some day the software engineering craft can come to think of this as common sense, instead of the excessive accidental complexity we consider normal nowadays.

P.S. if you want to keep the OOP and Clean mindset, then I highly encourage you to at least read Code Complete 2, it is a much more useful resource on the subject of how to actually program, compared to Clean Code.