Senior software engineer with 9+ years of experience. Eschew hype; focus on delivery and performance.
Living in Switzerland 🇨🇠since 2017.
Senior software engineer with 9+ years of experience. Eschew hype; focus on delivery and performance.
Living in Switzerland 🇨🇠since 2017.
Here's the task:
a_id
and b_id
, you have to associate in the DB a
with b
.a.c
with b
.a
can only have one b
associated with it, and if it already does, then it shouldn't be reassigned, should just return success.c
can have multiple b
s associated with it, although of course each one only once.Now the above is almost the pseudocode, but let's do it in proper pseudocode, AKA Python that kinda looks like Django but not quite:
def assign_b_to_a_and_c(a_id: str, b_id: str):
try:
a = A.objects.get(id=a_id)
except A.DoesNotExist:
return 404
# Already done, skip
if "b_id" in a and a.b_id is not None:
return 200
# Just make sure it exists
try:
B.objects.get(id=b_id)
except B.DoesNotExist:
return 404
# Associate A with B
a.b_id = b_id
try:
c = C.objects.get(id=a.c_id)
except C.DoesNotExist:
return 404
# Associate C with B
if "b_ids" in c and c.b_ids is not None:
c.b_ids += b_id
else:
c.b_ids = [b_id]
with transaction.atomic():
a.save()
c.save()
I'd say it's rather straightforward, easy to follow and to understand.
This is what I like to call simple code, at least as simple as it can be in Python, ignoring the kinda ugly try
/except
syntax and so on.
I can read it top to bottom, it fits in one screen in this case, and I know what's happening at any given moment, no surprises.
Code review: We need to follow the architecture rules.
OK.
Let's add some OOP to this, because actually while here we do it, we do in fact need to do these same operations in other parts of the code, so let's DRY a little.
# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
# ... same as before ...
# Associate A with B
a.assign_b(b_id)
# ... same as before ...
# Associate C with B
c.add_b(b_id)
# ... same as before ...
# models.py
class A(models.Model):
# ...
def assign_b(self, b_id: str):
# Oops some duplication, whatever
if "b_id" in self and self.b_id is not None:
return
self.b_id = b_id
class C(models.Model):
# ...
def add_b(self, b_id: str):
if "b_ids" in self and self.b_ids is not None:
self.b_ids += b_id
else:
self.b_ids = [b_id]
Hmm... suddenly I can't read the code in one go. And unless I'm familiar with what a.assign_b()
and c.add_b()
do, which as a first-time reader I hope are named correctly, I have to jump a file or two to figure out what's happening.
No biggie, this is normal.
Let's go a bit further to follow the proper architecture rules.
Every time we assign a label, we actually want to save the file, says someone. So when we call a.assign_b()
we are going to also save. Reasonable statement, especially if in most cases this is what we intend to do.
# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
# ... same as before ...
# Associate A with B
a.assign_b(b_id)
# ... same as before ...
# Associate C with B
c.add_b(b_id)
# ... same as before ...
# models.py
class A(models.Model):
# ...
def assign_b(self, b_id: str):
# Oops some duplication, whatever
if "b_id" in self and self.b_id is not None:
return
self.b_id = b_id
self.save()
Those of you that are following along will realize, this breaks one of the requirements: this was meant to run as a transaction.
During PR review somebody realizes this and requests that the programmer fixes it.
So he does.
# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
try:
a = A.objects.get(id=a_id)
except A.DoesNotExist:
return 404
# Already done, skip
if "b_id" in a and a.b_id is not None:
return 200
# Just make sure it exists
try:
B.objects.get(id=b_id)
except B.DoesNotExist:
return 404
try:
c = C.objects.get(id=a.c_id)
except C.DoesNotExist:
return 404
# Associate C with B
if "b_ids" in c and c.b_ids is not None:
c.b_ids += b_id
else:
c.b_ids = [b_id]
with transaction.atomic():
a.assign_b(b_id)
c.save()
Well obviously that's ugly so let's change c.add_b()
as well:
# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
# ... same as before ...
with transaction.atomic():
a.assign_b(b_id)
c.add_b(b_id)
# models.py
# ...
class C(models.Model):
# ...
def add_b(self, b_id: str):
if "b_ids" in self and self.b_ids is not None:
self.b_ids += b_id
else:
self.b_ids = [b_id]
self.save()
OK we're back to a normal scenario, and now things are transactional again.
This is how our code looks right now:
# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
try:
a = A.objects.get(id=a_id)
except A.DoesNotExist:
return 404
# Already done, skip
if "b_id" in a and a.b_id is not None:
return 200
# Just make sure it exists
try:
B.objects.get(id=b_id)
except B.DoesNotExist:
return 404
try:
c = C.objects.get(id=a.c_id)
except C.DoesNotExist:
return 404
with transaction.atomic():
a.assign_b(b_id)
c.add_b(b_id)
# models.py
class A(models.Model):
# ...
def assign_b(self, b_id: str):
# Oops some duplication, whatever
if "b_id" in self and self.b_id is not None:
return
self.b_id = b_id
self.save()
class C(models.Model):
# ...
def add_b(self, b_id: str):
if "b_ids" in self and self.b_ids is not None:
self.b_ids += b_id
else:
self.b_ids = [b_id]
self.save()
Beautiful.
The amount of code increased slightly.
The amount of complexity hasn't reduced.
But now we're more properly encapsulated, y'know?
Technically Python doesn't have private methods and private classes, but in this way we at least let the model control the logic of how it expects to work and how it expects its logic to be modified.
It's true that now the code is harder to read, you have to jump around and just know that the self.save()
will happen inside of these methods. But again, encapsulation is a clear win in this case.
Now, this code is a lie.
It's not segregated enough so it cannot be.
A
and C
are actually in two different Domains inside of our Onion Architecture (AKA Hexagonal Architecture, or Clean Architecture, all every similar). And the way you communicate between these layers in an Event-driven Architecture is of course by events!
So here's what we need to do.
Actually this is where my brilliant Python code breaks down because Python doesn't even allow this, because circular dependency graphs are not possible in Python due to execution order.
But for the sake of argument, so you can see how understandable and easy to read and maintain this code is, I give you some theoretical Python code.
BTW I have really worked on codebases this brilliant.
# views.py
def assign_b_to_a_and_c(a_id: str, b_id: str):
try: a = A.objects.get(id=a_id)
except A.DoesNotExist:
return 404
# Already done, skip
if "b_id" in a and a.b_id is not None:
return 200
# Just make sure it exists
try:
B.objects.get(id=b_id)
except B.DoesNotExist:
return 404
# Wow this code is so simple and minimal!
with transaction.atomic():
a.assign_b(b_id)
a.save()
# A/events.py
class AUpdatedEvent():
a: A
def __init__(self, a: A):
self.a = A
class AUpdatedIntegrationEvent():
a: A
def __init__(self, a: A):
self.a = A
# A/handlers.py
class AUpdatedEventHandler():
def handle(self, event: AUpdatedEvent):
# Imagine this function exists
push_to_async_event_bus(AUpdatedIntegrationEvent(event))
# A/models.py
class A(models.Model):
# ...
def assign_b(self, b_id: str):
if "b_id" in self and self.b_id is not None:
return
self.b_id = b_id
self.add_domain_event(AUpdatedEvent(self))
# C/handlers
class AUpdatedIntegrationEventBus():
def handle(self, event: AUpdatedIntegrationEvent):
try:
c = C.objects.get(id=a.c_id)
except C.DoesNotExist:
return 404
c.add_b(a.b_id)
c.save()
# C/models.py
class C(models.Model):
# ...
def add_b(self, b_id: str):
if "b_ids" in self and self.b_ids is not None:
self.b_ids += b_id
else:
self.b_ids = [b_id]
If you can't follow along with this easy to follow code, I'm sorry but, skill issues.
Of course we now violated the idea of simple, and of transactions, and all these things that are actually very useful to us. But in exchange: it's more maintainable and it's segregated and decentralized!
I hope you've realized that this article is a criticism of this kind of code, not a love letter.
This kind of code can only be produced when you've forsaken how the computer actually works (procedurally), and you're enamoured with the idea that code must be "Clean" (capital C, Uncle Bob), OOP, SOLID and so on.
That performance is for the hardware to handle, not for the engineer to handle.
And you've adopted the idea that somehow complexity is simpler to understand and more maintainable than simplicity.
Some people will genuinely argue that the final version is more maintainable.
Now let's get to the actual point of this article.
First, in case you're wondering, this really happened to me, recently even. In the end the transaction was thrown out the window. I'm sure that won't cause any issues ever.
But I don't want anyone to get distracted by the code.
Yes the code these ideologies produce is hard to follow.
But there are good ideologue programmers that will produce good code. But it will be despite the ideology, not because of it.
There are good and bad programmers anywhere and everywhere.
But these ideologies encourage this kind of code indirect and hard to follow code.
They make the programmer's job harder, and the computer's.
Don't think I will leave you just on the negatives!
If you want an alternative, I recommend you slowly load yourself up on the following:
If this post is your first introduction to this idea, welcome!
The name of the idea is Data-Oriented Design as described by Mike Acton, not by Stoyan Nikolov who has a totally different concept.
Love it or hate it, hello to you too.
I sincerely hope some day the software engineering craft can come to think of this as common sense, instead of the excessive accidental complexity we consider normal nowadays.
P.S. if you want to keep the OOP and Clean mindset, then I highly encourage you to at least read Code Complete 2, it is a much more useful resource on the subject of how to actually program, compared to Clean Code.