I make life easier, that is to say I've been writing software for 10+ years. Eschew hype; focus on delivery and performance.
Living in Switzerland 🇨🇭 since 2017.
I make life easier, that is to say I've been writing software for 10+ years. Eschew hype; focus on delivery and performance.
Living in Switzerland 🇨🇭 since 2017.
Version 1 of this post.
Problem definition:
Let's work through that step by step.
Note 1: Imagine that ¶ is \n
every time I mention it, like an actual newline in the text buffer. And imagine that § is also \n
but it was inserted purely for soft wrapping, it's not actually part of the original string buffer, but it is part of the string buffer handed to Raylib.
Note 2: We'll only be tackling monospaced text.
Note 3: All code shared in this post, meaning only the code on this post, not the actual source code, is licensed under CC0 for the purpose of this post. Similarly all graphics are licensed under CC0.
Note 4: I will not go into incredibly detailed explanations of code. You can read it yourself and figure it out yourself. The code itself is not as interesting as the problem/solution definition.
Note 5: I work in runes/characters for this code, not bytes.
Note 6: No AI was used to write this or figure this out. AI doesn't know Odin, first of all, and second it's a fun exercise. And I can't copyright it if AI writes it.
It is a data structure that is popular for text editors. See Resources section at bottom.
Its main features include:
That's about it.
Its characteristic design is that you keep track of two buffers, the "original" and the "add" buffer, and you keep track of which part of the buffer comes when using "pieces", and you just walk through the pieces, grabbing from the orig/add buffer as appropriate, the piece's length at the piece's offset.
I'm using it cause I'm a cool boy.
The Piece Table is not the main focus of this post however, the rest of it is.
Because the algorithm for putting together the Piece Table's content is actually rather straightforward:
pieces_builder := strings.builder_make()
for piece in table^.pieces {
if piece.length == 0 {
continue
}
content := strings.cut(piece.is_orig ? table.orig : table.add_str, piece.start, piece.length)
strings.write_string(&pieces_builder, content)
}
content := strings.to_string(pieces_builder)
As you can see that'd build just a flat, straightforward string.
Lorem ip¶sum
OK, next problem.
OK good we have our content: string
now, fantastic. And its value is:
Lorem ip¶sum
Making that display correctly in Raylib is actually rather straightforward. Just draw it. Raylib handles newlines for us.
I'll show you some extra setup code to get your own font in there too (only for TTF fonts):
font_size :: 24
// Need relative path to font from where Odin build command is being run
font := rl.LoadFontEx("resources/UbuntuMono-R.ttf", cast(i32)font_size, nil, 0)
defer rl.UnloadFont(font)
rl.SetTextureFilter(font.texture, .BILINEAR)
rl.SetTextLineSpacing(cast(i32)font_size)
char_spacing :: 1
char_measurement := rl.MeasureTextEx(font, "a", cast(f32)font_size, cast(f32)char_spacing)
char_height := char_measurement.y
char_width := char_measurement.x
// ...
rl.BeginDrawing()
rl.DrawTextEx(font, strings.clone_to_cstring(content), rl.Vector2{0, 0}, cast(f32)char_height, cast(f32)char_spacing, rl.DARKGRAY)
rl.EndDrawing()
The issue is that it overflows. It doesn't wrap. And we haven't implemented scrolling yet so it's even worse, you can't even read the text.
So that's the issue.
Now for the rest of this article I'm going to assume that you want to force soft wrapping. Of course if you don't want to then you have to do your own logic to enable/disable it, and if you want to do hard wrapping then your problem is less complex as well.
And here we go, this is the juicy part.
Now let's talk about the constraints of the solution.
The soft wrapping must be done through newlines in the string you render. This is the main constraint.
Why?
Because I don't want to call Raylib's draw 10x times for 10 lines, I want to let Raylib optimize that part out, so I just call it once and let it do the rest. Move logic out of the drawing process.
Let's be Data-Oriented.
If you don't understand the data, you don't understand the problem. Conversely, you understand the problem by understanding the data. If you have different data, you have a different problem. ― Excerpt from Mike Acton at CppCon 2014 "Data-Oriented Design and C++"
I was reminded of this quote repeatedly throughout the R&D process of what I will describe to you now, and I wish I was severely more strict about it.
Lorem ipsum, as put together by the above Piece Table algorithm.
Lorem ipsum dolor sit amet,¶consectetur adipiscing elit.
Please note, I don't insert a newline on purpose. We must think of this as a contiguous string.
When we draw the above through Raylib, we'll get:
And there's one more input, the cursor_offset
. Where is the cursor placed, on the whole contiguous string, as an offset.
So let's take the above Lor¶em
and say the cursor_offset
is 2
, what do we want to be rendered?
If we were to render the cursor taking up the whole space it's actually taking up, we'd use a full block and it'd look as follows:
Because of course the cursor_offset = 2
refers to the rune r
in this string.
cursor_offset = 3
cursor_offset = 4
Patience. Setting the scene up.
OK let's take a longer contiguous string as an example.
But this is too long for our screen. Our screen can only fit 4 characters at a time.
window_width :: 400
max_runes_per_line := int(window_width / (char_width + char_spacing)) - 1
// max_runes_per_line := 4
Then we need our program to produce a string like the following:
So Raylib can render the following, and it fits on the screen:
(Again note that ¶ is for real line breaks, and § is for line breaks we've inserted for the purpose of soft wrapping. In the resulting string buffer they're both \n
so that Raylib can render it properly.)
cursor_offset
interact with soft wrapping?Effectively it should be as if the § didn't exist.
So the cursor rendering should be as follows:
cursor_offset = 3
cursor_offset = 5
(notice ¶ are still real)
cursor_offset = 8
cursor_offset = 9
In other words the soft wrap doesn't affect the data, it only affects the rendering algorithm.
There's actually 3-4 distinct algorithms we need to make this work.
cursor_offset
on the real content, to a cursor_offset
on the soft-wrapped contentcursor_offset
on the soft-wrapped content to coordinates on the screenYou can combine 1-2 if you're clever, maybe you'll even get better performance. I'm not clever enough, but I tried once.
So let's get started.
We want this:
To turn into this:
That's the goal.
max_runes_per_line := 3
wrapped_builder := strings.builder_make()
for line, i in strings.split_lines(content) {
remaining := line
parts := [dynamic]string{}
for {
if len(remaining) == 0 {
break
}
cut := strings.cut(remaining, 0, max_runes_per_line)
append(&parts, cut)
remaining = strings.cut(remaining, utf8.rune_count(cut))
}
strings.write_string(&wrapped_builder, strings.join(parts[:], "\n"))
strings.write_string(&wrapped_builder, "\n")
}
wrapped_content := strings.to_string(wrapped_builder)
In short we split the content into lines and loop through them. We keep track of the remaining string, which is just whatever hasn't yet been consumed from the line by the algorithm, and we create a parts dynamic array which later we join with \n
(the wrap line breaks). We add an extra \n
to account for the line break we split into lines with to begin with.
If you wanted to wrap by word break, not just a hard break at the nth rune, then you can add the following to the algorithm:
// after this line:
cut := strings.cut(remaining, 0, max_runes_per_line)
// If after cutting we are at the limit, we try to find a word
// break that might be more suitable for the wrap.
if utf8.rune_count(cut) >= max_runes_per_line {
word_break_i := 0
#reverse for r, i in cut {
if r == ' ' {
word_break_i = i
break
}
}
// If we do find it, we take it.
if word_break_i > 0 {
cut = strings.cut(cut, 0, word_break_i)
}
}
**Note: my code is with the above word wrapping, which means some of the algorithm choices WILL be affected, primarily on the last two algorithms to figure out where to draw the cursor. The drawings and my explanations don't include or think about the word wrapping.
So you could integrate this into the algorithm above somehow, but I just wasn't able to do it very well, so I split it into its own algorithm.
Essentially what we want is to produce 4 distinct arrays with this algorithm. The names are what they are.
real_offsets
and all_offsets
Both can be calculated naively by just counting line breaks in a given string.
indexes :: proc(str: string, char: rune) -> []int {
offsets := [dynamic]int{}
for r, i in str {
if r == char {
append(&offsets, i)
}
}
return offsets[:]
}
// ...
real_offsets := indexes(content, '\n')
all_offsets := indexes(wrapped_content, '\n')
wrap_offsets
and real_offsets_with_wrap
The algorithm itself is actually quite simple/naive.
wrap_offsets := [dynamic]int{}
real_offsets_with_wrap := [dynamic]int{}
current_real_offset := 0
wraps_count := 0
for offset in all_offsets {
// Finished
if current_real_offset >= len(real_offsets) {
// TODO doesn't calculate § after last ¶
break
}
// Offset minus wrap offsets we've run across = real offset
// If we're at a real offset
if offset - wraps_count == real_offsets[current_real_offset] {
// ..then it's a real line
append(&real_offsets_with_wrap, offset)
current_real_offset += 1
} else {
// ..otherwise it's a wrap line
append(&wrap_offsets, offset)
wraps_count += 1
}
}
In short, put in human terms, we loop through all the offsets, and if we figure out it's a § then we wraps_count++
, if we figure out it's a ¶ then we look for the next ¶. As we find each, we add them to wrap_offsets
or real_offsets_with_wrap
.
Loop 1:
Loop 2:
Loop 3:
Psych! I haven't programmed this yet. But probably I'd just add all the remainders as §.
cursor_offset
from before to after soft wrappingFrom this:
To this:
How do we do it?
// Find line the cursor is on
lb_offset := -1
for offset, i in real_offsets {
// Find next one (easier check)
if offset > cursor_offset {
if i > 0 {
// Get thus previous one (current one)
lb_offset = i-1
}
break
}
}
lb := lb_offset == -1 ? 0 : real_offsets[lb_offset]
adjusted_lb := lb_offset == -1 ? 0 : real_offsets_with_wrap[lb_offset]
// Now we just need to find how many line breaks have been thus far, and add those to the adjusted_cursor_offset
adjusted_cursor_offset := lb_offset == -1 ? cursor_offset : cursor_offset + (adjusted_lb - lb)
// Cool, now we need to find out if there are any wrap line breaks in between the adjusted_lb and the adjusted_cursor_offset
for wrap_lb, i in table.table_to_draw.wrap_offsets {
// If it's in between adjusted_lb <-> adjusted_cursor_offset
if wrap_lb > adjusted_lb && wrap_lb < adjusted_cursor_offset {
adjusted_cursor_offset += 1
} else if wrap_lb >= adjusted_cursor_offset {
break
}
}
In short:
cursor_offset
has encountered.cursor_offset
(now adjusted_cursor_offset
)adjusted_cursor_offset
, are there any other §? ++ adjusted_cursor_offset
per each one we findDunkin' easy. (Only took me 16 hours to realize that the solution was this simple and implement it.)
adjusted_cursor_offset
to screen coordinatesNote again: the screen coordinates are monospaced.
So we want to go from this:
To this:
The code is actually rather simple, I discovered this simpler version in the middle of the development process, but then had to rediscover the correct solution and approach to all of the previous steps.
for offset, i in all_offsets {
// Find next one (easier check)
if offset > adjusted_cursor_offset {
if i == 0 {
line_offset = 0
char_offset = cursor_offset
} else if i > 0 {
offset := all_offsets[i-1]
line_offset = i // (i-1, the one we want) + 1
char_offset = adjusted_cursor_offset - offset - 1
}
// -1 means that the cursor is "on" a § or a ¶
if char_offset == -1 {
line_offset -= 1
prev_offset := i == 1 ? -1 : all_offsets[i-2]
char_offset = adjusted_cursor_offset - prev_offset - 1
}
break
}
}
return line_offset, char_offset // y and x coordinate offsets
I find the code rather self-explanatory, but the key points are:
all_offsets
, because for drawing you don't care if it's § or ¶, you just care that it's a line break.cursor_offset
or adjusted_cursor_offset
are the cursor's offset on a long contiguous string, which means that to know the X coordinate we just need to deduct the offset of the last § or ¶ in the content, from the cursor's offset, and we get the X coordinate.- 1
it because I don't know why, and honestly I don't care at this point. Not gonna lie to you.And so you get something like this https://x.com/textisenough/status/1848429885464711278
Beautiful.
This doesn't cover some cases very well. Off the top of my head I can think of:
This was a lot of frustrating fun. It was literally 16 hours of work to write and rewrite this code while figuring it out. Maybe longer.
Had to write and rewrite primarily because I didn't properly define the problem before I tackled it, unlike how I did in this post.
This is still not ideal or final implementation. It's actually missing stuff, and it hasn't been refined for user experience, but hopefully this serves as the heavy lifting for anyone out there figuring this stuff out on their own.
Piece Table:
Images were all created with Excalidraw, using the Excalidraw extension for Obsidian.