Soft-wrapping a Piece Table to draw with Raylib, in Odin

Oct 23, 2024

Version 1 of this post.

Problem definition:

We have a Piece Table
We need to draw its text on the screen
- We need to soft wrap it
- We use Raylib to draw our text

Let's work through that step by step.

Note 1: Imagine that ¶ is \n every time I mention it, like an actual newline in the text buffer. And imagine that § is also \n but it was inserted purely for soft wrapping, it's not actually part of the original string buffer, but it is part of the string buffer handed to Raylib.

Note 2: We'll only be tackling monospaced text.

Note 3: All code shared in this post, meaning only the code on this post, not the actual source code, is licensed under CC0 for the purpose of this post. Similarly all graphics are licensed under CC0.

Note 4: I will not go into incredibly detailed explanations of code. You can read it yourself and figure it out yourself. The code itself is not as interesting as the problem/solution definition.

Note 5: I work in runes/characters for this code, not bytes.

Note 6: No AI was used to write this or figure this out. AI doesn't know Odin, first of all, and second it's a fun exercise. And I can't copyright it if AI writes it.

What is a Piece Table

It is a data structure that is popular for text editors. See Resources section at bottom.

Its main features include:

Its memory usage is not much larger than the file its storing
It is not difficult to insert text in multiple locations at once
Inserting or removing text is rather cheap and does not require rearranging entire blocks of memory, unlike e.g. a cursor gap where you have to rearrange things just because the cursor moved

That's about it.

Its characteristic design is that you keep track of two buffers, the "original" and the "add" buffer, and you keep track of which part of the buffer comes when using "pieces", and you just walk through the pieces, grabbing from the orig/add buffer as appropriate, the piece's length at the piece's offset.

I'm using it cause I'm a cool boy.

The Piece Table is not the main focus of this post however, the rest of it is.

Why we mention the Piece Table

Because the algorithm for putting together the Piece Table's content is actually rather straightforward:

pieces_builder := strings.builder_make()
for piece in table^.pieces {
	if piece.length == 0 {
		continue
	}
	content := strings.cut(piece.is_orig ? table.orig : table.add_str, piece.start, piece.length)
	strings.write_string(&pieces_builder, content)
}
content := strings.to_string(pieces_builder)

As you can see that'd build just a flat, straightforward string.

Lorem ip¶sum

OK, next problem.

We need to draw the Piece Table's text on the screen

OK good we have our content: string now, fantastic. And its value is:

Lorem ip¶sum

Making that display correctly in Raylib is actually rather straightforward. Just draw it. Raylib handles newlines for us.

I'll show you some extra setup code to get your own font in there too (only for TTF fonts):

font_size :: 24
// Need relative path to font from where Odin build command is being run
font := rl.LoadFontEx("resources/UbuntuMono-R.ttf", cast(i32)font_size, nil, 0)
defer rl.UnloadFont(font)
rl.SetTextureFilter(font.texture, .BILINEAR)
rl.SetTextLineSpacing(cast(i32)font_size)

char_spacing :: 1
char_measurement := rl.MeasureTextEx(font, "a", cast(f32)font_size, cast(f32)char_spacing)
char_height := char_measurement.y
char_width := char_measurement.x

// ...

rl.BeginDrawing()

rl.DrawTextEx(font, strings.clone_to_cstring(content), rl.Vector2{0, 0}, cast(f32)char_height, cast(f32)char_spacing, rl.DARKGRAY)

rl.EndDrawing()

The problem is it overflows

The issue is that it overflows. It doesn't wrap. And we haven't implemented scrolling yet so it's even worse, you can't even read the text.

So that's the issue.

Now for the rest of this article I'm going to assume that you want to force soft wrapping. Of course if you don't want to then you have to do your own logic to enable/disable it, and if you want to do hard wrapping then your problem is less complex as well.

Soft wrapping constraints and details

And here we go, this is the juicy part.

Now let's talk about the constraints of the solution.

The soft wrapping must be done through newlines in the string you render. This is the main constraint.

Why?

Because I don't want to call Raylib's draw 10x times for 10 lines, I want to let Raylib optimize that part out, so I just call it once and let it do the rest. Move logic out of the drawing process.

Understanding our data

Let's be Data-Oriented.

If you don't understand the data, you don't understand the problem. Conversely, you understand the problem by understanding the data. If you have different data, you have a different problem. ― Excerpt from Mike Acton at CppCon 2014 "Data-Oriented Design and C++"

I was reminded of this quote repeatedly throughout the R&D process of what I will describe to you now, and I wish I was severely more strict about it.

What is our input?

Lorem ipsum, as put together by the above Piece Table algorithm.

Lorem ipsum dolor sit amet,¶consectetur adipiscing elit.

Please note, I don't insert a newline on purpose. We must think of this as a contiguous string.

When we draw the above through Raylib, we'll get:

And there's one more input, the cursor_offset. Where is the cursor placed, on the whole contiguous string, as an offset.

What is our desired output?

So let's take the above Lor¶em and say the cursor_offset is 2, what do we want to be rendered?

If we were to render the cursor taking up the whole space it's actually taking up, we'd use a full block and it'd look as follows:

Because of course the cursor_offset = 2 refers to the rune r in this string.

cursor_offset = 3

cursor_offset = 4

OK but what about soft wrap?

Patience. Setting the scene up.

OK let's take a longer contiguous string as an example.

But this is too long for our screen. Our screen can only fit 4 characters at a time.

window_width :: 400
max_runes_per_line := int(window_width / (char_width + char_spacing)) - 1
// max_runes_per_line := 4

Then we need our program to produce a string like the following:

So Raylib can render the following, and it fits on the screen:

(Again note that ¶ is for real line breaks, and § is for line breaks we've inserted for the purpose of soft wrapping. In the resulting string buffer they're both \n so that Raylib can render it properly.)

How should `cursor_offset` interact with soft wrapping?

Effectively it should be as if the § didn't exist.

So the cursor rendering should be as follows:

cursor_offset = 3

cursor_offset = 5 (notice ¶ are still real)

cursor_offset = 8

cursor_offset = 9

In other words the soft wrap doesn't affect the data, it only affects the rendering algorithm.

Soft wrapping algorithms we'll need

There's actually 3-4 distinct algorithms we need to make this work.

Inserting soft wrapping into the long contiguous string
Figuring out which line breaks are real, and which are soft wrap
Translating a cursor_offset on the real content, to a cursor_offset on the soft-wrapped content
Translating a cursor_offset on the soft-wrapped content to coordinates on the screen

You can combine 1-2 if you're clever, maybe you'll even get better performance. I'm not clever enough, but I tried once.

So let's get started.

Inserting soft wrap line breaks

We want this:

To turn into this:

That's the goal.

max_runes_per_line := 3

wrapped_builder := strings.builder_make()
for line, i in strings.split_lines(content) {
	remaining := line
	parts := [dynamic]string{}

	for {
		if len(remaining) == 0 {
			break
		}

		cut := strings.cut(remaining, 0, max_runes_per_line)

		append(&parts, cut)
		remaining = strings.cut(remaining, utf8.rune_count(cut))
	}

	strings.write_string(&wrapped_builder, strings.join(parts[:], "\n"))
	strings.write_string(&wrapped_builder, "\n")
}
wrapped_content := strings.to_string(wrapped_builder)

In short we split the content into lines and loop through them. We keep track of the remaining string, which is just whatever hasn't yet been consumed from the line by the algorithm, and we create a parts dynamic array which later we join with \n (the wrap line breaks). We add an extra \n to account for the line break we split into lines with to begin with.

Word break

If you wanted to wrap by word break, not just a hard break at the nth rune, then you can add the following to the algorithm:

// after this line:
cut := strings.cut(remaining, 0, max_runes_per_line)

// If after cutting we are at the limit, we try to find a word
// break that might be more suitable for the wrap.
if utf8.rune_count(cut) >= max_runes_per_line {
	word_break_i := 0
	#reverse for r, i in cut {
		if r == ' ' {
			word_break_i = i
			break
		}
	}
	// If we do find it, we take it.
	if word_break_i > 0 {
		cut = strings.cut(cut, 0, word_break_i)
	}
}

**Note: my code is with the above word wrapping, which means some of the algorithm choices WILL be affected, primarily on the last two algorithms to figure out where to draw the cursor. The drawings and my explanations don't include or think about the word wrapping.

Identifying real and soft line breaks

So you could integrate this into the algorithm above somehow, but I just wasn't able to do it very well, so I split it into its own algorithm.

Essentially what we want is to produce 4 distinct arrays with this algorithm. The names are what they are.

wrap_offsets = offset for each §
real_offsets = offset for each ¶ BEFORE any soft wrapping
real_offsets_with_wrap = as the name suggests, offset for each ¶ AFTER soft wrapping
all_offsets = all the offsets, for both ¶ and §

Calculating `real_offsets` and `all_offsets`

Both can be calculated naively by just counting line breaks in a given string.

indexes :: proc(str: string, char: rune) -> []int {
	offsets := [dynamic]int{}
	for r, i in str {
		if r == char {
			append(&offsets, i)
		}
	}
	return offsets[:]
}

// ...

real_offsets := indexes(content, '\n')
all_offsets := indexes(wrapped_content, '\n')

Calculating `wrap_offsets` and `real_offsets_with_wrap`

The algorithm itself is actually quite simple/naive.

wrap_offsets := [dynamic]int{}
real_offsets_with_wrap := [dynamic]int{}

current_real_offset := 0
wraps_count := 0

for offset in all_offsets {
	// Finished
	if current_real_offset >= len(real_offsets) {
		// TODO doesn't calculate § after last ¶
		break
	}
	// Offset minus wrap offsets we've run across = real offset
	// If we're at a real offset
	if offset - wraps_count == real_offsets[current_real_offset] {
		// ..then it's a real line
		append(&real_offsets_with_wrap, offset)
		current_real_offset += 1
	} else {
		// ..otherwise it's a wrap line
		append(&wrap_offsets, offset)
		wraps_count += 1
	}
}

In short, put in human terms, we loop through all the offsets, and if we figure out it's a § then we wraps_count++, if we figure out it's a ¶ then we look for the next ¶. As we find each, we add them to wrap_offsets or real_offsets_with_wrap.

Loop 1:

Loop 2:

Loop 3:

Psych! I haven't programmed this yet. But probably I'd just add all the remainders as §.

Translating the `cursor_offset` from before to after soft wrapping

From this:

To this:

How do we do it?

// Find line the cursor is on
lb_offset := -1
for offset, i in real_offsets {
	// Find next one (easier check)
	if offset > cursor_offset {
		if i > 0 {
			// Get thus previous one (current one)
			lb_offset = i-1
		}
		break
	}
}

lb := lb_offset == -1 ? 0 : real_offsets[lb_offset]
adjusted_lb := lb_offset == -1 ? 0 : real_offsets_with_wrap[lb_offset]

// Now we just need to find how many line breaks have been thus far, and add those to the adjusted_cursor_offset

adjusted_cursor_offset := lb_offset == -1 ? cursor_offset : cursor_offset + (adjusted_lb - lb)

// Cool, now we need to find out if there are any wrap line breaks in between the adjusted_lb and the adjusted_cursor_offset

for wrap_lb, i in table.table_to_draw.wrap_offsets {
	// If it's in between adjusted_lb <-> adjusted_cursor_offset
	if wrap_lb > adjusted_lb && wrap_lb < adjusted_cursor_offset {
		adjusted_cursor_offset += 1
	} else if wrap_lb >= adjusted_cursor_offset {
		break
	}
}

In short:

Find the array offset of the last ¶ the cursor_offset has encountered.
Translate that into the equivalent ¶ adjusted for § (§+¶)
We find out how many § there are in §+¶, and add those to the cursor_offset (now adjusted_cursor_offset)
And finally, between §+¶ and the adjusted_cursor_offset, are there any other §? ++ adjusted_cursor_offset per each one we find

Dunkin' easy. (Only took me 16 hours to realize that the solution was this simple and implement it.)

Translating the `adjusted_cursor_offset` to screen coordinates

Note again: the screen coordinates are monospaced.

So we want to go from this:

To this:

The code is actually rather simple, I discovered this simpler version in the middle of the development process, but then had to rediscover the correct solution and approach to all of the previous steps.

for offset, i in all_offsets {
	// Find next one (easier check)
	if offset > adjusted_cursor_offset {
		if i == 0 {
			line_offset = 0
			char_offset = cursor_offset
		} else if i > 0 {
			offset := all_offsets[i-1]
			line_offset = i // (i-1, the one we want) + 1
			char_offset = adjusted_cursor_offset - offset - 1
		}
		// -1 means that the cursor is "on" a § or a ¶
		if char_offset == -1 {
			line_offset -= 1
			prev_offset := i == 1 ? -1 : all_offsets[i-2]
			char_offset = adjusted_cursor_offset - prev_offset - 1
		}
		break
	}
}

return line_offset, char_offset // y and x coordinate offsets

I find the code rather self-explanatory, but the key points are:

We use all_offsets, because for drawing you don't care if it's § or ¶, you just care that it's a line break.
Remember the cursor_offset or adjusted_cursor_offset are the cursor's offset on a long contiguous string, which means that to know the X coordinate we just need to deduct the offset of the last § or ¶ in the content, from the cursor's offset, and we get the X coordinate.
We - 1 it because I don't know why, and honestly I don't care at this point. Not gonna lie to you.

The result

And so you get something like this https://x.com/textisenough/status/1848429885464711278

Beautiful.

Caveats

This doesn't cover some cases very well. Off the top of my head I can think of:

If word wrapping is not possible, like the single word on the line is longer than the screen, the cursor coordinate algorithm probably doesn't account for this very well for this.
As mentioned before, it doesn't properly count § after the last ¶, but that should be a simple fix.

Closing notes

This was a lot of frustrating fun. It was literally 16 hours of work to write and rewrite this code while figuring it out. Maybe longer.

Had to write and rewrite primarily because I didn't properly define the problem before I tackled it, unlike how I did in this post.

This is still not ideal or final implementation. It's actually missing stuff, and it hasn't been refined for user experience, but hopefully this serves as the heavy lifting for anyone out there figuring this stuff out on their own.

Resources

Piece Table:

Wikipedia page
Text Buffer Reimplementation a story of how Visual Studio Code implemented its data structure using a Piece Table
Text Editor Data Structures on invoke::thought()

Images were all created with Excalidraw, using the Excalidraw extension for Obsidian.