A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published online as a separate document.

In that case, the performance does not depend on the complexity or number of rules.

This annex describes guidelines for determining default segmentation boundaries between certain significant text elements: grapheme clusters (“user-perceived characters”), words, and sentences. This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium.

This is a stable document and may be used as reference material or cited as a normative reference by other specifications.

Each rule consists of a left side, a boundary symbol (see ), and a right side. The left and right sides use the boundary property values in regular expressions.

The regular expression syntax used is a simplified version of the format supplied in An open-box symbol (“␣”) is used to indicate a space in examples.

Rather, performance is only affected by the number of characters that may match after the boundary position in a rule that applies.

A boundary specification summarizes boundary property values used in that specification, then lists the rules for boundary determinations in terms of those property values.

Rather than concentrate on algorithmically searching for text elements (often called segments), a simpler and more useful computation instead detects the boundaries (or breaks) between those text elements.