wiki:LineBreakingCSS3Mapping

Line Break Behavior Details

The following table enumerates the line breaking semantics of all characters in a manner consistent with CSS3 Text.

The columns of the table are defined as follows:

  • Code - Unicode code point (in hexadecimal), except - which means all other characters not explicitly listed
  • UAX14 - Line breaking class assigned by UAX14
  • ICU() - Behavior implemented by ICU when primary language subtag of locale is not specified, not 'ja', or otherwise not explicitly supported
  • ICU(ja) - Behavior implemented by ICU when primary language subtag of locale is 'ja' (or equivalent)
  • Loose() - Behavior prescribed by CSS3 when line-break is loose and content language is not Chinese, Japanese, or Korean
  • Loose(cjk) - Behavior prescribed by CSS3 when line-break is loose and content language is Chinese, Japanese, or Korean
  • Normal() - Behavior prescribed by CSS3 when line-break is normal and content language is not Chinese, Japanese, or Korean
  • Normal(cjk) - Behavior prescribed by CSS3 when line-break is normal and content language is Chinese, Japanese, or Korean
  • Strict() - Behavior prescribed by CSS3 when line-break is strict and content language is not Chinese, Japanese, or Korean
  • Strict(cjk) - Behavior prescribed by CSS3 when line-break is strict and content language is Chinese, Japanese, or Korean
  • Character Name - Unicode character name

The values of the UAX14 column designate (a subset of the) line breaking classes defined by UAX14 as follows:

  • BA - Break After
  • CJ - Conditional Japanese Starter
  • EX - Exclamation/Interrogation
  • IN - Inseparable
  • IS - Infix Numeric Separator
  • NS - Nonstarters
  • PO - Postfix Numeric
  • PR - Prefix Numeric
  • - - as otherwise defined by UAX14

The values of the ICU() through Strict(ja) columns designate the following breaking behavior:

  • A - break permitted after
  • B - break permitted before
  • B/A - break permitted before or after
  • XA - break excluded after
  • XB - break excluded before
  • XP - break excluded between any pair in class
  • - - as otherwise defined by default line breaking behavior

    ISSUE: Need to verify behavior for U+2010 (hyphen) and U+2013 (en dash) below. UAX14 characterizes as BA (break permitted after) but doesn't address break before behavior, while CSS3 Text prescribes certain break before behavior but not break after behavior.

Code UAX14 ICU() ICU(ja) Loose() Loose(cjk) Normal() Normal(cjk) Strict() Strict(cjk) Character Name
0021 EX XB XB - B/A - XB - XB exclamation mark
0024 PR XA XA - B/A - XA - XA dollar sign
0025 PO XB XB - B/A - XB - XB percent sign
003A IS XB XB - B/A - XB - XB colon
003B IS XB XB - B/A - XB - XB semicolon
003F EX XB XB - B/A - XB - XB question mark
00A2 PO XB XB - B/A - XB - XB cent sign
00A3 PR XA XA - B/A - XA - XA pound sign
00A5 PR XA XA - B/A - XA - XA yen sign
00B0 PO XB XB - B/A - XB - XB degree sign
2010 BA A A - B/A - B/A - XB hyphen
2013 BA A A - B/A - B/A - XB en dash
2025 IN XP XP B/A B/A XP XP XP XP two dot leader
2026 IN XP XP B/A B/A XP XP XP XP ellipsis
2030 PO XB XB - B/A - XB - XB per mille sign
2032 PO XB XB - B/A - XB - XB prime
2033 PO XB XB - B/A - XB - XB double prime
203C NS XB XB - B/A - XB - XB double exclamation mark
2047 NS XB XB - B/A - XB - XB double question mark
2048 NS XB XB - B/A - XB - XB question exclamation mark
2049 NS XB XB - B/A - XB - XB exclamation question mark
20AC PR XA XA - B/A - XA - XA euro sign
2103 PO XB XB - B/A - XB - XB degree celsius
2116 PR XA XA - B/A - XA - XA numero sign
3005 NS XB XB B/A B/A XB XB XB XB ideographic iteration mark
301C NS XB XB - B/A - B/A - XB wave dash
303B NS XB XB B/A B/A XB XB XB XB vertical ideographic iteration mark
3041 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small a
3043 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small i
3045 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small u
3047 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small e
3049 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small o
3063 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small tu
3083 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small ya
3085 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small yu
3087 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small yo
308E CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small wa
3095 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small ka
3096 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small ke
309D NS XB XB B/A B/A XB XB XB XB hiragana iteration mark
309E NS XB XB B/A B/A XB XB XB XB hiragana voiced iteration mark
30A0 NS XB XB - B/A - B/A - XB katakana-hiragana double hyphen
30A1 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small a
30A3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small i
30A5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small u
30A7 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small e
30A9 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small o
30C3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small tu
30E3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ya
30E5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small yu
30E7 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small yo
30EE CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small wa
30F5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ka
30F6 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ke
30FB NS XB XB - B/A - XB - XB katakana middle dot
30FC CJ XB B/A B/A B/A B/A B/A XB XB katakana-hiragana prolonged sound mark
30FD NS XB XB B/A B/A XB XB XB XB katakana iteration mark
30FE NS XB XB B/A B/A XB XB XB XB katakana voiced iteration mark
31F0 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ku
31F1 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small si
31F2 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small su
31F3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small to
31F4 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small nu
31F5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ha
31F6 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small hi
31F7 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small hu
31F8 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small he
31F9 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ho
31FA CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small mu
31FB CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ra
31FC CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ri
31FD CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ru
31FE CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small re
31FF CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ro
FF01 EX XB XB - B/A - XB - XB fullwidth exclamation mark
FF04 PR XA XA - B/A - XA - XA fullwidth dollar sign
FF05 PO XB XB - B/A - XB - XB fullwidth percent sign
FF1A NS XB XB - B/A - XB - XB fullwidth colon
FF1B NS XB XB - B/A - XB - XB fullwidth semicolon
FF1F EX XB XB - B/A - XB - XB fullwidth question mark
FF65 NS XB XB - B/A - XB - XB halfwidth katakana middle dot
FF67 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small a
FF68 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small i
FF69 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small u
FF6A CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small e
FF6B CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small o
FF6C CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small ya
FF6D CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small yu
FF6E CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small yo
FF6F CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small tu
FF70 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana-hiragana prolonged sound mark
FFE0 PO XB XB - B/A - XB - XB fullwidth cent sign
FFE1 PR XA XA - B/A - XA - XA fullwidth pound sign
FFE5 PR XA XA - B/A - XA - XA fullwidth yen sign
- - - - - - - - - - all other characters

Implementation Details

  • If -webkit-line-break is auto, content language is not Chinese, Japanese, or Korean, and ICU does not provide a tailored set of rules that applies to the content language, then use the ICU() column's behavior;
  • If -webkit-line-break is auto, content language is not Chinese, Japanese, or Korean, and ICU does provide a tailored set of rules that applies to the content language, then use the ICU() column's behavior modulo application of tailored rules;
  • If -webkit-line-break is auto and content language is Chinese, Japanese, or Korean, then use the Normal(cjk) column's behavior;
  • If -webkit-line-break is loose and content language is not Chinese, Japanese, or Korean, then use the Loose() column's behavior;
  • If -webkit-line-break is loose and content language is Chinese, Japanese, or Korean, then use the Loose(cjk) column's behavior;
  • If -webkit-line-break is normal and content language is not Chinese, Japanese, or Korean, then use the Normal() column's behavior;
  • If -webkit-line-break is normal and content language is Chinese, Japanese, or Korean, then use the Normal(cjk) column's behavior;
  • If -webkit-line-break is strict and content language is not Chinese, Japanese, or Korean, then use the Strict() column's behavior;
  • If -webkit-line-break is strict and content language is Chinese, Japanese, or Korean, then use the Strict(cjk) column's behavior;
  • If -webkit-line-break is after-white-space, then use the procedure defined in Handling of after-white-space.

For implementation purposes, content language is determined as described by to determine the language of a node.

For implementation purposes, default line breaking behavior is interpreted as the behavior implemented by ICU when applying either ICU's default rules or locale specific rules tailored to an identified content language.

Handling of after-white-space

ISSUE: To be supplied.

Last modified 12 years ago Last modified on Sep 5, 2012, 10:41:54 PM
Note: See TracWiki for help on using the wiki.