wiki:LineBreakingCSS3Mapping

Version 3 (modified by glenn@skynav.com, 12 years ago) (diff)

Elaborate content.

Per-Character CSS3 Line Break Semantics

The following table enumerates the line breaking semantics of specific characters explicitly enumerated by CSS3 Text.

The columns of the table are defined as follows:

  • Code - Unicode code point (in hexadecimal), except - which means all other characters not explicitly listed
  • UAX14 - Line breaking class assigned by UAX14
  • ICU() - Behavior implemented by ICU when primary language subtag of locale is not specified or not 'ja' or 'zh' (or equivalent)
  • ICU(ja) - Behavior implemented by ICU when primary language subtag of locale is 'ja' or 'zh' (or equivalent)
  • Loose() - Behavior prescribed by CSS3 when line-break is loose and content language is not Japanese or Chinese
  • Loose(ja) - Behavior prescribed by CSS3 when line-break is loose and content language is not Japanese or Chinese
  • Normal() - Behavior prescribed by CSS3 when line-break is normal and content language is not Japanese or Chinese
  • Normal(ja) - Behavior prescribed by CSS3 when line-break is normal and content language is not Japanese or Chinese
  • Strict() - Behavior prescribed by CSS3 when line-break is strict and content language is not Japanese or Chinese
  • Strict(ja) - Behavior prescribed by CSS3 when line-break is strict and content language is not Japanese or Chinese
  • Character Name - Unicode character name

The values of the UAX14 column designate (a subset of the) line breaking classes defined by UAX14 as follows:

  • BA - Break After
  • CJ - Conditional Japanese Starter
  • EX - Exclamation/Interrogation
  • IN - Inseparable
  • NS - Nonstarters
  • PO - Postfix Numeric
  • PR - Prefix Numeric

The values of the ICU() through Strict(ja) columns designate the following breaking behavior:

  • B/A - break permitted before or after
  • XA - break excluded after
  • XB - break excluded before
  • XP - break excluded between any pair in class
  • - - break behavior defined by default line breaking behavior
Code UAX14 ICU() ICU(ja) Loose() Loose(ja) Normal() Normal(ja) Strict() Strict(ja) Character Name
0021 EX XB XB - B/A - XB - XB exclamation mark
0024 PR XA XA - B/A - XA - XA dollar sign
0025 PO XB XB - B/A - XB - XB percent sign
003A IS XB XB - B/A - XB - XB colon
003B IS XB XB - B/A - XB - XB semicolon
003F EX XB XB - B/A - XB - XB question mark
00A2 PO XB XB - B/A - XB - XB cent sign
00A3 PR XA XA - B/A - XA - XA pound sign
00A5 PR XA XA - B/A - XA - XA yen sign
00B0 PO XB XB - B/A - XB - XB degree sign
2010 BA B/A B/A - B/A - B/A - XB hyphen
2013 BA B/A B/A - B/A - B/A - XB en dash
2025 IN XP XP B/A B/A XP XP XP XP two dot leader
2026 IN XP XP B/A B/A XP XP XP XP ellipsis
2030 PO XB XB - B/A - XB - XB per mille sign
2032 PO XB XB - B/A - XB - XB prime
2033 PO XB XB - B/A - XB - XB double prime
203C NS XB XB - B/A - XB - XB double exclamation mark
2047 NS XB XB - B/A - XB - XB double question mark
2048 NS XB XB - B/A - XB - XB question exclamation mark
2049 NS XB XB - B/A - XB - XB exclamation question mark
20AC PR XA XA - B/A - XA - XA euro sign
2103 PO XB XB - B/A - XB - XB degree celsius
2116 PR XA XA - B/A - XA - XA numero sign
3005 NS XB XB B/A B/A XB XB XB XB ideographic iteration mark
301C NS XB XB - B/A - B/A - XB wave dash
303B NS XB XB B/A B/A XB XB XB XB vertical ideographic iteration mark
30A0 NS XB XB - B/A - B/A - XB katakana-hiragana double hyphen
3041 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small a
3043 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small i
3045 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small u
3047 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small e
3049 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small o
3063 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small tu
3083 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small ya
3085 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small yu
3087 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small yo
308E CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small wa
3095 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small ka
3096 CJ XB B/A B/A B/A B/A B/A XB XB hiragana letter small ke
309D NS XB XB B/A B/A XB XB XB XB hiragana iteration mark
309E NS XB XB B/A B/A XB XB XB XB hiragana voiced iteration mark
30A1 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small a
30A3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small i
30A5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small u
30A7 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small e
30A9 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small o
30C3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small tu
30E3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ya
30E5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small yu
30E7 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small yo
30EE CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small wa
30F5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ka
30F6 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ke
30FB NS XB XB - B/A - XB - XB katakana middle dot
30FC CJ XB B/A B/A B/A B/A B/A XB XB katakana-hiragana prolonged sound mark
30FD NS XB XB B/A B/A XB XB XB XB katakana iteration mark
30FE NS XB XB B/A B/A XB XB XB XB katakana voiced iteration mark
31F0 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ku
31F1 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small si
31F2 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small su
31F3 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small to
31F4 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small nu
31F5 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ha
31F6 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small hi
31F7 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small hu
31F8 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small he
31F9 CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ho
31FA CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small mu
31FB CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ra
31FC CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ri
31FD CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ru
31FE CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small re
31FF CJ XB B/A B/A B/A B/A B/A XB XB katakana letter small ro
FF01 EX XB XB - B/A - XB - XB fullwidth exclamation mark
FF04 PR XA XA - B/A - XA - XA fullwidth dollar sign
FF05 PO XB XB - B/A - XB - XB fullwidth percent sign
FF1A NS XB XB - B/A - XB - XB fullwidth colon
FF1B NS XB XB - B/A - XB - XB fullwidth semicolon
FF1F EX XB XB - B/A - XB - XB fullwidth question mark
FF65 NS XB XB - B/A - XB - XB halfwidth katakana middle dot
FF67 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small a
FF68 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small i
FF69 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small u
FF6A CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small e
FF6B CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small o
FF6C CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small ya
FF6D CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small yu
FF6E CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small yo
FF6F CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana letter small tu
FF70 CJ XB B/A B/A B/A B/A B/A XB XB halfwidth katakana-hiragana prolonged sound mark
FFE0 PO XB XB - B/A - XB - XB fullwidth cent sign
FFE1 PR XA XA - B/A - XA - XA fullwidth pound sign
FFE5 PR XA XA - B/A - XA - XA fullwidth yen sign
- - - - - - - - - - all other characters

Implementation Details

  • If line-break is auto, content language is neither Japanese nor Chinese, and ICU does not provide a tailored set of rules that apply to the content language, then use the ICU() column's behavior;
  • If line-break is auto, content language is neither Japanese nor Chinese, and ICU does provide a tailored set of rules that apply to the content language, then use the ICU() column's behavior modulo application of tailored rules;
  • If line-break is auto and content language is Japanese or Chinese, then use the Normal(ja) column's behavior;
  • If line-break is loose and content language is neither Japanese nor Chinese, then use the Loose() column's behavior;
  • If line-break is loose and content language is Japanese or Chinese, then use the Loose(ja) column's behavior;
  • If line-break is normal and content language is neither Japanese nor Chinese, then use the Normal() column's behavior;
  • If line-break is normal and content language is Japanese or Chinese, then use the Normal(ja) column's behavior;
  • If line-break is strict and content language is neither Japanese nor Chinese, then use the Strict() column's behavior;
  • If line-break is strict and content language is Japanese or Chinese, then use the Strict(ja) column's behavior;