Changeset 28793 in webkit


Ignore:
Timestamp:
Dec 16, 2007 8:19:25 PM (16 years ago)
Author:
Darin Adler
Message:

Reviewed by Maciej.

  • http://bugs.webkit.org/show_bug.cgi?id=16438
  • removed some more unused code
  • changed quite a few more names to WebKit-style
  • moved more things out of pcre_internal.h
  • changed some indentation to WebKit-style
  • improved design of the functions for reading and writing 2-byte values from the opcode stream (in pcre_internal.h)
  • pcre/dftables.cpp: (main): Added the kjs prefix a normal way in lieu of using macros.
  • pcre/pcre_compile.cpp: Moved some definitions here from pcre_internal.h. (errorText): Name changes, fewer typedefs. (checkEscape): Ditto. Changed uppercase conversion to use toASCIIUpper. (isCountedRepeat): Name change. (readRepeatCounts): Name change. (firstSignificantOpcode): Got rid of the use of OP_lengths, which is very lightly used here. Hard-coded the length of OP_BRANUMBER. (firstSignificantOpcodeSkippingAssertions): Ditto. Also changed to use the advanceToEndOfBracket function. (getOthercaseRange): Name changes. (encodeUTF8): Ditto. (compileBranch): Name changes. Removed unused after_manual_callout and the code to handle it. Removed code to handle OP_ONCE since we never emit this opcode. Changed to use advanceToEndOfBracket in more places. (compileBracket): Name changes. (branchIsAnchored): Removed code to handle OP_ONCE since we never emit this opcode. (bracketIsAnchored): Name changes. (branchNeedsLineStart): More fo the same. (bracketNeedsLineStart): Ditto. (branchFindFirstAssertedCharacter): Removed OP_ONCE code. (bracketFindFirstAssertedCharacter): More of the same. (calculateCompiledPatternLengthAndFlags): Ditto. (returnError): Name changes. (jsRegExpCompile): Ditto.
  • pcre/pcre_exec.cpp: Moved some definitions here from pcre_internal.h. (matchRef): Updated names. Improved macros to use the do { } while(0) idiom so they expand to single statements rather than to blocks or multiple statements. And refeactored the recursive match macros. (MatchStack::pushNewFrame): Name changes. (getUTF8CharAndIncrementLength): Name changes. (match): Name changes. Removed the ONCE opcode. (jsRegExpExecute): Name changes.
  • pcre/pcre_internal.h: Removed quite a few unneeded includes. Rewrote quite a few comments. Removed the macros that add kjs prefixes to the functions with external linkage; instead renamed the functions. Removed the unneeded typedefs pcre_uint16, pcre_uint32, and uschar. Removed the dead and not-all-working code for LINK_SIZE values other than 2, although we aim to keep the abstraction working. Removed the OP_LENGTHS macro. (put2ByteValue): Replaces put2ByteOpcodeValueAtOffset. (get2ByteValue): Replaces get2ByteOpcodeValueAtOffset. (put2ByteValueAndAdvance): Replaces put2ByteOpcodeValueAtOffsetAndAdvance. (putLinkValueAllowZero): Replaces putOpcodeValueAtOffset; doesn't do the addition, since a comma is really no better than a plus sign. Added an assertion to catch out of range values and changed the parameter type to int rather than unsigned. (getLinkValueAllowZero): Replaces getOpcodeValueAtOffset. (putLinkValue): New function that most former callers of the putOpcodeValueAtOffset function can use; asserts the value that is being stored is non-zero and then calls putLinkValueAllowZero. (getLinkValue): Ditto. (putLinkValueAndAdvance): Replaces putOpcodeValueAtOffsetAndAdvance. No caller was using an offset, which makes sense given the advancing behavior. (putLinkValueAllowZeroAndAdvance): Ditto. (isBracketOpcode): Added. For use in an assertion. (advanceToEndOfBracket): Renamed from moveOpcodePtrPastAnyAlternateBranches, and removed comments about how it's not well designed. This function takes a pointer to the beginning of a bracket and advances to the end of the bracket.
  • pcre/pcre_tables.cpp: Updated names.
  • pcre/pcre_ucp_searchfuncs.cpp: (kjs_pcre_ucp_othercase): Ditto.
  • pcre/pcre_xclass.cpp: (getUTF8CharAndAdvancePointer): Ditto. (kjs_pcre_xclass): Ditto.
  • pcre/ucpinternal.h: Ditto.
  • wtf/ASCIICType.h: (WTF::isASCIIAlpha): Added an int overload, like the one we already have for isASCIIDigit. (WTF::isASCIIAlphanumeric): Ditto. (WTF::isASCIIHexDigit): Ditto. (WTF::isASCIILower): Ditto. (WTF::isASCIISpace): Ditto. (WTF::toASCIILower): Ditto. (WTF::toASCIIUpper): Ditto.
Location:
trunk/JavaScriptCore
Files:
10 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/ChangeLog

    r28785 r28793  
     12007-12-16  Darin Adler  <darin@apple.com>
     2
     3        Reviewed by Maciej.
     4
     5        - http://bugs.webkit.org/show_bug.cgi?id=16438
     6        - removed some more unused code
     7        - changed quite a few more names to WebKit-style
     8        - moved more things out of pcre_internal.h
     9        - changed some indentation to WebKit-style
     10        - improved design of the functions for reading and writing
     11          2-byte values from the opcode stream (in pcre_internal.h)
     12
     13        * pcre/dftables.cpp:
     14        (main): Added the kjs prefix a normal way in lieu of using macros.
     15
     16        * pcre/pcre_compile.cpp: Moved some definitions here from pcre_internal.h.
     17        (errorText): Name changes, fewer typedefs.
     18        (checkEscape): Ditto. Changed uppercase conversion to use toASCIIUpper.
     19        (isCountedRepeat): Name change.
     20        (readRepeatCounts): Name change.
     21        (firstSignificantOpcode): Got rid of the use of OP_lengths, which is
     22        very lightly used here. Hard-coded the length of OP_BRANUMBER.
     23        (firstSignificantOpcodeSkippingAssertions): Ditto. Also changed to
     24        use the advanceToEndOfBracket function.
     25        (getOthercaseRange): Name changes.
     26        (encodeUTF8): Ditto.
     27        (compileBranch): Name changes. Removed unused after_manual_callout and
     28        the code to handle it. Removed code to handle OP_ONCE since we never
     29        emit this opcode. Changed to use advanceToEndOfBracket in more places.
     30        (compileBracket): Name changes.
     31        (branchIsAnchored): Removed code to handle OP_ONCE since we never emit
     32        this opcode.
     33        (bracketIsAnchored): Name changes.
     34        (branchNeedsLineStart): More fo the same.
     35        (bracketNeedsLineStart): Ditto.
     36        (branchFindFirstAssertedCharacter): Removed OP_ONCE code.
     37        (bracketFindFirstAssertedCharacter): More of the same.
     38        (calculateCompiledPatternLengthAndFlags): Ditto.
     39        (returnError): Name changes.
     40        (jsRegExpCompile): Ditto.
     41
     42        * pcre/pcre_exec.cpp: Moved some definitions here from pcre_internal.h.
     43        (matchRef): Updated names.
     44        Improved macros to use the do { } while(0) idiom so they expand to single
     45        statements rather than to blocks or multiple statements. And refeactored
     46        the recursive match macros.
     47        (MatchStack::pushNewFrame): Name changes.
     48        (getUTF8CharAndIncrementLength): Name changes.
     49        (match): Name changes. Removed the ONCE opcode.
     50        (jsRegExpExecute): Name changes.
     51
     52        * pcre/pcre_internal.h: Removed quite a few unneeded includes. Rewrote
     53        quite a few comments. Removed the macros that add kjs prefixes to the
     54        functions with external linkage; instead renamed the functions. Removed
     55        the unneeded typedefs pcre_uint16, pcre_uint32, and uschar. Removed the
     56        dead and not-all-working code for LINK_SIZE values other than 2, although
     57        we aim to keep the abstraction working. Removed the OP_LENGTHS macro.
     58        (put2ByteValue): Replaces put2ByteOpcodeValueAtOffset.
     59        (get2ByteValue): Replaces get2ByteOpcodeValueAtOffset.
     60        (put2ByteValueAndAdvance): Replaces put2ByteOpcodeValueAtOffsetAndAdvance.
     61        (putLinkValueAllowZero): Replaces putOpcodeValueAtOffset; doesn't do the
     62        addition, since a comma is really no better than a plus sign. Added an
     63        assertion to catch out of range values and changed the parameter type to
     64        int rather than unsigned.
     65        (getLinkValueAllowZero): Replaces getOpcodeValueAtOffset.
     66        (putLinkValue): New function that most former callers of the
     67        putOpcodeValueAtOffset function can use; asserts the value that is
     68        being stored is non-zero and then calls putLinkValueAllowZero.
     69        (getLinkValue): Ditto.
     70        (putLinkValueAndAdvance): Replaces putOpcodeValueAtOffsetAndAdvance. No
     71        caller was using an offset, which makes sense given the advancing behavior.
     72        (putLinkValueAllowZeroAndAdvance): Ditto.
     73        (isBracketOpcode): Added. For use in an assertion.
     74        (advanceToEndOfBracket): Renamed from moveOpcodePtrPastAnyAlternateBranches,
     75        and removed comments about how it's not well designed. This function takes
     76        a pointer to the beginning of a bracket and advances to the end of the
     77        bracket.
     78
     79        * pcre/pcre_tables.cpp: Updated names.
     80        * pcre/pcre_ucp_searchfuncs.cpp:
     81        (kjs_pcre_ucp_othercase): Ditto.
     82        * pcre/pcre_xclass.cpp:
     83        (getUTF8CharAndAdvancePointer): Ditto.
     84        (kjs_pcre_xclass): Ditto.
     85        * pcre/ucpinternal.h: Ditto.
     86
     87        * wtf/ASCIICType.h:
     88        (WTF::isASCIIAlpha): Added an int overload, like the one we already have for
     89        isASCIIDigit.
     90        (WTF::isASCIIAlphanumeric): Ditto.
     91        (WTF::isASCIIHexDigit): Ditto.
     92        (WTF::isASCIILower): Ditto.
     93        (WTF::isASCIISpace): Ditto.
     94        (WTF::toASCIILower): Ditto.
     95        (WTF::toASCIIUpper): Ditto.
     96
    1972007-12-16  Darin Adler  <darin@apple.com>
    298
     
    11711267        Reviewed by Maciej.
    11721268
    1173         Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings.
     1269        Centralize code for subjectPtr adjustments using inlines, only ever check for a single
     1270        trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char
     1271        classes and garbled UTF16 strings.
    11741272
    11751273        * pcre/pcre_exec.cpp:
  • trunk/JavaScriptCore/pcre/dftables.cpp

    r27730 r28793  
    7979  "128 (ASCII characters). These tables are used when no external tables are\n"
    8080  "passed to PCRE. */\n\n"
    81   "const unsigned char _pcre_default_tables[%d] = {\n\n"
     81  "const unsigned char kjs_pcre_default_tables[%d] = {\n\n"
    8282  "/* This table is a lower casing table. */\n\n", tables_length);
    8383
  • trunk/JavaScriptCore/pcre/pcre_compile.cpp

    r28785 r28793  
    5151using namespace WTF;
    5252
     53/* Negative values for the firstchar and reqchar variables */
     54
     55#define REQ_UNSET (-2)
     56#define REQ_NONE  (-1)
     57
    5358/*************************************************
    5459*      Code parameters and static tables         *
     
    8994};
    9095
    91 /* Table of sizes for the fixed-length opcodes. It's defined in a macro so that
    92 the definition is next to the definition of the opcodes in pcre_internal.h. */
    93 
    94 static const uschar OP_lengths[] = { OP_LENGTHS };
    95 
    9696/* The texts of compile-time error messages. These are "char *" because they
    9797are passed to the outside world. */
    9898
    99 static const char* error_text(ErrorCode code)
     99static const char* errorText(ErrorCode code)
    100100{
    101     static const char error_texts[] =
     101    static const char errorTexts[] =
    102102      /* 1 */
    103103      "\\ at end of pattern\0"
     
    124124
    125125    int i = code;
    126     const char* text = error_texts;
     126    const char* text = errorTexts;
    127127    while (i > 1)
    128128        i -= !*text++;
     
    142142        needOuterBracket = false;
    143143    }
    144     const uschar* start_code;   /* The start of the compiled code */
     144    const unsigned char* start_code;   /* The start of the compiled code */
    145145    const UChar* start_pattern; /* The start of the pattern */
    146146    int top_backref;            /* Maximum back reference */
     
    152152/* Definitions to allow mutual recursion */
    153153
    154 static bool compileBracket(int, int*, uschar**, const UChar**, const UChar*, ErrorCode*, int, int*, int*, CompileData&);
    155 static bool bracketIsAnchored(const uschar* code);
    156 static bool bracketNeedsLineStart(const uschar* code, unsigned captureMap, unsigned backrefMap);
    157 static int bracketFindFirstAssertedCharacter(const uschar* code, bool inassert);
     154static bool compileBracket(int, int*, unsigned char**, const UChar**, const UChar*, ErrorCode*, int, int*, int*, CompileData&);
     155static bool bracketIsAnchored(const unsigned char* code);
     156static bool bracketNeedsLineStart(const unsigned char* code, unsigned captureMap, unsigned backrefMap);
     157static int bracketFindFirstAssertedCharacter(const unsigned char* code, bool inassert);
    158158
    159159/*************************************************
     
    179179*/
    180180
    181 static int check_escape(const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int bracount, bool isclass)
     181static int checkEscape(const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int bracount, bool isclass)
    182182{
    183183    const UChar* ptr = *ptrptr + 1;
     
    209209    } else {
    210210        switch (c) {
    211         case '1':
    212         case '2':
    213         case '3':
    214         case '4':
    215         case '5':
    216         case '6':
    217         case '7':
    218         case '8':
    219         case '9':
    220             /* Escape sequences starting with a non-zero digit are backreferences,
    221              unless there are insufficient brackets, in which case they are octal
    222              escape sequences. Those sequences end on the first non-octal character
    223              or when we overflow 0-255, whichever comes first. */
    224            
    225             if (!isclass) {
    226                 const UChar* oldptr = ptr;
    227                 c -= '0';
    228                 while ((ptr + 1 < patternEnd) && isASCIIDigit(ptr[1]) && c <= bracount)
    229                     c = c * 10 + *(++ptr) - '0';
    230                 if (c <= bracount) {
    231                     c = -(ESC_REF + c);
     211            case '1':
     212            case '2':
     213            case '3':
     214            case '4':
     215            case '5':
     216            case '6':
     217            case '7':
     218            case '8':
     219            case '9':
     220                /* Escape sequences starting with a non-zero digit are backreferences,
     221                 unless there are insufficient brackets, in which case they are octal
     222                 escape sequences. Those sequences end on the first non-octal character
     223                 or when we overflow 0-255, whichever comes first. */
     224               
     225                if (!isclass) {
     226                    const UChar* oldptr = ptr;
     227                    c -= '0';
     228                    while ((ptr + 1 < patternEnd) && isASCIIDigit(ptr[1]) && c <= bracount)
     229                        c = c * 10 + *(++ptr) - '0';
     230                    if (c <= bracount) {
     231                        c = -(ESC_REF + c);
     232                        break;
     233                    }
     234                    ptr = oldptr;      /* Put the pointer back and fall through */
     235                }
     236               
     237                /* Handle an octal number following \. If the first digit is 8 or 9,
     238                 this is not octal. */
     239               
     240                if ((c = *ptr) >= '8')
    232241                    break;
    233                 }
    234                 ptr = oldptr;      /* Put the pointer back and fall through */
    235             }
    236            
    237             /* Handle an octal number following \. If the first digit is 8 or 9,
    238              this is not octal. */
    239            
    240             if ((c = *ptr) >= '8')
    241                 break;
    242            
     242
    243243            /* \0 always starts an octal number, but we may drop through to here with a
    244244             larger first octal digit. */
    245        
    246         case '0': {
    247             c -= '0';
    248             int i;
    249             for (i = 1; i <= 2; ++i) {
    250                 if (ptr + i >= patternEnd || ptr[i] < '0' || ptr[i] > '7')
    251                     break;
    252                 int cc = c * 8 + ptr[i] - '0';
    253                 if (cc > 255)
    254                     break;
    255                 c = cc;
     245
     246            case '0': {
     247                c -= '0';
     248                int i;
     249                for (i = 1; i <= 2; ++i) {
     250                    if (ptr + i >= patternEnd || ptr[i] < '0' || ptr[i] > '7')
     251                        break;
     252                    int cc = c * 8 + ptr[i] - '0';
     253                    if (cc > 255)
     254                        break;
     255                    c = cc;
     256                }
     257                ptr += i - 1;
     258                break;
    256259            }
    257             ptr += i - 1;
    258             break;
    259         }
    260         case 'x': {
    261             c = 0;
    262             int i;
    263             for (i = 1; i <= 2; ++i) {
    264                 if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
    265                     c = 'x';
    266                     i = 1;
    267                     break;
    268                 }
    269                 int cc = ptr[i];
    270                 if (cc >= 'a')
    271                     cc -= 32;             /* Convert to upper case */
    272                 c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
     260
     261            case 'x': {
     262                c = 0;
     263                int i;
     264                for (i = 1; i <= 2; ++i) {
     265                    if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
     266                        c = 'x';
     267                        i = 1;
     268                        break;
     269                    }
     270                    int cc = ptr[i];
     271                    if (cc >= 'a')
     272                        cc -= 32;             /* Convert to upper case */
     273                    c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
     274                }
     275                ptr += i - 1;
     276                break;
    273277            }
    274             ptr += i - 1;
    275             break;
    276         }
    277         case 'u': {
    278             c = 0;
    279             int i;
    280             for (i = 1; i <= 4; ++i) {
    281                 if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
    282                     c = 'u';
    283                     i = 1;
    284                     break;
    285                 }
    286                 int cc = ptr[i];
    287                 if (cc >= 'a')
    288                     cc -= 32;             /* Convert to upper case */
    289                 c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
     278
     279            case 'u': {
     280                c = 0;
     281                int i;
     282                for (i = 1; i <= 4; ++i) {
     283                    if (ptr + i >= patternEnd || !isASCIIHexDigit(ptr[i])) {
     284                        c = 'u';
     285                        i = 1;
     286                        break;
     287                    }
     288                    int cc = ptr[i];
     289                    if (cc >= 'a')
     290                        cc -= 32;             /* Convert to upper case */
     291                    c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10));
     292                }
     293                ptr += i - 1;
     294                break;
    290295            }
    291             ptr += i - 1;
    292             break;
    293            
    294             /* Other special escapes not starting with a digit are straightforward */
    295         }
    296         case 'c':
    297             if (++ptr == patternEnd) {
    298                 *errorcodeptr = ERR2;
    299                 return 0;
     296
     297            case 'c':
     298                if (++ptr == patternEnd) {
     299                    *errorcodeptr = ERR2;
     300                    return 0;
     301                }
     302                c = *ptr;
     303               
     304                /* A letter is upper-cased; then the 0x40 bit is flipped. This coding
     305                 is ASCII-specific, but then the whole concept of \cx is ASCII-specific. */
     306                c = toASCIIUpper(c) ^ 0x40;
     307                break;
    300308            }
    301             c = *ptr;
    302            
    303             /* A letter is upper-cased; then the 0x40 bit is flipped. This coding
    304              is ASCII-specific, but then the whole concept of \cx is ASCII-specific. */
    305            
    306             if (c >= 'a' && c <= 'z')
    307                 c -= 32;
    308             c ^= 0x40;
    309             break;
    310         }
    311309    }
    312310   
     
    314312    return c;
    315313}
    316 
    317 
    318314
    319315/*************************************************
     
    332328*/
    333329
    334 static bool is_counted_repeat(const UChar* p, const UChar* patternEnd)
     330static bool isCountedRepeat(const UChar* p, const UChar* patternEnd)
    335331{
    336332    if (p >= patternEnd || !isASCIIDigit(*p))
     
    356352}
    357353
    358 
    359354/*************************************************
    360355*         Read repeat counts                     *
     
    362357
    363358/* Read an item of the form {n,m} and return the values. This is called only
    364 after is_counted_repeat() has confirmed that a repeat-count quantifier exists,
     359after isCountedRepeat() has confirmed that a repeat-count quantifier exists,
    365360so the syntax is guaranteed to be correct, but we need to check the values.
    366361
     
    376371*/
    377372
    378 static const UChar* read_repeat_counts(const UChar* p, int* minp, int* maxp, ErrorCode* errorcodeptr)
     373static const UChar* readRepeatCounts(const UChar* p, int* minp, int* maxp, ErrorCode* errorcodeptr)
    379374{
    380375    int min = 0;
     
    420415}
    421416
    422 
    423417/*************************************************
    424418*      Find first significant op code            *
     
    427421/* This is called by several functions that scan a compiled expression looking
    428422for a fixed first character, or an anchoring op code etc. It skips over things
    429 that do not influence this. For some calls, a change of option is important.
    430 For some calls, it makes sense to skip negative forward and all backward
    431 assertions, and also the \b assertion; for others it does not.
     423that do not influence this.
    432424
    433425Arguments:
    434426  code         pointer to the start of the group
    435   skipassert   true if certain assertions are to be skipped
    436 
    437427Returns:       pointer to the first significant opcode
    438428*/
    439429
    440 static const uschar* firstSignificantOpCode(const uschar* code)
     430static const unsigned char* firstSignificantOpcode(const unsigned char* code)
    441431{
    442432    while (*code == OP_BRANUMBER)
    443         code += OP_lengths[*code];
     433        code += 3;
    444434    return code;
    445435}
    446436
    447 static const uschar* firstSignificantOpCodeSkippingAssertions(const uschar* code)
     437static const unsigned char* firstSignificantOpcodeSkippingAssertions(const unsigned char* code)
    448438{
    449439    while (true) {
    450440        switch (*code) {
    451         case OP_ASSERT_NOT:
    452             do {
    453                 code += getOpcodeValueAtOffset(code, 1);
    454             } while (*code == OP_ALT);
    455             code += OP_lengths[*code];
    456             break;
    457         case OP_WORD_BOUNDARY:
    458         case OP_NOT_WORD_BOUNDARY:
    459         case OP_BRANUMBER:
    460             code += OP_lengths[*code];
    461             break;
    462         default:
    463             return code;
     441            case OP_ASSERT_NOT:
     442                advanceToEndOfBracket(code);
     443                code += 1 + LINK_SIZE;
     444                break;
     445            case OP_WORD_BOUNDARY:
     446            case OP_NOT_WORD_BOUNDARY:
     447                ++code;
     448                break;
     449            case OP_BRANUMBER:
     450                code += 3;
     451                break;
     452            default:
     453                return code;
    464454        }
    465455    }
    466     ASSERT_NOT_REACHED();
    467456}
    468 
    469 
    470 /*************************************************
    471 *        Find the fixed length of a pattern      *
    472 *************************************************/
    473 
    474 /* Scan a pattern and compute the fixed length of subject that will match it,
    475 if the length is fixed. This is needed for dealing with backward assertions.
    476 In UTF8 mode, the result is in characters rather than bytes.
    477 
    478 Arguments:
    479   code     points to the start of the pattern (the bracket)
    480   options  the compiling options
    481 
    482 Returns:   the fixed length, or -1 if there is no fixed length,
    483              or -2 if \C was encountered
    484 */
    485 
    486 static int find_fixedlength(uschar* code, int options)
    487 {
    488     int length = -1;
    489    
    490     int branchlength = 0;
    491     uschar* cc = code + 1 + LINK_SIZE;
    492    
    493     /* Scan along the opcodes for this branch. If we get to the end of the
    494      branch, check the length against that of the other branches. */
    495    
    496     while (true) {
    497         int d;
    498         int op = *cc;
    499         if (op >= OP_BRA)
    500             op = OP_BRA;
    501        
    502         switch (op) {
    503             case OP_BRA:
    504             case OP_ONCE:
    505                 d = find_fixedlength(cc, options);
    506                 if (d < 0)
    507                     return d;
    508                 branchlength += d;
    509                 do {
    510                     cc += getOpcodeValueAtOffset(cc, 1);
    511                 } while (*cc == OP_ALT);
    512                 cc += 1 + LINK_SIZE;
    513                 break;
    514                
    515                 /* Reached end of a branch; if it's a ket it is the end of a nested
    516                  call. If it's ALT it is an alternation in a nested call. If it is
    517                  END it's the end of the outer call. All can be handled by the same code. */
    518                
    519             case OP_ALT:
    520             case OP_KET:
    521             case OP_KETRMAX:
    522             case OP_KETRMIN:
    523             case OP_END:
    524                 if (length < 0)
    525                     length = branchlength;
    526                 else if (length != branchlength)
    527                     return -1;
    528                 if (*cc != OP_ALT)
    529                     return length;
    530                 cc += 1 + LINK_SIZE;
    531                 branchlength = 0;
    532                 break;
    533                
    534                 /* Skip over assertive subpatterns */
    535                
    536             case OP_ASSERT:
    537             case OP_ASSERT_NOT:
    538                 do {
    539                     cc += getOpcodeValueAtOffset(cc, 1);
    540                 } while (*cc == OP_ALT);
    541                 /* Fall through */
    542                
    543                 /* Skip over things that don't match chars */
    544                
    545             case OP_BRANUMBER:
    546             case OP_CIRC:
    547             case OP_DOLL:
    548             case OP_NOT_WORD_BOUNDARY:
    549             case OP_WORD_BOUNDARY:
    550                 cc += OP_lengths[*cc];
    551                 break;
    552                
    553                 /* Handle literal characters */
    554                
    555             case OP_CHAR:
    556             case OP_CHAR_IGNORING_CASE:
    557             case OP_NOT:
    558                 branchlength++;
    559                 cc += 2;
    560                 while ((*cc & 0xc0) == 0x80)
    561                     cc++;
    562                 break;
    563                
    564             case OP_ASCII_CHAR:
    565             case OP_ASCII_LETTER_IGNORING_CASE:
    566                 branchlength++;
    567                 cc += 2;
    568                 break;
    569                
    570                 /* Handle exact repetitions. The count is already in characters, but we
    571                  need to skip over a multibyte character in UTF8 mode.  */
    572                
    573             case OP_EXACT:
    574                 branchlength += get2ByteOpcodeValueAtOffset(cc,1);
    575                 cc += 4;
    576                 while((*cc & 0x80) == 0x80)
    577                     cc++;
    578                 break;
    579                
    580             case OP_TYPEEXACT:
    581                 branchlength += get2ByteOpcodeValueAtOffset(cc,1);
    582                 cc += 4;
    583                 break;
    584                
    585                 /* Handle single-char matchers */
    586                
    587             case OP_NOT_DIGIT:
    588             case OP_DIGIT:
    589             case OP_NOT_WHITESPACE:
    590             case OP_WHITESPACE:
    591             case OP_NOT_WORDCHAR:
    592             case OP_WORDCHAR:
    593             case OP_NOT_NEWLINE:
    594                 branchlength++;
    595                 cc++;
    596                 break;
    597                
    598                 /* Check a class for variable quantification */
    599                
    600             case OP_XCLASS:
    601                 cc += getOpcodeValueAtOffset(cc, 1) - 33;
    602                 /* Fall through */
    603                
    604             case OP_CLASS:
    605             case OP_NCLASS:
    606                 cc += 33;
    607                
    608                 switch (*cc) {
    609                 case OP_CRSTAR:
    610                 case OP_CRMINSTAR:
    611                 case OP_CRQUERY:
    612                 case OP_CRMINQUERY:
    613                     return -1;
    614                    
    615                 case OP_CRRANGE:
    616                 case OP_CRMINRANGE:
    617                     if (get2ByteOpcodeValueAtOffset(cc, 1) != get2ByteOpcodeValueAtOffset(cc, 3))
    618                         return -1;
    619                     branchlength += get2ByteOpcodeValueAtOffset(cc, 1);
    620                     cc += 5;
    621                     break;
    622                    
    623                 default:
    624                     branchlength++;
    625                 }
    626                 break;
    627                
    628                 /* Anything else is variable length */
    629                
    630             default:
    631                 return -1;
    632         }
    633     }
    634     ASSERT_NOT_REACHED();
    635 }
    636 
    637 
    638 /*************************************************
    639 *         Complete a callout item                *
    640 *************************************************/
    641 
    642 /* A callout item contains the length of the next item in the pattern, which
    643 we can't fill in till after we have reached the relevant point. This is used
    644 for both automatic and manual callouts.
    645 
    646 Arguments:
    647   previous_callout   points to previous callout item
    648   ptr                current pattern pointer
    649   cd                 pointers to tables etc
    650 */
    651 
    652 static void complete_callout(uschar* previous_callout, const UChar* ptr, const CompileData& cd)
    653 {
    654     int length = ptr - cd.start_pattern - getOpcodeValueAtOffset(previous_callout, 2);
    655     putOpcodeValueAtOffset(previous_callout, 2 + LINK_SIZE, length);
    656 }
    657 
    658 
    659457
    660458/*************************************************
     
    676474*/
    677475
    678 static bool get_othercase_range(int* cptr, int d, int* ocptr, int* odptr)
     476static bool getOthercaseRange(int* cptr, int d, int* ocptr, int* odptr)
    679477{
    680478    int c, othercase = 0;
    681479   
    682480    for (c = *cptr; c <= d; c++) {
    683         if ((othercase = _pcre_ucp_othercase(c)) >= 0)
     481        if ((othercase = kjs_pcre_ucp_othercase(c)) >= 0)
    684482            break;
    685483    }
     
    692490   
    693491    for (++c; c <= d; c++) {
    694         if (_pcre_ucp_othercase(c) != next)
     492        if (kjs_pcre_ucp_othercase(c) != next)
    695493            break;
    696494        next++;
     
    717515 */
    718516
    719 // FIXME: This should be removed as soon as all UTF8 uses are removed from PCRE
    720 int _pcre_ord2utf8(int cvalue, uschar *buffer)
     517static int encodeUTF8(int cvalue, unsigned char *buffer)
    721518{
    722519    int i;
    723     for (i = 0; i < _pcre_utf8_table1_size; i++)
    724         if (cvalue <= _pcre_utf8_table1[i])
     520    for (i = 0; i < kjs_pcre_utf8_table1_size; i++)
     521        if (cvalue <= kjs_pcre_utf8_table1[i])
    725522            break;
    726523    buffer += i;
     
    729526        cvalue >>= 6;
    730527    }
    731     *buffer = _pcre_utf8_table2[i] | cvalue;
     528    *buffer = kjs_pcre_utf8_table2[i] | cvalue;
    732529    return i + 1;
    733530}
     
    759556
    760557static bool
    761 compileBranch(int options, int* brackets, uschar** codeptr,
     558compileBranch(int options, int* brackets, unsigned char** codeptr,
    762559               const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int *firstbyteptr,
    763560               int* reqbyteptr, CompileData& cd)
     
    767564    int bravalue = 0;
    768565    int reqvary, tempreqvary;
    769     int after_manual_callout = 0;
    770566    int c;
    771     uschar* code = *codeptr;
    772     uschar* tempcode;
     567    unsigned char* code = *codeptr;
     568    unsigned char* tempcode;
    773569    bool groupsetfirstbyte = false;
    774570    const UChar* ptr = *ptrptr;
    775571    const UChar* tempptr;
    776     uschar* previous = NULL;
    777     uschar* previous_callout = NULL;
    778     uschar classbits[32];
     572    unsigned char* previous = NULL;
     573    unsigned char classbits[32];
    779574   
    780575    bool class_utf8;
    781     uschar* class_utf8data;
    782     uschar utf8_char[6];
     576    unsigned char* class_utf8data;
     577    unsigned char utf8_char[6];
    783578   
    784579    /* Initialize no first byte, no required byte. REQ_UNSET means "no char
     
    815610        int subfirstbyte;
    816611        int mclength;
    817         uschar mcbuffer[8];
     612        unsigned char mcbuffer[8];
    818613       
    819614        /* Next byte in the pattern */
     
    824619         a quantifier. */
    825620       
    826         bool is_quantifier = c == '*' || c == '+' || c == '?' || (c == '{' && is_counted_repeat(ptr + 1, patternEnd));
    827        
    828         if (!is_quantifier && previous_callout && after_manual_callout-- <= 0) {
    829             complete_callout(previous_callout, ptr, cd);
    830             previous_callout = NULL;
    831         }
     621        bool is_quantifier = c == '*' || c == '+' || c == '?' || (c == '{' && isCountedRepeat(ptr + 1, patternEnd));
    832622       
    833623        switch (c) {
     
    922712                 bit map. */
    923713               
    924                 memset(classbits, 0, 32 * sizeof(uschar));
     714                memset(classbits, 0, 32 * sizeof(unsigned char));
    925715               
    926716                /* Process characters until ] is reached. The first pass
     
    939729                   
    940730                    if (c == '\\') {
    941                         c = check_escape(&ptr, patternEnd, errorcodeptr, *brackets, true);
     731                        c = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, true);
    942732                        if (c < 0) {
    943733                            class_charcount += 2;     /* Greater than 1 is what matters */
     
    1006796                        if (d == '\\') {
    1007797                            const UChar* oldptr = ptr;
    1008                             d = check_escape(&ptr, patternEnd, errorcodeptr, *brackets, true);
     798                            d = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, true);
    1009799                           
    1010800                            /* \X is literal X; any other special means the '-' was literal */
     
    1037827                                int cc = c;
    1038828                                int origd = d;
    1039                                 while (get_othercase_range(&cc, origd, &occ, &ocd)) {
     829                                while (getOthercaseRange(&cc, origd, &occ, &ocd)) {
    1040830                                    if (occ >= c && ocd <= d)
    1041831                                        continue;  /* Skip embedded ranges */
     
    1056846                                    else {
    1057847                                        *class_utf8data++ = XCL_RANGE;
    1058                                         class_utf8data += _pcre_ord2utf8(occ, class_utf8data);
     848                                        class_utf8data += encodeUTF8(occ, class_utf8data);
    1059849                                    }
    1060                                     class_utf8data += _pcre_ord2utf8(ocd, class_utf8data);
     850                                    class_utf8data += encodeUTF8(ocd, class_utf8data);
    1061851                                }
    1062852                            }
     
    1066856                           
    1067857                            *class_utf8data++ = XCL_RANGE;
    1068                             class_utf8data += _pcre_ord2utf8(c, class_utf8data);
    1069                             class_utf8data += _pcre_ord2utf8(d, class_utf8data);
     858                            class_utf8data += encodeUTF8(c, class_utf8data);
     859                            class_utf8data += encodeUTF8(d, class_utf8data);
    1070860                           
    1071861                            /* With UCP support, we are done. Without UCP support, there is no
     
    1104894                        class_utf8 = true;
    1105895                        *class_utf8data++ = XCL_SINGLE;
    1106                         class_utf8data += _pcre_ord2utf8(c, class_utf8data);
     896                        class_utf8data += encodeUTF8(c, class_utf8data);
    1107897                       
    1108898                        if (options & IgnoreCaseOption) {
    1109899                            int othercase;
    1110                             if ((othercase = _pcre_ucp_othercase(c)) >= 0) {
     900                            if ((othercase = kjs_pcre_ucp_othercase(c)) >= 0) {
    1111901                                *class_utf8data++ = XCL_SINGLE;
    1112                                 class_utf8data += _pcre_ord2utf8(othercase, class_utf8data);
     902                                class_utf8data += encodeUTF8(othercase, class_utf8data);
    1113903                            }
    1114904                        }
     
    1198988                    /* Now fill in the complete length of the item */
    1199989                   
    1200                     putOpcodeValueAtOffset(previous, 1, code - previous);
     990                    putLinkValue(previous + 1, code - previous);
    1201991                    break;   /* End of class handling */
    1202992                }
     
    12231013                if (!is_quantifier)
    12241014                    goto NORMAL_CHAR;
    1225                 ptr = read_repeat_counts(ptr+1, &repeat_min, &repeat_max, errorcodeptr);
     1015                ptr = readRepeatCounts(ptr + 1, &repeat_min, &repeat_max, errorcodeptr);
    12261016                if (*errorcodeptr)
    12271017                    goto FAILED;
     
    12611051                /* Save start of previous item, in case we have to move it up to make space
    12621052                 for an inserted OP_ONCE for the additional '+' extension. */
     1053                /* FIXME: Probably don't need this because we don't use OP_ONCE. */
    12631054               
    12641055                tempcode = previous;
     
    12891080                   
    12901081                    if (code[-1] & 0x80) {
    1291                         uschar *lastchar = code - 1;
     1082                        unsigned char *lastchar = code - 1;
    12921083                        while((*lastchar & 0xc0) == 0x80)
    12931084                            lastchar--;
     
    13351126                    int prop_value = -1;
    13361127                   
    1337                     uschar* oldcode = code;
     1128                    unsigned char* oldcode = code;
    13381129                    code = previous;                  /* Usually overwrite previous item */
    13391130                   
     
    13581149                        else {
    13591150                            *code++ = OP_UPTO + repeat_type;
    1360                             put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max);
     1151                            put2ByteValueAndAdvance(code, repeat_max);
    13611152                        }
    13621153                    }
     
    13751166                                goto END_REPEAT;
    13761167                            *code++ = OP_UPTO + repeat_type;
    1377                             put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max - 1);
     1168                            put2ByteValueAndAdvance(code, repeat_max - 1);
    13781169                        }
    13791170                    }
     
    13841175                    else {
    13851176                        *code++ = OP_EXACT + op_type;  /* NB EXACT doesn't have repeat_type */
    1386                         put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_min);
     1177                        put2ByteValueAndAdvance(code, repeat_min);
    13871178                       
    13881179                        /* If the maximum is unlimited, insert an OP_STAR. Before doing so,
     
    14211212                            repeat_max -= repeat_min;
    14221213                            *code++ = OP_UPTO + repeat_type;
    1423                             put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max);
     1214                            put2ByteValueAndAdvance(code, repeat_max);
    14241215                        }
    14251216                    }
     
    14631254                    else {
    14641255                        *code++ = OP_CRRANGE + repeat_type;
    1465                         put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_min);
     1256                        put2ByteValueAndAdvance(code, repeat_min);
    14661257                        if (repeat_max == -1)
    14671258                            repeat_max = 0;  /* 2-byte encoding for max */
    1468                         put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, repeat_max);
     1259                        put2ByteValueAndAdvance(code, repeat_max);
    14691260                    }
    14701261                }
     
    14731264                 cases. */
    14741265               
    1475                 else if (*previous >= OP_BRA || *previous == OP_ONCE) {
     1266                else if (*previous >= OP_BRA) {
    14761267                    int ketoffset = 0;
    14771268                    int len = code - previous;
    1478                     uschar* bralink = NULL;
     1269                    unsigned char* bralink = NULL;
    14791270                   
    14801271                    /* If the maximum repeat count is unlimited, find the end of the bracket
     
    14851276                   
    14861277                    if (repeat_max == -1) {
    1487                         uschar* ket = previous;
    1488                         do {
    1489                             ket += getOpcodeValueAtOffset(ket, 1);
    1490                         } while (*ket != OP_KET);
     1278                        const unsigned char* ket = previous;
     1279                        advanceToEndOfBracket(ket);
    14911280                        ketoffset = code - ket;
    14921281                    }
     
    15401329                            int offset = (!bralink) ? 0 : previous - bralink;
    15411330                            bralink = previous;
    1542                             putOpcodeValueAtOffsetAndAdvance(previous, 0, offset);
     1331                            putLinkValueAllowZeroAndAdvance(previous, offset);
    15431332                        }
    15441333                       
     
    15811370                                int offset = (!bralink) ? 0 : code - bralink;
    15821371                                bralink = code;
    1583                                 putOpcodeValueAtOffsetAndAdvance(code, 0, offset);
     1372                                putLinkValueAllowZeroAndAdvance(code, offset);
    15841373                            }
    15851374                           
     
    15931382                        while (bralink) {
    15941383                            int offset = code - bralink + 1;
    1595                             uschar* bra = code - offset;
    1596                             int oldlinkoffset = getOpcodeValueAtOffset(bra, 1);
    1597                             bralink = oldlinkoffset ? bralink - oldlinkoffset : 0;
     1384                            unsigned char* bra = code - offset;
     1385                            int oldlinkoffset = getLinkValueAllowZero(bra + 1);
     1386                            bralink = (!oldlinkoffset) ? 0 : bralink - oldlinkoffset;
    15981387                            *code++ = OP_KET;
    1599                             putOpcodeValueAtOffsetAndAdvance(code, 0, offset);
    1600                             putOpcodeValueAtOffset(bra, 1, offset);
     1388                            putLinkValueAndAdvance(code, offset);
     1389                            putLinkValue(bra + 1, offset);
    16011390                        }
    16021391                    }
     
    16391428                if (*(++ptr) == '?') {
    16401429                    switch (*(++ptr)) {
    1641                     case ':':                 /* Non-extracting bracket */
    1642                         bravalue = OP_BRA;
    1643                         ptr++;
    1644                         break;
    1645                        
    1646                     case '=':                 /* Positive lookahead */
    1647                         bravalue = OP_ASSERT;
    1648                         ptr++;
    1649                         break;
    1650                        
    1651                     case '!':                 /* Negative lookahead */
    1652                         bravalue = OP_ASSERT_NOT;
    1653                         ptr++;
    1654                         break;
    1655                        
     1430                        case ':':                 /* Non-extracting bracket */
     1431                            bravalue = OP_BRA;
     1432                            ptr++;
     1433                            break;
     1434                           
     1435                        case '=':                 /* Positive lookahead */
     1436                            bravalue = OP_ASSERT;
     1437                            ptr++;
     1438                            break;
     1439                           
     1440                        case '!':                 /* Negative lookahead */
     1441                            bravalue = OP_ASSERT_NOT;
     1442                            ptr++;
     1443                            break;
     1444                           
    16561445                        /* Character after (? not specially recognized */
    1657                        
    1658                     default:                  /* Option setting */
    1659                         *errorcodeptr = ERR12;
    1660                         goto FAILED;
    1661                     }
     1446                           
     1447                        default:
     1448                            *errorcodeptr = ERR12;
     1449                            goto FAILED;
     1450                        }
    16621451                }
    16631452               
     
    16701459                        bravalue = OP_BRA + EXTRACT_BASIC_MAX + 1;
    16711460                        code[1 + LINK_SIZE] = OP_BRANUMBER;
    1672                         put2ByteOpcodeValueAtOffset(code, 2+LINK_SIZE, *brackets);
     1461                        put2ByteValue(code + 2 + LINK_SIZE, *brackets);
    16731462                        skipbytes = 3;
    16741463                    }
     
    16821471                 new setting for the ims options if they have changed. */
    16831472               
    1684                 previous = (bravalue >= OP_ONCE) ? code : 0;
     1473                previous = (bravalue >= OP_BRAZERO) ? code : 0;
    16851474                *code = bravalue;
    16861475                tempcode = code;
     
    17151504                groupsetfirstbyte = false;
    17161505               
    1717                 if (bravalue >= OP_BRA || bravalue == OP_ONCE) {
     1506                if (bravalue >= OP_BRA) {
    17181507                    /* If we have not yet set a firstbyte in this branch, take it from the
    17191508                     subpattern, remembering that it was set here so that a repeat of more
     
    17751564            case '\\':
    17761565                tempptr = ptr;
    1777                 c = check_escape(&ptr, patternEnd, errorcodeptr, *brackets, false);
     1566                c = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, false);
    17781567               
    17791568                /* Handle metacharacters introduced by \. For ones like \d, the ESC_ values
     
    18021591                        previous = code;
    18031592                        *code++ = OP_REF;
    1804                         put2ByteOpcodeValueAtOffsetAndAdvance(code, 0, number);
     1593                        put2ByteValueAndAdvance(code, number);
    18051594                    }
    18061595                   
     
    18381627                    }
    18391628                } else {
    1840                     mclength = _pcre_ord2utf8(c, mcbuffer);
     1629                    mclength = encodeUTF8(c, mcbuffer);
    18411630                   
    18421631                    *code++ = (options & IgnoreCaseOption) ? OP_CHAR_IGNORING_CASE : OP_CHAR;
     
    18881677    return false;
    18891678}
    1890 
    1891 
    1892 
    18931679
    18941680/*************************************************
     
    19191705
    19201706static bool
    1921 compileBracket(int options, int* brackets, uschar** codeptr,
     1707compileBracket(int options, int* brackets, unsigned char** codeptr,
    19221708    const UChar** ptrptr, const UChar* patternEnd, ErrorCode* errorcodeptr, int skipbytes,
    19231709    int* firstbyteptr, int* reqbyteptr, CompileData& cd)
    19241710{
    19251711    const UChar* ptr = *ptrptr;
    1926     uschar* code = *codeptr;
    1927     uschar* last_branch = code;
    1928     uschar* start_bracket = code;
     1712    unsigned char* code = *codeptr;
     1713    unsigned char* last_branch = code;
     1714    unsigned char* start_bracket = code;
    19291715    int firstbyte = REQ_UNSET;
    19301716    int reqbyte = REQ_UNSET;
     
    19321718    /* Offset is set zero to mark that this bracket is still open */
    19331719   
    1934     putOpcodeValueAtOffset(code, 1, 0);
     1720    putLinkValueAllowZero(code + 1, 0);
    19351721    code += 1 + LINK_SIZE + skipbytes;
    19361722   
     
    19981784            int length = code - last_branch;
    19991785            do {
    2000                 int prev_length = getOpcodeValueAtOffset(last_branch, 1);
    2001                 putOpcodeValueAtOffset(last_branch, 1, length);
     1786                int prev_length = getLinkValueAllowZero(last_branch + 1);
     1787                putLinkValue(last_branch + 1, length);
    20021788                length = prev_length;
    20031789                last_branch -= length;
     
    20071793           
    20081794            *code = OP_KET;
    2009             putOpcodeValueAtOffset(code, 1, code - start_bracket);
     1795            putLinkValue(code + 1, code - start_bracket);
    20101796            code += 1 + LINK_SIZE;
    20111797           
     
    20251811       
    20261812        *code = OP_ALT;
    2027         putOpcodeValueAtOffset(code, 1, code - last_branch);
     1813        putLinkValue(code + 1, code - last_branch);
    20281814        last_branch = code;
    20291815        code += 1 + LINK_SIZE;
     
    20321818    ASSERT_NOT_REACHED();
    20331819}
    2034 
    20351820
    20361821/*************************************************
     
    20511836*/
    20521837
    2053 static bool branchIsAnchored(const uschar* code)
     1838static bool branchIsAnchored(const unsigned char* code)
    20541839{
    2055     const uschar* scode = firstSignificantOpCode(code);
     1840    const unsigned char* scode = firstSignificantOpcode(code);
    20561841    int op = *scode;
    20571842
    20581843    /* Brackets */
    2059     if (op >= OP_BRA || op == OP_ASSERT || op == OP_ONCE)
     1844    if (op >= OP_BRA || op == OP_ASSERT)
    20601845        return bracketIsAnchored(scode);
    20611846
     
    20641849}
    20651850
    2066 static bool bracketIsAnchored(const uschar* code)
     1851static bool bracketIsAnchored(const unsigned char* code)
    20671852{
    20681853    do {
    20691854        if (!branchIsAnchored(code + 1 + LINK_SIZE))
    20701855            return false;
    2071         code += getOpcodeValueAtOffset(code, 1);
     1856        code += getLinkValue(code + 1);
    20721857    } while (*code == OP_ALT);   /* Loop for each alternative */
    20731858    return true;
     
    20961881*/
    20971882
    2098 static bool branchNeedsLineStart(const uschar* code, unsigned captureMap, unsigned backrefMap)
     1883static bool branchNeedsLineStart(const unsigned char* code, unsigned captureMap, unsigned backrefMap)
    20991884{
    2100     const uschar* scode = firstSignificantOpCode(code);
     1885    const unsigned char* scode = firstSignificantOpcode(code);
    21011886    int op = *scode;
    21021887   
     
    21051890        int captureNum = op - OP_BRA;
    21061891        if (captureNum > EXTRACT_BASIC_MAX)
    2107             captureNum = get2ByteOpcodeValueAtOffset(scode, 2 + LINK_SIZE);
     1892            captureNum = get2ByteValue(scode + 2 + LINK_SIZE);
    21081893        int bracketMask = (captureNum < 32) ? (1 << captureNum) : 1;
    21091894        return bracketNeedsLineStart(scode, captureMap | bracketMask, backrefMap);
     
    21111896   
    21121897    /* Other brackets */
    2113     if (op == OP_BRA || op == OP_ASSERT || op == OP_ONCE)
     1898    if (op == OP_BRA || op == OP_ASSERT)
    21141899        return bracketNeedsLineStart(scode, captureMap, backrefMap);
    21151900   
     
    21241909}
    21251910
    2126 static bool bracketNeedsLineStart(const uschar* code, unsigned captureMap, unsigned backrefMap)
     1911static bool bracketNeedsLineStart(const unsigned char* code, unsigned captureMap, unsigned backrefMap)
    21271912{
    21281913    do {
    21291914        if (!branchNeedsLineStart(code + 1 + LINK_SIZE, captureMap, backrefMap))
    21301915            return false;
    2131         code += getOpcodeValueAtOffset(code, 1);
     1916        code += getLinkValue(code + 1);
    21321917    } while (*code == OP_ALT);  /* Loop for each alternative */
    21331918    return true;
     
    21541939*/
    21551940
    2156 static int branchFindFirstAssertedCharacter(const uschar* code, bool inassert)
     1941static int branchFindFirstAssertedCharacter(const unsigned char* code, bool inassert)
    21571942{
    2158     const uschar* scode = firstSignificantOpCodeSkippingAssertions(code);
     1943    const unsigned char* scode = firstSignificantOpcodeSkippingAssertions(code);
    21591944    int op = *scode;
    21601945   
     
    21681953        case OP_BRA:
    21691954        case OP_ASSERT:
    2170         case OP_ONCE:
    21711955            return bracketFindFirstAssertedCharacter(scode, op == OP_ASSERT);
    21721956
     
    21871971}
    21881972
    2189 static int bracketFindFirstAssertedCharacter(const uschar* code, bool inassert)
     1973static int bracketFindFirstAssertedCharacter(const unsigned char* code, bool inassert)
    21901974{
    21911975    int c = -1;
     
    21981982        else if (c != d)
    21991983            return -1;
    2200         code += getOpcodeValueAtOffset(code, 1);
     1984        code += getLinkValue(code + 1);
    22011985    } while (*code == OP_ALT);
    22021986    return c;
     
    22182002    unsigned brastackptr = 0;
    22192003    int brastack[BRASTACK_SIZE];
    2220     uschar bralenstack[BRASTACK_SIZE];
     2004    unsigned char bralenstack[BRASTACK_SIZE];
    22212005    int bracount = 0;
    22222006   
     
    22332017
    22342018            case '\\':
    2235                 c = check_escape(&ptr, patternEnd, &errorcode, bracount, false);
     2019                c = checkEscape(&ptr, patternEnd, &errorcode, bracount, false);
    22362020                if (errorcode != 0)
    22372021                    return -1;
     
    22442028                    if (c > 127) {
    22452029                        int i;
    2246                         for (i = 0; i < _pcre_utf8_table1_size; i++)
    2247                             if (c <= _pcre_utf8_table1[i]) break;
     2030                        for (i = 0; i < kjs_pcre_utf8_table1_size; i++)
     2031                            if (c <= kjs_pcre_utf8_table1[i]) break;
    22482032                        length += i;
    22492033                        lastitemlength += i;
     
    22672051                        cd.top_backref = refnum;
    22682052                    length += 2;   /* For single back reference */
    2269                     if (safelyCheckNextChar(ptr, patternEnd, '{') && is_counted_repeat(ptr + 2, patternEnd)) {
    2270                         ptr = read_repeat_counts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
     2053                    if (safelyCheckNextChar(ptr, patternEnd, '{') && isCountedRepeat(ptr + 2, patternEnd)) {
     2054                        ptr = readRepeatCounts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
    22712055                        if (errorcode)
    22722056                            return -1;
     
    22992083
    23002084            case '{':
    2301                 if (!is_counted_repeat(ptr+1, patternEnd))
     2085                if (!isCountedRepeat(ptr + 1, patternEnd))
    23022086                    goto NORMAL_CHAR;
    2303                 ptr = read_repeat_counts(ptr+1, &minRepeats, &maxRepeats, &errorcode);
     2087                ptr = readRepeatCounts(ptr + 1, &minRepeats, &maxRepeats, &errorcode);
    23042088                if (errorcode != 0)
    23052089                    return -1;
     
    23662150                   
    23672151                    if (*ptr == '\\') {
    2368                         c = check_escape(&ptr, patternEnd, &errorcode, bracount, true);
     2152                        c = checkEscape(&ptr, patternEnd, &errorcode, bracount, true);
    23692153                        if (errorcode != 0)
    23702154                            return -1;
     
    24012185                            if (safelyCheckNextChar(ptr, patternEnd, '\\')) {
    24022186                                ptr++;
    2403                                 d = check_escape(&ptr, patternEnd, &errorcode, bracount, true);
     2187                                d = checkEscape(&ptr, patternEnd, &errorcode, bracount, true);
    24042188                                if (errorcode != 0)
    24052189                                    return -1;
     
    24222206                           
    24232207                            if ((d > 255 || (ignoreCase && d > 127))) {
    2424                                 uschar buffer[6];
     2208                                unsigned char buffer[6];
    24252209                                if (!class_utf8)         /* Allow for XCLASS overhead */
    24262210                                {
     
    24392223                                    int cc = c;
    24402224                                    int origd = d;
    2441                                     while (get_othercase_range(&cc, origd, &occ, &ocd)) {
     2225                                    while (getOthercaseRange(&cc, origd, &occ, &ocd)) {
    24422226                                        if (occ >= c && ocd <= d)
    24432227                                            continue;   /* Skip embedded */
     
    24562240                                        /* An extra item is needed */
    24572241                                       
    2458                                         length += 1 + _pcre_ord2utf8(occ, buffer) +
    2459                                         ((occ == ocd) ? 0 : _pcre_ord2utf8(ocd, buffer));
     2242                                        length += 1 + encodeUTF8(occ, buffer) +
     2243                                        ((occ == ocd) ? 0 : encodeUTF8(ocd, buffer));
    24602244                                    }
    24612245                                }
     
    24632247                                /* The length of the (possibly extended) range */
    24642248                               
    2465                                 length += 1 + _pcre_ord2utf8(c, buffer) + _pcre_ord2utf8(d, buffer);
     2249                                length += 1 + encodeUTF8(c, buffer) + encodeUTF8(d, buffer);
    24662250                            }
    24672251                           
     
    24752259                        else {
    24762260                            if ((c > 255 || (ignoreCase && c > 127))) {
    2477                                 uschar buffer[6];
     2261                                unsigned char buffer[6];
    24782262                                class_optcount = 10;     /* Ensure > 1 */
    24792263                                if (!class_utf8)         /* Allow for XCLASS overhead */
     
    24822266                                    length += LINK_SIZE + 2;
    24832267                                }
    2484                                 length += (ignoreCase ? 2 : 1) * (1 + _pcre_ord2utf8(c, buffer));
     2268                                length += (ignoreCase ? 2 : 1) * (1 + encodeUTF8(c, buffer));
    24852269                            }
    24862270                        }
     
    25082292                     we also need extra for wrapping the whole thing in a sub-pattern. */
    25092293                   
    2510                     if (safelyCheckNextChar(ptr, patternEnd, '{') && is_counted_repeat(ptr+2, patternEnd)) {
    2511                         ptr = read_repeat_counts(ptr+2, &minRepeats, &maxRepeats, &errorcode);
     2294                    if (safelyCheckNextChar(ptr, patternEnd, '{') && isCountedRepeat(ptr + 2, patternEnd)) {
     2295                        ptr = readRepeatCounts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
    25122296                        if (errorcode != 0)
    25132297                            return -1;
     
    25382322                if (safelyCheckNextChar(ptr, patternEnd, '?')) {
    25392323                    switch (c = (ptr + 2 < patternEnd ? ptr[2] : 0)) {
    2540                             /* Non-referencing groups and lookaheads just move the pointer on, and
    2541                              then behave like a non-special bracket, except that they don't increment
    2542                              the count of extracting brackets. Ditto for the "once only" bracket,
    2543                              which is in Perl from version 5.005. */
     2324                        /* Non-referencing groups and lookaheads just move the pointer on, and
     2325                         then behave like a non-special bracket, except that they don't increment
     2326                         the count of extracting brackets. Ditto for the "once only" bracket,
     2327                         which is in Perl from version 5.005. */
    25442328                           
    25452329                        case ':':
     
    25492333                            break;
    25502334                           
    2551                             /* Else loop checking valid options until ) is met. Anything else is an
    2552                              error. If we are without any brackets, i.e. at top level, the settings
    2553                              act as if specified in the options, so massage the options immediately.
    2554                              This is for backward compatibility with Perl 5.004. */
     2335                        /* Else loop checking valid options until ) is met. Anything else is an
     2336                         error. If we are without any brackets, i.e. at top level, the settings
     2337                         act as if specified in the options, so massage the options immediately.
     2338                         This is for backward compatibility with Perl 5.004. */
    25552339                           
    25562340                        default:
     
    26052389                    duplength = 0;
    26062390               
    2607                 /* Leave ptr at the final char; for read_repeat_counts this happens
     2391                /* Leave ptr at the final char; for readRepeatCounts this happens
    26082392                 automatically; for the others we need an increment. */
    26092393               
    2610                 if ((ptr + 1 < patternEnd) && (c = ptr[1]) == '{' && is_counted_repeat(ptr+2, patternEnd)) {
    2611                     ptr = read_repeat_counts(ptr+2, &minRepeats, &maxRepeats, &errorcode);
     2394                if ((ptr + 1 < patternEnd) && (c = ptr[1]) == '{' && isCountedRepeat(ptr + 2, patternEnd)) {
     2395                    ptr = readRepeatCounts(ptr + 2, &minRepeats, &maxRepeats, &errorcode);
    26122396                    if (errorcode)
    26132397                        return -1;
     
    26722456                if (c > 127) {
    26732457                    int i;
    2674                     for (i = 0; i < _pcre_utf8_table1_size; i++)
    2675                         if (c <= _pcre_utf8_table1[i])
     2458                    for (i = 0; i < kjs_pcre_utf8_table1_size; i++)
     2459                        if (c <= kjs_pcre_utf8_table1[i])
    26762460                            break;
    26772461                    length += i;
     
    27092493*/
    27102494
    2711 static JSRegExp* returnError(ErrorCode errorcode, const char** errorptr)
     2495static inline JSRegExp* returnError(ErrorCode errorcode, const char** errorptr)
    27122496{
    2713     *errorptr = error_text(errorcode);
     2497    *errorptr = errorText(errorcode);
    27142498    return 0;
    27152499}
     
    27462530     passed around in the compile data block. */
    27472531   
    2748     const uschar* codeStart = (const uschar*)(re + 1);
     2532    const unsigned char* codeStart = (const unsigned char*)(re + 1);
    27492533    cd.start_code = codeStart;
    27502534    cd.start_pattern = (const UChar*)pattern;
     
    27562540    const UChar* ptr = (const UChar*)pattern;
    27572541    const UChar* patternEnd = pattern + patternLength;
    2758     uschar* code = (uschar*)codeStart;
     2542    unsigned char* code = (unsigned char*)codeStart;
    27592543    int firstbyte, reqbyte;
    27602544    int bracketCount = 0;
  • trunk/JavaScriptCore/pcre/pcre_exec.cpp

    r28627 r28793  
    7474    struct {
    7575        const UChar* subjectPtr;
    76         const uschar* instructionPtr;
    77         int offset_top;
     76        const unsigned char* instructionPtr;
     77        int offsetTop;
    7878        const UChar* subpatternStart;
    7979    } args;
     
    8484     store local variables on the current MatchFrame. */
    8585    struct {
    86         const uschar* data;
    87         const uschar* startOfRepeatingBracket;
     86        const unsigned char* data;
     87        const unsigned char* startOfRepeatingBracket;
    8888        const UChar* subjectPtrAtStartOfInstruction; // Several instrutions stash away a subjectPtr here for later compare
    89         const uschar* instructionPtrAtStartOfOnce;
     89        const unsigned char* instructionPtrAtStartOfOnce;
    9090       
    91         int repeat_othercase;
     91        int repeatOthercase;
    9292       
    9393        int ctype;
     
    9898        int number;
    9999        int offset;
    100         int save_offset1;
    101         int save_offset2;
    102         int save_offset3;
     100        int saveOffset1;
     101        int saveOffset2;
     102        int saveOffset3;
    103103       
    104104        const UChar* subpatternStart;
     
    110110
    111111struct MatchData {
    112   int*   offset_vector;         /* Offset vector */
    113   int    offset_end;            /* One past the end */
    114   int    offset_max;            /* The maximum usable for return data */
    115   bool   offset_overflow;       /* Set if too many extractions */
    116   const UChar*  start_subject;         /* Start of the subject string */
    117   const UChar*  end_subject;           /* End of the subject string */
    118   const UChar*  end_match_ptr;         /* Subject position at end match */
    119   int    end_offset_top;        /* Highwater mark at end of match */
     112  int*   offsetVector;         /* Offset vector */
     113  int    offsetEnd;            /* One past the end */
     114  int    offsetMax;            /* The maximum usable for return data */
     115  bool   offsetOverflow;       /* Set if too many extractions */
     116  const UChar*  startSubject;         /* Start of the subject string */
     117  const UChar*  endSubject;           /* End of the subject string */
     118  const UChar*  endMatchPtr;         /* Subject position at end match */
     119  int    endOffsetTop;        /* Highwater mark at end of match */
    120120  bool   multiline;
    121121  bool   ignoreCase;
     
    123123
    124124/* Non-error returns from the match() function. Error returns are externally
    125 defined PCRE_ERROR_xxx codes, which are all negative. */
     125defined error codes, which are all negative. */
    126126
    127127#define MATCH_MATCH        1
    128128#define MATCH_NOMATCH      0
     129
     130/* The maximum remaining length of subject we are prepared to search for a
     131req_byte match. */
     132
     133#define REQ_BYTE_MAX 1000
     134
     135/* The below limit restricts the number of recursive match calls in order to
     136limit the maximum amount of storage.
     137 
     138This limit is tied to the size of MatchFrame.  Right now we allow PCRE to allocate up
     139to MATCH_RECURSION_LIMIT - 16 * sizeof(MatchFrame) bytes of "stack" space before we give up.
     140Currently that's 100000 - 16 * (23 * 4)  ~ 90MB. */
     141
     142#define MATCH_RECURSION_LIMIT 100000
    129143
    130144#ifdef DEBUG
     
    139153  p           points to characters
    140154  length      number to print
    141   is_subject  true if printing from within md.start_subject
    142   md          pointer to matching data block, if is_subject is true
     155  isSubject  true if printing from within md.startSubject
     156  md          pointer to matching data block, if isSubject is true
    143157*/
    144158
    145 static void pchars(const UChar* p, int length, bool is_subject, const MatchData& md)
     159static void pchars(const UChar* p, int length, bool isSubject, const MatchData& md)
    146160{
    147     if (is_subject && length > md.end_subject - p)
    148         length = md.end_subject - p;
     161    if (isSubject && length > md.endSubject - p)
     162        length = md.endSubject - p;
    149163    while (length-- > 0) {
    150164        int c;
     
    159173#endif
    160174
    161 
    162 
    163175/*************************************************
    164176*          Match a back-reference                *
     
    177189*/
    178190
    179 static bool match_ref(int offset, const UChar* subjectPtr, int length, const MatchData& md)
     191static bool matchRef(int offset, const UChar* subjectPtr, int length, const MatchData& md)
    180192{
    181     const UChar* p = md.start_subject + md.offset_vector[offset];
     193    const UChar* p = md.startSubject + md.offsetVector[offset];
    182194   
    183195#ifdef DEBUG
    184     if (subjectPtr >= md.end_subject)
     196    if (subjectPtr >= md.endSubject)
    185197        printf("matching subject <null>");
    186198    else {
     
    195207    /* Always fail if not enough characters left */
    196208   
    197     if (length > md.end_subject - subjectPtr)
     209    if (length > md.endSubject - subjectPtr)
    198210        return false;
    199211   
     
    203215        while (length-- > 0) {
    204216            UChar c = *p++;
    205             int othercase = _pcre_ucp_othercase(c);
     217            int othercase = kjs_pcre_ucp_othercase(c);
    206218            UChar d = *subjectPtr++;
    207219            if (c != d && othercase != d)
     
    239251#endif
    240252
    241 #define CHECK_RECURSION_LIMIT \
    242     if (stack.size >= MATCH_LIMIT_RECURSION) \
    243         return matchError(JSRegExpErrorRecursionLimit, stack);
    244 
    245 #define RECURSE_WITH_RETURN_NUMBER(num) \
    246     CHECK_RECURSION_LIMIT \
     253#define RECURSIVE_MATCH_COMMON(num) \
     254    if (stack.size >= MATCH_RECURSION_LIMIT) \
     255        return matchError(JSRegExpErrorRecursionLimit, stack); \
    247256    goto RECURSE;\
    248     RRETURN_##num:
     257    RRETURN_##num: \
     258    stack.popCurrentFrame();
    249259
    250260#define RECURSIVE_MATCH(num, ra, rb) \
    251 {\
    252     stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
    253     RECURSE_WITH_RETURN_NUMBER(num) \
    254     stack.popCurrentFrame(); \
    255 }
     261    do { \
     262        stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
     263        RECURSIVE_MATCH_COMMON(num) \
     264    } while (0)
    256265
    257266#define RECURSIVE_MATCH_STARTNG_NEW_GROUP(num, ra, rb) \
    258 {\
    259     stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
    260     startNewGroup(stack.currentFrame); \
    261     RECURSE_WITH_RETURN_NUMBER(num) \
    262     stack.popCurrentFrame(); \
    263 }
     267    do { \
     268        stack.pushNewFrame((ra), (rb), RMATCH_WHERE(num)); \
     269        startNewGroup(stack.currentFrame); \
     270        RECURSIVE_MATCH_COMMON(num) \
     271    } while (0)
    264272
    265273#define RRETURN goto RRETURN_LABEL
    266274
    267 #define RRETURN_NO_MATCH \
    268   {\
    269     is_match = false;\
    270     RRETURN;\
    271   }
     275#define RRETURN_NO_MATCH do { isMatch = false; RRETURN; } while (0)
    272276
    273277/*************************************************
     
    285289   subjectPtr        pointer in subject
    286290   instructionPtr       position in code
    287    offset_top  current top pointer
     291   offsetTop  current top pointer
    288292   md          pointer to "static" info for the match
    289293
    290294Returns:       MATCH_MATCH if matched            )  these values are >= 0
    291295               MATCH_NOMATCH if failed to match  )
    292                a negative PCRE_ERROR_xxx value if aborted by an error condition
     296               a negative error value if aborted by an error condition
    293297                 (e.g. stopped by repeated call or recursion limit)
    294298*/
     
    322326    }
    323327   
    324     inline void pushNewFrame(const uschar* instructionPtr, const UChar* subpatternStart, ReturnLocation returnLocation)
     328    inline void pushNewFrame(const unsigned char* instructionPtr, const UChar* subpatternStart, ReturnLocation returnLocation)
    325329    {
    326330        MatchFrame* newframe = allocateNextFrame();
     
    328332
    329333        newframe->args.subjectPtr = currentFrame->args.subjectPtr;
    330         newframe->args.offset_top = currentFrame->args.offset_top;
     334        newframe->args.offsetTop = currentFrame->args.offsetTop;
    331335        newframe->args.instructionPtr = instructionPtr;
    332336        newframe->args.subpatternStart = subpatternStart;
     
    362366 if there are extra bytes. This is called when we know we are in UTF-8 mode. */
    363367
    364 static inline void getUTF8CharAndIncrementLength(int& c, const uschar* subjectPtr, int& len)
     368static inline void getUTF8CharAndIncrementLength(int& c, const unsigned char* subjectPtr, int& len)
    365369{
    366370    c = *subjectPtr;
    367371    if ((c & 0xc0) == 0xc0) {
    368         int gcaa = _pcre_utf8_table4[c & 0x3f];  /* Number of additional bytes */
     372        int gcaa = kjs_pcre_utf8_table4[c & 0x3f];  /* Number of additional bytes */
    369373        int gcss = 6 * gcaa;
    370         c = (c & _pcre_utf8_table3[gcaa]) << gcss;
     374        c = (c & kjs_pcre_utf8_table3[gcaa]) << gcss;
    371375        for (int gcii = 1; gcii <= gcaa; gcii++) {
    372376            gcss -= 6;
     
    402406}
    403407
    404 static int match(const UChar* subjectPtr, const uschar* instructionPtr, int offset_top, MatchData& md)
     408static int match(const UChar* subjectPtr, const unsigned char* instructionPtr, int offsetTop, MatchData& md)
    405409{
    406     int is_match = false;
     410    int isMatch = false;
    407411    int min;
    408412    bool minimize = false; /* Initialization not really needed, but some compilers think so. */
     
    413417#ifdef USE_COMPUTED_GOTO_FOR_MATCH_OPCODE_LOOP
    414418#define EMIT_JUMP_TABLE_ENTRY(opcode) &&LABEL_OP_##opcode,
    415     static void* opcode_jump_table[256] = { FOR_EACH_OPCODE(EMIT_JUMP_TABLE_ENTRY) };
     419    static void* opcodeJumpTable[256] = { FOR_EACH_OPCODE(EMIT_JUMP_TABLE_ENTRY) };
    416420#undef EMIT_JUMP_TABLE_ENTRY
    417421#endif
     
    419423    /* One-time setup of the opcode jump table. */
    420424#ifdef USE_COMPUTED_GOTO_FOR_MATCH_OPCODE_LOOP
    421     for (int i = 255; !opcode_jump_table[i]; i--)
    422         opcode_jump_table[i] = &&CAPTURING_BRACKET;
     425    for (int i = 255; !opcodeJumpTable[i]; i--)
     426        opcodeJumpTable[i] = &&CAPTURING_BRACKET;
    423427#endif
    424428   
     
    432436    stack.currentFrame->args.subjectPtr = subjectPtr;
    433437    stack.currentFrame->args.instructionPtr = instructionPtr;
    434     stack.currentFrame->args.offset_top = offset_top;
     438    stack.currentFrame->args.offsetTop = offsetTop;
    435439    stack.currentFrame->args.subpatternStart = 0;
    436440    startNewGroup(stack.currentFrame);
     
    449453#ifdef USE_COMPUTED_GOTO_FOR_MATCH_OPCODE_LOOP
    450454#define BEGIN_OPCODE(opcode) LABEL_OP_##opcode
    451 #define NEXT_OPCODE goto *opcode_jump_table[*stack.currentFrame->args.instructionPtr]
     455#define NEXT_OPCODE goto *opcodeJumpTable[*stack.currentFrame->args.instructionPtr]
    452456#else
    453457#define BEGIN_OPCODE(opcode) case OP_##opcode
     
    468472                do {
    469473                    RECURSIVE_MATCH_STARTNG_NEW_GROUP(2, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    470                     if (is_match)
     474                    if (isMatch)
    471475                        RRETURN;
    472                     stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     476                    stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
    473477                } while (*stack.currentFrame->args.instructionPtr == OP_ALT);
    474478                DPRINTF(("bracket 0 failed\n"));
     
    484488               
    485489            BEGIN_OPCODE(END):
    486                 md.end_match_ptr = stack.currentFrame->args.subjectPtr;          /* Record where we ended */
    487                 md.end_offset_top = stack.currentFrame->args.offset_top;   /* and how many extracts were taken */
    488                 is_match = true;
     490                md.endMatchPtr = stack.currentFrame->args.subjectPtr;          /* Record where we ended */
     491                md.endOffsetTop = stack.currentFrame->args.offsetTop;   /* and how many extracts were taken */
     492                isMatch = true;
    489493                RRETURN;
    490494               
     
    498502                do {
    499503                    RECURSIVE_MATCH_STARTNG_NEW_GROUP(6, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, NULL);
    500                     if (is_match)
     504                    if (isMatch)
    501505                        break;
    502                     stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     506                    stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
    503507                } while (*stack.currentFrame->args.instructionPtr == OP_ALT);
    504508                if (*stack.currentFrame->args.instructionPtr == OP_KET)
     
    508512                 mark, since extracts may have been taken during the assertion. */
    509513               
    510                 moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->args.instructionPtr);
     514                advanceToEndOfBracket(stack.currentFrame->args.instructionPtr);
    511515                stack.currentFrame->args.instructionPtr += 1 + LINK_SIZE;
    512                 stack.currentFrame->args.offset_top = md.end_offset_top;
     516                stack.currentFrame->args.offsetTop = md.endOffsetTop;
    513517                NEXT_OPCODE;
    514518               
     
    518522                do {
    519523                    RECURSIVE_MATCH_STARTNG_NEW_GROUP(7, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, NULL);
    520                     if (is_match)
     524                    if (isMatch)
    521525                        RRETURN_NO_MATCH;
    522                     stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     526                    stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
    523527                } while (*stack.currentFrame->args.instructionPtr == OP_ALT);
    524528               
     
    526530                NEXT_OPCODE;
    527531               
    528             /* "Once" brackets are like assertion brackets except that after a match,
    529              the point in the subject string is not moved back. Thus there can never be
    530              a move back into the brackets. Friedl calls these "atomic" subpatterns.
    531              Check the alternative branches in turn - the matching won't pass the KET
    532              for this kind of subpattern. If any one branch matches, we carry on as at
    533              the end of a normal bracket, leaving the subject pointer. */
    534                
    535             BEGIN_OPCODE(ONCE):
    536                 stack.currentFrame->locals.instructionPtrAtStartOfOnce = stack.currentFrame->args.instructionPtr;
    537                 stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
    538                
    539                 do {
    540                     RECURSIVE_MATCH_STARTNG_NEW_GROUP(9, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    541                     if (is_match)
    542                         break;
    543                     stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
    544                 } while (*stack.currentFrame->args.instructionPtr == OP_ALT);
    545                
    546                 /* If hit the end of the group (which could be repeated), fail */
    547                
    548                 if (*stack.currentFrame->args.instructionPtr != OP_ONCE && *stack.currentFrame->args.instructionPtr != OP_ALT)
     532            /* An alternation is the end of a branch; scan along to find the end of the
     533             bracketed group and go to there. */
     534               
     535            BEGIN_OPCODE(ALT):
     536                advanceToEndOfBracket(stack.currentFrame->args.instructionPtr);
     537                NEXT_OPCODE;
     538               
     539            /* BRAZERO and BRAMINZERO occur just before a bracket group, indicating
     540             that it may occur zero times. It may repeat infinitely, or not at all -
     541             i.e. it could be ()* or ()? in the pattern. Brackets with fixed upper
     542             repeat limits are compiled as a number of copies, with the optional ones
     543             preceded by BRAZERO or BRAMINZERO. */
     544               
     545            BEGIN_OPCODE(BRAZERO): {
     546                stack.currentFrame->locals.startOfRepeatingBracket = stack.currentFrame->args.instructionPtr + 1;
     547                RECURSIVE_MATCH_STARTNG_NEW_GROUP(14, stack.currentFrame->locals.startOfRepeatingBracket, stack.currentFrame->args.subpatternStart);
     548                if (isMatch)
    549549                    RRETURN;
    550                
    551                 /* Continue as from after the assertion, updating the offsets high water
    552                  mark, since extracts may have been taken. */
    553                
    554                 moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->args.instructionPtr);
    555                
    556                 stack.currentFrame->args.offset_top = md.end_offset_top;
    557                 stack.currentFrame->args.subjectPtr = md.end_match_ptr;
     550                advanceToEndOfBracket(stack.currentFrame->locals.startOfRepeatingBracket);
     551                stack.currentFrame->args.instructionPtr = stack.currentFrame->locals.startOfRepeatingBracket + 1 + LINK_SIZE;
     552                NEXT_OPCODE;
     553            }
     554               
     555            BEGIN_OPCODE(BRAMINZERO): {
     556                stack.currentFrame->locals.startOfRepeatingBracket = stack.currentFrame->args.instructionPtr + 1;
     557                advanceToEndOfBracket(stack.currentFrame->locals.startOfRepeatingBracket);
     558                RECURSIVE_MATCH_STARTNG_NEW_GROUP(15, stack.currentFrame->locals.startOfRepeatingBracket + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
     559                if (isMatch)
     560                    RRETURN;
     561                stack.currentFrame->args.instructionPtr++;
     562                NEXT_OPCODE;
     563            }
     564               
     565            /* End of a group, repeated or non-repeating. If we are at the end of
     566             an assertion "group", stop matching and return MATCH_MATCH, but record the
     567             current high water mark for use by positive assertions. Do this also
     568             for the "once" (not-backup up) groups. */
     569               
     570            BEGIN_OPCODE(KET):
     571            BEGIN_OPCODE(KETRMIN):
     572            BEGIN_OPCODE(KETRMAX):
     573                stack.currentFrame->locals.instructionPtrAtStartOfOnce = stack.currentFrame->args.instructionPtr - getLinkValue(stack.currentFrame->args.instructionPtr + 1);
     574                stack.currentFrame->args.subpatternStart = stack.currentFrame->locals.subpatternStart;
     575                stack.currentFrame->locals.subpatternStart = stack.currentFrame->previousFrame->args.subpatternStart;
     576
     577                if (*stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT || *stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT_NOT) {
     578                    md.endOffsetTop = stack.currentFrame->args.offsetTop;
     579                    isMatch = true;
     580                    RRETURN;
     581                }
     582               
     583                /* In all other cases except a conditional group we have to check the
     584                 group number back at the start and if necessary complete handling an
     585                 extraction by setting the offsets and bumping the high water mark. */
     586               
     587                stack.currentFrame->locals.number = *stack.currentFrame->locals.instructionPtrAtStartOfOnce - OP_BRA;
     588               
     589                /* For extended extraction brackets (large number), we have to fish out
     590                 the number from a dummy opcode at the start. */
     591               
     592                if (stack.currentFrame->locals.number > EXTRACT_BASIC_MAX)
     593                    stack.currentFrame->locals.number = get2ByteValue(stack.currentFrame->locals.instructionPtrAtStartOfOnce + 2 + LINK_SIZE);
     594                stack.currentFrame->locals.offset = stack.currentFrame->locals.number << 1;
     595               
     596#ifdef DEBUG
     597                printf("end bracket %d", stack.currentFrame->locals.number);
     598                printf("\n");
     599#endif
     600               
     601                /* Test for a numbered group. This includes groups called as a result
     602                 of recursion. Note that whole-pattern recursion is coded as a recurse
     603                 into group 0, so it won't be picked up here. Instead, we catch it when
     604                 the OP_END is reached. */
     605               
     606                if (stack.currentFrame->locals.number > 0) {
     607                    if (stack.currentFrame->locals.offset >= md.offsetMax)
     608                        md.offsetOverflow = true;
     609                    else {
     610                        md.offsetVector[stack.currentFrame->locals.offset] =
     611                        md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number];
     612                        md.offsetVector[stack.currentFrame->locals.offset+1] = stack.currentFrame->args.subjectPtr - md.startSubject;
     613                        if (stack.currentFrame->args.offsetTop <= stack.currentFrame->locals.offset)
     614                            stack.currentFrame->args.offsetTop = stack.currentFrame->locals.offset + 2;
     615                    }
     616                }
    558617               
    559618                /* For a non-repeating ket, just continue at this level. This also
     
    569628               
    570629                /* The repeating kets try the rest of the pattern or restart from the
    571                  preceding bracket, in the appropriate order. We need to reset any options
    572                  that changed within the bracket before re-running it, so check the next
    573                  opcode. */
    574                
    575                 if (*stack.currentFrame->args.instructionPtr == OP_KETRMIN) {
    576                     RECURSIVE_MATCH(10, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    577                     if (is_match)
    578                         RRETURN;
    579                     RECURSIVE_MATCH_STARTNG_NEW_GROUP(11, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
    580                     if (is_match)
    581                         RRETURN;
    582                 } else { /* OP_KETRMAX */
    583                     RECURSIVE_MATCH_STARTNG_NEW_GROUP(12, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
    584                     if (is_match)
    585                         RRETURN;
    586                     RECURSIVE_MATCH(13, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    587                     if (is_match)
    588                         RRETURN;
    589                 }
    590                 RRETURN;
    591                
    592             /* An alternation is the end of a branch; scan along to find the end of the
    593              bracketed group and go to there. */
    594                
    595             BEGIN_OPCODE(ALT):
    596                 moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->args.instructionPtr);
    597                 NEXT_OPCODE;
    598                
    599             /* BRAZERO and BRAMINZERO occur just before a bracket group, indicating
    600              that it may occur zero times. It may repeat infinitely, or not at all -
    601              i.e. it could be ()* or ()? in the pattern. Brackets with fixed upper
    602              repeat limits are compiled as a number of copies, with the optional ones
    603              preceded by BRAZERO or BRAMINZERO. */
    604                
    605             BEGIN_OPCODE(BRAZERO): {
    606                 stack.currentFrame->locals.startOfRepeatingBracket = stack.currentFrame->args.instructionPtr + 1;
    607                 RECURSIVE_MATCH_STARTNG_NEW_GROUP(14, stack.currentFrame->locals.startOfRepeatingBracket, stack.currentFrame->args.subpatternStart);
    608                 if (is_match)
    609                     RRETURN;
    610                 moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->locals.startOfRepeatingBracket);
    611                 stack.currentFrame->args.instructionPtr = stack.currentFrame->locals.startOfRepeatingBracket + 1 + LINK_SIZE;
    612                 NEXT_OPCODE;
    613             }
    614                
    615             BEGIN_OPCODE(BRAMINZERO): {
    616                 stack.currentFrame->locals.startOfRepeatingBracket = stack.currentFrame->args.instructionPtr + 1;
    617                 moveOpcodePtrPastAnyAlternateBranches(stack.currentFrame->locals.startOfRepeatingBracket);
    618                 RECURSIVE_MATCH_STARTNG_NEW_GROUP(15, stack.currentFrame->locals.startOfRepeatingBracket + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    619                 if (is_match)
    620                     RRETURN;
    621                 stack.currentFrame->args.instructionPtr++;
    622                 NEXT_OPCODE;
    623             }
    624                
    625             /* End of a group, repeated or non-repeating. If we are at the end of
    626              an assertion "group", stop matching and return MATCH_MATCH, but record the
    627              current high water mark for use by positive assertions. Do this also
    628              for the "once" (not-backup up) groups. */
    629                
    630             BEGIN_OPCODE(KET):
    631             BEGIN_OPCODE(KETRMIN):
    632             BEGIN_OPCODE(KETRMAX):
    633                 stack.currentFrame->locals.instructionPtrAtStartOfOnce = stack.currentFrame->args.instructionPtr - getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
    634                 stack.currentFrame->args.subpatternStart = stack.currentFrame->locals.subpatternStart;
    635                 stack.currentFrame->locals.subpatternStart = stack.currentFrame->previousFrame->args.subpatternStart;
    636 
    637                 if (*stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT || *stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ASSERT_NOT || *stack.currentFrame->locals.instructionPtrAtStartOfOnce == OP_ONCE) {
    638                     md.end_match_ptr = stack.currentFrame->args.subjectPtr;      /* For ONCE */
    639                     md.end_offset_top = stack.currentFrame->args.offset_top;
    640                     is_match = true;
    641                     RRETURN;
    642                 }
    643                
    644                 /* In all other cases except a conditional group we have to check the
    645                  group number back at the start and if necessary complete handling an
    646                  extraction by setting the offsets and bumping the high water mark. */
    647                
    648                 stack.currentFrame->locals.number = *stack.currentFrame->locals.instructionPtrAtStartOfOnce - OP_BRA;
    649                
    650                 /* For extended extraction brackets (large number), we have to fish out
    651                  the number from a dummy opcode at the start. */
    652                
    653                 if (stack.currentFrame->locals.number > EXTRACT_BASIC_MAX)
    654                     stack.currentFrame->locals.number = get2ByteOpcodeValueAtOffset(stack.currentFrame->locals.instructionPtrAtStartOfOnce, 2+LINK_SIZE);
    655                 stack.currentFrame->locals.offset = stack.currentFrame->locals.number << 1;
    656                
    657 #ifdef DEBUG
    658                 printf("end bracket %d", stack.currentFrame->locals.number);
    659                 printf("\n");
    660 #endif
    661                
    662                 /* Test for a numbered group. This includes groups called as a result
    663                  of recursion. Note that whole-pattern recursion is coded as a recurse
    664                  into group 0, so it won't be picked up here. Instead, we catch it when
    665                  the OP_END is reached. */
    666                
    667                 if (stack.currentFrame->locals.number > 0) {
    668                     if (stack.currentFrame->locals.offset >= md.offset_max)
    669                         md.offset_overflow = true;
    670                     else {
    671                         md.offset_vector[stack.currentFrame->locals.offset] =
    672                         md.offset_vector[md.offset_end - stack.currentFrame->locals.number];
    673                         md.offset_vector[stack.currentFrame->locals.offset+1] = stack.currentFrame->args.subjectPtr - md.start_subject;
    674                         if (stack.currentFrame->args.offset_top <= stack.currentFrame->locals.offset)
    675                             stack.currentFrame->args.offset_top = stack.currentFrame->locals.offset + 2;
    676                     }
    677                 }
    678                
    679                 /* For a non-repeating ket, just continue at this level. This also
    680                  happens for a repeating ket if no characters were matched in the group.
    681                  This is the forcible breaking of infinite loops as implemented in Perl
    682                  5.005. If there is an options reset, it will get obeyed in the normal
    683                  course of events. */
    684                
    685                 if (*stack.currentFrame->args.instructionPtr == OP_KET || stack.currentFrame->args.subjectPtr == stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
    686                     stack.currentFrame->args.instructionPtr += 1 + LINK_SIZE;
    687                     NEXT_OPCODE;
    688                 }
    689                
    690                 /* The repeating kets try the rest of the pattern or restart from the
    691630                 preceding bracket, in the appropriate order. */
    692631               
    693632                if (*stack.currentFrame->args.instructionPtr == OP_KETRMIN) {
    694633                    RECURSIVE_MATCH(16, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    695                     if (is_match)
     634                    if (isMatch)
    696635                        RRETURN;
    697636                    RECURSIVE_MATCH_STARTNG_NEW_GROUP(17, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
    698                     if (is_match)
     637                    if (isMatch)
    699638                        RRETURN;
    700639                } else { /* OP_KETRMAX */
    701640                    RECURSIVE_MATCH_STARTNG_NEW_GROUP(18, stack.currentFrame->locals.instructionPtrAtStartOfOnce, stack.currentFrame->args.subpatternStart);
    702                     if (is_match)
     641                    if (isMatch)
    703642                        RRETURN;
    704643                    RECURSIVE_MATCH(19, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    705                     if (is_match)
     644                    if (isMatch)
    706645                        RRETURN;
    707646                }
     
    711650               
    712651            BEGIN_OPCODE(CIRC):
    713                 if (stack.currentFrame->args.subjectPtr != md.start_subject && (!md.multiline || !isNewline(stack.currentFrame->args.subjectPtr[-1])))
     652                if (stack.currentFrame->args.subjectPtr != md.startSubject && (!md.multiline || !isNewline(stack.currentFrame->args.subjectPtr[-1])))
    714653                    RRETURN_NO_MATCH;
    715654                stack.currentFrame->args.instructionPtr++;
     
    719658               
    720659            BEGIN_OPCODE(DOLL):
    721                 if (stack.currentFrame->args.subjectPtr < md.end_subject && (!md.multiline || !isNewline(*stack.currentFrame->args.subjectPtr)))
     660                if (stack.currentFrame->args.subjectPtr < md.endSubject && (!md.multiline || !isNewline(*stack.currentFrame->args.subjectPtr)))
    722661                    RRETURN_NO_MATCH;
    723662                stack.currentFrame->args.instructionPtr++;
     
    731670                bool previousCharIsWordChar = false;
    732671               
    733                 if (stack.currentFrame->args.subjectPtr > md.start_subject)
     672                if (stack.currentFrame->args.subjectPtr > md.startSubject)
    734673                    previousCharIsWordChar = isWordChar(stack.currentFrame->args.subjectPtr[-1]);
    735                 if (stack.currentFrame->args.subjectPtr < md.end_subject)
     674                if (stack.currentFrame->args.subjectPtr < md.endSubject)
    736675                    currentCharIsWordChar = isWordChar(*stack.currentFrame->args.subjectPtr);
    737676               
     
    746685               
    747686            BEGIN_OPCODE(NOT_NEWLINE):
    748                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     687                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    749688                    RRETURN_NO_MATCH;
    750689                if (isNewline(*stack.currentFrame->args.subjectPtr++))
     
    754693
    755694            BEGIN_OPCODE(NOT_DIGIT):
    756                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     695                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    757696                    RRETURN_NO_MATCH;
    758697                if (isASCIIDigit(*stack.currentFrame->args.subjectPtr++))
     
    762701
    763702            BEGIN_OPCODE(DIGIT):
    764                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     703                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    765704                    RRETURN_NO_MATCH;
    766705                if (!isASCIIDigit(*stack.currentFrame->args.subjectPtr++))
     
    770709
    771710            BEGIN_OPCODE(NOT_WHITESPACE):
    772                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     711                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    773712                    RRETURN_NO_MATCH;
    774713                if (isSpaceChar(*stack.currentFrame->args.subjectPtr++))
     
    778717
    779718            BEGIN_OPCODE(WHITESPACE):
    780                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     719                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    781720                    RRETURN_NO_MATCH;
    782721                if (!isSpaceChar(*stack.currentFrame->args.subjectPtr++))
     
    786725               
    787726            BEGIN_OPCODE(NOT_WORDCHAR):
    788                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     727                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    789728                    RRETURN_NO_MATCH;
    790729                if (isWordChar(*stack.currentFrame->args.subjectPtr++))
     
    794733               
    795734            BEGIN_OPCODE(WORDCHAR):
    796                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     735                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    797736                    RRETURN_NO_MATCH;
    798737                if (!isWordChar(*stack.currentFrame->args.subjectPtr++))
     
    810749               
    811750            BEGIN_OPCODE(REF):
    812                 stack.currentFrame->locals.offset = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1) << 1;               /* Doubled ref number */
     751                stack.currentFrame->locals.offset = get2ByteValue(stack.currentFrame->args.instructionPtr + 1) << 1;               /* Doubled ref number */
    813752                stack.currentFrame->args.instructionPtr += 3;                                 /* Advance past item */
    814753               
     
    818757                 minima. */
    819758               
    820                 if (stack.currentFrame->locals.offset >= stack.currentFrame->args.offset_top || md.offset_vector[stack.currentFrame->locals.offset] < 0)
     759                if (stack.currentFrame->locals.offset >= stack.currentFrame->args.offsetTop || md.offsetVector[stack.currentFrame->locals.offset] < 0)
    821760                    stack.currentFrame->locals.length = 0;
    822761                else
    823                     stack.currentFrame->locals.length = md.offset_vector[stack.currentFrame->locals.offset+1] - md.offset_vector[stack.currentFrame->locals.offset];
     762                    stack.currentFrame->locals.length = md.offsetVector[stack.currentFrame->locals.offset+1] - md.offsetVector[stack.currentFrame->locals.offset];
    824763               
    825764                /* Set up for repetition, or handle the non-repeated case */
     
    838777                    case OP_CRMINRANGE:
    839778                        minimize = (*stack.currentFrame->args.instructionPtr == OP_CRMINRANGE);
    840                         min = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
    841                         stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 3);
     779                        min = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
     780                        stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 3);
    842781                        if (stack.currentFrame->locals.max == 0)
    843782                            stack.currentFrame->locals.max = INT_MAX;
     
    846785                   
    847786                    default:               /* No repeat follows */
    848                         if (!match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
     787                        if (!matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
    849788                            RRETURN_NO_MATCH;
    850789                        stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
     
    861800               
    862801                for (int i = 1; i <= min; i++) {
    863                     if (!match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
     802                    if (!matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
    864803                        RRETURN_NO_MATCH;
    865804                    stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
     
    877816                    for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    878817                        RECURSIVE_MATCH(20, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    879                         if (is_match)
     818                        if (isMatch)
    880819                            RRETURN;
    881                         if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || !match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
     820                        if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || !matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
    882821                            RRETURN;
    883822                        stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
     
    891830                    stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
    892831                    for (int i = min; i < stack.currentFrame->locals.max; i++) {
    893                         if (!match_ref(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
     832                        if (!matchRef(stack.currentFrame->locals.offset, stack.currentFrame->args.subjectPtr, stack.currentFrame->locals.length, md))
    894833                            break;
    895834                        stack.currentFrame->args.subjectPtr += stack.currentFrame->locals.length;
     
    897836                    while (stack.currentFrame->args.subjectPtr >= stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
    898837                        RECURSIVE_MATCH(21, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    899                         if (is_match)
     838                        if (isMatch)
    900839                            RRETURN;
    901840                        stack.currentFrame->args.subjectPtr -= stack.currentFrame->locals.length;
     
    934873                    case OP_CRMINRANGE:
    935874                        minimize = (*stack.currentFrame->args.instructionPtr == OP_CRMINRANGE);
    936                         min = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
    937                         stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 3);
     875                        min = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
     876                        stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 3);
    938877                        if (stack.currentFrame->locals.max == 0)
    939878                            stack.currentFrame->locals.max = INT_MAX;
     
    949888               
    950889                for (int i = 1; i <= min; i++) {
    951                     if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     890                    if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    952891                        RRETURN_NO_MATCH;
    953892                    int c = *stack.currentFrame->args.subjectPtr++;
     
    972911                    for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    973912                        RECURSIVE_MATCH(22, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    974                         if (is_match)
     913                        if (isMatch)
    975914                            RRETURN;
    976                         if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
     915                        if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
    977916                            RRETURN;
    978917                        int c = *stack.currentFrame->args.subjectPtr++;
     
    992931                   
    993932                    for (int i = min; i < stack.currentFrame->locals.max; i++) {
    994                         if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     933                        if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    995934                            break;
    996935                        int c = *stack.currentFrame->args.subjectPtr;
     
    1006945                    for (;;) {
    1007946                        RECURSIVE_MATCH(24, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1008                         if (is_match)
     947                        if (isMatch)
    1009948                            RRETURN;
    1010949                        if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
     
    1020959            BEGIN_OPCODE(XCLASS):
    1021960                stack.currentFrame->locals.data = stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE;                /* Save for matching */
    1022                 stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);                      /* Advance past the item */
     961                stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);                      /* Advance past the item */
    1023962               
    1024963                switch (*stack.currentFrame->args.instructionPtr) {
     
    1035974                    case OP_CRMINRANGE:
    1036975                        minimize = (*stack.currentFrame->args.instructionPtr == OP_CRMINRANGE);
    1037                         min = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
    1038                         stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 3);
     976                        min = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
     977                        stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 3);
    1039978                        if (stack.currentFrame->locals.max == 0)
    1040979                            stack.currentFrame->locals.max = INT_MAX;
     
    1044983                    default:               /* No repeat follows */
    1045984                        min = stack.currentFrame->locals.max = 1;
    1046             }
     985                }
    1047986               
    1048987                /* First, ensure the minimum number of matches are present. */
    1049988               
    1050989                for (int i = 1; i <= min; i++) {
    1051                     if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     990                    if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    1052991                        RRETURN_NO_MATCH;
    1053992                    int c = *stack.currentFrame->args.subjectPtr++;
    1054                     if (!_pcre_xclass(c, stack.currentFrame->locals.data))
     993                    if (!kjs_pcre_xclass(c, stack.currentFrame->locals.data))
    1055994                        RRETURN_NO_MATCH;
    1056995                }
     
    10681007                    for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    10691008                        RECURSIVE_MATCH(26, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1070                         if (is_match)
     1009                        if (isMatch)
    10711010                            RRETURN;
    1072                         if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
     1011                        if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
    10731012                            RRETURN;
    10741013                        int c = *stack.currentFrame->args.subjectPtr++;
    1075                         if (!_pcre_xclass(c, stack.currentFrame->locals.data))
     1014                        if (!kjs_pcre_xclass(c, stack.currentFrame->locals.data))
    10761015                            RRETURN;
    10771016                    }
     
    10841023                    stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
    10851024                    for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1086                         if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1025                        if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    10871026                            break;
    10881027                        int c = *stack.currentFrame->args.subjectPtr;
    1089                         if (!_pcre_xclass(c, stack.currentFrame->locals.data))
     1028                        if (!kjs_pcre_xclass(c, stack.currentFrame->locals.data))
    10901029                            break;
    10911030                        ++stack.currentFrame->args.subjectPtr;
     
    10931032                    for(;;) {
    10941033                        RECURSIVE_MATCH(27, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1095                         if (is_match)
     1034                        if (isMatch)
    10961035                            RRETURN;
    10971036                        if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
     
    11101049                getUTF8CharAndIncrementLength(stack.currentFrame->locals.fc, stack.currentFrame->args.instructionPtr, stack.currentFrame->locals.length);
    11111050                stack.currentFrame->args.instructionPtr += stack.currentFrame->locals.length;
    1112                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1051                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    11131052                    RRETURN_NO_MATCH;
    11141053                if (stack.currentFrame->locals.fc != *stack.currentFrame->args.subjectPtr++)
     
    11231062                getUTF8CharAndIncrementLength(stack.currentFrame->locals.fc, stack.currentFrame->args.instructionPtr, stack.currentFrame->locals.length);
    11241063                stack.currentFrame->args.instructionPtr += stack.currentFrame->locals.length;
    1125                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1064                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    11261065                    RRETURN_NO_MATCH;
    11271066                int dc = *stack.currentFrame->args.subjectPtr++;
    1128                 if (stack.currentFrame->locals.fc != dc && _pcre_ucp_othercase(stack.currentFrame->locals.fc) != dc)
     1067                if (stack.currentFrame->locals.fc != dc && kjs_pcre_ucp_othercase(stack.currentFrame->locals.fc) != dc)
    11291068                    RRETURN_NO_MATCH;
    11301069                NEXT_OPCODE;
     
    11341073               
    11351074            BEGIN_OPCODE(ASCII_CHAR):
    1136                 if (md.end_subject == stack.currentFrame->args.subjectPtr)
     1075                if (md.endSubject == stack.currentFrame->args.subjectPtr)
    11371076                    RRETURN_NO_MATCH;
    11381077                if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->args.instructionPtr[1])
     
    11451084               
    11461085            BEGIN_OPCODE(ASCII_LETTER_IGNORING_CASE):
    1147                 if (md.end_subject == stack.currentFrame->args.subjectPtr)
     1086                if (md.endSubject == stack.currentFrame->args.subjectPtr)
    11481087                    RRETURN_NO_MATCH;
    11491088                if ((*stack.currentFrame->args.subjectPtr | 0x20) != stack.currentFrame->args.instructionPtr[1])
     
    11561095               
    11571096            BEGIN_OPCODE(EXACT):
    1158                 min = stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1097                min = stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
    11591098                minimize = false;
    11601099                stack.currentFrame->args.instructionPtr += 3;
     
    11641103            BEGIN_OPCODE(MINUPTO):
    11651104                min = 0;
    1166                 stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1105                stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
    11671106                minimize = *stack.currentFrame->args.instructionPtr == OP_MINUPTO;
    11681107                stack.currentFrame->args.instructionPtr += 3;
     
    11851124                stack.currentFrame->locals.length = 1;
    11861125                getUTF8CharAndIncrementLength(stack.currentFrame->locals.fc, stack.currentFrame->args.instructionPtr, stack.currentFrame->locals.length);
    1187                 if (min * (stack.currentFrame->locals.fc > 0xFFFF ? 2 : 1) > md.end_subject - stack.currentFrame->args.subjectPtr)
     1126                if (min * (stack.currentFrame->locals.fc > 0xFFFF ? 2 : 1) > md.endSubject - stack.currentFrame->args.subjectPtr)
    11881127                    RRETURN_NO_MATCH;
    11891128                stack.currentFrame->args.instructionPtr += stack.currentFrame->locals.length;
    11901129               
    11911130                if (stack.currentFrame->locals.fc <= 0xFFFF) {
    1192                     int othercase = md.ignoreCase ? _pcre_ucp_othercase(stack.currentFrame->locals.fc) : -1;
     1131                    int othercase = md.ignoreCase ? kjs_pcre_ucp_othercase(stack.currentFrame->locals.fc) : -1;
    11931132                   
    11941133                    for (int i = 1; i <= min; i++) {
     
    12021141                   
    12031142                    if (minimize) {
    1204                         stack.currentFrame->locals.repeat_othercase = othercase;
     1143                        stack.currentFrame->locals.repeatOthercase = othercase;
    12051144                        for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    12061145                            RECURSIVE_MATCH(28, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1207                             if (is_match)
     1146                            if (isMatch)
    12081147                                RRETURN;
    1209                             if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
     1148                            if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
    12101149                                RRETURN;
    1211                             if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.repeat_othercase)
     1150                            if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.repeatOthercase)
    12121151                                RRETURN;
    12131152                            ++stack.currentFrame->args.subjectPtr;
     
    12171156                        stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
    12181157                        for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1219                             if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1158                            if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    12201159                                break;
    12211160                            if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc && *stack.currentFrame->args.subjectPtr != othercase)
     
    12251164                        while (stack.currentFrame->args.subjectPtr >= stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
    12261165                            RECURSIVE_MATCH(29, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1227                             if (is_match)
     1166                            if (isMatch)
    12281167                                RRETURN;
    12291168                            --stack.currentFrame->args.subjectPtr;
     
    12471186                        for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    12481187                            RECURSIVE_MATCH(30, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1249                             if (is_match)
     1188                            if (isMatch)
    12501189                                RRETURN;
    1251                             if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
     1190                            if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
    12521191                                RRETURN;
    12531192                            if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc)
     
    12591198                        stack.currentFrame->locals.subjectPtrAtStartOfInstruction = stack.currentFrame->args.subjectPtr;
    12601199                        for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1261                             if (stack.currentFrame->args.subjectPtr > md.end_subject - 2)
     1200                            if (stack.currentFrame->args.subjectPtr > md.endSubject - 2)
    12621201                                break;
    12631202                            if (*stack.currentFrame->args.subjectPtr != stack.currentFrame->locals.fc)
     
    12671206                        while (stack.currentFrame->args.subjectPtr >= stack.currentFrame->locals.subjectPtrAtStartOfInstruction) {
    12681207                            RECURSIVE_MATCH(31, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1269                             if (is_match)
     1208                            if (isMatch)
    12701209                                RRETURN;
    12711210                            stack.currentFrame->args.subjectPtr -= 2;
     
    12801219               
    12811220            BEGIN_OPCODE(NOT): {
    1282                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1221                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    12831222                    RRETURN_NO_MATCH;
    12841223                stack.currentFrame->args.instructionPtr++;
     
    13041243               
    13051244            BEGIN_OPCODE(NOTEXACT):
    1306                 min = stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1245                min = stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
    13071246                minimize = false;
    13081247                stack.currentFrame->args.instructionPtr += 3;
     
    13121251            BEGIN_OPCODE(NOTMINUPTO):
    13131252                min = 0;
    1314                 stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1253                stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
    13151254                minimize = *stack.currentFrame->args.instructionPtr == OP_NOTMINUPTO;
    13161255                stack.currentFrame->args.instructionPtr += 3;
     
    13301269               
    13311270            REPEATNOTCHAR:
    1332                 if (min > md.end_subject - stack.currentFrame->args.subjectPtr)
     1271                if (min > md.endSubject - stack.currentFrame->args.subjectPtr)
    13331272                    RRETURN_NO_MATCH;
    13341273                stack.currentFrame->locals.fc = *stack.currentFrame->args.instructionPtr++;
     
    13621301                        for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    13631302                            RECURSIVE_MATCH(38, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1364                             if (is_match)
     1303                            if (isMatch)
    13651304                                RRETURN;
    13661305                            int d = *stack.currentFrame->args.subjectPtr++;
    13671306                            if (d < 128)
    13681307                                d = toLowerCase(d);
    1369                             if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject || stack.currentFrame->locals.fc == d)
     1308                            if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject || stack.currentFrame->locals.fc == d)
    13701309                                RRETURN;
    13711310                        }
     
    13791318                       
    13801319                        for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1381                             if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1320                            if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    13821321                                break;
    13831322                            int d = *stack.currentFrame->args.subjectPtr;
     
    13901329                        for (;;) {
    13911330                            RECURSIVE_MATCH(40, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1392                             if (is_match)
     1331                            if (isMatch)
    13931332                                RRETURN;
    13941333                            if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
     
    14161355                        for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    14171356                            RECURSIVE_MATCH(42, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1418                             if (is_match)
     1357                            if (isMatch)
    14191358                                RRETURN;
    14201359                            int d = *stack.currentFrame->args.subjectPtr++;
    1421                             if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject || stack.currentFrame->locals.fc == d)
     1360                            if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject || stack.currentFrame->locals.fc == d)
    14221361                                RRETURN;
    14231362                        }
     
    14311370                       
    14321371                        for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1433                             if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1372                            if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    14341373                                break;
    14351374                            int d = *stack.currentFrame->args.subjectPtr;
     
    14401379                        for (;;) {
    14411380                            RECURSIVE_MATCH(44, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1442                             if (is_match)
     1381                            if (isMatch)
    14431382                                RRETURN;
    14441383                            if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
     
    14561395               
    14571396            BEGIN_OPCODE(TYPEEXACT):
    1458                 min = stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1397                min = stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
    14591398                minimize = true;
    14601399                stack.currentFrame->args.instructionPtr += 3;
     
    14641403            BEGIN_OPCODE(TYPEMINUPTO):
    14651404                min = 0;
    1466                 stack.currentFrame->locals.max = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1405                stack.currentFrame->locals.max = get2ByteValue(stack.currentFrame->args.instructionPtr + 1);
    14671406                minimize = *stack.currentFrame->args.instructionPtr == OP_TYPEMINUPTO;
    14681407                stack.currentFrame->args.instructionPtr += 3;
     
    14891428                 the minimum number of characters before we start. */
    14901429               
    1491                 if (min > md.end_subject - stack.currentFrame->args.subjectPtr)
     1430                if (min > md.endSubject - stack.currentFrame->args.subjectPtr)
    14921431                    RRETURN_NO_MATCH;
    14931432                if (min > 0) {
     
    15661505                    for (stack.currentFrame->locals.fi = min;; stack.currentFrame->locals.fi++) {
    15671506                        RECURSIVE_MATCH(48, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1568                         if (is_match)
     1507                        if (isMatch)
    15691508                            RRETURN;
    1570                         if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.end_subject)
     1509                        if (stack.currentFrame->locals.fi >= stack.currentFrame->locals.max || stack.currentFrame->args.subjectPtr >= md.endSubject)
    15711510                            RRETURN;
    15721511                       
     
    16251564                        case OP_NOT_NEWLINE:
    16261565                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1627                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject || isNewline(*stack.currentFrame->args.subjectPtr))
     1566                                if (stack.currentFrame->args.subjectPtr >= md.endSubject || isNewline(*stack.currentFrame->args.subjectPtr))
    16281567                                    break;
    16291568                                stack.currentFrame->args.subjectPtr++;
     
    16331572                        case OP_NOT_DIGIT:
    16341573                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1635                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1574                                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    16361575                                    break;
    16371576                                int c = *stack.currentFrame->args.subjectPtr;
     
    16441583                        case OP_DIGIT:
    16451584                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1646                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1585                                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    16471586                                    break;
    16481587                                int c = *stack.currentFrame->args.subjectPtr;
     
    16551594                        case OP_NOT_WHITESPACE:
    16561595                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1657                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1596                                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    16581597                                    break;
    16591598                                int c = *stack.currentFrame->args.subjectPtr;
     
    16661605                        case OP_WHITESPACE:
    16671606                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1668                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1607                                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    16691608                                    break;
    16701609                                int c = *stack.currentFrame->args.subjectPtr;
     
    16771616                        case OP_NOT_WORDCHAR:
    16781617                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1679                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1618                                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    16801619                                    break;
    16811620                                int c = *stack.currentFrame->args.subjectPtr;
     
    16881627                        case OP_WORDCHAR:
    16891628                            for (int i = min; i < stack.currentFrame->locals.max; i++) {
    1690                                 if (stack.currentFrame->args.subjectPtr >= md.end_subject)
     1629                                if (stack.currentFrame->args.subjectPtr >= md.endSubject)
    16911630                                    break;
    16921631                                int c = *stack.currentFrame->args.subjectPtr;
     
    17061645                    for (;;) {
    17071646                        RECURSIVE_MATCH(52, stack.currentFrame->args.instructionPtr, stack.currentFrame->args.subpatternStart);
    1708                         if (is_match)
     1647                        if (isMatch)
    17091648                            RRETURN;
    17101649                        if (stack.currentFrame->args.subjectPtr-- == stack.currentFrame->locals.subjectPtrAtStartOfInstruction)
     
    17561695               
    17571696                if (stack.currentFrame->locals.number > EXTRACT_BASIC_MAX)
    1758                     stack.currentFrame->locals.number = get2ByteOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 2+LINK_SIZE);
     1697                    stack.currentFrame->locals.number = get2ByteValue(stack.currentFrame->args.instructionPtr + 2 + LINK_SIZE);
    17591698                stack.currentFrame->locals.offset = stack.currentFrame->locals.number << 1;
    17601699               
     
    17651704#endif
    17661705               
    1767                 if (stack.currentFrame->locals.offset < md.offset_max) {
    1768                     stack.currentFrame->locals.save_offset1 = md.offset_vector[stack.currentFrame->locals.offset];
    1769                     stack.currentFrame->locals.save_offset2 = md.offset_vector[stack.currentFrame->locals.offset + 1];
    1770                     stack.currentFrame->locals.save_offset3 = md.offset_vector[md.offset_end - stack.currentFrame->locals.number];
    1771                    
    1772                     DPRINTF(("saving %d %d %d\n", stack.currentFrame->locals.save_offset1, stack.currentFrame->locals.save_offset2, stack.currentFrame->locals.save_offset3));
    1773                     md.offset_vector[md.offset_end - stack.currentFrame->locals.number] = stack.currentFrame->args.subjectPtr - md.start_subject;
     1706                if (stack.currentFrame->locals.offset < md.offsetMax) {
     1707                    stack.currentFrame->locals.saveOffset1 = md.offsetVector[stack.currentFrame->locals.offset];
     1708                    stack.currentFrame->locals.saveOffset2 = md.offsetVector[stack.currentFrame->locals.offset + 1];
     1709                    stack.currentFrame->locals.saveOffset3 = md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number];
     1710                   
     1711                    DPRINTF(("saving %d %d %d\n", stack.currentFrame->locals.saveOffset1, stack.currentFrame->locals.saveOffset2, stack.currentFrame->locals.saveOffset3));
     1712                    md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number] = stack.currentFrame->args.subjectPtr - md.startSubject;
    17741713                   
    17751714                    do {
    17761715                        RECURSIVE_MATCH_STARTNG_NEW_GROUP(1, stack.currentFrame->args.instructionPtr + 1 + LINK_SIZE, stack.currentFrame->args.subpatternStart);
    1777                         if (is_match)
     1716                        if (isMatch)
    17781717                            RRETURN;
    1779                         stack.currentFrame->args.instructionPtr += getOpcodeValueAtOffset(stack.currentFrame->args.instructionPtr, 1);
     1718                        stack.currentFrame->args.instructionPtr += getLinkValue(stack.currentFrame->args.instructionPtr + 1);
    17801719                    } while (*stack.currentFrame->args.instructionPtr == OP_ALT);
    17811720                   
    17821721                    DPRINTF(("bracket %d failed\n", stack.currentFrame->locals.number));
    17831722                   
    1784                     md.offset_vector[stack.currentFrame->locals.offset] = stack.currentFrame->locals.save_offset1;
    1785                     md.offset_vector[stack.currentFrame->locals.offset + 1] = stack.currentFrame->locals.save_offset2;
    1786                     md.offset_vector[md.offset_end - stack.currentFrame->locals.number] = stack.currentFrame->locals.save_offset3;
     1723                    md.offsetVector[stack.currentFrame->locals.offset] = stack.currentFrame->locals.saveOffset1;
     1724                    md.offsetVector[stack.currentFrame->locals.offset + 1] = stack.currentFrame->locals.saveOffset2;
     1725                    md.offsetVector[md.offsetEnd - stack.currentFrame->locals.number] = stack.currentFrame->locals.saveOffset3;
    17871726                   
    17881727                    RRETURN;
     
    18461785   
    18471786RETURN:
    1848     ASSERT(is_match == MATCH_MATCH || is_match == MATCH_NOMATCH);
    1849     return is_match;
     1787    ASSERT(isMatch == MATCH_MATCH || isMatch == MATCH_NOMATCH);
     1788    return isMatch;
    18501789}
    18511790
     
    19041843}
    19051844
    1906 static bool tryRequiredByteOptimization(const UChar*& subjectPtr, const UChar* endSubject, int req_byte, int req_byte2, bool req_byte_caseless, bool hasFirstByte, const UChar*& req_byte_ptr)
     1845static bool tryRequiredByteOptimization(const UChar*& subjectPtr, const UChar* endSubject, int req_byte, int req_byte2, bool req_byte_caseless, bool hasFirstByte, const UChar*& reqBytePtr)
    19071846{
    19081847    /* If req_byte is set, we know that that character must appear in the subject
     
    19261865         place we found it at last time. */
    19271866
    1928         if (p > req_byte_ptr) {
     1867        if (p > reqBytePtr) {
    19291868            if (req_byte_caseless) {
    19301869                while (p < endSubject) {
     
    19531892             the start hasn't passed this character yet. */
    19541893
    1955             req_byte_ptr = p;
     1894            reqBytePtr = p;
    19561895        }
    19571896    }
     
    19681907    ASSERT(offsets || offsetcount == 0);
    19691908   
    1970     MatchData match_block;
    1971     match_block.start_subject = subject;
    1972     match_block.end_subject = match_block.start_subject + length;
    1973     const UChar* end_subject = match_block.end_subject;
    1974    
    1975     match_block.multiline = (re->options & MatchAcrossMultipleLinesOption);
    1976     match_block.ignoreCase = (re->options & IgnoreCaseOption);
     1909    MatchData matchBlock;
     1910    matchBlock.startSubject = subject;
     1911    matchBlock.endSubject = matchBlock.startSubject + length;
     1912    const UChar* endSubject = matchBlock.endSubject;
     1913   
     1914    matchBlock.multiline = (re->options & MatchAcrossMultipleLinesOption);
     1915    matchBlock.ignoreCase = (re->options & IgnoreCaseOption);
    19771916   
    19781917    /* If the expression has got more back references than the offsets supplied can
     
    19891928    if (re->top_backref > 0 && re->top_backref >= ocount/3) {
    19901929        ocount = re->top_backref * 3 + 3;
    1991         match_block.offset_vector = new int[ocount];
    1992         if (!match_block.offset_vector)
     1930        matchBlock.offsetVector = new int[ocount];
     1931        if (!matchBlock.offsetVector)
    19931932            return JSRegExpErrorNoMemory;
    19941933        using_temporary_offsets = true;
    19951934    } else
    1996         match_block.offset_vector = offsets;
    1997    
    1998     match_block.offset_end = ocount;
    1999     match_block.offset_max = (2*ocount)/3;
    2000     match_block.offset_overflow = false;
     1935        matchBlock.offsetVector = offsets;
     1936   
     1937    matchBlock.offsetEnd = ocount;
     1938    matchBlock.offsetMax = (2*ocount)/3;
     1939    matchBlock.offsetOverflow = false;
    20011940   
    20021941    /* Compute the minimum number of offsets that we need to reset each time. Doing
     
    20121951     initialize them to avoid reading uninitialized locations. */
    20131952   
    2014     if (match_block.offset_vector) {
    2015         int* iptr = match_block.offset_vector + ocount;
     1953    if (matchBlock.offsetVector) {
     1954        int* iptr = matchBlock.offsetVector + ocount;
    20161955        int* iend = iptr - resetcount/2 + 1;
    20171956        while (--iptr >= iend)
     
    20481987     the loop runs just once. */
    20491988   
    2050     const UChar* start_match = subject + start_offset;
    2051     const UChar* req_byte_ptr = start_match - 1;
     1989    const UChar* startMatch = subject + start_offset;
     1990    const UChar* reqBytePtr = startMatch - 1;
    20521991    bool useMultiLineFirstCharOptimization = re->options & UseMultiLineFirstByteOptimizationOption;
    20531992   
    20541993    do {
    20551994        /* Reset the maximum number of extractions we might see. */
    2056         if (match_block.offset_vector) {
    2057             int* iptr = match_block.offset_vector;
     1995        if (matchBlock.offsetVector) {
     1996            int* iptr = matchBlock.offsetVector;
    20581997            int* iend = iptr + resetcount;
    20591998            while (iptr < iend)
     
    20612000        }
    20622001       
    2063         tryFirstByteOptimization(start_match, end_subject, first_byte, first_byte_caseless, useMultiLineFirstCharOptimization, match_block.start_subject + start_offset);
    2064         if (tryRequiredByteOptimization(start_match, end_subject, req_byte, req_byte2, req_byte_caseless, first_byte >= 0, req_byte_ptr))
     2002        tryFirstByteOptimization(startMatch, endSubject, first_byte, first_byte_caseless, useMultiLineFirstCharOptimization, matchBlock.startSubject + start_offset);
     2003        if (tryRequiredByteOptimization(startMatch, endSubject, req_byte, req_byte2, req_byte_caseless, first_byte >= 0, reqBytePtr))
    20652004            break;
    20662005               
     
    20732012       
    20742013        /* The code starts after the JSRegExp block and the capture name table. */
    2075         const uschar* start_code = (const uschar*)(re + 1);
     2014        const unsigned char* start_code = (const unsigned char*)(re + 1);
    20762015       
    2077         int returnCode = match(start_match, start_code, 2, match_block);
     2016        int returnCode = match(startMatch, start_code, 2, matchBlock);
    20782017       
    20792018        /* When the result is no match, advance the pointer to the next character
     
    20812020       
    20822021        if (returnCode == MATCH_NOMATCH) {
    2083             start_match++;
     2022            startMatch++;
    20842023            continue;
    20852024        }
     
    20952034        if (using_temporary_offsets) {
    20962035            if (offsetcount >= 4) {
    2097                 memcpy(offsets + 2, match_block.offset_vector + 2, (offsetcount - 2) * sizeof(int));
     2036                memcpy(offsets + 2, matchBlock.offsetVector + 2, (offsetcount - 2) * sizeof(int));
    20982037                DPRINTF(("Copied offsets from temporary memory\n"));
    20992038            }
    2100             if (match_block.end_offset_top > offsetcount)
    2101                 match_block.offset_overflow = true;
     2039            if (matchBlock.endOffsetTop > offsetcount)
     2040                matchBlock.offsetOverflow = true;
    21022041           
    21032042            DPRINTF(("Freeing temporary memory\n"));
    2104             delete [] match_block.offset_vector;
     2043            delete [] matchBlock.offsetVector;
    21052044        }
    21062045       
    2107         returnCode = match_block.offset_overflow ? 0 : match_block.end_offset_top / 2;
     2046        returnCode = matchBlock.offsetOverflow ? 0 : matchBlock.endOffsetTop / 2;
    21082047       
    21092048        if (offsetcount < 2)
    21102049            returnCode = 0;
    21112050        else {
    2112             offsets[0] = start_match - match_block.start_subject;
    2113             offsets[1] = match_block.end_match_ptr - match_block.start_subject;
     2051            offsets[0] = startMatch - matchBlock.startSubject;
     2052            offsets[1] = matchBlock.endMatchPtr - matchBlock.startSubject;
    21142053        }
    21152054       
    21162055        DPRINTF((">>>> returning %d\n", rc));
    21172056        return returnCode;
    2118     } while (start_match <= end_subject);
     2057    } while (startMatch <= endSubject);
    21192058   
    21202059    if (using_temporary_offsets) {
    21212060        DPRINTF(("Freeing temporary memory\n"));
    2122         delete [] match_block.offset_vector;
     2061        delete [] matchBlock.offsetVector;
    21232062    }
    21242063   
  • trunk/JavaScriptCore/pcre/pcre_internal.h

    r28525 r28793  
    7777#endif
    7878
     79#include "pcre.h"
     80
    7981/* The value of LINK_SIZE determines the number of bytes used to store links as
    8082offsets within the compiled regex. The default is 2, which allows for compiled
    81 patterns up to 64K long. This covers the vast majority of cases. However, PCRE
    82 can also be compiled to use 3 or 4 bytes instead. This allows for longer
    83 patterns in extreme cases. On systems that support it, "configure" can be used
    84 to override this default. */
     83patterns up to 64K long. */
    8584
    8685#define LINK_SIZE   2
    87 
    88 /* The below limit restricts the number of recursive match calls in order to
    89 limit the maximum amount of stack (or heap, if NO_RECURSE is defined) that is used. The
    90 value of MATCH_LIMIT_RECURSION applies only to recursive calls of match().
    91  
    92  This limit is tied to the size of MatchFrame.  Right now we allow PCRE to allocate up
    93  to MATCH_LIMIT_RECURSION - 16 * sizeof(MatchFrame) bytes of "stack" space before we give up.
    94  Currently that's 100000 - 16 * (23 * 4)  ~ 90MB
    95  */
    96 
    97 #define MATCH_LIMIT_RECURSION 100000
    98 
    99 #define _pcre_default_tables kjs_pcre_default_tables
    100 #define _pcre_ord2utf8 kjs_pcre_ord2utf8
    101 #define _pcre_utf8_table1 kjs_pcre_utf8_table1
    102 #define _pcre_utf8_table2 kjs_pcre_utf8_table2
    103 #define _pcre_utf8_table3 kjs_pcre_utf8_table3
    104 #define _pcre_utf8_table4 kjs_pcre_utf8_table4
    105 #define _pcre_xclass kjs_pcre_xclass
    10686
    10787/* Define DEBUG to get debugging output on stdout. */
     
    121101#define DPRINTF(p) /*nothing*/
    122102#endif
    123 
    124 /* Standard C headers plus the external interface definition. The only time
    125 setjmp and stdarg are used is when NO_RECURSE is set. */
    126 
    127 #include <ctype.h>
    128 #include <limits.h>
    129 #include <setjmp.h>
    130 #include <stdarg.h>
    131 #include <stddef.h>
    132 #include <stdio.h>
    133 #include <stdlib.h>
    134 #include <string.h>
    135 
    136 /* Include the public PCRE header and the definitions of UCP character property
    137 values. */
    138 
    139 #include "pcre.h"
    140 
    141 typedef unsigned short pcre_uint16;
    142 typedef unsigned pcre_uint32;
    143 typedef unsigned char uschar;
    144103
    145104/* PCRE keeps offsets in its compiled code as 2-byte quantities (always stored
     
    149108for almost everybody. However, I received a request for an even bigger limit.
    150109For this reason, and also to make the code easier to maintain, the storing and
    151 loading of offsets from the byte string is now handled by the macros that are
    152 defined here.
    153 
    154 The macros are controlled by the value of LINK_SIZE. This defaults to 2 in
    155 the config.h file, but can be overridden by using -D on the command line. This
    156 is automated on Unix systems via the "configure" command. */
    157 
    158 #if LINK_SIZE == 2
    159 
    160 static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned short value)
    161 {
    162     opcodePtr[offset] = value >> 8;
    163     opcodePtr[offset + 1] = value & 255;
    164 }
    165 
    166 static inline short getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
    167 {
    168     return ((opcodePtr[offset] << 8) | opcodePtr[offset + 1]);
    169 }
    170 
    171 #define MAX_PATTERN_SIZE (1 << 16)
    172 
    173 #elif LINK_SIZE == 3
    174 
    175 static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned value)
    176 {
    177     ASSERT(!(value & 0xFF000000)); // This function only allows values < 2^24
    178     opcodePtr[offset] = value >> 16;
    179     opcodePtr[offset + 1] = value >> 8;
    180     opcodePtr[offset + 2] = value & 255;
    181 }
    182 
    183 static inline int getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
    184 {
    185     return ((opcodePtr[offset] << 16) | (opcodePtr[offset + 1] << 8) | opcodePtr[offset + 2]);
    186 }
    187 
    188 #define MAX_PATTERN_SIZE (1 << 24)
    189 
    190 #elif LINK_SIZE == 4
    191 
    192 static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned value)
    193 {
    194     opcodePtr[offset] = value >> 24;
    195     opcodePtr[offset + 1] = value >> 16;
    196     opcodePtr[offset + 2] = value >> 8;
    197     opcodePtr[offset + 3] = value & 255;
    198 }
    199 
    200 static inline int getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
    201 {
    202     return ((opcodePtr[offset] << 24) | (opcodePtr[offset + 1] << 16) | (opcodePtr[offset + 2] << 8) | opcodePtr[offset + 3]);
    203 }
    204 
    205 #define MAX_PATTERN_SIZE (1 << 30)   /* Keep it positive */
    206 
    207 #else
    208 #error LINK_SIZE must be either 2, 3, or 4
    209 #endif
    210 
    211 static inline void putOpcodeValueAtOffsetAndAdvance(uschar*& opcodePtr, size_t offset, unsigned short value)
    212 {
    213     putOpcodeValueAtOffset(opcodePtr, offset, value);
    214     opcodePtr += LINK_SIZE;
    215 }
     110loading of offsets from the byte string is now handled by the functions that are
     111defined here. */
    216112
    217113/* PCRE uses some other 2-byte quantities that do not change when the size of
     
    219115capturing parenthesis numbers in back references. */
    220116
    221 static inline void put2ByteOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned short value)
    222 {
    223     opcodePtr[offset] = value >> 8;
    224     opcodePtr[offset + 1] = value & 255;
    225 }
    226 
    227 static inline short get2ByteOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
    228 {
    229     return ((opcodePtr[offset] << 8) | opcodePtr[offset + 1]);
    230 }
    231 
    232 static inline void put2ByteOpcodeValueAtOffsetAndAdvance(uschar*& opcodePtr, size_t offset, unsigned short value)
    233 {
    234     put2ByteOpcodeValueAtOffset(opcodePtr, offset, value);
     117static inline void put2ByteValue(unsigned char* opcodePtr, int value)
     118{
     119    ASSERT(value >= 0 && value <= 0xFFFF);
     120    opcodePtr[0] = value >> 8;
     121    opcodePtr[1] = value;
     122}
     123
     124static inline int get2ByteValue(const unsigned char* opcodePtr)
     125{
     126    return (opcodePtr[0] << 8) | opcodePtr[1];
     127}
     128
     129static inline void put2ByteValueAndAdvance(unsigned char*& opcodePtr, int value)
     130{
     131    put2ByteValue(opcodePtr, value);
    235132    opcodePtr += 2;
     133}
     134
     135static inline void putLinkValueAllowZero(unsigned char* opcodePtr, int value)
     136{
     137    put2ByteValue(opcodePtr, value);
     138}
     139
     140static inline int getLinkValueAllowZero(const unsigned char* opcodePtr)
     141{
     142    return get2ByteValue(opcodePtr);
     143}
     144
     145#define MAX_PATTERN_SIZE (1 << 16)
     146
     147static inline void putLinkValue(unsigned char* opcodePtr, int value)
     148{
     149    ASSERT(value);
     150    putLinkValueAllowZero(opcodePtr, value);
     151}
     152
     153static inline int getLinkValue(const unsigned char* opcodePtr)
     154{
     155    int value = getLinkValueAllowZero(opcodePtr);
     156    ASSERT(value);
     157    return value;
     158}
     159
     160static inline void putLinkValueAndAdvance(unsigned char*& opcodePtr, int value)
     161{
     162    putLinkValue(opcodePtr, value);
     163    opcodePtr += LINK_SIZE;
     164}
     165
     166static inline void putLinkValueAllowZeroAndAdvance(unsigned char*& opcodePtr, int value)
     167{
     168    putLinkValueAllowZero(opcodePtr, value);
     169    opcodePtr += LINK_SIZE;
    236170}
    237171
     
    246180};
    247181
    248 /* Negative values for the firstchar and reqchar variables */
    249 
    250 #define REQ_UNSET (-2)
    251 #define REQ_NONE  (-1)
    252 
    253 /* The maximum remaining length of subject we are prepared to search for a
    254 req_byte match. */
    255 
    256 #define REQ_BYTE_MAX 1000
    257 
    258182/* Flags added to firstbyte or reqbyte; a "non-literal" item is either a
    259183variable-length repeat, or a anything other than literal characters. */
     
    367291    macro(ASSERT_NOT) \
    368292    \
    369     macro(ONCE) \
    370     \
    371293    macro(BRAZERO) \
    372294    macro(BRAMINZERO) \
     
    382304
    383305/* The highest extraction number before we have to start using additional
    384 bytes. (Originally PCRE didn't have support for extraction counts highter than
     306bytes. (Originally PCRE didn't have support for extraction counts higher than
    385307this number.) The value is limited by the number of opcodes left after OP_BRA,
    386308i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional
    387309opcodes. */
    388310
     311/* FIXME: Note that OP_BRA + 100 is > 128, so the two comments above
     312are in conflict! */
     313
    389314#define EXTRACT_BASIC_MAX  100
    390 
    391 /* This macro defines the length of fixed length operations in the compiled
    392 regex. The lengths are used when searching for specific things, and also in the
    393 debugging printing of a compiled regex. We use a macro so that it can be
    394 defined close to the definitions of the opcodes themselves.
    395 
    396 As things have been extended, some of these are no longer fixed lenths, but are
    397 minima instead. For example, the length of a single-character repeat may vary
    398 in UTF-8 mode. The code that uses this table must know about such things. */
    399 
    400 #define OP_LENGTHS \
    401   1,                             /* End                                    */ \
    402   1, 1, 1, 1, 1, 1, 1, 1,        /* \B, \b, \D, \d, \S, \s, \W, \w         */ \
    403   1,                             /* Any                                    */ \
    404   1, 1,                          /* ^, $                                   */ \
    405   2, 2,                          /* Char, Charnc - minimum lengths         */ \
    406   2, 2,                          /* ASCII char or non-cased                */ \
    407   2,                             /* not                                    */ \
    408   /* Positive single-char repeats                            ** These are  */ \
    409   2, 2, 2, 2, 2, 2,              /* *, *?, +, +?, ?, ??      ** minima in  */ \
    410   4, 4, 4,                       /* upto, minupto, exact     ** UTF-8 mode */ \
    411   /* Negative single-char repeats - only for chars < 256                   */ \
    412   2, 2, 2, 2, 2, 2,              /* NOT *, *?, +, +?, ?, ??                */ \
    413   4, 4, 4,                       /* NOT upto, minupto, exact               */ \
    414   /* Positive type repeats                                                 */ \
    415   2, 2, 2, 2, 2, 2,              /* Type *, *?, +, +?, ?, ??               */ \
    416   4, 4, 4,                       /* Type upto, minupto, exact              */ \
    417   /* Character class & ref repeats                                         */ \
    418   1, 1, 1, 1, 1, 1,              /* *, *?, +, +?, ?, ??                    */ \
    419   5, 5,                          /* CRRANGE, CRMINRANGE                    */ \
    420  33,                             /* CLASS                                  */ \
    421  33,                             /* NCLASS                                 */ \
    422   0,                             /* XCLASS - variable length               */ \
    423   3,                             /* REF                                    */ \
    424   1 + LINK_SIZE,                   /* Alt                                    */ \
    425   1 + LINK_SIZE,                   /* Ket                                    */ \
    426   1 + LINK_SIZE,                   /* KetRmax                                */ \
    427   1 + LINK_SIZE,                   /* KetRmin                                */ \
    428   1 + LINK_SIZE,                   /* Assert                                 */ \
    429   1 + LINK_SIZE,                   /* Assert not                             */ \
    430   1 + LINK_SIZE,                   /* Once                                   */ \
    431   1, 1,                          /* BRAZERO, BRAMINZERO                    */ \
    432   3,                             /* BRANUMBER                              */ \
    433   1 + LINK_SIZE                    /* BRA                                    */ \
    434 
    435315
    436316/* The index of names and the
     
    443323
    444324struct JSRegExp {
    445     pcre_uint32 options;
    446 
    447     pcre_uint16 top_bracket;
    448     pcre_uint16 top_backref;
     325    unsigned options;
     326
     327    unsigned short top_bracket;
     328    unsigned short top_backref;
    449329   
    450     // jsRegExpExecute && jsRegExpCompile currently only how to handle ASCII
    451     // chars for thse optimizations, however it would be trivial to add support
    452     // for optimized UChar first_byte/req_byte scans
    453     pcre_uint16 first_byte;
    454     pcre_uint16 req_byte;
     330    unsigned short first_byte;
     331    unsigned short req_byte;
    455332};
    456333
     
    460337 pcre_tables.c module. */
    461338
    462 #define _pcre_utf8_table1_size 6
    463 
    464 extern const int    _pcre_utf8_table1[6];
    465 extern const int    _pcre_utf8_table2[6];
    466 extern const int    _pcre_utf8_table3[6];
    467 extern const uschar _pcre_utf8_table4[0x40];
    468 
    469 extern const uschar _pcre_default_tables[tables_length];
    470 
    471 static inline uschar toLowerCase(uschar c)
    472 {
    473     static const uschar* lowerCaseChars = _pcre_default_tables + lcc_offset;
     339#define kjs_pcre_utf8_table1_size 6
     340
     341extern const int    kjs_pcre_utf8_table1[6];
     342extern const int    kjs_pcre_utf8_table2[6];
     343extern const int    kjs_pcre_utf8_table3[6];
     344extern const unsigned char kjs_pcre_utf8_table4[0x40];
     345
     346extern const unsigned char kjs_pcre_default_tables[tables_length];
     347
     348static inline unsigned char toLowerCase(unsigned char c)
     349{
     350    static const unsigned char* lowerCaseChars = kjs_pcre_default_tables + lcc_offset;
    474351    return lowerCaseChars[c];
    475352}
    476353
    477 static inline uschar flipCase(uschar c)
    478 {
    479     static const uschar* flippedCaseChars = _pcre_default_tables + fcc_offset;
     354static inline unsigned char flipCase(unsigned char c)
     355{
     356    static const unsigned char* flippedCaseChars = kjs_pcre_default_tables + fcc_offset;
    480357    return flippedCaseChars[c];
    481358}
    482359
    483 static inline uschar classBitmapForChar(uschar c)
    484 {
    485     static const uschar* charClassBitmaps = _pcre_default_tables + cbits_offset;
     360static inline unsigned char classBitmapForChar(unsigned char c)
     361{
     362    static const unsigned char* charClassBitmaps = kjs_pcre_default_tables + cbits_offset;
    486363    return charClassBitmaps[c];
    487364}
    488365
    489 static inline uschar charTypeForChar(uschar c)
    490 {
    491     const uschar* charTypeMap = _pcre_default_tables + ctypes_offset;
     366static inline unsigned char charTypeForChar(unsigned char c)
     367{
     368    const unsigned char* charTypeMap = kjs_pcre_default_tables + ctypes_offset;
    492369    return charTypeMap[c];
    493370}
     
    495372static inline bool isWordChar(UChar c)
    496373{
    497     /* UTF8 Characters > 128 are assumed to be "non-word" characters. */
    498     return (c < 128 && (charTypeForChar(c) & ctype_word));
     374    return c < 128 && (charTypeForChar(c) & ctype_word);
    499375}
    500376
    501377static inline bool isSpaceChar(UChar c)
    502378{
    503     return (c < 128 && (charTypeForChar(c) & ctype_space));
    504 }
    505 
    506 /* Internal shared functions. These are functions that are used by more than
    507 one of the exported public functions. They have to be "external" in the C
    508 sense, but are not part of the PCRE public API. */
    509 
    510 extern int         _pcre_ucp_othercase(const unsigned int);
    511 extern bool        _pcre_xclass(int, const uschar*);
     379    return c < 128 && (charTypeForChar(c) & ctype_space);
     380}
    512381
    513382static inline bool isNewline(UChar nl)
     
    516385}
    517386
    518 // FIXME: It's unclear to me if this moves the opcode ptr to the start of all branches
    519 // or to the end of all branches -- ecs
    520 // FIXME: This abstraction is poor since it assumes that you want to jump based on whatever
    521 // the next value in the stream is, and *then* follow any OP_ALT branches.
    522 static inline void moveOpcodePtrPastAnyAlternateBranches(const uschar*& opcodePtr)
    523 {
    524     do {
    525         opcodePtr += getOpcodeValueAtOffset(opcodePtr, 1);
    526     } while (*opcodePtr == OP_ALT);
    527 }
     387static inline bool isBracketStartOpcode(unsigned char opcode)
     388{
     389    if (opcode >= OP_BRA)
     390        return true;
     391    switch (opcode) {
     392        case OP_ASSERT:
     393        case OP_ASSERT_NOT:
     394            return true;
     395        default:
     396            return false;
     397    }
     398}
     399
     400static inline void advanceToEndOfBracket(const unsigned char*& opcodePtr)
     401{
     402    ASSERT(isBracketStartOpcode(*opcodePtr) || *opcodePtr == OP_ALT);
     403    do
     404        opcodePtr += getLinkValue(opcodePtr + 1);
     405    while (*opcodePtr == OP_ALT);
     406}
     407
     408/* Internal shared functions. These are functions that are used in more
     409that one of the source files. They have to have external linkage, but
     410but are not part of the public API and so not exported from the library. */
     411
     412extern int kjs_pcre_ucp_othercase(unsigned);
     413extern bool kjs_pcre_xclass(int, const unsigned char*);
    528414
    529415#endif
  • trunk/JavaScriptCore/pcre/pcre_tables.cpp

    r27730 r28793  
    5050character. */
    5151
    52 const int _pcre_utf8_table1[6] =
     52const int kjs_pcre_utf8_table1[6] =
    5353  { 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff};
    5454
     
    5656first byte of a character, indexed by the number of additional bytes. */
    5757
    58 const int _pcre_utf8_table2[6] = { 0,    0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
    59 const int _pcre_utf8_table3[6] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
     58const int kjs_pcre_utf8_table2[6] = { 0,    0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
     59const int kjs_pcre_utf8_table3[6] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
    6060
    6161/* Table of the number of extra characters, indexed by the first character
     
    63630x3d. */
    6464
    65 const uschar _pcre_utf8_table4[0x40] = {
     65const unsigned char kjs_pcre_utf8_table4[0x40] = {
    6666  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
    6767  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
  • trunk/JavaScriptCore/pcre/pcre_ucp_searchfuncs.cpp

    r28161 r28793  
    6060*/
    6161
    62 int _pcre_ucp_othercase(const unsigned c)
     62int kjs_pcre_ucp_othercase(unsigned c)
    6363{
    6464    int bot = 0;
  • trunk/JavaScriptCore/pcre/pcre_xclass.cpp

    r28169 r28793  
    6060 know we are in UTF-8 mode. */
    6161
    62 static inline void getUTF8CharAndAdvancePointer(int& c, const uschar*& subjectPtr)
     62static inline void getUTF8CharAndAdvancePointer(int& c, const unsigned char*& subjectPtr)
    6363{
    6464    c = *subjectPtr++;
    6565    if ((c & 0xc0) == 0xc0) {
    66         int gcaa = _pcre_utf8_table4[c & 0x3f];  /* Number of additional bytes */
     66        int gcaa = kjs_pcre_utf8_table4[c & 0x3f];  /* Number of additional bytes */
    6767        int gcss = 6 * gcaa;
    68         c = (c & _pcre_utf8_table3[gcaa]) << gcss;
     68        c = (c & kjs_pcre_utf8_table3[gcaa]) << gcss;
    6969        while (gcaa-- > 0) {
    7070            gcss -= 6;
     
    7474}
    7575
    76 bool _pcre_xclass(int c, const uschar* data)
     76bool kjs_pcre_xclass(int c, const unsigned char* data)
    7777{
    7878    bool negated = (*data & XCL_NOT);
  • trunk/JavaScriptCore/pcre/ucpinternal.h

    r27686 r28793  
    4646
    4747typedef struct cnode {
    48   pcre_uint32 f0;
    49   pcre_uint32 f1;
     48  unsigned f0;
     49  unsigned f1;
    5050} cnode;
    5151
  • trunk/JavaScriptCore/wtf/ASCIICType.h

    r27686 r28793  
    4949    inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
    5050#endif
     51    inline bool isASCIIAlpha(int c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
    5152
    5253    inline bool isASCIIAlphanumeric(char c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
     
    5556    inline bool isASCIIAlphanumeric(wchar_t c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
    5657#endif
     58    inline bool isASCIIAlphanumeric(int c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
    5759
    5860    inline bool isASCIIDigit(char c) { return (c >= '0') & (c <= '9'); }
     
    6870    inline bool isASCIIHexDigit(wchar_t c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'f'; }
    6971#endif
     72    inline bool isASCIIHexDigit(int c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'f'; }
    7073
    7174    inline bool isASCIILower(char c) { return c >= 'a' && c <= 'z'; }
     
    7477    inline bool isASCIILower(wchar_t c) { return c >= 'a' && c <= 'z'; }
    7578#endif
     79    inline bool isASCIILower(int c) { return c >= 'a' && c <= 'z'; }
    7680
    7781    inline bool isASCIISpace(char c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
     
    8084    inline bool isASCIISpace(wchar_t c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
    8185#endif
     86    inline bool isASCIISpace(int c) { return c == '\t' || c == '\n' || c == '\v' || c =='\f' || c == '\r' || c == ' '; }
    8287
    8388    inline char toASCIILower(char c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
     
    8691    inline wchar_t toASCIILower(wchar_t c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
    8792#endif
     93    inline int toASCIILower(int c) { return c | ((c >= 'A' && c <= 'Z') << 5); }
    8894
    8995    inline char toASCIIUpper(char c) { return static_cast<char>(c & ~((c >= 'a' && c <= 'z') << 5)); }
     
    9298    inline wchar_t toASCIIUpper(wchar_t c) { return static_cast<wchar_t>(c & ~((c >= 'a' && c <= 'z') << 5)); }
    9399#endif
     100    inline int toASCIIUpper(int c) { return static_cast<int>(c & ~((c >= 'a' && c <= 'z') << 5)); }
    94101
    95102}
Note: See TracChangeset for help on using the changeset viewer.