Re: spelling: stripping non-alphabetic characters

From: Stephen Viles (sviles_abi@iinet.net.au)
Date: Thu Jul 24 2003 - 05:15:40 EDT

  • Next message: Dom Lachowicz: "Re: 1.99.3 soon?"

    Re-posting as text (rather than HTML)

    24/07/03 3:33:26 AM, Raphael Finkel <raphael@cs.uky.edu> wrote:

    I enclose a patch that I have built to strip leading and trailing
    non-alphabetic characters from strings sent to the spellcheck apparatus.
    It's a bit tricky: alphabetic characters are those with Unicode types
    Lm, Lo, Lu, Ll, Lt, Mn, Me, and Mc (the L ones are letters, the M ones
    are letter-like marks, such as accents).

    The patch introduces two new routines in ut_string:
    UT_UCS2_isalphaormark() and UT_UCS4_isalphaormark(), which are
    implemented by binary search (included) through a table I built with the
    uniset program. Then in fl_BlockLayout.cpp I changed two "if"s to
    "while"s, to repeatedly remove leading/trailing stuff, and changed the
    choice of what to remove to call my isalphaormark() routines.

    There may be more elegant ways to do this.

    Raphael





    This archive was generated by hypermail 2.1.4 : Thu Jul 24 2003 - 05:32:12 EDT