From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id q1G9TS4N020717 for ; Thu, 16 Feb 2012 10:29:32 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkAFAPnLPE8+3JIEgWdsb2JhbABEgw6tWCIBARYmJ4IgiiWZPp93jB4LBAUFAQIBCAQEEgYDAoQ5Dg2COmMEmxONDA X-IronPort-AV: E=Sophos;i="4.73,428,1325458800"; d="scan'208";a="131602687" Received: from a.mx.strauss-acoustics.ch ([62.220.146.4]) by mail4-smtp-sop.national.inria.fr with ESMTP; 16 Feb 2012 10:29:32 +0100 Received: from [192.168.1.102] (80-218-101-216.dclient.hispeed.ch [80.218.101.216]) by a.mx.strauss-acoustics.ch (Postfix) with ESMTPSA id 8328D69C5E9 for ; Thu, 16 Feb 2012 10:29:31 +0100 (CET) From: Philippe Strauss Content-Type: text/plain; charset=iso-8859-1 Date: Thu, 16 Feb 2012 10:29:30 +0100 Message-Id: <32FE3555-556A-43AC-8B1B-9A4AF08DBA02@strauss-acoustics.ch> To: caml-list@inria.fr Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id q1G9TS4N020717 Subject: [Caml-list] ocaml-pcre and UTF-8 Hello caml'ers, How do I convince PCRE to be UTF-8 friendly? example: -- open Pcre external show : 'a -> string = "%show" let recomp = regexp ~flags:[`UTF8; `CASELESS] let res_w = "(*UTF8)^(\w+)$" let rec_w = recomp res_w let accents = ["blurb"; "toxicité"; "velléités"; "à"; "où"; "über"; "marie-jeanne"] let () = Printf.printf "config_utf8=%b\n" Pcre.config_utf8 ; List.iter (fun word -> try let sub = Pcre.extract_opt ~full_match:false ~rex:rec_w word in print_endline (show sub) with Not_found -> Printf.eprintf "Not_found was raised on \"%s\" :-(\n%!" word ) accents -- at least on my setup it gives: -- philou@air:~/mysrc/web/myco$ ./pcre_utf config_utf8=true [|Some ("blurb")|] Not_found was raised on "toxicité" :-( Not_found was raised on "velléités" :-( Not_found was raised on "à" :-( Not_found was raised on "où" :-( Not_found was raised on "über" :-( Not_found was raised on "marie-jeanne" :-( -- A php user which happen to be a nice guy got it working on PHP :-( how lame. -- Philippe Strauss http://www.strauss-acoustics.ch/