From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=5.0 tests=AWL,LONGWORDS,SPF_NEUTRAL autolearn=disabled version=3.1.3 Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 8843CBC6B for ; Tue, 29 May 2007 21:03:13 +0200 (CEST) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.181]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l4TJ3CT8006062 for ; Tue, 29 May 2007 21:03:12 +0200 Received: by py-out-1112.google.com with SMTP id u52so1020456pyb for ; Tue, 29 May 2007 12:03:12 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:content-type; b=p+/DPwrazi9w1FyK8K3JPnguC1LgnOvuP2jwjVcyFyK3PvLJ9rXcehJRCKuOg7JWp7KfmStmg2GiNRxDrqFRabfWlkbDPEkhgLN5QtpjIurhu8LINDBGaBD13jgDT+pTykR8iZ6wYeJ1oVY+foj5r22uggupjwonIAeeg46GJAE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:content-type; b=JOq3xXVBFc9hclRw3WrZZXgm0C32QqEQDCk/wc/VeRXTDcv7zmLUYe7Fzq7zWWIUOz0faagyADcnwaRBlIMQgVNqgonkkjOJdMFAElDbdkjwS30vsMKHAKJsgBreI9TTO2lMVfCLHBKDPRA35XUaXvQXNR0dUsNPxXLY2D1964c= Received: by 10.65.203.3 with SMTP id f3mr13521745qbq.1180465382069; Tue, 29 May 2007 12:03:02 -0700 (PDT) Received: from ?192.168.0.7? ( [70.243.139.152]) by mx.google.com with ESMTP id e16sm13185287qba.2007.05.29.12.03.01; Tue, 29 May 2007 12:03:01 -0700 (PDT) Message-ID: <465C7954.1060208@gmail.com> Date: Tue, 29 May 2007 14:04:52 -0500 From: Edgar Friendly User-Agent: Thunderbird 2.0.0.0 (X11/20070326) MIME-Version: 1.0 To: caml-list Subject: GtkSourceView2.0 syntax highlighting Content-Type: multipart/mixed; boundary="------------010502060705000103000207" X-j-chkmail-Score: MSGID : 465C78F0.002 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 465C78F0.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; syntax:01 syntax:01 parsing:01 ocaml:01 literals:01 variants:01 constructors:01 literals:01 booleans:01 downto:01 val:01 functor:01 mutable:01 struct:01 sig:01 X-Attachments: cset="UTF-8" type="application/x-gzip" name="ocaml2.lang.gz" name="ocaml2.lang.gz" This is a multi-part message in MIME format. --------------010502060705000103000207 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit The upcoming version of GtkSourceView, the library used by many Gnome text editors for syntax highlighting, supports new and more powerful language parsing. I volunteer to update the language definition for Ocaml, and would like some feedback from the community regarding useful things to highlight. The following are currently matched: ==================================== * Comments (* *), and within them, email addresses, net addresses and TODO/FIXME/XXX * Decimal, Octal, Hex, Binary and Floating point literals * Labeled and Optional function arguments * Polymorphic variants and normal Variant constructors * Module paths (as a prefix to anything) * Strings and Character Literals, with all escape codes allowed within them sub-matched The reasonably large list of keywords has been broken into four sections. (I encourage comments on this division.) 1) booleans true false 2) flow control & common keywords and assert begin do done downto else end for fun function if in let match rec then to try val when while with 3) types, objects & modules as class constraint exception external functor include inherit initializer method module mutable new object of open private struct sig type virtual 4) function-like keywords asr land lazy lor lsl lsr lxor mod or Things not matched currently ============================ * Line number directives (probably never seen in actual code) * Record constructors - { record with label:value; label:value } * Object duplication - {< var = value; var = value >} * List literals - [ elem1; elem2; elem3 ] * Array literals - [| elem1; elem2; elem3 |] * Tuples - elem1, elem2, elem3 (hard to parse - no parentheses needed, only commas) * Array access and modification - arr.(i), arr.(i) <- 5 * String access and modification - str.[i], str[i] <- 'w' * Coercion - ( expr :> type ), (expr : type1 :> type2) * Method calls - obj#method args * There's a ton of character-sequence keywords, are there any that should be handled as a keyword, or should they be handled only in the above cases where they're used? != # & && ' ( ) * + , - -. -> . .. : :: := :> ; ;; < <- = > >] >} ? ?? [ [< [> [| ] _ ` { {< | |] } ~ * What about camlp4 keywords? New keywords in the new camlp4? parser << <: >> $ $$ $: Thanks for your comments. Attached is a compressed version of my current language definition in the XML format required by gtksourceview. E. --------------010502060705000103000207 Content-Type: application/x-gzip; name="ocaml2.lang.gz" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="ocaml2.lang.gz" H4sICMZ4XEYAA29jYW1sMi5sYW5nAL1YX2/jNgx/vn4KzcDWNqua3u1h2yF/0PZWYNhtV+C6 YVgcdIqtJFplKZOVJmnTffZJtuXIlq2leViBBpZI/khRJCWqN1wnFDxikRLO+sHb84sAYBbx mLBZP/j17gZ+FwwHRz2K2GyJZhiQuB/wyV84kuQRwwglNAD3DCW4H3wy0+A6my5R32nU+1QT 9fAzX4oIp8HgCIBegiWKkUR6oIYLwRdYyA3IIROi6JuFZpZ4LbtryLXKXtfwZRjdHYgepnJD cVoAZoPM6ognCWayNPfajBO0gJL3gxhP3xumriM+QSmGDBIm8QyLEuVKT/8CfjTTNlhNxMVM pVB+LrE+F0Mbo2BxZR/wZsVFXAr/ZMa2tGFyxbXLYB3jTrn6DCQ8XiouLkC+0eDhddDTJXOQ r5aESsKgomVR8GpMypGSn8EFJ9Ye3hTT4DaftsFqIi5mjCOSoF38fjBjG8UwNYirgIM6OHcA agbcZTMViJLRBaFogncWfNQjHAMkZks3Nn04C0438BEJgizv3KrJhIvFnETgN0OzEVvA6jiF LLjmTIXjMpJc7IWTB1IJ83MeV7dIzvcSx2mELO/m2dG9niMBDMlGKeYaMpdzihHbxWI+BpFe juMTO+F6XVNL9EBRCSM6ek1xyWYwFCq913mNUUAk1juXcwAQTkaX8I+x/kHw6QJ+f3887uTS XVu8DZHylYuokPZGVKvUpdPyKEzx3wHIVqY4p6WjjYJegmQ0H4ThyUkYbsNgGx5v2VZuJ1tx uh0phePnb162a/2F4PQS3oyf372cqiqciRWWFGoHb1wrykJsmWDmShvUzgg5CE/Cjt4E/W0o mMWDsBMqhfrLzBIW0WWMB0dvSl0ZsN5SVXcK/PcdEx/axlKmarJrcVkpLIvNXN1rkxEcDzMv 6Z/7cWdEP7LxMJy0OMjVNsdryJbJRB8nlsLaadKo92K0/j3Tq3ZF7Y31mRvyKjt4JBE93BL+ Sav/Nvs5QPuEMCQ2h6ufXCnNb/X/AcqrZ0eTEbXT5T+j4CQ8L75OhyejH/B49HWN43S4t3nF qWGZk8/UrBj9MxyHXz6XJeRlbwXV48TSUyHU1P2pdJUFcH9d6pRYZEeCpaY4OVQ2Z+xfQAiK ZAWqwCEKjs+PhwBcSiDnWF1YdG6fAcbPAYCwvhe2VWobKuPTzsmwH563VS/X2iantPjjMHeY O6GFX0zVamPQVBiDlqoIir9qcbTOg0PKYqSOYhRJLODuKHXNVv1EDJGEVJ9M6rsfqDsErq/m uGk1x//naoprQvNazB2ixC2uqgO9ll7XjOrUKaJpnVy3QQe3useu9H1ECk7BV0CfVrs7clrG dMVeQ63YaW7QjiWIxe1mojRVzVQ7fYJnhLWTY+6jMY9/Yr5i0iONXffZVN+aplx4iEvPckyP 0s5Bph6aR45ij5Oz+tBOFjhqJ6oq6FHr87AUm3biI6LtxJVX52pOqGfrVkTW19qUF1nvf1b0 oSlQQVx0py05kXW1blZUmt2G6G+3M6IqNzzkrCNC6gbgidJ1hBf+eFLmY8F8zs5i0hfRRZnz McyxIB47s94GUfKEPWqUJ+fck3X57njoS4kmPgaGV+3EPA48dE9e8oUvXBeCPCLpsStvfD10 MvMkmQpjT5YRIZfO5nuvp0WFMjFdjXb7/aVUprOpFKPkAbunSyUlPDFAvWcJRU+eikJ9MUxT TwZQr0lrH27ii1hH0NsSVV8+972Z1F4VXYbaK5PLYPWErTyVfq2Vq9pXtbI1d0Ct7A1XwVbe 6pNmg4X1i1jX6erzdsedr7QnLtm0Gi6lXaZMsQa43WnjEt0c3etSWrzhmGemXte8vA+O/gWR +s27rBcAAA== --------------010502060705000103000207--