OCaml Weekly News
Hello
Here is the latest OCaml Weekly News, for the week of August 08 to 15, 2023.
Table of Contents
- decimal 1.0.0
- rtree 0.1.0
- Shuttle 0.9.1 released
- dream-html 1.0.0
- tiny_httpd 0.14
- kcas and kcas_data 0.6.1: STM and compositional lock-dree data structures
- ppx_update 0.81
- Is a mutable project structure inherently slower?
- OCaml.org Newsletter: July 2023
- Using
[@poll error]
attribute to implement systhread safe data structures - forester 2.3
- Old CWN
decimal 1.0.0
Yawar Amin announced
Small update that we have released a bugfix 1.0.1: https://ocaml.org/p/decimal/latest
This fixes the parsing of floating-point numbers into decimal. It was off by an order of magnitude. Yikes!
rtree 0.1.0
Patrick Ferris announced
Hello :wave:
I’m happy to announce on behalf of the Geocaml organisation, a pure OCaml R-tree library. This library is an updated version of https://github.com/mariusae/ocaml-rtree and you can read a little about it in this short blog post. It supports OMT-style bulk-loading for producing efficient trees.
Happy hacking :deciduous_tree:
Shuttle 0.9.1 released
Anurag Soni announced
Some new updates since the last release:
shuttle_http
now supports HTTP/1.1 servers that useasync_ssl
based encrypted connections.- A server context is forwarded to http services. This context can be used to lookup peer client’s address, check if the http service is running on a SSL encrypted server, and if SSL is used lookup peer client’s SSL certificates.
-
HTTP servers support web socket upgrades now. The implementation is based on Janestreet’s web socket library (https://github.com/janestreet/async_websocket), and shuttle has a companion library
shuttle_websocket
that provides the web socket upgrade negotiation for servers, and allows converting a web socket handler to arequest -> response promise
based http service that can be used byshuttle_http
.Example for a http service that serves web socket traffic on some requests:
open! Core open! Async open Shuttle_http let websocket_handler = Shuttle_websocket.create (fun ws -> let rd, wr = Websocket.pipes ws in Pipe.transfer_id rd wr) ;; let service context request = Log.Global.info "Peer address: %s" (Socket.Address.to_string (Server.peer_addr context)); match Request.path request with | "/websocket"-> websocket_handler request | "/" -> return (Response.create ~body:(Body.string "Hello World") `Ok) | _ -> return (Response.create `Not_found) ;;
The latest release has been published to the opam package repository, and can be installed via opam:
opam install shuttle_http opam install shuttle_websocket
- Git repository: https://git.sr.ht/~soni/shuttle_http
dream-html 1.0.0
Yawar Amin announced
Hi, dream-html 1.0.0 has been released to opam: https://ocaml.org/p/dream-html
- Repo: https://github.com/yawaramin/dream-html
- API docs: https://yawaramin.github.io/dream-html/dream-html/Dream_html/index.html
Dream-html is a library for generating HTML, closely integrated with Dream. It can be used as an alternative to Dream’s built-in Embedded ML templating language. Here’s the Dream home page example using dream-html:
let hello who = let open Dream_html in let open HTML in html [] [ body [] [ h1 [] [txt "Hello, %s!" who]]] let () = Dream.run @@ Dream.logger @@ Dream.router [Dream.get "/" (fun _ -> Dream_html.respond (hello "world"))]
In this release, I made a breaking change (hence major version bump) to group all HTML tags and attributes under the same
HTML
module, so only two ~open~s are needed to access all HTML functionality directly.
Another smaller improvement is more granular escaping of HTML text nodes and attribute values, following browser rules more closely. E.g. I’m no longer escaping
'
and "
in text nodes, and not escaping &
,
<
, >
in attribute values.
More details in the repo readme and documentation. Enjoy!
tiny_httpd 0.14
Simon Cruanes announced
Bonjour bonjour,
It’s with delectation that I announce the release of tiny_httpd 0.14. Tiny_httpd is a web server library that relies on threads[^1] to handle client connections. Overall Tiny_httpd aims at being self contained, reasonably simple, and performant enough for non-google scale.
This release brings a significant amount of improvements:
- a
Tiny_httpd_io
module provides extensible input and output channels (as records of functions) -
In terms of flexibility, a request handler can now choose to obtain an output channel (from the
Tiny_httpd_io
module above) into which to write the response’s body; as opposed to only being able to returning a string or a stream (as powerful but the inversion of control isn’t as easy to produce). This means that reading a file to return it can look like this:Tiny_httpd_server.(add_route_handler server Route.(exact "passwd" @/ return)) @@ fun _req -> (* stream the content of the file *) let write oc = let buf = Bytes.create 4096 in let ic = open_in "/etc/passwd" in Fun.protect ~finally:(fun () -> close_in_noerr ic) @@ fun () -> while let n = input ic buf 0 (Bytes.length buf) in if n > 0 then IO.Output.output oc buf 0 n; n > 0 do () done in let writer = IO.Writer.make ~write () in Response.make_writer @@ Ok writer)
- client address is passed to the handler throught the
Request.t
object; - performance was improved by setting
TCP_NODELAY
on the sockets and by adding a buffer pool to reduce memory churn; - some improvements to termination behavior were implemented by @VPhantom, so that the main thread waits for all connections to terminate before returning, and also handling signals better.
Full changelog here.
Cheerio!
kcas and kcas_data 0.6.1: STM and compositional lock-dree data structures
Vesa Karvonen announced
And the second part of the blog post Kcas: Building a lock-free STM for OCaml is now online as well.
If you have any feedback or questions on Kcas, I’m happy to discuss.
Feel free to ask here or on OCaml Discord, for example.
ppx_update 0.81
Yotam Barnoy announced
Hi guys. I wanted to let you know about ppx_update
. This is a
small utility ppx that rewrites some record update expressions to make them more efficient.
When updating the contents of a record functionally, one might want to avoid reallocating the record if the contents haven’t changed. Similarly, when updating the contents of a mutable field of a record, one might want to avoid the cost of the write barrier
in case the field content hasn’t changed. Instead of having to write error-prone code checking each field with physical equality,
ppx_update
does the added physical comparisons for you behind the scenes.
You can install ppx_update
via opam
and easily apply it to your code with
dune
.
Is a mutable project structure inherently slower?
Deep in this thread, Daniel Bünzli said
Btw. there are a few gameboy emulators written in OCaml you may want to check out what they do:
OCaml.org Newsletter: July 2023
Thibaut Mattio announced
Welcome to the July 2023 edition of the OCaml.org newsletter! As with the previous issues, this update has been compiled by @sabine and @tmattio.
Our goal is to make OCaml.org the best resource for anyone who wants to get started and be productive in OCaml. The OCaml.org newsletter provides an update of our progress towards that goal and an overview of changes we are working on.
We couldn’t do it without all the amazing OCaml community members who help us review, revise, and create better OCaml documentation. Your feedback enables us to better prioritise our work and make progress towards our goal. Thank you!
This month, our priorities were:
- Learn Area: We’re working towards making OCaml.org a great resource to learn OCaml and discover its ecosystem. This month, we continued writing the new documentation content and iterating on community feedback. We also finalised the Figma light desktop designs and started implementing the UI.
- JavaScript Toplevels: We started exploring how to generate JavaScript toplevels for OCaml packages, with the goal of allowing users to load packages into the OCaml Playground, and adding a new toplevel feature to the OCaml Packages area. Ultimately, we aim to make every code block on OCaml.org interactive!
- General Improvements: As usual, we also worked on general maintenance and improvements based on user feedback, and we’re highlighting some of our work.
In addition to our work on the site, we introduced new ways for the team to interact with the community. We’ve created an #ocaml.org Discord channel, and we started holding public OCaml.org dev meetings. Don’t hesitate to reach out to us on Discord and join the dev meetings. We’re always looking for new insights on things to improve!
Learn Area
- 1. Redesign of the Learn Area
As the designs for the new Learn area are nearing completion, we started implementing the UI. If you have visited the documentation in the past few weeks, you’ve probably noticed a few changes. The most prominent one being the new tabs to navigate the different parts of the documentation.
On the design front, our focus will now be directed to the mobile views and dark mode.
Relevant PRs and Activities:
- Continued work on
Figma UX/UI designs for the new Learn area:
- Finalised the light theme designs
- Created color variants and a color palette in Figma, aiming for consistency with Figma to the Tailwind configuration, and established naming conventions for light and dark mode colors.
- Designed various button variants on Figma, including Extra large, Large, Small, Large Ghost, Ghost, and Level tag styles.
- Started implementing new components for the Learn Area:
- Tab – ocaml/ocaml.org#1389
- Tutorial block – ocaml/ocaml.org#1387
- Language Manual banner – ocaml/ocaml.org#1406
- Skill level tag – ocaml/ocaml.org#1427
- Introduced new tabs to navigate the OCaml documentation by section – ocaml/ocaml.org#1429
- Continued work on
Figma UX/UI designs for the new Learn area:
- 2. OCaml Documentation
We also continued the work on the new documentation content. As we’ve been through the lifecycle of new pages a couple times, we’re getting more structured. Each new page goes through the following steps: Outline Approval, Drafting, Internal Review and, finally, Community Review. We have two new pages that are in the final stage (community review), namely the File Manipulation tutorial and Arrays guide. They should be ready to merge in the coming weeks. We also have a completely new Getting Started tutorial that aims to replace the existing “Your First Day with OCaml.” It’s currently in the internal review stage and should be shared on Discuss for community review soon.
Plus, we’ve got a lot more content in the drafting stage.
Stay tuned, as we’ll be sharing more and more new documentation pages for community review!
Relevant PRs and Activities:
- Created a tentative high-level outline and [meta-issue]((https://github.com/ocaml/ocaml.org/issues/1415)) to track our progress.
- Worked on the new documentation content
- File Manipulation (status: community review)
- New Arrays tutorial (status: community review)
- Tour of OCaml (status: internal review)
- S-Expressions tutorial (internal review)
- Maps and Sets guides (status: drafting)
- Basic Datatypes guide (status: drafting)
- Watched TheVimeagen “Learning OCaml Part 1” and “Learn OCaml Part 2”. Subsequently, made it clearer how to activate the opam switch on the install page – ocaml/ocaml.org#1390
- Incorporating feedback from reviews:
- Include @gmevel proof-reading of Seq tutorial ocaml/ocaml.org#1376
- Other documentation improvements
- Line edits on existing Labels tutorial ocaml.org#1040
- Moved the Error Handling guide from Language to the Guides section – ocaml.org#1383
- Converted example from LaTeX to markdown in the If Statements, Loops, and Recursion tutorial – ocaml.org#1439
- Replaced
dune build @runtest
bydune runtest
in the Running Executables and Tests with Dune tutorial – ocaml.org#1430
- 3. Preparing the Move of the opam Documentation to OCaml.org
The next step for the centralised package documentation is to serve the documentation of critical OCaml packages, including the OCaml manual and the Platform tools documentation. This requires a lot of work on
odoc
to remove the blockers that prevents project from moving from their current documentation generator toodoc
. As an intermediate step, we’ll be moving the opam documentation to OCaml.org’s Learn area, so we can retire the frontend of opam.ocaml.org and redirect all the trafic to ocaml.org.We’ve been working towards these goals this month. You can follow our progress on this PR.
Relevant PRs and Activities:
- ocaml/opam:
- Move opam documentation from opam.ocaml.org to ocaml.org – ocaml/ocaml.org#1367
- Convert man pages to Markdown with YAML header – ocaml/opam#5594
- Changing the Markdown files in
doc/pages
to be amenable for use on OCaml.org – ocaml/opam#5593
- ocaml-opam/opam2web:
- Rearrange
opam2web
to remove all package info, build only opam archive, keep public key, and create redirections from opam.ocamlorg to ocaml.org in a Caddyfile. Current WIP branch at https://github.com/sabine/opam2web/tree/strip_to_bare_minimum
- Rearrange
- ocaml/ocaml.org:
- Give Local Blogs a Page and RSS Feeds. This introduces the concept of a “blog hosted on OCaml.org.” This way, we can host the non-changelog posts of the opam blog in such a way that we can redirect
opam.ocaml.org/blog/feed.xml
toocaml.org/blog/opam/feed.xml
- Give Local Blogs a Page and RSS Feeds. This introduces the concept of a “blog hosted on OCaml.org.” This way, we can host the non-changelog posts of the opam blog in such a way that we can redirect
- ocaml/opam:
JavaScript Toplevels
Always with the aim to improve the learning experience, we’re exploring how to generate JavaScript toplevels for all the OCaml packages (the ones that are JavaScript-compatible, that is).
This would enable a few very neat new features:
- Loading OCaml packages from the OCaml Playground: to enable the use of any JavaScript-compatible package. This is very handy to share code snippets to beginners, which is currently limited to using the standard library.
- Toplevels for OCaml packages on the centralised documentation: to spawn a toplevel while navigating the documentation.
- Interactive toplevels for every code block: This includes the OCaml packages that contain code examples, but also every code block and exercices on the Learn area. You’d be able to run the code, edit it, run it again and inspect the result directly from the browser. Every documentation page becomes a Jupyter notebook!
We’re very excited at the possibilities this brings to improving the learning experience. Let us know what you think, and stay tuned for updates on our explorations!
Relevant PRs and Activities:
- Process
.cma
’s,.cmi
’s and toplevel.js
files – ocaml-doc/voodoo#114
General Improvements
This month, we’re welcoming no less than 4 new contributors:
- @contificate improved the OCaml Playground layout with @StonedHesus doing a review
- @just-max fixed an issue with code sharing on the OCaml Playground
- @AshineFoster updated the dev setup to be able to run the site without an internet connection.
- @theteachr contributed a typo fix to the homepage.
- @brandoncc contributed a typo fix to the First Day with OCaml tutorial
Thanks a lot to all the contributors this month! It’s lovely to see more and more people making contributions to the site!
Relevant PRs and Activities:
- OCaml Playground:
- @contificate resolved the layout problem of the playground’s bottom bar and thoroughly tested it in different browsers with a review from @StonedHesus – ocaml.org#1384
- Building the playground was challenging due to a script incompatibility with POSIX – ocaml.org#1456
- @just-max discovered and resolved an issue with Base64-encoded URLs generated by the Playground share button, ensuring backward compatibility – ocaml.org#1434
- OCaml.org package documentation:
- Voodoo output format was updated to list README/LICENSE/CHANGELOG as part of
status.json
– voodoo#68, ocaml.org#1435 - Voodoo now includes a
Voodoo_serialize
module for data serialisation and deserialisation – voodoo#103, ocaml.org#1442 - Compile step issues with documentation pipeline generation tool addressed – voodoo#115
- In case of missing documentation, users are now redirected to the last documented version – ocaml.org#1438
- Voodoo output format was updated to list README/LICENSE/CHANGELOG as part of
- Bug fixes and miscellaneous improvements:
- @AshineFoster made ocaml.org run offline during development – ocaml.org#1366
- OCaml Changelog is no longer experimental – ocaml.org#1369
- Resolved OCaml Changelog tags’ overflow issue – ocaml.org#1358
- Fixed unreadable components due to tailwind configuration changes – ocaml.org#1375, ocaml.org#1377, ocaml.org#1428
- Dark mode navigation’s logo color was corrected for mobile view – ocaml.org#1385
- Applied
odoc
’s styles to package documentation pages – ocaml.org#1378 - Improved CONTRIBUTING.md instructions – ocaml.org#1365
- Added a Be Sport social network success story – ocaml.org#1362
- Published “Invitation to Contribute to OCaml.org” news entry – ocaml.org#1363
- URLs in the
data/
folder are now routinely checked bytarides/olinkcheck
.
Using [@poll error]
attribute to implement systhread safe data structures
Vesa Karvonen announced
For a long time OCaml has supported lightweight threads exposed via the Thread module. These threads are often called “systhreads”, but I will simply call them “threads” in this post.
The OCaml Stdlib also provides many mutable data structures such as Hashtbl, Queue, and Stack. As the documentation alerts, none of these are thread-safe.
In this post I will very briefly describe an approach to implementing lock-free thread-safe data structures.
In OCaml 4 and in OCaml 5, within a single Domain, only a single thread may run at a time. In other words, threads do not run in parallel except when they run in different domains in OCaml 5. The OCaml runtime schedules threads and semi pre-emptively switches (within a domain) between threads (created within the domain) during “safe points”. In other words, thread switches cannot happen at arbitrary points — they may only happen at safe points. Memory allocations are safe points. Additional safe points (where no actual memory allocation happens) are inserted into various constructs such as loops.
This means that within a block of code where there are no safe points it is possible to make multiple read and write accesses atomically with respect to threads (within the domain).
How does one ensure that a block of code does not include a safe point?
The OCaml compiler provides an annotation
[@poll error]
that one can use on a function to ensure that the function does not include a safe point.
IOW, using [@poll error]
one can essentially create functions that are executed atomically with respect to threads (within a domain).
With a particular application in mind, I have created a lock-free thread-safe (integer keyed) hash table, thread-table.
As mentioned in the README, the implementation has “zero synchronization overhead on lookups”. Indeed, if you look at the
find
operation implementation
let find t k' = let h = Mix.int k' in let buckets = t.buckets in let n = Array.length buckets in let i = h land (n - 1) in find k' (Array.unsafe_get buckets i)
it includes no synchronization. In this case not even a [@poll error]
attribute is needed.
For other operations the thread-table implementation uses functions annotated with the
[@poll error]
attribute (to make atomic updates) and familiar lock-free programming patterns such as retry loops and cooperation to avoid starvation. As an example, see the
add
operation implementation:
let[@poll error] add_atomically t buckets n i before after = t.rehash = 0 && buckets == t.buckets && before == Array.unsafe_get buckets i && begin Array.unsafe_set buckets i after; let length = t.length + 1 in t.length <- length; if n < length && n < max_buckets_div_2 then t.rehash <- n * 2; true end let rec add t k' v' = let h = Mix.int k' in maybe_rehash t; let buckets = t.buckets in let n = Array.length buckets in let i = h land (n - 1) in let before = Array.unsafe_get buckets i in let after = Cons (k', v', before) in if not (add_atomically t buckets n i before after) then add t k' v'
Compared to e.g. using a Stdlib
Mutex
to protect a data structure against concurrent accesses by threads, this sort of lock-free implementation can give better performance (especially for read-only operations) and also allows use of the operations in contexts, such as signal
handlers, where locks are not appropriate.
Note that this technique is not sufficient for parallelism-safe implementation of data structures.
Guillaume Munch-Maccagnoni said
Thanks for the write-up! I do not remember someone writing about this before.
This trick is used in JaneStreet’s
Nano_mutex and
Thread_safe_queue. [@poll error]
was in fact motivated by these use-cases (and I am surprised not to see them used in the latest version of JaneStreet’s libraries).
As you note, with multicore OCaml, these data structures should never be shared between different domains, but the technique remains valid and useful for data structures designed to stay on a single domain.
Be careful that [@poll error]
is a recent addition (OCaml 4.14). Earlier version of OCaml require attributes to
disable inlining (among other things), to avoid that polling points could be added during compilation via code transformations.
[@poll error]
has the correct semantics in this regard in OCaml 4.14, but earlier OCaml versions will disregard the attribute and potentially produce incorrect code in the absence of additional attributes. Also, I would not recommend trying to
do without the [@poll error]
attribute, because this is error-prone and requires knowledge of the compiler.
[@poll error]
is also inoperant in bytecode (which is trickier because it has more polling locations).
Lastly, it should be noted that [@poll error]
is very inexpressive in the kind of code that it accepts. The reasoning-about-polling-locations trick is also used in parts of the stdlib, for which
[@poll error]
is not expressive enough. I proposed a more expressive attribute to handle those cases, but it was not accepted. There is also a proposal to delay the polling with “masking” during critical sections, at runtime (hence even more expressive).
Calascibetta Romain asked and Vesa Karvonen replied
As far as I understand, the usage of
[poll error]
starts to be interesting when we start to use a mix ofThread
andDomain
?
Yes, and also when using only Thread
s (and no Domain
s).
One might ask why one would use Thread
s when we have Domain
s and effects?
My comment here hopefully provides some ideas where threads could still be very useful. The tl;dr is that threads could be used to allow effects based schedulers to effectively share domains and threads could also be used, in part, to e.g. implement IO in such a way that it becomes scheduler independent. If we do use threads, then it will likely be very useful to be able to implement communication between threads within a domain with as little synchronization as possible.
For instance, if we allocate only
Domain
s, the usage of anHashtbl
into one (and uniquely one, theHashtbl
is not shared betweenDomain
s) is “safe”? Moreover,Mutex
still is the best practice (regardlessDomain
orThread
) to protect anHashtbl
against data-race conditon?
If you mean the Stdlib Hashtbl
, then, yes, it is neither thread-safe nor parallelism-safe and one will need to e.g. use a
Mutex
to protect it when it might be accessed from multiple threads concurrently or from multiple domains in parallel.
As another currently available alternative, the
Kcas library comes with a companion package of lock-free and parallelism-safe (and also thread-safe) data structures including a
Hashtbl
implementation that is designed to be an almost drop-in replacement for the Stdlib
Hashtbl
. When used in parallel from multiple domains it should provide better performance than Stdlib
Hashtbl
protected by a Mutex
. It is also
composable (read from “But why should you care about composability?”), which can make the implementation of more interesting use cases a breeze compared to the use of non-composable concurrent programming techniques.
Calascibetta Romain asked and Vesa Karvonen replied
Did you consider the idea to integrate that directly into the Stdlib’s
Hashtbl
?
Yes, and not really.
The API of Stdlib Hashtbl
is not designed for concurrent programming. In sequential use cases it will be faster than any parallelism safe implementation that supports the full API.
The reason for mimicking the Stdlib Hashtbl
API in Kcas is to allow for easier learning curve and to potentially make it easier to port an existing application to parallelism-safe OCaml 5.
In the future I expect there will be data structures that are designed from the start for concurrent programming and will e.g. avoid or de-emphasize features with inherent sequential bottlenecks (such as maintenance of exact length) and operations with inherent risk of starvation (such as being able to (atomically) insert an arbitrary number of elements to a data structure) and provide fused operations that support common use cases (such as get-or-add).
forester 2.3
Jon Sterling announced
I would like to announce the release on opam of forester 2.3. is an OCaml utility to develop “Forests”, which are densely interlinked scientific websites / Zettelkästen similar to the Stacks Project or Kerodon. An example of a “Forest” is my own website.
This is a major release involving changes to the command line interface, among other things. Please see the full changelog for a detailed description of the changes. Below I give a brief summary:
- The existing behavior of the
forester
command is now located underforester build
. - A new
forester new
command to create the “next” tree under the base-36 tree addressing scheme. - A new
forester complete
command for completing tree titles, to facilitate tool support. - Rudimentary support for emitting XML attributes.
- Subdirectories of input directories will now be traversed automatically; note that the tree address model remains flat, and subdirectories are present only for convenience.
- Added a nicer command line interface with
--help
documentation. - I have migrated much of the system code to use the experimental Eio library for improved portability.
- The example forest has been removed from the main repository, and moved into a separate template repository.
My thanks to Armaël Guéneau, Riley Shahar, and Masanori Ogino for their contributions of code and ideas that made it into this release.
Old CWN
If you happen to miss a CWN, you can send me a message and I’ll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.