Strange behaviour of string_of

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* Strange behaviour of string_of_float
@ 2008-06-22 16:56 Paolo Donadeo
  2008-06-22 19:58 ` [Caml-list] " Richard Jones
  2008-06-22 20:32 ` Daniel Bünzli
  0 siblings, 2 replies; 15+ messages in thread
From: Paolo Donadeo @ 2008-06-22 16:56 UTC (permalink / raw)
  To: caml-list caml-list

Today I noticed this strange behaviour of string_of_float:

Let's start with:

# let pi = 4.0 *. atan 1.0;;
val pi : float = 3.14159265358979312
# let (|>) x f = f x;;
val ( |> ) : 'a -> ('a -> 'b) -> 'b = <fun>

Ok, I want to serialize pi:

# (pi |> string_of_float |> float_of_string) -. pi;;
- : float = 2.06945571790129179e-13

string_of_float is not the inverse of float_of_string, at least in this example.

Is this correct? It's not a problem at all, I used this workaround:

# let my_string_of_float = Printf.sprintf "%.1000g";;
val my_string_of_float : float -> string = <fun>

# (pi |> my_string_of_float |> float_of_string) -. pi;;
- : float = 0.


-- 
Paolo
~
~
:wq


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 16:56 Strange behaviour of string_of_float Paolo Donadeo
@ 2008-06-22 19:58 ` Richard Jones
  2008-06-22 20:45   ` Paolo Donadeo
  2008-06-23  8:35   ` Jon Harrop
  2008-06-22 20:32 ` Daniel Bünzli
  1 sibling, 2 replies; 15+ messages in thread
From: Richard Jones @ 2008-06-22 19:58 UTC (permalink / raw)
  To: Paolo Donadeo; +Cc: caml-list caml-list

On Sun, Jun 22, 2008 at 06:56:22PM +0200, Paolo Donadeo wrote:
> string_of_float is not the inverse of float_of_string, at least in
> this example.

Yes, you wouldn't expect it to be, because the string is an
approximate base 10 representation of the float (which is itself only
an approximate base 2 representation of the transcendental number pi).
You might want to read a presentation called "What every computer
programmer should know about floating point arithmetic".  There's a
PDF version here:

http://blogs.sun.com/darcy/resource/Wecpskafpa-ACCU.pdf

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 19:58 ` [Caml-list] " Richard Jones
@ 2008-06-22 20:45   ` Paolo Donadeo
  2008-06-23  1:25     ` Brian Hurt
  2008-06-23  8:32     ` Mattias Engdegård
  2008-06-23  8:35   ` Jon Harrop
  1 sibling, 2 replies; 15+ messages in thread
From: Paolo Donadeo @ 2008-06-22 20:45 UTC (permalink / raw)
  To: caml-list caml-list

I know what a float number is from my numerical analysis course :-).

In any case, what is the suggested way to serialize/deserialize a
float number in OCaml? The Sexplib, for example, suffers the same
problem of the string_of_float function:.

My intent is to extract an ASCII representation of an OCaml float
value so that it can be used to recreate *exactly* the same value, at
least on the same architecture.

-- 
Paolo
~
~
:wq

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 20:45   ` Paolo Donadeo
@ 2008-06-23  1:25     ` Brian Hurt
  2008-06-23  7:50       ` Paolo Donadeo
  2008-06-23  8:32     ` Mattias Engdegård
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Hurt @ 2008-06-23  1:25 UTC (permalink / raw)
  To: Paolo Donadeo; +Cc: caml-list caml-list



On Sun, 22 Jun 2008, Paolo Donadeo wrote:

> I know what a float number is from my numerical analysis course :-).
>
> In any case, what is the suggested way to serialize/deserialize a
> float number in OCaml? The Sexplib, for example, suffers the same
> problem of the string_of_float function:.
>
> My intent is to extract an ASCII representation of an OCaml float
> value so that it can be used to recreate *exactly* the same value, at
> least on the same architecture.
>

Code something like this should work:

let encode_float x =
     match (classify_float x) with
     | FP_zero -> if (x = -0.0) then "-0.0" else "0.0"
     | FP_infinite -> if (x = neg_infinity) then "-INF" else "INF"
     | FP_nan -> "NaN"
     | _ ->
         let s = x < 0.0 in
         let x = abs_float x in
         let frac, exp = frexp x in
         let frac = frac *. 268435456.0 in (* 2^28 *)
         let i1 = int_of_float frac in
         let i2 = int_of_float ((frac -. (floor frac)) *. 268435456.0) in
         let exp = exp - 56 in
         let s2 = exp < 0 in
         let exp = if exp < 0 then -exp else exp in
         Printf.sprintf "%c%07X%07XX%c%X" (if s then '-' else '+') i1 i2
             (if s2 then '-' else '+') exp
;;


I'll leave the decode to you- it should be obvious, once you discover the 
ldexp function.

Brian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-23  1:25     ` Brian Hurt
@ 2008-06-23  7:50       ` Paolo Donadeo
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Donadeo @ 2008-06-23  7:50 UTC (permalink / raw)
  To: caml-list caml-list

> Code something like this should work:

Thanks, this is even better. I should pay more attention to the
Pervasives API: I never noticed frexp and ldexp.


-- 
Paolo
~
~
:wq


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 20:45   ` Paolo Donadeo
  2008-06-23  1:25     ` Brian Hurt
@ 2008-06-23  8:32     ` Mattias Engdegård
  2008-06-23  8:50       ` Olivier Andrieu
  1 sibling, 1 reply; 15+ messages in thread
From: Mattias Engdegård @ 2008-06-23  8:32 UTC (permalink / raw)
  To: p.donadeo; +Cc: caml-list

>My intent is to extract an ASCII representation of an OCaml float
>value so that it can be used to recreate *exactly* the same value, at
>least on the same architecture.

A somewhat more portable (and readable, maybe) representation of
floating-point numbers is in hex (a la C99). It is independent of the
precision and binary format used. Unfortunately, ocaml's Printf has
already appropriated %a for a different purpose, but it remains a good
option for those willing to do some manual work.

I have used it in the past to good effect in text-based interchange
formats between applications written in C.

Of course the decimal notation can unambiguously represent any
(binary) floating-point number, so that representation is fine if you
have confidence in the output and reading routines. See, for instance,
William Clinger's _How to Read Floating Point Numbers Accurately_
(http://ftp.ccs.neu.edu/pub/people/will/retrospective.pdf).
But decimal handling will always be a little slower.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-23  8:32     ` Mattias Engdegård
@ 2008-06-23  8:50       ` Olivier Andrieu
  0 siblings, 0 replies; 15+ messages in thread
From: Olivier Andrieu @ 2008-06-23  8:50 UTC (permalink / raw)
  To: p.donadeo, Mattias Engdegård, caml-list

On Mon, Jun 23, 2008 at 10:32, Mattias Engdegård <mattias@virtutech.se> wrote:
>>My intent is to extract an ASCII representation of an OCaml float
>>value so that it can be used to recreate *exactly* the same value, at
>>least on the same architecture.
>
> A somewhat more portable (and readable, maybe) representation of
> floating-point numbers is in hex (a la C99). It is independent of the
> precision and binary format used. Unfortunately, ocaml's Printf has
> already appropriated %a for a different purpose, but it remains a good
> option for those willing to do some manual work.
>
> I have used it in the past to good effect in text-based interchange
> formats between applications written in C.

Indeed, that's a good solution. It's possible to use this %a
conversion directly, without  writing external C code:

  (* this external is in pervasives.ml *)
  external format_float : string -> float -> string = "caml_format_float"
  let hex_string_of_float f =
    format_float "%a" f

# hex_string_of_float pi ;;
- : string = "0x1.921fb54442d18p+1"


Mind that this only works if the underlying C library knows how to
handle this C99 conversion specifier (MSVC6 doesn't for instance).

-- 
  Olivier


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 19:58 ` [Caml-list] " Richard Jones
  2008-06-22 20:45   ` Paolo Donadeo
@ 2008-06-23  8:35   ` Jon Harrop
  1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2008-06-23  8:35 UTC (permalink / raw)
  To: caml-list

On Sunday 22 June 2008 20:58:31 Richard Jones wrote:
> On Sun, Jun 22, 2008 at 06:56:22PM +0200, Paolo Donadeo wrote:
> > string_of_float is not the inverse of float_of_string, at least in
> > this example.
>
> Yes, you wouldn't expect it to be, because the string is an
> approximate base 10 representation of the float...

That is not true. All finite floats have exact finite decimal representations. 
So it is perfectly reasonable to expect the conversions to recover the 
original number exactly.

As Paolo has shown, OCaml's current string_of_float function is approximate. 
The accuracy of this routine is unspecified but a quick test indicates that 
it is simply printing too few digits to be exact:

# string_of_float pi;;
- : string = "3.14159265359"

Fortunately, you can ask sprintf to generate a sufficiently accurate result:

# open Printf;;
# sprintf "%0.17g" pi;;
- : string = "3.1415926535897931"

The float_of_string function does then recover the number exactly in this 
case:

# float_of_string "3.1415926535897931" -. pi;;
- : float = 0.

Also, you should keep in mind in this context that calculations may be done 
with 80-bit float arithmetic in registers or truncated to 64-bits when stored 
to memory. Moreover, OCaml's bytecode and native code targets can behave 
differently in this context. I do not believe that is a problem with Paolo's 
code here though.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 16:56 Strange behaviour of string_of_float Paolo Donadeo
  2008-06-22 19:58 ` [Caml-list] " Richard Jones
@ 2008-06-22 20:32 ` Daniel Bünzli
  2008-06-22 20:50   ` Paolo Donadeo
  2008-06-23  1:06   ` Brian Hurt
  1 sibling, 2 replies; 15+ messages in thread
From: Daniel Bünzli @ 2008-06-22 20:32 UTC (permalink / raw)
  To: caml-list caml-list

Richard gave you the reason.

If you can serialize to binary, you can acheive what you want by  
serializing the 64 bits integers you get with Int64.bits_of_float and  
applying Int64.float_of_bits to the integers you deserialize.

Best,

Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 20:32 ` Daniel Bünzli
@ 2008-06-22 20:50   ` Paolo Donadeo
  2008-06-23  8:45     ` David Allsopp
  2008-06-23  1:06   ` Brian Hurt
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Donadeo @ 2008-06-22 20:50 UTC (permalink / raw)
  To: caml-list caml-list

> If you can serialize to binary, you can acheive what you want by serializing
> the 64 bits integers you get with Int64.bits_of_float and applying
> Int64.float_of_bits to the integers you deserialize.

Just posted a useless message :-)

This is *exactly* what I was searching for, thanks Daniel.


-- 
Paolo
~
~
:wq


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 20:50   ` Paolo Donadeo
@ 2008-06-23  8:45     ` David Allsopp
  2008-06-23  8:55       ` Olivier Andrieu
  0 siblings, 1 reply; 15+ messages in thread
From: David Allsopp @ 2008-06-23  8:45 UTC (permalink / raw)
  To: 'caml-list caml-list'

> > Richard gave you the reason.

Erm, please correct me if I'm wrong but every single possible floating point
value (on the same architecture) has a string representation that will be
reparsed to the same floating point value (on the same architecture). It's
the reverse that isn't true because floating point numbers are only an
approximation. 

> > If you can serialize to binary, you can acheive what you want by
> > serializing the 64 bits integers you get with Int64.bits_of_float and
> > applying Int64.float_of_bits to the integers you deserialize.
>
> Just posted a useless message :-)
>
> This is *exactly* what I was searching for, thanks Daniel.

This is of course a better, more reliable and faster way of serialising, but
the real cause for your original spurious result is down to how
string_of_float is defined in pervasives.ml:

# pi;;
- : float = 3.1415926535897931

# string_of_float pi;;
- : string = "3.14159265359"

In other words, (pi |> string_of_float |> float_of_string) is never going to
be equal to your original pi. For some reason, string_of_float is defined
as:

let string_of_float f = valid_float_lexem (format_float "%.12g" f);;

Perhaps Xavier can say why it's only "%.12g" in the format (I imagine
there's a historical reason) but if you increase it to 16 then you'll get
the answer you expected (0.). All that said, the values given by
string_of_float cannot always be fed back to float_of_string anyway (e.g.
float_of_string (string_of_float nan))

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-23  8:45     ` David Allsopp
@ 2008-06-23  8:55       ` Olivier Andrieu
  2008-06-23 12:06         ` David Allsopp
  0 siblings, 1 reply; 15+ messages in thread
From: Olivier Andrieu @ 2008-06-23  8:55 UTC (permalink / raw)
  To: David Allsopp; +Cc: caml-list caml-list

On Mon, Jun 23, 2008 at 10:45, David Allsopp <dra-news@metastack.com> wrote:
> All that said, the values given by
> string_of_float cannot always be fed back to float_of_string anyway (e.g.
> float_of_string (string_of_float nan))

euh, why do you say that ? it does :

  # float_of_string (string_of_float nan) ;;
  - : float = nan

float_of_string is basically strtod which should correctly handle nan and inf.

-- 
  Olivier


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Caml-list] Strange behaviour of string_of_float
  2008-06-23  8:55       ` Olivier Andrieu
@ 2008-06-23 12:06         ` David Allsopp
  0 siblings, 0 replies; 15+ messages in thread
From: David Allsopp @ 2008-06-23 12:06 UTC (permalink / raw)
  To: 'caml-list caml-list'

> > All that said, the values given by
> > string_of_float cannot always be fed back to float_of_string anyway
> > (e.g. float_of_string (string_of_float nan))
>
> euh, why do you say that ? it does :
>
>   # float_of_string (string_of_float nan) ;;
>   - : float = nan

Because:


        Objective Caml version 3.09.3

# float_of_string (string_of_float nan);;
Exception: Failure "float_of_string".


but this is clearly fixed in 3.10!


David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-22 20:32 ` Daniel Bünzli
  2008-06-22 20:50   ` Paolo Donadeo
@ 2008-06-23  1:06   ` Brian Hurt
  2008-06-23  7:58     ` Xavier Leroy
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Hurt @ 2008-06-23  1:06 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 1286 bytes --]

On Sun, 22 Jun 2008, Daniel Bünzli wrote:

> Richard gave you the reason.
>
> If you can serialize to binary, you can acheive what you want by serializing 
> the 64 bits integers you get with Int64.bits_of_float and applying 
> Int64.float_of_bits to the integers you deserialize.

Actually, for serialization, I strongly reccommend first using 
classify_float to split off and handle NaNs, Infinities, etc., then using 
frexp to split the float into a fraction and exponent.  The exponent is 
just an int, and the fractional part can be multiplied by, say, 2^56 and 
then converted into an integer.

The advantage of doing things this way, despite it being slightly more 
complicated, is two fold: one, it gaurentees you the ability to recovery 
the exact binary value of the float, and two, it sidesteps a huge number 
of compatibility issues between architectures- IIRC, IEEE 754 specifies 
how many bits have to be used to represent each part of the float, but not 
where they have to be in the word.  Also, if you use hexadecimal for 
saving the integers, this can actually be faster than converting to 
base-10, as conversion to base-10 isn't cheap.  It's a couple of more 
branches, but a lot of divs and mods get turned into shifts and ands.

Brian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Strange behaviour of string_of_float
  2008-06-23  1:06   ` Brian Hurt
@ 2008-06-23  7:58     ` Xavier Leroy
  0 siblings, 0 replies; 15+ messages in thread
From: Xavier Leroy @ 2008-06-23  7:58 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Daniel Bünzli, caml-list caml-list

>> If you can serialize to binary, you can acheive what you want by
>> serializing the 64 bits integers you get with Int64.bits_of_float and
>> applying Int64.float_of_bits to the integers you deserialize.
>
> Actually, for serialization, I strongly reccommend first using
> classify_float to split off and handle NaNs, Infinities, etc., then
> using frexp to split the float into a fraction and exponent.  The
> exponent is just an int, and the fractional part can be multiplied by,
> say, 2^56 and then converted into an integer.
>
> The advantage of doing things this way, despite it being slightly more
> complicated, is two fold: one, it gaurentees you the ability to recovery
> the exact binary value of the float, and two, it sidesteps a huge number
> of compatibility issues between architectures- IIRC, IEEE 754 specifies
> how many bits have to be used to represent each part of the float, but
> not where they have to be in the word.

The only architecture I know where this problem could occur is the old
(pre-EABI) ABI for ARM, which has "mixed-endian" floats.  But the
implementation of Int64.{bits_of_float,float_of_bits} goes to some
length to rearrange bits as expected, i.e. with the sign bit in the
most significant bit of the int64, followed by the exponent bits,
followed by the mantissa bits in the least significant bits of the
int64.

So, the case analysis on the float that Brian suggests is a bit of an
overkill, and I strongly suggest using the result of
Int64.bits_of_float as the exact, serializable representation of a
Caml float.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-06-23 12:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-22 16:56 Strange behaviour of string_of_float Paolo Donadeo
2008-06-22 19:58 ` [Caml-list] " Richard Jones
2008-06-22 20:45   ` Paolo Donadeo
2008-06-23  1:25     ` Brian Hurt
2008-06-23  7:50       ` Paolo Donadeo
2008-06-23  8:32     ` Mattias Engdegård
2008-06-23  8:50       ` Olivier Andrieu
2008-06-23  8:35   ` Jon Harrop
2008-06-22 20:32 ` Daniel Bünzli
2008-06-22 20:50   ` Paolo Donadeo
2008-06-23  8:45     ` David Allsopp
2008-06-23  8:55       ` Olivier Andrieu
2008-06-23 12:06         ` David Allsopp
2008-06-23  1:06   ` Brian Hurt
2008-06-23  7:58     ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox