From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id D3A957EEE0 for ; Sat, 7 Mar 2015 07:21:27 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of kennethadammiller@gmail.com) identity=pra; client-ip=209.85.218.51; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="kennethadammiller@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of kennethadammiller@gmail.com designates 209.85.218.51 as permitted sender) identity=mailfrom; client-ip=209.85.218.51; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="kennethadammiller@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-oi0-f51.google.com) identity=helo; client-ip=209.85.218.51; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="postmaster@mail-oi0-f51.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0D2AAC/l/pUmzPaVdFcg1haBIMGvmUBC4VuAoEtB00BAQEBAQEQAQEBAQEGCwsJFC6EDwEBAQMBEhEdARsRDQMBCwYFCwcGKgICIQEBEQEFAQ4OBhMih3gBAwkIDadOPjGLLoFrgnePUgoZJw1UhGABAQgBAQEBARcBBQ6KCn+CRB2BKxEBTAuCLTsSgTEFhXWEcIkFghiCCoFIgRo5gm2CVIV9FTeCUIF4EiOBDAmCIh+BbiAxgQuBOAEBAQ X-IPAS-Result: A0D2AAC/l/pUmzPaVdFcg1haBIMGvmUBC4VuAoEtB00BAQEBAQEQAQEBAQEGCwsJFC6EDwEBAQMBEhEdARsRDQMBCwYFCwcGKgICIQEBEQEFAQ4OBhMih3gBAwkIDadOPjGLLoFrgnePUgoZJw1UhGABAQgBAQEBARcBBQ6KCn+CRB2BKxEBTAuCLTsSgTEFhXWEcIkFghiCCoFIgRo5gm2CVIV9FTeCUIF4EiOBDAmCIh+BbiAxgQuBOAEBAQ X-IronPort-AV: E=Sophos;i="5.11,357,1422918000"; d="scan'208";a="124813835" Received: from mail-oi0-f51.google.com ([209.85.218.51]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 07 Mar 2015 07:21:26 +0100 Received: by oiba3 with SMTP id a3so21300019oib.7 for ; Fri, 06 Mar 2015 22:21:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=BlKRiy5kjdK+BzYWfrjiBUy1kCX12QHIM0fqcuCRj5o=; b=UgF7/N/DFpyuEt/ia11ZUmRHBDe5Hy2ETkam2B2wD5ihP+IUNFuEZvoncJqn9WxIUb kjSsZCcAXuPW1d68Ke9AJENQAlF38+OI/f8TaTTcamiU1cpP486AC1rhY5lNi7jUT1DQ UDFjTR+kBmMjdxQmKRRiRnxipptojI1RqHZkTstB6/3s1nGDoM98ftf0/JLsLGhNJvxY JQVccwAEo90wvphWpJCywF5HTg4+6/gjhgvpelMJg01/ndtD7RLN4RLaEnjCqVekKNIG yvttgqmWQZLRigKCk7j9y6ZUEwHAC48cxKR+3DIhgpv3a1d3pc6WZcAYofePQ9kj34Xl QbWQ== MIME-Version: 1.0 X-Received: by 10.182.22.137 with SMTP id d9mr13831338obf.67.1425709285081; Fri, 06 Mar 2015 22:21:25 -0800 (PST) Received: by 10.202.0.211 with HTTP; Fri, 6 Mar 2015 22:21:25 -0800 (PST) In-Reply-To: <9C7D03E3-FC7C-4C09-92FA-232731E53263@ieee.org> References: <9C7D03E3-FC7C-4C09-92FA-232731E53263@ieee.org> Date: Sat, 7 Mar 2015 01:21:25 -0500 Message-ID: From: Kenneth Adam Miller To: caml users Content-Type: multipart/alternative; boundary=001a11c2e5587d8b3d0510accf3b Subject: Re: [Caml-list] Error with and Proper Library Usage --001a11c2e5587d8b3d0510accf3b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Epic! Library author! Were you influenced by recent research from UTD concerning high accuracy disassembly using machine learning techniques for distinguishing data & code? On Sat, Mar 7, 2015 at 12:56 AM, Ivan Gotovchits wrote: > > > On Mar 7, 2015, at 12:28 AM, Kenneth Adam Miller < > kennethadammiller@gmail.com> wrote: > > So, I want to use CMU's BAP to do some internal processing for a task that > I have been assigned. One of the pertinent parts is transforming assembler > representations of CPU instructions into the BAP Intermediate Language, or > BIL. > > > BAP is more about disassembling. So it can easily lift binary string into > the BIL. But if you have assembly you need to compile it into machine code > first. So, you need to find assembler. (For example, you can use `llvm-mc` > from llvm toolkit) > > > Right, but shouldn't "\x90" or "\xc3" be interpreted by the toplevel as hex in order that it also be disassembled properly? Our current use case is primitive, but it involves passing each instruction individually by command line to a binary to be transformed to BIL. > It's kind of difficult, because there's only so much documentation that is > really anything more than just the MLI interface and the OCaml Doc > generated stuff. I have a lot of questions about how to proceed, but befo= re > I begin eliciting the problem and all, let me explain about how I got whe= re > I am. > > > Yes, unfortunately that=E2=80=99s true. First off all BAP is currently un= der > active development. Next, there is no properly working doc generator for > OCaml right now, that will handle complex project containing many modules > and sub libraries. I=E2=80=99m looking at odoc with a great hope. > > No loss, I'm happy to dive in with BAP any way it goes documentation just makes things easier. Ok, possibly I could do some write ups about what I learn and pass them back or something to help? > > > > You can install BAP through opam, but you don't get the documentation I > don't think. So, > > git clone github.com/BinaryAnalysisPlatform/bap/ > > and then just follow the instructions on how to build it, it's not hard at > all, I got it going on Ubuntu 14.04. The only thing I ran into was an err= or > on a llvm dependency, which required that I edit the opam file so that I = do > "--with-llvm-version=3D3.4" on the configure command line as an option. A= fter > that everything ran smoothly. > > > Actually when you install BAP with opam you will have the documentation > installed also. It is automatically installed at `~/.opam/???/doc/bap`. Y= ou > can query the path to the documentation with the following command: > > opam config var gap:doc > > Yeah, I didn't exactly research the directions because writing the 2 day logging was a pell mell effort. > Moreover, we provide a compiled API documentation on github pages. I will > update the main site with the link. > Also, you may find this [1] page interesting. > > [1]: > https://github.com/BinaryAnalysisPlatform/bap/wiki/Build-tips-and-tricks > > > Cool :) I'm just happy I got it built on my machine. BAP is looking really decent, several of the things I had sitting on a private machine and had thought about giving back already I see done very well. BAP as a service (ZMQ too!), dependency segregation, baptop, OCamlMakefile elminination (only even wished for this one). > Once you run bapbuild and make and all that=E2=80=A6. > > > You don=E2=80=99t need to run bapbuild at all, this is not a tool to buil= d BAP, > this is a tool to build applications and plugins that use BAP. > > Opening up the index file at _build/bap.docdir/index.html, you can see > that the documentation starts off with a note about using Bap.Std as > everything else is interface files. What confused me is the seeming > repetition of the documentation that is generated. It seems that some of > the documentation on some of the very same pages is duplicated for certain > sections. Why does it do so much duplication? > > > Thats how ocamldoc works. Actually, the auto generated documentation is of > very low quality. I personally suggest you to setup your Emacs environmen= t, > with merlin and everything else. Then you can navigate through the proje= ct > using `C-c C-l` (jump to definition). Look here [2] for instructions about > how to configure Emacs > > [2]: https://github.com/BinaryAnalysisPlatform/bap/wiki/Emacs > > > First, using the toplevel I tried to construct a BIL set of statements. > But the way the code works, you actually have to compose a disassembler > that is specific to your architecture (x32/64 and ARM vs Intel or > whatever). You then have to construct memory, and from that memory > construct an Insn type, which is meant to be the canonical, cross > disassembler type representation of an instruction. I can see how module > use makes for great reusability of code. Problem is, the type definitions > that the toplevel reports (baptop) and those of which are reported in the > documentation seem to differ often. TL;DR here, I tried to get as close to > the front page mention of how to use module Disasm, which meant > Disasm.insn_at_mem function, but I had a hard time navigating the modules > to create what I wanted. It seems like each one thing depends on some oth= er > portion of the library, and at one point I hit a dead end. The > documentation mentions the same functions being exposed copiously, but > that's when the type definitions wouldn't match up or something. > > > I=E2=80=99m not sure that I understand you correctly. If you have just by= tes, the > use function `disassemble` that accepts memory and arch. You can use > `Memory.create` to make memory, and `Bigstring.of_string` to create a > bigstring of string/ > > > I re-read the documentation on the front page and went back to that, as per my most recent email. :) > Lastly, and ultimately even more confusing is that of bap_mc.ml, which I > saw as my second easiest avenue for usage of the BAP library. I saw > bap_mc.ml line 55 as my chance; > > > https://github.com/BinaryAnalysisPlatform/bap/blob/master/src/bap_mc/bap_= mc.ml#L55 > > If I just were to modify it so that it, instead of watering down the > string constructed, were to just pipe the insn object to a BIL constructo= r, > and then use the sexp_of_bil transformer, then I could just drop it from > there to be printed or converted to string and then printed. > > Naturally, I tried with several different module's bil constructor. But > most notably I think that the Std bil constructor blew up, so here's what= I > replaced that line with: > > > Oh, please, don=E2=80=99t use bap_mc as an example, as it is very low lev= el. It is > intended for debugging the underlying disassembly and uses very low-level > interface, with lots of hard to understand phantom types. Please, try to > stay with convenient Disasm module. > > > Perhaps there are subtleties to the module language that I'm missing, and my lack of understanding how the modules fit together combined with an overlapping module language vernacular would be why it's more difficult. Ok sure, after interacting more with the meant interface, I can see a bit better how BAP is intended to be used. --001a11c2e5587d8b3d0510accf3b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Epic! Library author!

Were you influenced by recent research from UTD concerning high accura= cy disassembly using machine learning techniques for distinguishing data &a= mp; code?=C2=A0

On Sat, Mar 7, 2015 at 12:56 AM, Ivan Gotovchits <= ivg@ieee.org> wrote:


On Mar 7, 2015, at 12:28= AM, Kenneth Adam Miller <kennethadammiller@gmail.com> wrote:

So, I want to use CMU's BAP= to do some internal processing for a task that I have been assigned. One o= f the pertinent parts is transforming assembler representations of CPU inst= ructions into the BAP Intermediate Language, or BIL.=C2=A0

BAP is more about disassembling. So it can eas= ily lift binary string into the BIL. But if you have assembly you need to c= ompile it into machine code first. So, you need to find assembler. (For exa= mple, you can use `llvm-mc` from llvm toolkit)=C2=A0




Right, but shouldn't "\x90" or "\xc3&qu= ot; be interpreted by the toplevel as hex in order that it also be disassem= bled properly? Our current use case is primitive, but it involves passing e= ach instruction individually by command line to a binary to be transformed = to BIL.
= It's kind of difficult, because there's only so much documentation = that is really anything more than just the MLI interface and the OCaml Doc = generated stuff. I have a lot of questions about how to proceed, but before= I begin eliciting the problem and all, let me explain about how I got wher= e I am.

Yes, unfortunate= ly that=E2=80=99s true. First off all BAP is currently under active develop= ment. Next, there is no properly working doc generator for OCaml right now,= that will handle complex project containing many modules and sub libraries= . I=E2=80=99m looking at odoc with a great hope.


N= o loss, I'm happy to dive in with BAP any way it goes documentation jus= t makes things easier. Ok, possibly I could do some write ups about what I = learn and pass them back or something to help?=C2=A0
=C2=A0
=
=C2=A0
<= div>
You can install BAP through opam, but you don't get = the documentation I don't think. So,


= and then just follow the instructions on how to build it, it's not hard= at all, I got it going on Ubuntu 14.04. The only thing I ran into was an e= rror on a llvm dependency, which required that I edit the opam file so that= I do "--with-llvm-version=3D3.4" on the configure command line a= s an option. After that everything ran smoothly.

Actually when you install BAP with opam you will have t= he documentation installed also. It is automatically installed at `~/.opam/= ???/doc/bap`. You can query the path to the documentation with the followin= g command:=C2=A0

=C2=A0 =C2=A0 =C2=A0opam config v= ar gap:doc


Yeah, I didn= 't exactly research the directions because writing the 2 day logging wa= s a pell mell effort.
=C2=A0
=
Moreover, we provide a compiled AP= I documentation on github pages. I will update the main site with the link.= =C2=A0
Also, you may find this [1] page interesting.



Cool :) I'm just happy I got i= t built on my machine. BAP is looking really decent, several of the things = I had sitting on a private machine and had thought about giving back alread= y I see done very well. BAP as a service (ZMQ too!), dependency segregation= , baptop, OCamlMakefile elminination (only even wished for this one).
=
=C2=A0
Once you run = bapbuild and make and all that=E2=80=A6.

<= /div>
You don=E2=80=99t need to run bapbuild at all, this is not a tool= to build BAP, this is a tool to build applications and plugins that use BA= P.=C2=A0

Opening up the index file at _build/bap.docdir/index.html, you can = see that the documentation starts off with a note about using Bap.Std as ev= erything else is interface files. What confused me is the seeming repetitio= n of the documentation that is generated. It seems that some of the documen= tation on some of the very same pages is duplicated for certain sections. W= hy does it do so much duplication?

<= /span>
Thats how ocamldoc works. Actually, the auto generated documenta= tion is of very low quality. I personally suggest you to setup your Emacs e= nvironment, with merlin and everything else.=C2=A0 Then you can navigate th= rough the project using `C-c C-l` (jump to definition). Look here [2] for i= nstructions about how to configure Emacs



First, using the toplevel I tried to construct a = BIL set of statements. But the way the code works, you actually have to com= pose a disassembler that is specific to your architecture (x32/64 and ARM v= s Intel or whatever). You then have to construct memory, and from that memo= ry construct an Insn type, which is meant to be the canonical, cross disass= embler type representation of an instruction. I can see how module use make= s for great reusability of code. Problem is, the type definitions that the = toplevel reports (baptop) and those of which are reported in the documentat= ion seem to differ often. TL;DR here, I tried to get as close to the front = page mention of how to use module Disasm, which meant Disasm.insn_at_mem fu= nction, but I had a hard time navigating the modules to create what I wante= d. It seems like each one thing depends on some other portion of the librar= y, and at one point I hit a dead end. The documentation mentions the same f= unctions being exposed copiously, but that's when the type definitions = wouldn't match up or something.

=
I=E2=80=99m not sure that I understand you correctly. If you ha= ve just bytes, the use function `disassemble` that accepts memory and arch.= =C2=A0 You can use `Memory.create` to make memory, and `Bigstring.of_string= ` to create a bigstring of string/



I re-read t= he documentation on the front page and went back to that, as per my most re= cent email. :)


Lastly, and ultimately even more c= onfusing is that of bap_mc.= ml, which I saw as my second easiest avenue for usage of the BAP librar= y. I saw bap_mc.ml line= 55 as my chance;

https://github.com/BinaryAnalysisPlatform/bap/blob/master/src/bap_mc/bap_m= c.ml#L55

If I just were to modify it so that it,= instead of watering down the string constructed, were to just pipe the ins= n object to a BIL constructor, and then use the sexp_of_bil transformer, th= en I could just drop it from there to be printed or converted to string and= then printed.

Naturally, I tried with several dif= ferent module's bil constructor. But most notably I think that the Std = bil constructor blew up, so here's what I replaced that line with:

Oh, please, don=E2=80=99t us= e bap_mc as an example, as it is very low level. It is intended for debuggi= ng the underlying disassembly and uses very low-level interface, with lots = of hard to understand phantom types. Please, try to stay with convenient Di= sasm module.


Perhaps there are s= ubtleties to the module language that I'm missing, and my lack of under= standing how the modules fit together combined with an overlapping module l= anguage vernacular would be why it's more difficult. Ok sure, after int= eracting more with the meant interface, I can see a bit better how BAP is i= ntended to be used.=C2=A0

--001a11c2e5587d8b3d0510accf3b--