* type inference for python @ 2005-02-01 5:26 Philippe Fremy 2005-02-01 6:56 ` [Caml-list] " Richard Cole ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Philippe Fremy @ 2005-02-01 5:26 UTC (permalink / raw) To: caml-list Hi, I would like to implement something similar to the type inference of ocaml for the python language. I have always found it very impressive (although I have only used caml light). I have no experience with the topic, it is just a project that seems cool to me :-) Do you have any hints or where I could build up my knowledge (code, books, article, ...) to do that kind of thing. I imagine that it works in a kind of three way pass: 1. analyse all the constraints of the code Ex: def f(a): a.append(1) def g(a): a=a+1; f(a) g('coucou') => a must support append => a must also be an int 2. cross-validate the constraints consistency => inconsistent assertions 3. validate the constraints against reality => g('coucou') will not work The 2. and 3. looks like the most difficult aspect of it. regards, Philippe -- Philippe Fremy InSeal Technical Director tel: +33 6 07 98 76 44 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] type inference for python 2005-02-01 5:26 type inference for python Philippe Fremy @ 2005-02-01 6:56 ` Richard Cole 2005-02-01 6:58 ` skaller 2005-02-01 7:25 ` Nicolas Cannasse 2 siblings, 0 replies; 7+ messages in thread From: Richard Cole @ 2005-02-01 6:56 UTC (permalink / raw) To: Philippe Fremy; +Cc: caml-list Philippe Fremy wrote: > > Hi, > > I would like to implement something similar to the type inference of > ocaml for the python language. I have always found it very impressive > (although I have only used caml light). > > I have no experience with the topic, it is just a project that seems > cool to me :-) Yeah me too :) Checkout "Demand-Driven Type Inference with Subgoal Pruning: Trading Precision for Scalability." by Alexander Spoon. Google can finds the paper if you type in the title. This paper gives a nice description of why type inference is hard thing to do in a dynamicly typed OO language like Python (aka smalltalk) and gives pointers to some work in the area including Cecile and Squeak. There was a later paper published by the same author in 04 but its not on the web yet :( I spent a little time considering adding interfaces to Ruby [http://kvo.itee.uq.edu.au/twiki/bin/view/Main/RjBlog39] which is pretty similar to Python, I guess. The main motivation for me, in adding interfaces to a language like Ruby, is to make the software structure explicit, and to give coders a place to document contracts that are shared by a number of classes. So I don't mind if the type inference isn't complete in the sense that every variable gets assigned some sort of minimal type. I'm sure one of the Ocaml guys can give you a list of references to the ML type system. Is there a canonical reference of the Ocaml type system? regards, Richard. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] type inference for python 2005-02-01 5:26 type inference for python Philippe Fremy 2005-02-01 6:56 ` [Caml-list] " Richard Cole @ 2005-02-01 6:58 ` skaller 2005-02-01 10:54 ` Philippe Fremy 2005-02-01 7:25 ` Nicolas Cannasse 2 siblings, 1 reply; 7+ messages in thread From: skaller @ 2005-02-01 6:58 UTC (permalink / raw) To: Philippe Fremy; +Cc: caml-list On Tue, 2005-02-01 at 16:26, Philippe Fremy wrote: > Hi, > > I would like to implement something similar to the type inference of > ocaml for the python language. I have always found it very impressive > (although I have only used caml light). > > I have no experience with the topic, I do. I did considerable work on this. The project was called Vyper. The key idea was to build an interpreter which was used to initialise all the dictionaries, then analyse them for type information, and generate code that used that type information. I got as far as getting the interpreter to work very well, running most standard pure python plus some emulations of popular C extensions, and with reasonable performance. In addition, I made some extensions, many of which got into later versions of Python .. including proper lexical scoping, garbage collection (well, my version used Ocaml for that .. ) and a few other things. However, the type analysis was never done, and I lost interest in the project, primarily because it would never support continuation passing in the style of Stackless Python. (And the source is lost now). A few technical problems also got in the way: (1) I needed finalisers for __del__ methods. Ocaml has Ocaml finalises today as a consequence. (C finalisers existed, but I needed Ocaml ones). Unfortunately, Python uses ref counting and finalisation is therefore synchronous and well ordered for non-cyclic references, whereas my implementation just used GC. This meant certain programs didn't quite work as they did in CPython. (2) Ocaml native code compiler can't generate shared libraries, so compiling Python to Ocaml will never emulate dynamic loading. Perhaps you could do this with Ocaml bytecode, but the interpreter needed to be fast. This was a very serious obstacle, since every Python C extension I wanted to emulate had to be hardwired into the program (even though using it emulated the usual import semantics). (3) Python specs weren't written by someone knowing about languages and standardisation. By far the worst problem was exceptions. In many cases errors are specified to throw exceptions... which means the behaviour is well defined. This alone make static type analysis almost impossible. To do static typing you need a specification that says the result of a wrong typing is undefined. Otherwise the program can *rely* on catching an exception, and a type error isn't an error (since the behaviour is well defined). Trying to explain that to Pythonistas and van Rossum in particular proved difficult.. although they seemed to understand the problem which dynamic mutations to modules caused and considered 'freezing' modules after they'd been imported. This may have been done in later versions of Python: some changes to 'eval' and friends were made for his reason (you have to specify the dictionary to use for an eval now). > 1. analyse all the constraints of the code > Ex: > def f(a): a.append(1) > def g(a): a=a+1; f(a) > g('coucou') > > => a must support append No, you cannot assume that due to the 'exception' problem mentioned above. > => a must also be an int And you can't assume that either. A could be a class with an __add__ method. Alternatively, it could be anything, and the called of g() is again relying on a type error to throw an exception. > 2. cross-validate the constraints consistency > => inconsistent assertions Python variables are all polymorphic. It isn't inconsistent to have a is an int a is a string This is possible, at different times .. as an example: for i in [1,"string"]: .. i is first an int then a string. > The 2. and 3. looks like the most difficult aspect of it. Actually the difficulty is deciding how to modify the semantics of Python so analysis is possible. This will break some programs, so you need to choose a reasonable set of constraints. You would typically assume that modules were immutable after loading, whilst class objects remained dynamic. This means you can apply some static analysis by actually executing the program up to the main function call, then examining what got imported. At least that was my theory .. in particular, you can type module level variables, and you can actually lookup functions defined in modules and examine the code. By using global analysis, you can then optimise some functions. For example, you might determine that some function def f(a): return a+1 really is always called with an int argument, and thus optimise it. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] type inference for python 2005-02-01 6:58 ` skaller @ 2005-02-01 10:54 ` Philippe Fremy 2005-02-01 12:37 ` skaller 0 siblings, 1 reply; 7+ messages in thread From: Philippe Fremy @ 2005-02-01 10:54 UTC (permalink / raw) To: caml-list Hi, > I do. I did considerable work on this. The project > was called Vyper. Woa, Vyper looks like a killer project. But unlike vyper, my goal is not to provide an alternative python implementation with type inference and other goodies. I aim at something like pychecker. That is, a tool that you can run to check the types consistency of a python. It is perfectly acceptable for it to be limited and to report false warning for twisted constructs. The goal is just to make python programming more friendly. There are many small mistakes that could be caught by such a tool and creep in python programs. I keep thinking about the interview of a mailman developer, who said that a big problem of python is that eventhough you have a program running for a long time, there might be some hidden place where you did type mistake that will break your whole program with an exception. Regarding this, python is no better than C program segfaulting. > (3) Python specs weren't written by someone knowing about > languages and standardisation. By far the worst problem > was exceptions. It is tricky for sure. >>1. analyse all the constraints of the code >>Ex: >>def f(a): a.append(1) >>def g(a): a=a+1; f(a) >>g('coucou') >> >>=> a must support append > > > No, you cannot assume that due to the 'exception' problem > mentioned above. Indeed, the append could be a problem if it is meant to be caught by an AttributeError exception. This is why I never liked exceptions. When deeply hidden, they break the execution flow too much and are just as dangerous as pointers. But since I am taking the "helper tool" approach, it is ok to make assumptions that will trigger real warning in 90% of the cases and false warning when you trick python too much. You can then disable the warning in your code, like pychecker does. I am ready to make a few assumptions as a starter: - operations on int/float will return an int or float - operation + involves two objects of the same type and return an object of the same type - AttributesError are not supposed to be raised and add whatever other simplification is necessary in order to get a limited but useful tool. > Alternatively, it could be anything, > and the called of g() is again relying on a type error > to throw an exception. Did you find a lot of code relying on this idiom ? That's clearly a stopper, but I am ok for ignoring it. It would be probably ok to detect this if the try/catch is in the same block. If it is not in the same scope, I consider this bad programming practice anyway so I do not mind throwing a warning. > >>2. cross-validate the constraints consistency >>=> inconsistent assertions > > > Python variables are all polymorphic. It isn't > inconsistent to have > > a is an int > a is a string > > This is possible, at different times .. as an example: > > for i in [1,"string"]: .. > > i is first an int then a string. Sure. I used to program a bit like that: def f(a): if not type(a) is type([]): a = [a] That means that the type of a variable must be tied to a portion of the code or would have to be one of a possible choice. Iterating over lists with variable types is sure tricky. But again, that's not a common programming pattern (or is it my imagination). You had the constraint of supporting every language feature. I can afford to support only 50% of them. That would already be a big win over the current situation. I understand that python typing is so loose that it creates many problems. I am puzzled because the industry uses poor languages to do important things. Our life depends so much on computers those days that it is a pity to uses such insecure languages. There have been a lot of good languages produced but few of them really caught on. Whay I like in python is that it has some characteristics that make it an easy to pick language, and it can replace C++, VB and Java in many areas. But this type fully dynamic types is one of its strength but also the number one error made by beginners. regards, Philippe -- Philippe Fremy InSeal Technical Director tel: +33 6 07 98 76 44 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] type inference for python 2005-02-01 10:54 ` Philippe Fremy @ 2005-02-01 12:37 ` skaller 2005-02-01 23:06 ` Philippe Fremy 0 siblings, 1 reply; 7+ messages in thread From: skaller @ 2005-02-01 12:37 UTC (permalink / raw) To: Philippe Fremy; +Cc: caml-list On Tue, 2005-02-01 at 21:54, Philippe Fremy wrote: > But unlike vyper, my goal is not to provide an alternative python > implementation with type inference and other goodies. That wasn't mine either. > I aim at something like pychecker. That is, a tool that you can run to > check the types consistency of a python. Perhaps you missed the point of my description. In order to check the types of something, where there are no type annotations, you need to use the known types as a starting point: namely the literals. Often in Python you will see: def f(a): return M.x + a where you don't know the type of a. But what about M.x? Perhaps you can look up x in M. But attributes are not declared in Python, let alone their types. What you need to do is actually find the value of M.x to see what type it is. IN order to do this, you observe there is a statement: import M in the program. There is a simple way to find the type of M.x, and that is to actually *execute* the import statement. In other words, you have no choice but to implement an interpreter -- or use the existing one. More precisely, you could look at the code in M, do flow analysis, and try to guess at the type of x from that. But that is much more complicated and difficult. Why bother to do that when you can just import M and then examine the module dictionary?? So you see my purpose was not to implement a python interpreter, I did that because it is *necessary*. There is no better way to find the types of attributes in modules. The reason this works is based on the fact that *usually* in Python, a program does three things: (a) import some modules (b) do some initialisations in the mainline (c) call the main function It is a bit tricky to distinguish (b) and (c). The point however, is that (a) is usually expected to just create a set of values and functions which the mainline then uses. So if you want to type check the mainline code, you need to establish the types of the values and functions in the imported modules as a starting point... and as mentioned you do that by actually importing them. This is basically the only possibility: it is reliable because you don't have to guess about how to do it, the semantics are defined in the language reference. In addition, for many modules the typing is known because it is documented in the Library Reference. After you have done this, then you must analyse the mainline. You CANNOT execute it, in the same fashion, to find the types, because you don't have the input data, it could be a server, etc etc. But most of the modules, you can expect them to execute and terminate quickly, since the execution serves mainly to just to store things in the module dictionary. The difficulty you will face with 'pycheck' or whatever is that you really do have to establish with some certainty what some of the types are, so that a violation can be noted and warned about. Otherwise, if whenever you're in doubt you just say 'polymorphic' then clearly every doubtful use will just lead to saying 'polymorphic' and you won't be able to warn about *any* misuses .. :) > It is perfectly acceptable for > it to be limited and to report false warning for twisted constructs. The question is whether you can provide a reasonable signal to noise ratio. To do that you really have to be right about the intended typing most of the type. > I keep thinking about the interview of a mailman > developer, who said that a big problem of python is that eventhough you > have a program running for a long time, there might be some hidden place > where you did type mistake that will break your whole program with an > exception. > > Regarding this, python is no better than C program segfaulting. C is probably more reliable because it is statically typed, however there is a loss of reliability in C because the type system isn't very expressive, and the workaround -- casting -- is used far more often than one would like. Python, on the other hand, is easier to test (no long compile times), and dynamic type checks catch errors earlier than in C. Either way, what you would like to do is a laudible goal, if you can get it to work .. but i suspect you will find it much harder than you might think -- I certainly did :) > But since I am taking the "helper tool" approach, it is ok to make > assumptions that will trigger real warning in 90% of the cases and false > warning when you trick python too much. Sure, but what do you mean 'trick Python'? Writing code based on the specified semantics -- including exception handling -- is not a trick. I myself write lots of Python even today (all my build scripts and utilities are Python). I do things often that you might think are tricks: try: x = f() except: x = default I do that one heaps, my code is littered with it, almost the entire Felix configuration script works that way. > I am ready to make a few assumptions as a starter: > - operations on int/float will return an int or float > - operation + involves two objects of the same type and return an object > of the same type > - AttributesError are not supposed to be raised My literate programming tool interscript relies on that heavily. It is in fact the central technique I use to build polymorphic weavers (a weaver is a driver for a typesetter). you are allowed to call the moral equivalent of: x.start_emphasise() x.output("jhgjhgjhg") x.end_emphasise() and an error finding the start/end emphasise method is just ignored. If they're found and fail that's an error. It isn't a bug to leave out the emphasise methods for the text weaver -- plain text doesn't support it. Latex and HTML do. This allows you to encode pretty features (like emphasis) whether or not the typesetter supports it -- that is, without knowing which typesetter the client will want to use for your document. You also might have some trouble finding this kind of thing out -- since the weavers are all plugins. They're loaded dynamically on demand. Is this trickery? Well, if it is, if this is bad programming, then why bother using Python? One uses it *because* it is a dynamic language. > Did you find a lot of code relying on this idiom ? No, most code doesn't do anything fancy. Most Python code is 'well typed' and reasonbly static. By well typed I mean a variable is either a single type, or clearly polymorphic. Unfortunately, 'most code' is easily type checked by a single test case. The real advantage of a static checker is weeding out corner cases -- programmers don't get the ordinary cases wrong that often, and when they do the first test run usually finds the bug. > Iterating over lists with variable types is sure tricky. But again, > that's not a common programming pattern (or is it my imagination). The common case is (im Ocaml notation): 'a option That is, some value of a fixed type, or None. That one you have to handle. Luckily, there's lots of theory for this. None is a bottom or ground or Error or whatever, so for each plain type int, string, ... you need to handle Some int | None, Some string | None which are called 'pointed types'. > You had the constraint of supporting every language feature. I can > afford to support only 50% of them. That would already be a big win over > the current situation. Not really. The problem is that doubt propagates exponentially. So you have to handle quite a bit :) You probably need to distinguish (a) monomorphic unpointed type (int, string etc) (b) monomorphic pointed (allows None too) (c) dynamic (any type is allowed) (d) don't know Don't confuse (c) and (d). Dynamic means you DO know that the type varies (no further deduction required). Don't know means you're open to further constraints as you see more code. > I am puzzled because the industry uses poor languages to do important > things. Our life depends so much on computers those days that it is a > pity to uses such insecure languages. There have been a lot of good > languages produced but few of them really caught on. You will get much agreement with that sentiment here :) > Whay I like in python is that it has some characteristics that make it > an easy to pick language, and it can replace C++, VB and Java in many > areas. But this type fully dynamic types is one of its strength but also > the number one error made by beginners. I suspect dynamic typing is mostly a workaround for lack of polymorphism and the lack of sum types (variants). It does have the advantages of avoiding the difficulties of find a sound, simple, and expressive type system, which continues to be a bleeding edge research topic :) If you're serious, and you think programmers will take your tool seriously, you might consider establishing a protocol for type annotations -- in comments for example, perhaps like: def f(a): #a:int x = y + z #z:int Using such annotations, your tool could be helped quite a bit. In particular, when you can't determine a type, the diagnostic is a hint to add an annotation. Software houses might then require enough annotations as part of their coding standards to shut your tool up :) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] type inference for python 2005-02-01 12:37 ` skaller @ 2005-02-01 23:06 ` Philippe Fremy 0 siblings, 0 replies; 7+ messages in thread From: Philippe Fremy @ 2005-02-01 23:06 UTC (permalink / raw) Cc: caml-list Hi Skaller, Thank you for your very detailed explanation. You have obviously dedicated a lot of thought to the problem. I know understand that it is indeed necessary to almost rerun the interpreter to check the code. Actually, that's what pychecker does. But even running the interprter would probably not be enough. I knew when asking that the problem was probably beyound my reach. Now, I know why :-). > Sure, but what do you mean 'trick Python'? Writing code > based on the specified semantics -- including exception handling -- > is not a trick. Actually, I realise that there are many usage patterns for python. Some people like you use it for its highly dynamic nature. I use it mainly as an easy Java/C++ replacement and it would be ok for me to drop many of the dynamic aspect, in order to get more type and execution safety. > Unfortunately, 'most code' is easily type checked by a > single test case. Well, it is so easy to make a small mistake with big consequences. An intern just wrote some code for me where he was supposed to return a list. In one case, he would return a None instead of an empty list, and broke the program the whole concept (always review the code written by intenrs :-). There should be a way to avoid that kind of silly mistakes. > you might consider establishing a protocol for type annotations This is already being discussed in python-dev by more advanced programmer than I am. And raises a lot of controversy on how to do it and whether it will be useful. Anyway, discussing the topic gave me a higher view of the problem. Python is currently a single solution (CPython) that lends itself to many programming paradigm, but in an unsatisfactory manners for some of them. I would like to get rid of the dynamically nature of the language sometimes, which some people consider equivalent to killing it. That's how I would react if somebody suggested to "remove all the OO crap of python" and use it like a good pascal replacement. Still, it would be a valid use. The good news is that a solution is coming for this problem: pypy. This generated implementation of python should allows to build python interpreter with different feature set. They also have a nice program flow chart where they can infer types and optimise code to a wild level. So, I just have to find myself another fun project :-). Thank you for your answers. regards, Philippe -- Philippe Fremy InSeal Technical Director tel: +33 6 07 98 76 44 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] type inference for python 2005-02-01 5:26 type inference for python Philippe Fremy 2005-02-01 6:56 ` [Caml-list] " Richard Cole 2005-02-01 6:58 ` skaller @ 2005-02-01 7:25 ` Nicolas Cannasse 2 siblings, 0 replies; 7+ messages in thread From: Nicolas Cannasse @ 2005-02-01 7:25 UTC (permalink / raw) To: Philippe Fremy, caml-list > Hi, > > I would like to implement something similar to the type inference of > ocaml for the python language. I have always found it very impressive > (although I have only used caml light). > > I have no experience with the topic, it is just a project that seems > cool to me :-) > > Do you have any hints or where I could build up my knowledge (code, > books, article, ...) to do that kind of thing. > > I imagine that it works in a kind of three way pass: > > 1. analyse all the constraints of the code > Ex: > def f(a): a.append(1) > def g(a): a=a+1; f(a) > g('coucou') > > => a must support append > => a must also be an int > > 2. cross-validate the constraints consistency > => inconsistent assertions > > 3. validate the constraints against reality > => g('coucou') will not work In fact, the ML style type inference can be done in only one pass, at first your variables are monomorphic then they're unified witht the first type they match, creating linkages between types if needed. However in an OO system such as Python, you need more than core ML type inference. You will need some kind of structural subtyping, and subtyping is known to be tricky when used together with polymorphism. There is several kind of subtyping algorithms possible, using constraints is one of them (see work of François Pottier on this). I used another approach based on ocaml polymorphic variants typing scheme (somehow reversed) to implement this feature in MTypes, but it still have a lot of implementation holes difficult to handle. Also, the Python standard library must be designed with dynamic typing in mind (several types possible for arguments, variable number of arguments, operator overriding.....). It's possible to rely on a dynamicly typed library but you'll have to constraint the types a little more and add primitives so users can access different kinds of features which are not usable anymore. Good luck, Nicolas Cannasse ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-02-01 23:06 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-02-01 5:26 type inference for python Philippe Fremy 2005-02-01 6:56 ` [Caml-list] " Richard Cole 2005-02-01 6:58 ` skaller 2005-02-01 10:54 ` Philippe Fremy 2005-02-01 12:37 ` skaller 2005-02-01 23:06 ` Philippe Fremy 2005-02-01 7:25 ` Nicolas Cannasse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox