GMLscripts.com

Discuss and collaborate on GML scripts
Invert

You are not logged in.

#1 2011-06-06 10:32:46

paul23
Member
Registered: 2007-10-17
Posts: 110

Mathematical, Logical expression parser

Just to inform everyone:

I released the first non-beta version, it is a much more complete package.
http://gmc.yoyogames.com/index.php?showtopic=510018 <- gmc topic, feel free to leave some rants/comments there.


PS: I'm wondering how many people will notice I've edited this :P.




Well the other day I wanted to create a system where I could load external "events", an "event" would be (in pseudo format):

ExpandWHEN event_name //when event is tested/executed
CONDITION conditional_statement //(var1 > 2 && func(var2) < 3) type conditions
ACTION action_script

Now the "when" part was easy to write, and easy to implement, the first difficult position was the "condition" part. I set to myself as goal to not use execute_string() at all, and to preferable not use dlls either. Thus I had to write my own parser for these expressions, after some weeks of evading the problem (while thinking of ideas) I decided to start writing.

Turned out to be not as difficult as I once thought:
Download math_parser.gm81 from Host-A

Above example shows "a few" features of the parser. pressing "ENTER" forces the parser to actually parse the functions, pressing "SPACE BAR" allows you to edit the last expression. You can edit, and use all (and more) features shown in the above functions. It handles the syntax (mostly correct). When looking into the code you can see an example of how a "parser" is set up in "objTest". - First you initialize the parser (which sets some default operators). This returns an instance (id) with which you have to do all other operations, you can add operators, functions and "variables". (MLP_Add..)


There are a few drawbacks/bugs in this parser:
-First of all much information is "lost".. especially for functions this might become problematic.
-There's no syntax checking and any small mistake will lead at some point to a gamemaker error. I wanted to create some syntax rules, but the problem is that if an error occurs parsing has to "stop". However gamemaker doesn't have good exception handling for this build in. Meaning that there's no guarantee that I quit functions when I'm x-levels deep.. The only way is to check after each function which can possible result in an error whether or not an error was raised.
Things like: 5 * * 5 cause errors now.
-If an expression contains some "random" signs the lexer can't identify, it considers it simply a "value". I'm not doing any checking on "value" tokens, and just feeding them into gamemaker's "real" function. Gamemaker will then spit out errors. Updating this would be a really tedious task, as this should be done by a regex-engine. (And hence I'm building half an engine while creating the lexer). - I could've used yourself's engine but my goal was to not use dlls at all (if I would use dlls I'd have used a library build for this already).

-I've chosen to use the precedence rules as described in the "c" family.. In particular this means that && has now precedence over || signs.
-I'm totally unsure what to do with the "^" sign, currently it handles those sign as "xor" operators (bitwise as well as boolean). Maybe using it for "to the power of" would be better?
-Currently the reading order is predetermined: unary operators are always read from right-left, and binary always from left-right. This works for almost all cases (only exception I know is the "to the power of" binary operator, which reads from right-to-left).

Well the basics are now done at least.. Though when I was looking for more documentation about the manner I discovered some pretty nifty algorithms to handle the data, this one seemed terribly easy.. Maybe a rewrite is in order :P..


Well I guess I'll be working on the next part of the problem now: handling the actions of the events.

Last edited by paul23 (2011-07-27 05:17:17)

Offline

#2 2011-06-18 18:45:44

paul23
Member
Registered: 2007-10-17
Posts: 110

Re: Mathematical, Logical expression parser

Wow, just noticed I forgot to update this.. The above version is completely wrong & outdated.. I rewrote it using "shunting yard". - It should be much more robust and better to extend. I'm unsure how much more I'll update this, the enthousiams isn't really heavy, probably goes back to the private workbench. (I really love working on things like this - am thinking about writing a full-fledged parser someday).


GMC post:

MLP - Mathematical Logical Parser
updated:19/06/2011

Parsing user-inpute, and then especially "user defined" functions & expressions was at best hard, and often evaded in gamemaker. People often resorted -out of lazyness- to using "execute_string" type structures to parse user given expressions. With this group of scripts (& objects) I tried to prevent the use of exectue string, and all it's bad habits with it. The goal was to completely prevent execute_string() and to make the expression-parser completely in "vanilla GML", independent of any other extension or dll.

The parser can be found here: Download mlp_parser.zip from Host-A
The zip contains 2 files, an demonstation implementation as gmk file, and a resource file for the actual engine..

key features

  • based around the Shunting yard algorithm

  • independent of dlls, so portable to any platform gamemaker is working

  • Complete safe; YOU register the function which can be used, and the syntax checker prevents anything else

  • Robust exception handling; If the input expression isn't a valid expression, an "exception" is raised. Not leaking memory and not crashing your program

  • Many default operators; over 30 operators defined

  • Respects (c-style) precedence rules

  • easy to extend; you can add your own functions and operators with simple functions

  • precedence rules can be adapted and extended.

  • Assigning to specified variables.

  • ...

A good example for when to use is described below:

Suppose I have "external events", an external file describes certain "events". Such an event would have the format:
EVENTNAME: //identifyer name
WHEN: //when event occurs
CONDITION: //condition which has to be true when the event occurs
EVENT: //script with event

The CONDITION part can in this case be perfectly handled by this engine, you would simply give the expression/condition and then see if the answer is "1". As this parser is 100% robust there's no need to "hide" such external data to prevent hacking of the "main" game. Only the function you allowed to be execute will ever be executed. And it won't ever SET variables (other than done by your allowed functions).

in topic help
Maybe at some day I'll write a full documentation, but for now you'll have to do it with below (and the demonstration file):

set up
1) Start up:
parser = MLP_InitParser([buildin]) - buildin determines wether to use the build-in expression list. The function returns an "id" which has to be used as base for all other functions.
2) Register your own functions, operators and variables
MLP_AddBinaryOper(operator_string,precedence,script_index)  Adds a binary operator, operator_string is the identifying string, script_index the index of the script.
MLP_AddUnaryOper(operator_string,precedence,script_index)  Adds a unary operator, operator_string is the identifying string, script_index the index of the script.
MLP_AddFunction(function_string,script_index,argument_number);  Adds a function, function_string is the identifying string, script_index the index of the script, and argument_number the number of arguments this function needs.
MLP_AddVariable(variable_string,variable_pointer,scope); Adds a variable, variable_string is the identifying string, variable_pointer is a string which "points" to the real variable, and scope the scope where the variable is stored (can be instance id or "global").

3) register the expression string:
MLP_SetExpression(expression) Sets the expression string

4) Calculate:
MLP_Calculate() Calculates the lot, notice that you have to sure the error-state is "cleared" before starting a calculation
MLP_HasAnswer() Whether or not the parser has something calculated (succesfully)
MLP_GetAnswer() Gives the last answer found.. - Undetermined value if exception was raised during calculation.

5) Check for error, and handle it:
MLP_NoException() Wether there was no exception
MLP_LastExceptionPosition() Postion of last exception
MLP_LastExceptionString() String data of last exception
MLP_LastException() ID of last exception
MLP_ClearExceptions() Clear the Error state. (use before recalculating).

example

Expandglobal.somevar = 2;
parser = MLP_InitParser(); //initializes a parser, using build-in operators
with (parser) {
   MLP_AddBinaryOper("nCr",11,script_combinations); //adds "nCr" - given we have a script_combinations already
   MLP_SetExpression("2 * somevar nCr 2");
}
Expandwith (parser) {
   if (!MLP_NoException) {
      MLP_ClearExceptions();
   }
   MLP_Calculate()
   if (MLP_HasAnswer()) {
      show_message("answer is: "+string(MLP_GetAnswer));
   } else {
      show_message("Error during  calculation.");
   }
}

Build in operators
The following operators are build in already:

token | precedence | explanation
+     | 12         | unary prefix
-     | 12         | unary negative
!     | 12         | unary not
~     | 12         | unary bitwise not
*     | 11         | multiply
/     | 11         | division
%     | 11         | modulo
div   | 11         | integer division
+     | 10         | addition
-     | 10         | substraction
<<    | 9          | bitwise shift left
>>    | 9          | bitwise shift right
<     | 8          | less (comparison)
<=    | 8          | less or equal (comparison)
>     | 8          | greater (comparison)
>=    | 8          | greater or equal (comparison)
==    | 7          | equal (comparison)
!=    | 7          | inequal (comparison)
&     | 6          | bitwise and
^     | 5          | bitwise xor
|     | 4          | bitwise or
&&    | 3          | boolean and)
^^    | 2          | boolean xor
||    | 1          | boolean or
=     | 0          | assignment
+=    | 0          | assignment by addition
-=    | 0          | assignment by substraction
*=    | 0          | assignment by multiplication
/=    | 0          | assignment by division
%=    | 0          | assignment by modulo
<<=   | 0          | assignment by left bitshift
>>=   | 0          | assignment by right bitshift
&=    | 0          | assignment by binary and
|=    | 0          | assignment by binary or
^=    | 0          | assignment by binary xor

A small "todo" list:
-Arrays
-Ability to use multiple expressions at once. (";" or "," sign)
-ternary operator


UPDATE 19/06/2011:Fixed a few memory leaks
Added ability to allow for right associated operators - so you can now finally use "^" for "to the power off" symbol.
Aded assign operator structure, now this parser should be even better implemented into a game-structure.

Some funny thing I did to circumvent the terrible support for errors in gamemaker:

Expanddo {
    tokenlist = MLP_LexicalAnalysis();
    if (!MLP_NoException())  { break;}
    rpn = MLP_ShuntingYard(tokenlist);
    if (!MLP_NoException())  {ds_queue_destroy(rpn); break;}
    Ans = MLP_Parser(rpn)
    ds_queue_destroy(rpn);
    if (!MLP_NoException()) break;
    Calculated = true;
} until 1 = 1;

//cleanup
var i, s;
s = ds_list_size(tokenlist) 
for (i = 0; i < s; i += 1) {
    with (ds_list_find_value(tokenlist,i)) {
        instance_destroy();
    }
}
ds_list_destroy(tokenlist);

By using "break" I could stop a part of the script, and after the loop it would continue.. This allowed me to "catch" errors and handle the memory.

EDIT:meh, forum engine seems to not handle spoiler trees.. changed to quote

Last edited by xot (2011-06-19 03:59:35)

Offline

#3 2011-06-19 18:32:22

xot
Administrator
Registered: 2007-08-18
Posts: 1,240

Re: Mathematical, Logical expression parser

Wow! This is pretty great stuff, Paul. Seems to work really well and gets around that pesky execute_string() security hole perfectly for most likely applications. Being able to register functions and variables adds amazing power to it. Very nice demo.

I think it's also got some vital future applications. I'll bet dollars to donuts that execute_string() will not appear in some no-doubt popular versions of the GM runner that are coming soon (namely iOS and HTML5 versions, but potentially all versions).

I tweaked the formatting of your post. Hope you don't mind.


Abusing forum power since 1986.

Offline

#4 2011-06-20 18:10:38

paul23
Member
Registered: 2007-10-17
Posts: 110

Re: Mathematical, Logical expression parser

Thanks for cleaning the bbcode smile - The GMC has quite a bit different tags tongue.
Some funny fact is that around 40% of the processing time is spend on error checking & validation. But then again this is one of the main features I aim for: User input errors should never give an error from gamemaker, but instead raise an exception as the program might actually handle this (asking for new user input, rereading the file, ignoring the code piece..). Wish gamemaker had some better build in method for this.
Also I got to rant about the lack of programming features in the department of "pass-by-pointer" and "pass-by-reference".. Being able to pass by reference would've made many things (assignments) so much more easy.. No more scope checking, no more verbose "v=variable_*_get(X)" followed by "variable_*_set(X,v+t)"


Download math_parser0_2.gm81 from Host-A

Well as a quick "update": I fixed a few possible location for user errors in the default operators (mostly "division by 0" problems which would originally crash gm, are now handled by an exception). To follow the paradigm that any action by the END USER shouldn't result in a gamemaker error.

But more importantly I added "strings" towards this, and with strings there are now 2/3 types:
Strings/reals/any (last 1 means the parser doesn't check).

This last change meant a complete overhaul of the interface, as now you need to give the types for everything. I decided to go strongly typed, this allows me to catch type-errors during parsing instead of relying on the writing of a function. It also meant the parser is quite a bit faster, as when using a weak typed format each operation would mean checking the types.. Now I can make assumptions already.
Users would mostly feel this in the need to define operators/functions well.. And the fact you can't assign a "real" to a variable deemed "string" (this would result in an exception).

Though I immediatelly made a work-around of this strongly typing already (yes I'm that bad). MLP_VAL_ANY as parameter means that both strings & reals can be fed to this function/operator. - This mostly comes from the fact I didn't allow for overloading of functions.. So a function/operator would basically either work for strings or reals, not a combination. Obviously the "+" operator is a troubling point already.

I am still believing this is a terrible way to solve this problem and am activelly searching for a way to solve this problem. Not only does this put much more stress on the programmer (he has to take account for each different argument-type combination).. But it also slows down the program (return type can also be "any", in which case the parser has to manually verify the returned type).


I've thought going the way like how C++ compilers often do: they mangle the names to include also arguments in the "signature". So:
string foo(string); would be like foo$string in the function table
real foo(real); would be like foo$real in the function table.

Problem with this approach is with the lexer (lexical analyzer, the very first step to conver human lines towards seperate tokens), whenever it sees a "word" it looks this word up in the various tables (operators, variables and functions) to determine what it is, and if it actually IS something. The lexer however is too stupid to understand what argument (types) will follow - this is the task of the parser (3rd step).
So the lexer would only see "foo" - and then it has to know, "hey this is possibly a function". When using mangled names this would mean I have to manually go through the map untill I find a "function with foo as start". Not a problem in a compiled language (as it's just another lookup and shouldn't be slower than normal lookups), however you don't have access to the internals of the ds_map in gamemaker so it would mean you have to go through the whole map 1-by-1, you can't even use the fact maps are stored as binary trees. Another problem with this approach would be to operators, overloading operators for different types wouldn't be possible. And you'll have to go through
So this was kind of "out the picture" after trying it once.

I've got another method in my mind: simply keep the same function signature.. But then each function keeps a list inside himself what argument types are possible:
funciont FOO:
{real, real}
{string, string}
The problem for this method would be to find a way to quickly check if combination "X" is in the list. - How would I define "X". Obviously the types are identified by "constants" (real = 0, string = 1).. So the function would have a list filled with arrays which identify the possibilities. But as arrays are impossible to pass to lists it would be lists-of-lists in gamemaker. But this would mean terrible syntax, and slow, manual searching.



But I'm rambling away again, probably lost everyone during my first sentence already -.-. Well safe to say that soon I'll have to rewrite large parts again - if I add arrays to that part as well as the ability to create ternary operators I think I'll call that version 1 and stop for a while. The shunting yard algorithm (which is the part to convert the tokens found by the lexer to an abstract data tree necessary for a parser) isn't quite fit for structures like loops & if statements. So that will be done at a very much later date (if ever).

Last edited by paul23 (2011-06-20 18:11:08)

Offline

Board footer

Powered by FluxBB