Discuss and collaborate on GML scripts

You are not logged in.

- Topics: Active | Unanswered

Pages: **1**

**leif902****Member**- From: Georgia, USA
- Registered: 2007-10-09
- Posts: 24

Recently, I needed to calculate the hamming distance (the number of different characters, ex. "123" 'and "127" have a distance of 1 because the 3 and the 7 are different) between two numbers.

I first calculated the binary hamming distance:

**#define** hamming_distance2**var** dist, val;

dist = 0;

val = argument0 ^ argument1;

**while** (val)**{**

dist += 1;

val &= val - 1;**}**

**return** ( dist );

This worked well enough, until I realized that I needed the hamming distance for decimal numbers:**#define** hamming_distance10**var** len, dist, i, a, b;

len = ( floor(log10(argument0)) + 1 );

dist = 0;

**for**(i = 0; i < len; i += 1)**{**

a = floor(argument0 / ( 10 ^ (len - i) )) * 0.1;

a = floor((a - floor(a)) * 10);

b = floor(vec2 / ( 10 ^ (len - i) )) * 0.1;

b = floor((b - floor(b)) * 10);

dist = dist + (a != b);**}**

The base 10 function is very impractical and I'm sure there is a better way to do it mathematically.

Also, the first line of the decimal hamming function " floor(log10(argument0)) + 1;" could be a separate function as it returns the length of the decimal number which is often very useful to know.

*Last edited by leif902 (2008-09-13 00:08:37)*

Leif902

Offline

**xot****Administrator**- Registered: 2007-08-18
- Posts: 1,240

Why not do this using strings?

*Abusing forum power since 1986.*

Offline

**leif902****Member**- From: Georgia, USA
- Registered: 2007-10-09
- Posts: 24

Heh, I've had several people say that since I wrote this script. I just didn't think it was a good idea mixing and matching data types like that; doing it mathematically seems so much more... pure?

It is also more memory efficient this way and I assume there is a mathematical way that is more efficient all around as well (mine's not so bad, but I'm sure there is a better way to do the decimal compare).

EDIT: Of course, it would also be useful to have a separate function for doing this to strings. I wasn't working with strings, but of course it would be a useful script too.

*Last edited by leif902 (2008-09-13 01:20:26)*

Leif902

Offline

**xot****Administrator**- Registered: 2007-08-18
- Posts: 1,240

Yeah, I get what you are saying. Still, Hamming distance really is a string function, and trying to reproduce it mathematically, for an arbitrary base, without using strings, just seems like a roundabout way to go, not to mention slower for an interpreted language like GML. And memory? You're talking about a very, very tiny difference in memory usage.

Also, shouldn't "10 ^ (len - i)" be "power(10, len - i)"?

If it were me, I wouldn't hesitate to use strings. Maybe someone more mathematically inclined can provide another option.

*Abusing forum power since 1986.*

Offline

**leif902****Member**- From: Georgia, USA
- Registered: 2007-10-09
- Posts: 24

Ah, yes, in regards to 10^... you're absolutely right; I meant "ten to the power of" and simply wrote it wrong (I tested this function in MATLAB, not GML).

As for the hamming distance normally being used for strings, I would argue that it is mainly used for comparing binary numbers, however, that comes from very limited experiences with the function.

EDIT: Regarding memory, you're right again... I was forgetting GM stores everything as doubles... so again, not an issue.

*Last edited by leif902 (2008-09-13 01:45:29)*

Leif902

Offline

**xot****Administrator**- Registered: 2007-08-18
- Posts: 1,240

I meant string in the most generic terms. A binary number is just a string of bits. All that matters is the length of the strings (number of digits) and the number differences in their sequences of arbitrary symbols. The actual values of the digits are inconsequential, so bringing a bunch of math into it seems almost pointless.

*Last edited by xot (2008-09-13 02:37:54)*

*Abusing forum power since 1986.*

Offline

**leif902****Member**- From: Georgia, USA
- Registered: 2007-10-09
- Posts: 24

Oh right, sorry. Using strings would allow for an arbitrary base, but it still makes the engineering side of me shutter.

Leif902

Offline

**xot****Administrator**- Registered: 2007-08-18
- Posts: 1,240

I would be surprised if there was a more efficient or less error-prone method than the obvious:

```
{
var val1,val2,len1,len2,dist,i;
val1 = string(argument0);
val2 = string(argument1);
len1 = string_length(val1);
len2 = string_length(val2);
if (len1 != len 2) return -1;
dist = 0;
for (i=1; i<=len1; i+=1)
{
if (string_char_at(val1,i) != string_char_at(val2,i))
{
dist += 1;
}
}
return dist;
}
```

That's at most: (length * (4 operations + 2 compares + 2 assignments) + 4 operations + 1 compare + 6 assignments)

Your decimal version: (length * (20 operations + 2 compares + 6 assignments) + 3 operations + 3 assignments)

I'm curious, why do you need decimal vs. binary? Are you doing something beyond simple error-detection?

*Abusing forum power since 1986.*

Offline

**leif902****Member**- From: Georgia, USA
- Registered: 2007-10-09
- Posts: 24

I needed the base 10 version of the script for an assignment in CS class... the binary version is something I'm thinking about using in an AI which will compare various recordings and tell you if they are of the same piece of music or not.

Oddly enough, I've never actually used the hamming distance for error detection (though, not being a professional in the field, I have never really had the need to detect errors in anything).

Leif902

Offline

Pages: **1**