10 years has passed since MD5 was broken, yet it is still frequently used in web applications, particularly PHP powered applications (maybe because there’s a function after it?). Break it again to prove the point!

We are given access to the source code:

if (isset($_GET['src']))
    highlight_file(__FILE__) and die();
if (isset($_GET['md5']))
    if ($md5==md5($md5))
        echo "Wonderbubulous! Flag is [here goes a flag]"; //.require __DIR__."/flag.php";
        echo "Nah... '",htmlspecialchars($md5),"' not the same as ",md5($md5);


As we can see we need to generate an MD5 hash that is ‘equal’ to the string it was generated from.

The flaw in the source is the result of a type juggling issue with the comparison using ‘==‘ instead of ‘===‘.

When using ‘==‘ both values being compared that have have ‘0e’ will be converted to a float value such that the string has leading zeroes into an ‘e’ and followed by only integers.

Consider the following:

var_dump('000e1032' == '00e032098103123'); // Returns True
var_dump('000e1032' === '00e032098103123'); // Returns False
var_dump('000e1032' == '00e03209A103123'); // Returns False

A common misconception is that this would be entirely limited to values starting with 0e. However, we can have any number of leading zeroes.

So in order to create a ‘collision’ we need to find a string that matches a regex of ‘^[0]{1,30}e[0-9]{1,30}$’ and that also has an MD5 value matching that regex.

I setup the script with a single argument that I planned to use for a range of 0-32.

I have an empty file setup that each additional process can check to see if another process found the collision.  The aim here is to maximize CPU without being limited to a single core via the process.

Essentially each value from 0-32 represents a block of 15 million attempts.

We setup the counter based on the process integer from 0-32 and multiply by one million:

count = int(thread_num) * MILL

Then we initiate our WHILE LOOP that first checks the output file to see if any other processes found a collision first:

if os.stat(OUTFILE).stsize > 0:

If not, then we enter our leading zero loop for each count value:

for i in range(1,16):
    m = hashlib.md5()
    plain = '0' * int(i) + 'e' + str(count)
    collide = m.hexdigest()

Lets break this down. The plain text that we will generate the hash from is going to have a number of zeroes equivalent to ‘i’ generated before ‘e’ followed by the current loop counter.

So, if we are at the very start of process 3, our current plain value would be ‘0e3000000’. As we step through the ‘i’ loop it will increase. So once ‘i’ is 10 for example, the plain value will be ‘0000000000e3000000’.

To illustrate the order that hashes are being generated:


We then check it against our regex to see if it matches and will work:

if re.match('^[0]{1,30}e[0-9]{1,30}$',collide):
   f=open(OUTFILE, 'w+')
   f.write('[+]:: Collision Found!\n[+]:: Value '+str(plain)+'\n[+]:: HASH '+collide)
   print '[+]:: Execution Time - ' + str( - starttime)
   print '[+]:: Hashes Generated - ' + str((count - (MILL*int(thread_num))) * 16 - i)


There are plenty of ways we can execute our script to open multiple processes (for loops, python, etc). I have too many issues with my workstation not maximizing CPU with threading and mp modules to try to waste time troubleshooting for now. Once we run with command line ‘python x’ where x is 0-32, we can find the hash in just a few seconds.

Turns out process number 9 generates a match the quickest per the md5_out.txt file:

[+]:: Collision Found!
[+]:: Value 0000e9012915
[+]:: HASH 0e799233929553026239472795857843

[+]:: Execution Time – 0:00:05.844000

Leave a Reply

Your email address will not be published. Required fields are marked *