Base64 Detection — Mar 27, 2013, 9:25 am
I tried to create a Base64-Toggle function for my Webtools and found out there is no perfect solution. Google showed me some attempts, but they never really convinced me.
What we know for sure is, that a Base64 conversion usually only contains the characters A-Z, a-z, 0-9, and + / =. But it doesn't necessarily mean, that every string with only these characters is a converted one (i.e. "hello").

As one 'solution' I found this:
if (base64_decode($str, false) == false) ...
The second argument "false" in the function returns FALSE if the input contains characters from outside the base64 alphabet. But this doesn't give us any advantage from a simple REGEX.

There also was another, much better valuation using REGEX, that I found at stockoverflow:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$
I tried this one, but was also disappointed, because it falsely decodes terms like "hello" or "abc123", too.

Finally I thought of something, that was supported by another hit on Google. It came to my mind, because I used a similar method before to get rid of some website spam.
Whenever the base64_decode function swallows something not properly encoded, it gives out strange symbols. So I decided to count these strange symbols to get a percentage of them and decide on that whether to encode or decode. Here is my full function:
function base64Toggle($str) {
	if (!preg_match('~[^0-9a-zA-Z+/=]~', $str)) {
		$check = str_split(base64_decode($str));
		$x = 0;
		foreach ($check as $char) if (ord($char) > 126) $x++;
		if ($x/count($check)*100 < 30) return base64_decode($str);
	}
	return base64_encode($str);
}

For all my tests this worked perfectly and this is why I wanted to share it. But even with this you can never be 100% sure. For this example I set the detection rate to 30% for a character set, that only includes the regular alphabet (up to ASCII character #126) and no international characters like German umlauts.

Feel free to use it and let me know, if there are any problems with it!
Alan on Sep 16, 2014, 10:38 pm:
For my specific use case, I needed to remove the "!" from the first if clause, otherwise it works great. Thank you for posting this - it has been very helpful!
Cheat on Nov 19, 2014, 5:26 am:
this is shit man!
J on Jun 6, 2016, 8:39 pm:
That's fucking awesome man, thanks a lot. I had to raise the threshold to 35 to get it to work with some base64 encoded images.
Enter your comment:


  Use [code=LANGUAGE]...[/code] for highlighting (i.e. html, php, css, js)