Base64 Detection — Mar 27, 2013, 9:25 am
I tried to create a Base64-Toggle function for my Webtools and found out there is no perfect solution. Google showed me some attempts, but they never really convinced me.What we know for sure is, that a Base64 conversion usually only contains the characters A-Z, a-z, 0-9, and + / =. But it doesn't necessarily mean, that every string with only these characters is a converted one (i.e. "hello").
As one 'solution' I found this:
if (base64_decode($str, false) == false) ...The second argument "false" in the function returns FALSE if the input contains characters from outside the base64 alphabet. But this doesn't give us any advantage from a simple REGEX.
There also was another, much better valuation using REGEX, that I found at stockoverflow:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$I tried this one, but was also disappointed, because it falsely decodes terms like "hello" or "abc123", too.
Finally I thought of something, that was supported by another hit on Google. It came to my mind, because I used a similar method before to get rid of some website spam.
Whenever the base64_decode function swallows something not properly encoded, it gives out strange symbols. So I decided to count these strange symbols to get a percentage of them and decide on that whether to encode or decode. Here is my full function:
function base64Toggle($str) { if (!preg_match('~[^0-9a-zA-Z+/=]~', $str)) { $check = str_split(base64_decode($str)); $x = 0; foreach ($check as $char) if (ord($char) > 126) $x++; if ($x/count($check)*100 < 30) return base64_decode($str); } return base64_encode($str); }
For all my tests this worked perfectly and this is why I wanted to share it. But even with this you can never be 100% sure. For this example I set the detection rate to 30% for a character set, that only includes the regular alphabet (up to ASCII character #126) and no international characters like German umlauts.
Feel free to use it and let me know, if there are any problems with it!
Alan on Sep 16, 2014, 10:38 pm:
For my specific use case, I needed to remove the "!" from the first if clause, otherwise it works great. Thank you for posting this - it has been very helpful!Cheat on Nov 19, 2014, 5:26 am:
this is shit man!J on Jun 6, 2016, 8:39 pm:
That's fucking awesome man, thanks a lot. I had to raise the threshold to 35 to get it to work with some base64 encoded images.zaza on Jun 11, 2018, 10:07 pm:
works great thanks, but needs to change !preg_match with preg_match