Convert PHP serialized data to Unicode

Lea Verou

I recently had to convert a database of a large Greek website from single-byte Greek to Unicode (UTF-8). One of the problems I faced was the stored PHP serialized data: As PHP stores the length of the data (in bytes) inside the serialized string, the stored serialized strings could not be unserialized after the conversion.

I didn’t want anyone to go through the frustration I went through while searching for a solution, so here is a little function I wrote to recount the string lengths, since I couldn’t find anything on this:

function recount_serialized_bytes($text) {
	mb_internal_encoding("UTF-8");
	mb_regex_encoding("UTF-8");

	mb_ereg_search_init($text, 's:[0-9]+:"');

	$offset = 0;

	while(preg_match('/s:([0-9]+):"/u', $text, $matches, PREG_OFFSET_CAPTURE, $offset) ||
		  preg_match('/s:([0-9]+):"/u', $text, $matches, PREG_OFFSET_CAPTURE, ++$offset)) {
		$number = $matches[1][0];
		$pos = $matches[1][1];

		$digits = strlen("$number");
		$pos_chars = mb_strlen(substr($text, 0, $pos)) + 2 + $digits;

		$str = mb_substr($text, $pos_chars, $number);

		$new_number = strlen($str);
		$new_digits = strlen($new_number);

		if($number != $new_number) {
			// Change stored number
			$text = substr_replace($text, $new_number, $pos, $digits);
			$pos += $new_digits - $digits;
		}

		$offset = $pos + 2 + $new_number;
	}

	return $text;
}

My initial approach was to do it with regular expressions, but the PHP serialized data format is not a regular language and cannot be properly parsed with regular expressions. All approaches fail on edge cases, and I had lots of edge cases in my data (I even had nested serialized strings!).

Note that this will only work when converting from single-byte encoded data, since it assumes the stored lengths are the string lengths in characters. Admittedly, it’s not my best code, it could be optimized in many ways. It was something I had to write quickly and was only going to be used by me in a one-time conversion process. However, it works smoothly and has been tested with lots of different serialized data. I know that not many people will find it useful, but it’s going to be a lifesaver for the few ones that need it.

blogs
tech

Convert PHP serialized data to Unicode

Lea Verou

I recently had to convert a database of a large Greek website from single-byte Greek to Unicode (UTF-8). One of the problems I faced was the stored PHP serialized data: As PHP stores the length of the data (in bytes) inside the serialized string, the stored serialized strings could not be unserialized after the conversion.

I didn’t want anyone to go through the frustration I went through while searching for a solution, so here is a little function I wrote to recount the string lengths, since I couldn’t find anything on this:

function recount_serialized_bytes($text) {
	mb_internal_encoding("UTF-8");
	mb_regex_encoding("UTF-8");

	mb_ereg_search_init($text, 's:[0-9]+:"');

	$offset = 0;

	while(preg_match('/s:([0-9]+):"/u', $text, $matches, PREG_OFFSET_CAPTURE, $offset) ||
		  preg_match('/s:([0-9]+):"/u', $text, $matches, PREG_OFFSET_CAPTURE, ++$offset)) {
		$number = $matches[1][0];
		$pos = $matches[1][1];

		$digits = strlen("$number");
		$pos_chars = mb_strlen(substr($text, 0, $pos)) + 2 + $digits;

		$str = mb_substr($text, $pos_chars, $number);

		$new_number = strlen($str);
		$new_digits = strlen($new_number);

		if($number != $new_number) {
			// Change stored number
			$text = substr_replace($text, $new_number, $pos, $digits);
			$pos += $new_digits - $digits;
		}

		$offset = $pos + 2 + $new_number;
	}

	return $text;
}

My initial approach was to do it with regular expressions, but the PHP serialized data format is not a regular language and cannot be properly parsed with regular expressions. All approaches fail on edge cases, and I had lots of edge cases in my data (I even had nested serialized strings!).

Note that this will only work when converting from single-byte encoded data, since it assumes the stored lengths are the string lengths in characters. Admittedly, it’s not my best code, it could be optimized in many ways. It was something I had to write quickly and was only going to be used by me in a one-time conversion process. However, it works smoothly and has been tested with lots of different serialized data. I know that not many people will find it useful, but it’s going to be a lifesaver for the few ones that need it.

blogs
tech

Styling elements based on sibling count

Lea Verou

The original idea belongs to André Luís, but I think it could be improved to be much less verbose.

André’s solution is like this:

/* one item */
li:nth-child(1):nth-last-child(1) {
	width: 100%;
}

/* two items */
li:nth-child(1):nth-last-child(2),
li:nth-child(2):nth-last-child(1) {
	width: 50%;
}

/* three items */
li:nth-child(1):nth-last-child(3),
li:nth-child(2):nth-last-child(2),
li:nth-child(3):nth-last-child(1) {
	width: 33.3333%;
}

/* four items */
li:nth-child(1):nth-last-child(4),
li:nth-child(2):nth-last-child(3),
li:nth-child(3):nth-last-child(2),
li:nth-child(4):nth-last-child(1) {
	width: 25%;
}

It’s based on the relationship between :nth-child and :nth-last-child. As you can see, the number of total rules is O(N) and the number of selectors in every rule is also O(N).

However, what you really want, is to just target the first element. The others can be targeted with just a sibling selector. With my improvement, the number of total rules is still O(N), but the number of selectors in every rule becomes just 2, making this trick practical for far larger numbers of children:

/* one item */ li:first-child:nth-last-child(1) { width: 100%; }

/* two items */ li:first-child:nth-last-child(2), li:first-child:nth-last-child(2) ~ li { width: 50%; }

/* three items */ li:first-child:nth-last-child(3), li:first-child:nth-last-child(3) ~ li { width: 33.3333%; }

/* four items */ li:first-child:nth-last-child(4), li:first-child:nth-last-child(4) ~ li { width: 25%; }

And here’s a fiddle to prove it:

Yes, I know that with Flexbox and the other layout modules, techniques such as these are soon becoming obsolete, but I think they are still useful right now. I’m also aware that you can emulate this particular example with table display modes, but a) Table display modes have other implications that are sometimes undesirable and b) Widths are just an example, you could come up with other ways to style the elements based on their total count, which can’t be emulated by CSS tables.

tech
blogs

Styling elements based on sibling count

Lea Verou

The original idea belongs to André Luís, but I think it could be improved to be much less verbose.

André’s solution is like this:

/* one item */
li:nth-child(1):nth-last-child(1) {
	width: 100%;
}

/* two items */
li:nth-child(1):nth-last-child(2),
li:nth-child(2):nth-last-child(1) {
	width: 50%;
}

/* three items */
li:nth-child(1):nth-last-child(3),
li:nth-child(2):nth-last-child(2),
li:nth-child(3):nth-last-child(1) {
	width: 33.3333%;
}

/* four items */
li:nth-child(1):nth-last-child(4),
li:nth-child(2):nth-last-child(3),
li:nth-child(3):nth-last-child(2),
li:nth-child(4):nth-last-child(1) {
	width: 25%;
}

It’s based on the relationship between :nth-child and :nth-last-child. As you can see, the number of total rules is O(N) and the number of selectors in every rule is also O(N).

However, what you really want, is to just target the first element. The others can be targeted with just a sibling selector. With my improvement, the number of total rules is still O(N), but the number of selectors in every rule becomes just 2, making this trick practical for far larger numbers of children:

/* one item */ li:first-child:nth-last-child(1) { width: 100%; }

/* two items */ li:first-child:nth-last-child(2), li:first-child:nth-last-child(2) ~ li { width: 50%; }

/* three items */ li:first-child:nth-last-child(3), li:first-child:nth-last-child(3) ~ li { width: 33.3333%; }

/* four items */ li:first-child:nth-last-child(4), li:first-child:nth-last-child(4) ~ li { width: 25%; }

And here’s a fiddle to prove it:

Yes, I know that with Flexbox and the other layout modules, techniques such as these are soon becoming obsolete, but I think they are still useful right now. I’m also aware that you can emulate this particular example with table display modes, but a) Table display modes have other implications that are sometimes undesirable and b) Widths are just an example, you could come up with other ways to style the elements based on their total count, which can’t be emulated by CSS tables.

blogs
tech

Yet another redesign

Lea Verou

I had grown sick of my previous blog style and its various bugs (since it was put together in just a few hours), so I decided to make a new, more minimalistic one. Best viewed in browsers that support CSS gradients, like Firefox, Safari and Chrome. I also finally got around to making a logo for myself, although I’m not sure I’ll keep it. I also switched to HTML5, using Toolbox as a base.

I want to make a few more changes, but I have to go to sleep sometime :p

I also started using DISQUS for the blog comments. I like it when a blog I read has it (since it offers a few features I find convenient, like comment editing for instance), so I wanted to offer it to my readers too. It’s a shame that in some of their buttons they haven’t added the standard CSS3 border-radius declarations, but only the prefixed proprietary ones, so they’re square in Opera (and probably IE9). I’m fed up with seeing this in websites, TOPSY’s widget also does it. However, their carelessness will backfire soon, when browsers stop supporting the prefixed versions *evil grin*

blogs
tech

Reading List

Convert PHP serialized data to Unicode

Convert PHP serialized data to Unicode

Styling elements based on sibling count

Styling elements based on sibling count

Yet another redesign