After implementing a filter system into a post search with post meta fields, I stumbled upon a major performance bottleneck. The more filters were set, the slower the query was – exponentially.

The problem

Originally, I’ve added the active filters via the pre_get_posts filter. For each active filter, I’ve added an additional meta query to the WP_Query object. So far, so common. And while it worked in a reasonable amount of time for one filter, adding a second one already took multiple seconds to run. Adding a third one results in multiple minutes and in a local test adding a fourth filter even didn’t end in a result after multiple hours. There was something in the query that took exponentially longer for each meta field. If you don’t know, WordPress performs a separate INNER JOIN for each meta field, which looks like this:

SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID
FROM wp_posts
INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id )
INNER JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id )
INNER JOIN wp_postmeta AS mt3 ON ( wp_posts.ID = mt3.post_id )
INNER JOIN wp_postmeta AS mt4 ON ( wp_posts.ID = mt4.post_id )
WHERE 1=1
AND (
	(
		(
			( wp_postmeta.meta_key = 'filter_1' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
			OR ( wp_postmeta.meta_key LIKE 'filter_1_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
		)
	)
	OR (
		(
			( mt1.meta_key = 'filter_2' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
			OR ( mt1.meta_key LIKE 'filter_2_%_name' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
		)
	)
	OR (
		(
			( mt2.meta_key = 'filter_3' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
			OR ( mt2.meta_key LIKE 'filter_3_%_name' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
		)
	)
	OR (
		(
			( mt3.meta_key = 'filter_4' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
			OR ( mt3.meta_key LIKE 'filter_4_%_name' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
		)
	)
	OR (
		(
			( mt4.meta_key = 'mast_hubhoehe_range' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
			OR ( mt4.meta_key LIKE 'mast_hubhoehe_range_%_name' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
		)
	)
)
AND ((wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'product_homepage' OR wp_posts.post_status = 'private')))
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC
LIMIT 0, 10;Code language: SQL (Structured Query Language) (sql)

The first approach was checking an EXPLAIN SELECT, which didn’t really seem to show any problem with the query:

+----+-------------+-------------+------------+-------+------------------------------------------------------------+------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table       | partitions | type  | possible_keys                                              | key              | key_len | ref                   | rows | filtered | Extra                                                     |
+----+-------------+-------------+------------+-------+------------------------------------------------------------+------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | wp_posts    | NULL       | range | PRIMARY,post_name,type_status_date,post_parent,post_author | type_status_date | 164     | NULL                  |    6 |   100.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | wp_postmeta | NULL       | ref   | post_id,meta_key                                           | post_id          | 8       | wp_wd50_5.wp_posts.ID |   16 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | mt1         | NULL       | ref   | post_id,meta_key                                           | post_id          | 8       | wp_wd50_5.wp_posts.ID |   16 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | mt4         | NULL       | ref   | post_id,meta_key                                           | post_id          | 8       | wp_wd50_5.wp_posts.ID |   16 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | mt2         | NULL       | ref   | post_id,meta_key                                           | post_id          | 8       | wp_wd50_5.wp_posts.ID |   16 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | mt3         | NULL       | ref   | post_id,meta_key                                           | post_id          | 8       | wp_wd50_5.wp_posts.ID |   16 |   100.00 | Using where                                               |
+----+-------------+-------------+------------+-------+------------------------------------------------------------+------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+
6 rows in set, 2 warnings (0.01 sec)
Code language: PHP (php)

Rewriting to use a single JOIN

My first idea was to use a single INNER JOIN for all meta fields. This seemed to work for this case where all post meta fields are handled via OR. For AND, there must be a single JOIN for every post meta field. Luckily (in this case), the query execution time was only so long when the post meta fields were selected via OR.

So, I used the posts_clauses filter to adjust the query accordingly. This filter allows you to manipulate the actual SQL query before its execution. The filter content can look like this:

array(7) {
  ["where"]=>
  string(1993) " AND (
  (
    (
      ( wp_postmeta.meta_key = 'filter_1' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
      OR ( wp_postmeta.meta_key LIKE 'filter_1_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
    )
  )
  OR (
    (
      ( mt1.meta_key = 'filter_2' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
      OR ( mt1.meta_key LIKE 'filter_2_%_name' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
    )
  )
  OR (
    (
      ( mt2.meta_key = 'filter_3' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
      OR ( mt2.meta_key LIKE 'filter_3_%_name' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
    )
  )
  OR (
    (
      ( mt3.meta_key = 'filter_4' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
      OR ( mt3.meta_key LIKE 'filter_4_%_name' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
    )
  )
  OR (
    (
      ( mt4.meta_key = 'mast_hubhoehe_range' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
      OR ( mt4.meta_key LIKE 'mast_hubhoehe_range_%_name' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
    )
  )
) AND ((wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'internal_area' OR wp_posts.post_status = 'product_homepage' 
OR wp_posts.post_status = 'private')))"
  ["groupby"]=>
  string(13) "wp_posts.ID"
  ["join"]=>
  string(206) " INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )  INNER JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id )  INNER JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id )"
  ["orderby"]=>
  string(25) "wp_posts.post_date DESC"
  ["distinct"]=>
  string(0) ""
  ["fields"]=>
  string(12) "wp_posts.*"
  ["limits"]=>
  string(11) "LIMIT 0, 10"
}
Code language: SQL (Structured Query Language) (sql)

What I needed to do here, was removing all subsequent joins. First, replace all mtX identifiers for the table names with the actual post meta table in the where key:

$clauses['where'] = (string) \preg_replace( '/mt\d+\.meta_/', $query->meta_query->meta_table . '.meta_', $clauses['where'] );Code language: PHP (php)

Here, mt1 becomes wp_postmeta, as well as mt2, mt3 and mt4.

Then, check where the first mt1 appears in the join array key. As we don’t need these additional joins, use only the substring ahead of them.

$second_inner_join = ' INNER JOIN ' . $query->meta_query->meta_table . ' AS mt1';
$second_inner_join_position = \strpos( $clauses['join'], $second_inner_join );

if ( $second_inner_join_position !== false ) {
	$clauses['join'] = \substr( $clauses['join'], 0, $second_inner_join_position );
}
Code language: PHP (php)

The returned value now looks like this:

array(7) {
  ["where"]=>
  string(1993) " AND (
  (
    (
      ( wp_postmeta.meta_key = 'filter_1' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
      OR ( wp_postmeta.meta_key LIKE 'filter_1_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
    )
  )
  OR (
    (
      ( wp_postmeta.meta_key = 'filter_2' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
      OR ( wp_postmeta.meta_key LIKE 'filter_2_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
    )
  )
  OR (
    (
      ( wp_postmeta.meta_key = 'filter_3' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
      OR ( wp_postmeta.meta_key LIKE 'filter_3_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
    )
  )
  OR (
    (
      ( wp_postmeta.meta_key = 'filter_4' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
      OR ( wp_postmeta.meta_key LIKE 'filter_4_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
    )
  )
  OR (
    (
      ( wp_postmeta.meta_key = 'mast_hubhoehe_range' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
      OR ( wp_postmeta.meta_key LIKE 'mast_hubhoehe_range_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
    )
  )
) AND ((wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'internal_area' OR wp_posts.post_status = 'product_homepage' 
OR wp_posts.post_status = 'private')))"
  ["groupby"]=>
  string(13) "wp_posts.ID"
  ["join"]=>
  string(206) " INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )"
  ["orderby"]=>
  string(25) "wp_posts.post_date DESC"
  ["distinct"]=>
  string(0) ""
  ["fields"]=>
  string(12) "wp_posts.*"
  ["limits"]=>
  string(11) "LIMIT 0, 10"
}
Code language: SQL (Structured Query Language) (sql)

As you might already noticed, I use this filter for a custom post type product. Thus, I first check in my hook callback whether I’m in the main query and whether this query does belong to my post type:

if (
	(
		! $query->is_main_query()
		|| $query->query_vars['post_type'] !== 'product'
		|| ! $query->is_archive()
	)
	&& ! \has_action( 'my_plugin_enforce_filter' )
) {
	return $clauses;
}
Code language: PHP (php)

Additionally, I also check whether a specific action has been called with add_action to be able to short-circuit this function.

AND relation

As mentioned above, if the AND relation of meta fields ist used, the multiple joins are necessary. Thus, if enabled (via a global option in my case), I add them again:

if ( \get_option( 'my_plugin_filter_relation', 'OR' ) === 'AND' ) {
	/** @var \wpdb $wpdb */
	global $wpdb;
	$join_count = 0;
	
	foreach ( $query->meta_query as $key_query ) {
		if ( ! \is_array( $key_query ) ) {
			continue;
		}
		
		foreach ( $key_query as $meta_query ) {
			if ( ! \is_array( $meta_query ) ) {
				continue;
			}
			
			++$join_count;
			$clauses['join'] .= \sprintf(
				' INNER JOIN %1$s AS mt%2$d ON (%3$s.ID = mt%2$d.post_id AND (mt%2$d.meta_key = \'%4$s\' OR mt%2$d.meta_key LIKE \'%5$s\') AND %6$s %7$s %8$s) ',
				$wpdb->postmeta,
				$join_count,
				$wpdb->posts,
				$meta_query[0]['key'],
				$meta_query[0]['key'] . '_%_name',
				'mt' . $join_count . '.meta_value',
				$meta_query[0]['compare'],
				"'" . \implode( '\' AND \'', $meta_query[0]['value'] ) . "'"
			) . \PHP_EOL;
		}
	}
}
Code language: PHP (php)

Why removing it first and then explicitly adding it afterwards? This is for an easier maintenance, since I can make sure that the actual base is identical, means the query is built the same way for both types.

Conclusion

To this day, I don’t know what actually happens to make such queries exponentially slower. I consider the solution above as workaround, but as working. If you have a clue, why this happens, feel free to comment. šŸ™‚

Leave a Reply

Your email address will not be published. Required fields are marked *