WP_Query: Replace multiple joins
Published: ā Leave a comment Last update:
After implementing a filter system into a post search with post meta fields, I stumbled upon a major performance bottleneck. The more filters were set, the slower the query was āĀ exponentially.
The problem
Originally, Iāve added the active filters via the pre_get_posts
filter. For each active filter, Iāve added an additional meta query to the WP_Query
object. So far, so common. And while it worked in a reasonable amount of time for one filter, adding a second one already took multiple seconds to run. Adding a third one results in multiple minutes and in a local test adding a fourth filter even didnāt end in a result after multiple hours. There was something in the query that took exponentially longer for each meta field. If you donāt know, WordPress performs a separate INNER JOIN
for each meta field, which looks like this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id )
INNER JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id )
INNER JOIN wp_postmeta AS mt3 ON ( wp_posts.ID = mt3.post_id )
INNER JOIN wp_postmeta AS mt4 ON ( wp_posts.ID = mt4.post_id )
WHERE 1=1
AND (
(
(
( wp_postmeta.meta_key = 'filter_1' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
OR ( wp_postmeta.meta_key LIKE 'filter_1_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
)
)
OR (
(
( mt1.meta_key = 'filter_2' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
OR ( mt1.meta_key LIKE 'filter_2_%_name' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
)
)
OR (
(
( mt2.meta_key = 'filter_3' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
OR ( mt2.meta_key LIKE 'filter_3_%_name' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
)
)
OR (
(
( mt3.meta_key = 'filter_4' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
OR ( mt3.meta_key LIKE 'filter_4_%_name' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
)
)
OR (
(
( mt4.meta_key = 'mast_hubhoehe_range' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
OR ( mt4.meta_key LIKE 'mast_hubhoehe_range_%_name' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
)
)
)
AND ((wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'product_homepage' OR wp_posts.post_status = 'private')))
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC
LIMIT 0, 10;
Code language: SQL (Structured Query Language) (sql)
The first approach was checking an EXPLAIN SELECT
, which didnāt really seem to show any problem with the query:
+----+-------------+-------------+------------+-------+------------------------------------------------------------+------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+------------------------------------------------------------+------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | wp_posts | NULL | range | PRIMARY,post_name,type_status_date,post_parent,post_author | type_status_date | 164 | NULL | 6 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | wp_postmeta | NULL | ref | post_id,meta_key | post_id | 8 | wp_wd50_5.wp_posts.ID | 16 | 100.00 | NULL |
| 1 | SIMPLE | mt1 | NULL | ref | post_id,meta_key | post_id | 8 | wp_wd50_5.wp_posts.ID | 16 | 100.00 | NULL |
| 1 | SIMPLE | mt4 | NULL | ref | post_id,meta_key | post_id | 8 | wp_wd50_5.wp_posts.ID | 16 | 100.00 | NULL |
| 1 | SIMPLE | mt2 | NULL | ref | post_id,meta_key | post_id | 8 | wp_wd50_5.wp_posts.ID | 16 | 100.00 | NULL |
| 1 | SIMPLE | mt3 | NULL | ref | post_id,meta_key | post_id | 8 | wp_wd50_5.wp_posts.ID | 16 | 100.00 | Using where |
+----+-------------+-------------+------------+-------+------------------------------------------------------------+------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+
6 rows in set, 2 warnings (0.01 sec)
Code language: PHP (php)
Rewriting to use a single JOIN
My first idea was to use a single INNER JOIN
for all meta fields. This seemed to work for this case where all post meta fields are handled via OR
. For AND
, there must be a single JOIN
for every post meta field. Luckily (in this case), the query execution time was only so long when the post meta fields were selected via OR
.
So, I used the posts_clauses
filter to adjust the query accordingly. This filter allows you to manipulate the actual SQL query before its execution. The filter content can look like this:
array(7) {
["where"]=>
string(1993) " AND (
(
(
( wp_postmeta.meta_key = 'filter_1' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
OR ( wp_postmeta.meta_key LIKE 'filter_1_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
)
)
OR (
(
( mt1.meta_key = 'filter_2' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
OR ( mt1.meta_key LIKE 'filter_2_%_name' AND CAST(mt1.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
)
)
OR (
(
( mt2.meta_key = 'filter_3' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
OR ( mt2.meta_key LIKE 'filter_3_%_name' AND CAST(mt2.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
)
)
OR (
(
( mt3.meta_key = 'filter_4' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
OR ( mt3.meta_key LIKE 'filter_4_%_name' AND CAST(mt3.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
)
)
OR (
(
( mt4.meta_key = 'mast_hubhoehe_range' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
OR ( mt4.meta_key LIKE 'mast_hubhoehe_range_%_name' AND CAST(mt4.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
)
)
) AND ((wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'internal_area' OR wp_posts.post_status = 'product_homepage'
OR wp_posts.post_status = 'private')))"
["groupby"]=>
string(13) "wp_posts.ID"
["join"]=>
string(206) " INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id ) INNER JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id ) INNER JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id )"
["orderby"]=>
string(25) "wp_posts.post_date DESC"
["distinct"]=>
string(0) ""
["fields"]=>
string(12) "wp_posts.*"
["limits"]=>
string(11) "LIMIT 0, 10"
}
Code language: SQL (Structured Query Language) (sql)
What I needed to do here, was removing all subsequent joins. First, replace all mtX
identifiers for the table names with the actual post meta table in the where
key:
$clauses['where'] = (string) \preg_replace( '/mt\d+\.meta_/', $query->meta_query->meta_table . '.meta_', $clauses['where'] );
Code language: PHP (php)
Here, mt1
becomes wp_postmeta
, as well as mt2
, mt3
and mt4
.
Then, check where the first mt1
appears in the join
array key. As we donāt need these additional joins, use only the substring ahead of them.
$second_inner_join = ' INNER JOIN ' . $query->meta_query->meta_table . ' AS mt1';
$second_inner_join_position = \strpos( $clauses['join'], $second_inner_join );
if ( $second_inner_join_position !== false ) {
$clauses['join'] = \substr( $clauses['join'], 0, $second_inner_join_position );
}
Code language: PHP (php)
The returned value now looks like this:
array(7) {
["where"]=>
string(1993) " AND (
(
(
( wp_postmeta.meta_key = 'filter_1' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
OR ( wp_postmeta.meta_key LIKE 'filter_1_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '5000' AND '10000' )
)
)
OR (
(
( wp_postmeta.meta_key = 'filter_2' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
OR ( wp_postmeta.meta_key LIKE 'filter_2_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '1980' AND '2016' )
)
)
OR (
(
( wp_postmeta.meta_key = 'filter_3' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
OR ( wp_postmeta.meta_key LIKE 'filter_3_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '49' AND '50' )
)
)
OR (
(
( wp_postmeta.meta_key = 'filter_4' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
OR ( wp_postmeta.meta_key LIKE 'filter_4_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
)
)
OR (
(
( wp_postmeta.meta_key = 'mast_hubhoehe_range' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
OR ( wp_postmeta.meta_key LIKE 'mast_hubhoehe_range_%_name' AND CAST(wp_postmeta.meta_value AS DECIMAL) BETWEEN '500' AND '1000' )
)
)
) AND ((wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'internal_area' OR wp_posts.post_status = 'product_homepage'
OR wp_posts.post_status = 'private')))"
["groupby"]=>
string(13) "wp_posts.ID"
["join"]=>
string(206) " INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )"
["orderby"]=>
string(25) "wp_posts.post_date DESC"
["distinct"]=>
string(0) ""
["fields"]=>
string(12) "wp_posts.*"
["limits"]=>
string(11) "LIMIT 0, 10"
}
Code language: SQL (Structured Query Language) (sql)
As you might already noticed, I use this filter for a custom post type product
. Thus, I first check in my hook callback whether Iām in the main query and whether this query does belong to my post type:
if (
(
! $query->is_main_query()
|| $query->query_vars['post_type'] !== 'product'
|| ! $query->is_archive()
)
&& ! \has_action( 'my_plugin_enforce_filter' )
) {
return $clauses;
}
Code language: PHP (php)
Additionally, I also check whether a specific action has been called with add_action
to be able to short-circuit this function.
AND
relation
As mentioned above, if the AND
relation of meta fields ist used, the multiple joins are necessary. Thus, if enabled (via a global option in my case), I add them again:
if ( \get_option( 'my_plugin_filter_relation', 'OR' ) === 'AND' ) {
/** @var \wpdb $wpdb */
global $wpdb;
$join_count = 0;
foreach ( $query->meta_query as $key_query ) {
if ( ! \is_array( $key_query ) ) {
continue;
}
foreach ( $key_query as $meta_query ) {
if ( ! \is_array( $meta_query ) ) {
continue;
}
++$join_count;
$clauses['join'] .= \sprintf(
' INNER JOIN %1$s AS mt%2$d ON (%3$s.ID = mt%2$d.post_id AND (mt%2$d.meta_key = \'%4$s\' OR mt%2$d.meta_key LIKE \'%5$s\') AND %6$s %7$s %8$s) ',
$wpdb->postmeta,
$join_count,
$wpdb->posts,
$meta_query[0]['key'],
$meta_query[0]['key'] . '_%_name',
'mt' . $join_count . '.meta_value',
$meta_query[0]['compare'],
"'" . \implode( '\' AND \'', $meta_query[0]['value'] ) . "'"
) . \PHP_EOL;
}
}
}
Code language: PHP (php)
Why removing it first and then explicitly adding it afterwards? This is for an easier maintenance, since I can make sure that the actual base is identical, means the query is built the same way for both types.
Conclusion
To this day, I donāt know what actually happens to make such queries exponentially slower. I consider the solution above as workaround, but as working. If you have a clue, why this happens, feel free to comment. š