When running the following query:
SELECT productid FROM product WHERE productid=ROUND(RAND()*(SELECT MAX(productid) FROM product));
The result should be 0 or 1 results (0 due to data gaps, 1 if a record is found), however it results in multiple results a good number of times (very easy to reproduce, 90% of queries have more than 1 result).
Sample output:
+-----------+| productid |+-----------+| 11701 || 20602 || 22029 || 24994 |+-----------+
(Number of records in DB is about 30k).
Running a single SELECT RAND()
always results in a single result.
Explain:
explain SELECT productid FROM product WHERE productid=ROUND(RAND()*(SELECT MAX(productid) FROM product));+----+-------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+| 1 | PRIMARY | product | NULL | index | NULL | idx_prod_url | 2003 | NULL | 31197 | 10.00 | Using where; Using index || 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |+----+-------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
Who can explain this behavior?
Follow up:Following Martin's remark a rewrite of the query in:
SELECT productid FROM product WHERE productid=(SELECT ROUND(RAND()*(SELECT MAX(productid) FROM product)));
Explain:
explain SELECT productid FROM product WHERE productid=(SELECT ROUND(RAND()*(SELECT MAX(productid) FROM product)));+----+----------------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+----------------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+| 1 | PRIMARY | product | NULL | index | NULL | idx_prod_url | 2003 | NULL | 31197 | 100.00 | Using where; Using index || 2 | UNCACHEABLE SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used || 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |+----+----------------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
However despite the changed plan, the behavior stays the same.
Follow up 2:
Using an INNER JOIN
, the behavior disappears:
SELECT a.productid FROM product a INNER JOIN (SELECT ROUND(RAND()*(SELECT MAX(productid))) as productid FROM product) b ON a.productid=b.productid;
Explain:
explain SELECT a.productid FROM product a INNER JOIN (SELECT ROUND(RAND()*(SELECT MAX(productid))) as productid FROM product) b ON a.productid=b.productid;+----+--------------------+------------+------------+--------+---------------+--------------+---------+-------+-------+----------+----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+------------+------------+--------+---------------+--------------+---------+-------+-------+----------+----------------+| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL || 1 | PRIMARY | a | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index || 2 | DERIVED | product | NULL | index | NULL | idx_prod_url | 2003 | NULL | 31197 | 100.00 | Using index || 3 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |+----+--------------------+------------+------------+--------+---------------+--------------+---------+-------+-------+----------+----------------+