mysqlcountdistinct太慢_mysql–即使使⽤覆盖索引也可以优化
COUNT。。。
我们在
MySql中有⼀个表有⼤约3000万条记录,以下是表结构
CREATE TABLE `campaign_logs` (
`domain` varchar(50) DEFAULT NULL,
磁盘限额`campaign_id` varchar(50) DEFAULT NULL,
`subscriber_id` varchar(50) DEFAULT NULL,大学生网上就业市场
`message` varchar(21000) DEFAULT NULL,
`log_time` datetime DEFAULT NULL,
`log_type` varchar(50) DEFAULT NULL,
`level` varchar(50) DEFAULT NULL,
孕婴店进货渠道`campaign_name` varchar(500) DEFAULT NULL,
KEY `subscriber_id_index` (`subscriber_id`),
KEY `log_type_index` (`log_type`),
KEY `log_time_index` (`log_time`),
KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
以下是我的查询
我正在做UNION ALL⽽不是使⽤IN操作
SELECT log_type,
DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date,
count(DISTINCT subscriber_id) AS COUNT,
COUNT(subscriber_id) AS total
FROM stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE DOMAIN='xxx'
AND campaign_id='123'
AND log_type = 'EMAIL_OPENED'
腰椎间盘突出是什么原因造成的AND log_time BETWEEN CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-03-01
23:59:58','+00:00','+05:30')
GROUP BY log_date
UNION ALL
SELECT log_type,
DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date,
COUNT(DISTINCT subscriber_id) AS COUNT,图像记忆
COUNT(subscriber_id) AS total
FROM stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE DOMAIN='xxx'
AND campaign_id='123'
AND log_type = 'EMAIL_SENT'
AND log_time BETWEEN CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-03-01
23:59:58','+00:00','+05:30')
GROUP BY log_date
UNION ALL
SELECT log_type,
DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date,记忆中的歌声
COUNT(DISTINCT subscriber_id) AS COUNT,
COUNT(subscriber_id) AS total
FROM stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE DOMAIN='xxx'
AND campaign_id='123'
AND log_type = 'EMAIL_CLICKED'
AND log_time BETWEEN CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-03-01
23:59:58','+00:00','+05:30')
GROUP BY log_date,
以下是我的解释声明
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| id | lect_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| 1 | PRIMARY | campaign_logs | range | campid_domain_logtype_logtime_subid_index |
campid_domain_logtype_logtime_subid_index | 468 | NULL | 55074 | Using where; Using index; Using filesort |
| 2 | UNION | campaign_logs | range | campid_domain_logtype_logtime_subid_index |
campid_domain_logtype_logtime_subid_index | 468 | NULL | 330578 | Using where; Using index; Using filesort |
| 3 | UNION | campaign_logs | range | campid_domain_logtype_logtime_subid_index |
campid_domain_logtype_logtime_subid_index | 468 | NULL | 1589 | Using where; Using index; Using filesort |
笔记本可以换显卡吗| NULL | UNION RESULT | | ALL | NULL | NULL | NULL | NULL | NULL | |
车辆维修合同范本+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
>我将COUNT(subscriber_id)更改为COUNT(*)并且未观察到性能提升.
2.我从查询中删除了COUNT(DISTINCT subscriber_id),然后我变得很庞⼤
性能提升,我得到的结果⼤约1.5秒,之前它
花了50秒–1分钟.但我需要查询中不同的subscriber_id计数
当我从查询中删除COUNT(DISTINCT subscriber_id)时,将解释以下内容
+----+--------------+---------------+-------+-------------------------------------------+-----------------------------------------
--+---------+------+--------+-----------------------------------------------------------+
| id | lect_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+-----------------------------------------------------------+
| 1 | PRIMARY | campaign_logs | range | campid_domain_logtype_logtime_subid_index |
campid_domain_logtype_logtime_subid_index | 468 | NULL | 55074 | Using where; Using index; Using temporary; Using filesort |
| 2 | UNION | campaign_logs | range | campid_domain_logtype_logtime_subid_index |
campid_domain_logtype_logtime_subid_index | 468 | NULL | 330578 | Using where; Using index; Using temporary; Using filesort |
| 3 | UNION | campaign_logs | range | campid_domain_logtype_logtime_subid_index |
campid_domain_logtype_logtime_subid_index | 468 | NULL | 1589 | Using where; Using index; Using temporary; Using filesort |
| NULL | UNION RESULT | | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+-----------------------------------------------------------+
>我通过删除UNION ALL单独运⾏了三个查询. ONe查询需要32秒,其他查询需要1.5秒,但是第⼀个查询处理⼤约350K记录⽽其他查询只处理2k⾏
我可以通过省略COUNT(DISTINCT …)来解决我的性能问题,但我需要这些值.有没有办法重构我的查询,或添加索引,或什么,以获得
COUNT(DISTINCT …)值,但更快?
UPDATE
以下信息是关于上表的数据分布
对于
1个域名
1个活动
20 log_types
1k-200k⽤户
我正在运⾏的上述查询,该域拥有180,000个订阅者.