关于SQL代码性能优化的建议

问题描述:


我在Qubole中有一个代码需要花费大约3个小时来执行。我正在寻找一些减少代码执行时间的建议。

I have a code in Qubole that's taking a lot almost 3 hours to execute. i am looking for some recommendation to decrease the code execution time.


WITH

WITH


- 获取最新日期 - 日前10天

-- Get latest date - 10 days before as day


d AS(SELECT CAST(CONCAT(SUBSTR)(CAST(DATE_ADD('day',-10,CAST) (CURRENT_TIMESTAMP AS DATE))AS varchar),1,4),SUBSTR(CAST(DATE_ADD('day', - 10,CAST(CURRENT_TIMESTAMP AS DATE))AS varchar),6,2),SUBSTR(CAST(DATE_ADD( 'day',-10,CAST(CURRENT_TIMESTAMP AS DATE))
AS varchar),9,2) )AS bigint)AS day,CAST(CONCAT(SUBSTR(CAST(DATE_ADD('day',-10,CAST(CURRENT_TIMESTAMP AS DATE))AS varchar),1,4),' - ',SUBSTR(CAST(DATE_ADD( 'day',-10,CAST(CURRENT_TIMESTAMP AS DATE))AS varchar),6,2),' - ',SUBSTR(CAST(DATE_ADD('day',
-10,CAST(CURRENT_TIMESTAMP AS DATE)) )AS varchar),9,2))AS DATE)AS date,'FR'AS country)

d AS (SELECT CAST(CONCAT(SUBSTR(CAST(DATE_ADD('day', -10, CAST(CURRENT_TIMESTAMP AS DATE)) AS varchar),1, 4), SUBSTR(CAST(DATE_ADD('day', -10, CAST(CURRENT_TIMESTAMP AS DATE)) AS varchar),6, 2), SUBSTR(CAST(DATE_ADD('day', -10, CAST(CURRENT_TIMESTAMP AS DATE)) AS varchar),9, 2)) AS bigint) AS day, CAST(CONCAT(SUBSTR(CAST(DATE_ADD('day', -10, CAST(CURRENT_TIMESTAMP AS DATE)) AS varchar),1, 4), '-',SUBSTR(CAST(DATE_ADD('day', -10, CAST(CURRENT_TIMESTAMP AS DATE)) AS varchar),6, 2), '-', SUBSTR(CAST(DATE_ADD('day', -10, CAST(CURRENT_TIMESTAMP AS DATE)) AS varchar),9, 2)) AS DATE) AS date, 'FR' AS country )


选择'流'作为交易,'Spotify'作为帐户,p_day,访问,COUNT(DISTINCT customer_id)作为用户,COUNT(*)作为单位FROM temp_1 WHERE day> =(SELECT day d)AND country_code =(SELECT country FROM d)GROUP BY 1,2,3,4

SELECT 'Streaming' as transaction, 'Spotify' as account, p_day, access, COUNT(DISTINCT customer_id) as users, COUNT(*) as units FROM temp_1 WHERE day >= (SELECT day FROM d) AND country_code = (SELECT country FROM d) GROUP BY 1,2,3,4


UNION ALL

UNION ALL


SELECT'流媒体'作为交易,'Deezer'作为帐户,p_day,CASE WHEN offer_code IN('APP','BAO','BDP','BDS','BMO', 'BMS', '宝马', 'BPF', 'BPP', '流程再造', 'BSO', 'BWE', 'BWP', 'BWS', 'DEE', 'DEP', 'ETT',"EXT ", 'FFX', 'IOS', 'OT1', 'PBH', 'PE1', 'PE2', 'PEM', 'PLS', 'PRM', 'PSC', 'PTP', 'SDP', '竞彩', 'SPF', 'SPP', 'SPR', 'SUP', '瑞典', 'SWP', '3M', 'FAM', 'GOO', '火焰杯', 'HFP',"HFF ','HFI')
那么'溢价'当offer_code IN('BFR','MFS','MOD','SMR')然后'*'ELSE NULL END AS访问,COUNT(DISTINCT masked_consumer_id)AS用户,SUM(units_sold_streams)作为流来自temp_2 WHERE day> =(SELECT day FROM d)AND country_code =(SELECT country
FROM d)GROUP BY 1,2,3,4

SELECT 'Streaming' as transaction, 'Deezer' as account, p_day, CASE WHEN offer_code IN ('APP','BAO','BDP','BDS','BMO','BMS','BMW','BPF','BPP','BPR','BSO','BWE','BWP','BWS','DEE','DEP','ETT','EXT','FFX','IOS','OT1','PBH','PE1','PE2','PEM','PLS','PRM','PSC','PTP','SDP','SMG','SPF','SPP','SPR','SUP','SWE','SWP','3M','FAM','GOO','GOF','HFP','HFF','HFI') THEN 'premium' WHEN offer_code IN ('BFR','MFS','MOD','SMR') THEN 'free' ELSE NULL END AS access, COUNT(DISTINCT masked_consumer_id) AS users, SUM(units_sold_streams) as streams FROM temp_2 WHERE day >= (SELECT day FROM d) AND country_code = (SELECT country FROM d) GROUP BY 1,2,3,4


UNION ALL

UNION ALL


SELECT'Streaming'作为交易,'Apple Music'作为帐户,ingest_datestamp作为p_day,'premium'作为访问,COUNT(DISTINCT anonymized_person_id)作为用户,COUNT(*)AS stream FROM temp_streams1 WHERE ingest_datestamp> =(SELECT日期FROM d)AND country_code
=(SELECT country FROM d)GROUP BY 1,2,3,4

SELECT 'Streaming' as transaction, 'Apple Music' as account, ingest_datestamp as p_day, 'premium' as access, COUNT(DISTINCT anonymized_person_id) as users, COUNT(*) AS streams FROM temp_streams1 WHERE ingest_datestamp >= (SELECT date FROM d) AND country_code = (SELECT country FROM d) GROUP BY 1,2,3,4

这是一个SQL Server查询吗?DATE_ADD是MySQL函数...

Is this a SQL Server query? DATE_ADD is MySQL function...

另外,不看表结构/索引和底层数据集,它会非常很难提出任何建议。

Also, without looking at the table structure/indexes and underlying dataset, it will be very difficult to suggest anything.