当前位置: 首页 > news >正文

一个人做网站难吗网站建设策略阿里巴巴

一个人做网站难吗,网站建设策略阿里巴巴,黄骅港招聘最新招聘,网站开发时间计划表day10_会员主题域开发 会员主题_DWS和ADS层 DWS层开发 门店会员分类天表: 维度指标: 指标#xff1a;新增注册会员数、累计注册会员数、新增消费会员数、累计消费会员数、新增复购会员数、累计复购会员数、活跃会员数、沉睡会员数、会员消费金额 维度: 时间维度#xff08…day10_会员主题域开发 会员主题_DWS和ADS层 DWS层开发 门店会员分类天表: 维度指标: 指标新增注册会员数、累计注册会员数、新增消费会员数、累计消费会员数、新增复购会员数、累计复购会员数、活跃会员数、沉睡会员数、会员消费金额 维度: 时间维度天、周、月 ​ 涉及表: 门店会员分类天表 ​ 表字段的组成: 维度字段 指标结果字段 建表语句: CREATE TABLE IF NOT EXISTS dws.dws_mem_store_member_classify_day_i(trade_date                   STRING COMMENT 统计时间,week_trade_date             STRING COMMENT 周一日期,month_trade_date           STRING COMMENT 月一日期, ​store_no                   STRING COMMENT 店铺编码,store_name                 STRING COMMENT 店铺名称,store_sale_type             BIGINT COMMENT 店铺销售类型,store_type_code             BIGINT COMMENT 分店类型,city_id                     BIGINT COMMENT 城市ID,city_name                   STRING COMMENT 城市名称,region_code                 STRING COMMENT 区域编码,region_name                 STRING COMMENT 区域名称,is_day_clear                BIGINT COMMENT 是否日清:0否,1是, ​reg_num_add                 BIGINT COMMENT 新增注册会员数,reg_num_sum                 BIGINT COMMENT 累计注册会员数,consume_num_add             BIGINT COMMENT 新增消费会员数,consume_num_sum             BIGINT COMMENT 累计消费会员数,repurchase_num_add          BIGINT COMMENT 新增复购会员数,repurchase_num_sum          BIGINT COMMENT 累计复购会员数,active_member_num           BIGINT COMMENT 活跃会员数,sleep_member_num            BIGINT COMMENT 沉睡会员数,sale_amount_bind            DECIMAL(27, 2) COMMENT 会员消费金额 ) comment 门店会员分类天表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 活跃会员30天内有消费 沉睡会员90天内有消费30天内没有消费 这个需求的难点在于计算累计值。思路是 使用天进行聚合得到每天的值然后使用sum () over 窗口得到累加值对于每天的累积情况这里需要使用拉链表的思想即构造一个生效日期这里使用lead() over 窗口函数取到下一个日期然后再用这个当日时间去卡即可得到当日对应的累计值。 -- DWS层: 门店会员分类天 -- 注意: 以下内容仅仅以2023-11-14为例,实际需要把14-20日的所有数据都要导入对应表中 with t1 as (selecttrade_date as start_date,store_no,reg_num_add, -- 新增注册会员数sum(reg_num_add) over(partition by store_no order by trade_date) as reg_num_sum, -- 累计注册会员数lead(trade_date,1,9999-99-99) over (partition by store_no order by trade_date) as end_datefrom( -- 先统计每天注册会员数selecttrade_date,reg_md as store_no,count(1) as reg_num_addfrom dwm.dwm_mem_member_behavior_day_iwhere is_register 1group bytrade_date, reg_md) temp1 ), t2 as (selecttrade_date as start_date,store_no,consume_num_add, -- 新增消费会员数sum(consume_num_add) over(partition by store_no order by trade_date) as consume_num_sum, -- 累计消费会员数lead(trade_date,1,9999-99-99) over (partition by store_no order by trade_date) as end_datefrom (selecttrade_date,store_no,count(1) as consume_num_addfrom dwm.dwm_mem_first_buy_igroup by trade_date, store_no) temp2 ), t3 as (selecttrade_date as start_date,store_no,repurchase_num_add, -- 新增充值会员数sum(repurchase_num_add) over(partition by store_no order by trade_date) as repurchase_num_sum, -- 累计充值会员数lead(trade_date,1,9999-99-99) over (partition by store_no order by trade_date) as end_datefrom (selecttrade_date,store_no,count(1) as repurchase_num_addfrom dwm.dwm_mem_second_buy_igroup by trade_date, store_no) temp2 ), t4 as (-- 活跃会员数(最近30天有消费) 2023-11-14select2023-11-14 as trade_date,bind_md as store_no,count(distinct zt_id) as active_member_numfrom dwm.dwm_mem_member_behavior_day_iwhere trade_date 2023-11-14 and trade_date date_sub(2023-11-14,30) and is_consume 1group by bind_md ), t5 as (-- 沉睡会员数: 最近90天有消费 , 但是最近30天无消费select2023-11-14 as trade_date,temp3.bind_md as store_no,count(temp3.zt_id) as sleep_member_numfrom(selectbind_md,zt_idfrom dwm.dwm_mem_member_behavior_day_iwhere trade_date 2023-11-14 and trade_date date_sub(2023-11-14,90) and is_consume 1group by bind_md,zt_id) temp3LEFT JOIN(selectbind_md,zt_idfrom dwm.dwm_mem_member_behavior_day_iwhere trade_date 2023-11-14 and trade_date date_sub(2023-11-14,30) and is_consume 1group by bind_md,zt_id) temp4 on temp3.bind_md temp4.bind_md and temp3.zt_id temp4.zt_idwhere temp4.zt_id is nullgroup by temp3.bind_md ), t6 as (selecttrade_date,store_no,sum(real_paid_amount) as sale_amount_bindfrom dwm.dwm_mem_sell_order_iwhere trade_date 2023-11-14group by trade_date,store_no ), t7 as (select2023-11-14 as trade_date,store_no,if(start_date 2023-11-14,reg_num_add,0) as reg_num_add,reg_num_sum,0 as consume_num_add,0 as consume_num_sum,0 as repurchase_num_add,0 as repurchase_num_sum,0 as active_member_num,0 as sleep_member_num,0 as sale_amount_bindfrom t1where start_date 2023-11-14 and end_date 2023-11-14union allselect2023-11-14 as trade_date,store_no,0 reg_num_add,0 as reg_num_sum,if( start_date 2023-11-14,consume_num_add,0) as consume_num_add,consume_num_sum,0 as repurchase_num_add,0 as repurchase_num_sum,0 as active_member_num,0 as sleep_member_num,0 as sale_amount_bindfrom t2where start_date 2023-11-14 and end_date 2023-11-14union allselect2023-11-14 as trade_date,store_no,0 reg_num_add,0 as reg_num_sum,0 as consume_num_add,0 as consume_num_sum,if(start_date 2023-11-14,repurchase_num_add,0) as repurchase_num_add,repurchase_num_sum,0 as active_member_num,0 as sleep_member_num,0 as sale_amount_bindfrom t3where start_date 2023-11-14 and end_date 2023-11-14union allselecttrade_date,store_no,0 reg_num_add,0 as reg_num_sum,0 as consume_num_add,0 as consume_num_sum,0 as repurchase_num_add,0 as repurchase_num_sum,active_member_num,0 as sleep_member_num,0 as sale_amount_bindfrom t4union allselecttrade_date,store_no,0 reg_num_add,0 as reg_num_sum,0 as consume_num_add,0 as consume_num_sum,0 as repurchase_num_add,0 as repurchase_num_sum,0 as active_member_num,sleep_member_num,0 as sale_amount_bindfrom t5union allselecttrade_date,store_no,0 reg_num_add,0 as reg_num_sum,0 as consume_num_add,0 as consume_num_sum,0 as repurchase_num_add,0 as repurchase_num_sum,0 as active_member_num,0 as sleep_member_num,sale_amount_bindfrom t6 ) -- insert overwrite table dws.dws_mem_store_member_classify_day_i partition (dt) selectt7.trade_date,t8.week_trade_date,t8.month_trade_date,t7.store_no,t9.store_name,t9.store_sale_type,t9.store_type_code,t9.city_id,t9.city_name,t9.region_code,t9.region_name,t9.is_day_clear,sum(t7.reg_num_add) as reg_num_add,sum(t7.reg_num_sum) as reg_num_sum,sum(t7.consume_num_add) as consume_num_add,sum(t7.consume_num_sum) as consume_num_sum,sum(t7.repurchase_num_add) as repurchase_num_add,sum(t7.repurchase_num_sum) as repurchase_num_sum,sum(t7.active_member_num) as active_member_num,sum(t7.sleep_member_num) as sleep_member_num,sum(t7.sale_amount_bind) as sale_amount_bind,t7.trade_date as dt from t7left join dim.dwd_dim_date_f t8 on t7.trade_date t8.trade_date-- 注意: 一定要检查自己的dwd_dim_store_i分区目录,此处填写自己的分区目录时间left join dim.dwd_dim_store_i t9 on t7.store_no t9.store_no and t9.dt 2023-11-23 group byt7.trade_date,t8.week_trade_date,t8.month_trade_date,t7.store_no,t9.store_name,t9.store_sale_type,t9.store_type_code,t9.city_id,t9.city_name,t9.region_code,t9.region_name,t9.is_day_clear; 门店会员统计天表: 维度指标: 指标: 门店销售额、门店总订单量、当日注册人数、累计注册会员数、当日注册且充值会员数、当日注册且充值且消费会员数、当日注册且消费会员数、充值会员数、充值金额、累计会员充值金额、当日有余额的会员人数、当日会员余额、余额消费人数/单量、余额支付金额、余额消费金额、会员消费人数/单量、会员消费金额、会员首单人数/订单量/销售额、会员非首单人数/订单量/销售额 维度: 时间维度天、周、月涉及表:门店会员统计天表表字段的组成: 维度字段 指标结果字段 建表语句: CREATE TABLE IF NOT EXISTS dws.dws_mem_store_member_statistics_day_i(trade_date STRING COMMENT 统计时间,week_trade_date STRING COMMENT 周一日期,month_trade_date STRING COMMENT 月一日期,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,store_sale_amount DECIMAL(27, 2) COMMENT 门店销售金额,store_orders_number BIGINT COMMENT 门店总订单量,register_member_num BIGINT COMMENT 当日注册人数,register_member_num_all BIGINT COMMENT 累计注册会员数,register_recharge_num BIGINT COMMENT 当日注册且充值会员数,rg_rc_td_num BIGINT COMMENT 当日注册且充值且消费会员数,register_trade_num BIGINT COMMENT 当日注册且消费会员数,recharge_member_num BIGINT COMMENT 充值会员数,recharge_amount DECIMAL(27, 2) COMMENT 充值金额,recharge_amount_all DECIMAL(27, 2) COMMENT 累计会员充值金额,remain_member_num BIGINT COMMENT 当日有余额的会员人数,remain_member_amount DECIMAL(27, 2) COMMENT 当日会员余额,balance_member_num BIGINT COMMENT 余额消费人数,balance_member_order_num BIGINT COMMENT 余额消费单量,balance_pay_amount DECIMAL(27, 2) COMMENT 余额支付金额,balance_member_amount DECIMAL(27, 2) COMMENT 余额消费金额,member_num BIGINT COMMENT 会员消费人数,member_order_num BIGINT COMMENT 会员消费单量,member_amount DECIMAL(27, 2) COMMENT 会员消费金额,member_first_num BIGINT COMMENT 会员首单人数,member_first_order_num BIGINT COMMENT 会员首单订单量,member_first_amount DECIMAL(27, 2) COMMENT 会员首单销售额,member_nofirst_num BIGINT COMMENT 会员非首单人数,member_nofirst_order_num BIGINT COMMENT 会员非首单订单量,member_nofirst_amount DECIMAL(27, 2) COMMENT 会员非首单销售额 ) comment 门店会员统计日表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 门店的消费情况可以从dwm_sell_o2o_order_i表中出 注册、充值、消费这些数据可以从dwm_mem_member_behavior_day_i中出 余额数据可以从dwd_mem_balance_online_i中出。 需要注意的是这里有新增的指标还有累计的指标为了方便计算可以分开求解。 新增指标可以大部分从dwm_mem_member_behavior_day_i中出因为 dwm_mem_member_behavior_day_i是会员粒度的表记录了会员的各种行为。在计算会员指标的时候很多需要count()来计算的指标可以转化成sum1根据条件进行判断即可。 -- DWS 门店会员统计宽表 -- 注意: 以下内容仅仅以2023-11-14为例,实际需要把14-20日的所有数据都要导入对应表中 with t1 as (selecttrade_date,store_no,sum(real_paid_amount) as store_sale_amount,count(if(trade_type 0,parent_order_no,NULL)) - count(if(trade_type 5,parent_order_no,NULL)) as store_orders_number,0 as register_member_num,0 as register_member_num_all,0 as register_recharge_num,0 as rg_rc_td_num,0 as register_trade_num,0 as recharge_member_num,0 as recharge_amount,0 as recharge_amount_all,0 as remain_member_num,0 as remain_member_amount,0 as balance_member_num,0 as balance_member_order_num,0 as balance_pay_amount,0 as balance_member_amount,0 as member_num,0 as member_order_num,0 as member_amount,0 as member_first_num,0 as member_first_order_num,0 as member_first_amount,0 as member_nofirst_num,0 as member_nofirst_order_num,0 as member_nofirst_amountfrom dwm.dwm_sell_o2o_order_i where dt 2023-11-14group by trade_date,store_nounion allselecttrade_date,bind_md as store_no,0 as store_sale_amount, -- 门店销售额0 as store_orders_number, -- 门店总订单量sum(is_register) as register_member_num, -- 当日注册人数0 as register_member_num_all, -- 累计注册人数sum(if(is_register 1 and is_recharge 1, 1,0)) as register_recharge_num, -- 当日注册且充值会员数sum(if(is_register 1 and is_recharge 1 and is_consume 1, 1,0)) as rg_rc_td_num, -- 当日注册 且充值且消费会员数sum(if(is_register 1 and is_consume 1, 1,0)) as register_trade_num, -- 当日注册且消费会员数sum(is_recharge) as recharge_member_num, -- 充值会员数sum(if( is_recharge 1,recharge_amount,0) ) as recharge_amount, -- 充值金额0 as recharge_amount_all, -- 累计会员充值金额0 as remain_member_num, -- 当日有余额的会员人数0 as remain_member_amount, -- 当日会员余额sum(is_balance_consume) as balance_member_num, --余额消费人数sum(if(is_balance_consume 1, balance_consume_times, 0)) as balance_member_order_num, --余额消费单量sum(if(is_balance_consume 1, balance_pay_amount, 0)) as balance_pay_amount, -- 余额支付金额sum(if(is_balance_consume 1, balance_consume_amount, 0)) as balance_member_amount, -- 余额消费金额sum(is_consume) as member_num, -- 会员消费人数sum(if(is_consume 1, consume_times, 0)) as member_order_num, -- 会员消费单量sum(if(is_consume 1, consume_amount, 0)) as member_amount, -- 会员消费金额sum(is_first_consume) as member_first_num, -- 会员首单人数sum(is_first_consume) as member_first_order_num, -- 会员首单订单量sum(if(is_first_consume 1, first_consume_amount,0)) as member_first_amount, -- 会员首单销售额sum(is_consume) - sum(is_first_consume) as member_nofirst_num, -- 会员非首单人数sum(if(is_consume 1, consume_times, 0)) - sum(is_first_consume) as member_nofirst_order_num, -- 会员非首单订单量sum(if(is_consume 1, consume_amount, 0)) - sum(if(is_first_consume 1, first_consume_amount,0)) as member_nofirst_amount -- 会员非首单销售额from dwm.dwm_mem_member_behavior_day_i where dt 2023-11-14group by trade_date,bind_mdunion allselecttrade_date,store_no,0 as store_sale_amount,0 as store_orders_number,0 as register_member_num,0 as register_member_num_all,0 as register_recharge_num,0 as rg_rc_td_num,0 as register_trade_num,0 as recharge_member_num,0 as recharge_amount,0 as recharge_amount_all,count(1) as remain_member_num,sum(balance_amount) as remain_member_amount,0 as balance_member_num,0 as balance_member_order_num,0 as balance_pay_amount,0 as balance_member_amount,0 as member_num,0 as member_order_num,0 as member_amount,0 as member_first_num,0 as member_first_order_num,0 as member_first_amount,0 as member_nofirst_num,0 as member_nofirst_order_num,0 as member_nofirst_amountfrom dwd.dwd_mem_balance_online_i where dt 2023-11-14group by trade_date,store_nounion allselectstart_date as trade_date,store_no,0 as store_sale_amount,0 as store_orders_number,0 as register_member_num,register_member_num_all,0 as register_recharge_num,0 as rg_rc_td_num,0 as register_trade_num,0 as recharge_member_num,0 as recharge_amount,recharge_amount_all,0 as remain_member_num,0 as remain_member_amount,0 as balance_member_num,0 as balance_member_order_num,0 as balance_pay_amount,0 as balance_member_amount,0 as member_num,0 as member_order_num,0 as member_amount,0 as member_first_num,0 as member_first_order_num,0 as member_first_amount,0 as member_nofirst_num,0 as member_nofirst_order_num,0 as member_nofirst_amountfrom(selecttrade_date as start_date,store_no,sum(reg_num_add) over(partition by store_no order by trade_date) as register_member_num_all, -- 累计注册会员数sum(recharge_amount) over(partition by store_no order by trade_date) as recharge_amount_all, -- 累计充值金额lead(trade_date,1,9999-99-99) over (partition by store_no order by trade_date) as end_datefrom( -- 先统计每天注册会员数selecttrade_date,bind_md as store_no,sum(is_register) as reg_num_add,sum(if(is_recharge 1,recharge_amount,0)) as recharge_amountfrom dwm.dwm_mem_member_behavior_day_igroup bytrade_date, bind_md) temp1) twhere start_date 2023-11-14 and end_date 2023-11-14 ) insert overwrite table dws.dws_mem_store_member_statistics_day_i partition(dt) selectt1.trade_date,t2.week_trade_date,t2.month_trade_date,t1.store_no,t3.store_name,t3.store_sale_type,t3.store_type_code,t3.city_id,t3.city_name,t3.region_code,t3.region_name,t3.is_day_clear,sum(t1.store_sale_amount) as store_sale_amount,sum(t1.store_orders_number) as store_orders_number,sum(t1.register_member_num) as register_member_num,sum(t1.register_member_num_all) as register_member_num_all,sum(t1.register_recharge_num) as register_recharge_num,sum(t1.rg_rc_td_num) as rg_rc_td_num,sum(t1.register_trade_num) as register_trade_num,sum(t1.recharge_member_num) as recharge_member_num,sum(t1.recharge_amount) as recharge_amount,sum(t1.recharge_amount_all) as recharge_amount_all,sum(t1.remain_member_num) as remain_member_num,sum(t1.remain_member_amount) as remain_member_amount,sum(t1.balance_member_num) as balance_member_num,sum(t1.balance_member_order_num) as balance_member_order_num,sum(t1.balance_pay_amount) as balance_pay_amount,sum(t1.balance_member_amount) as balance_member_amount,sum(t1.member_num) as member_num,sum(t1.member_order_num) as member_order_num,sum(t1.member_amount) as member_amount,sum(t1.member_first_num) as member_first_num,sum(t1.member_first_order_num) as member_first_order_num,sum(t1.member_first_amount) as member_first_amount,sum(t1.member_nofirst_num) as member_nofirst_num,sum(t1.member_nofirst_order_num) as member_nofirst_order_num,sum(t1.member_nofirst_amount) as member_nofirst_amount,t1.trade_date as dt from t1left join dim.dwd_dim_date_f t2 on t1.trade_date t2.trade_date-- 注意: 一定要检查自己的dwd_dim_store_i分区目录,此处填写自己的分区目录时间left join dim.dwd_dim_store_i t3 on t1.store_no t3.store_no and t3.dt 2023-11-23 group byt1.trade_date,t2.week_trade_date,t2.month_trade_date,t1.store_no,t3.store_name,t3.store_sale_type,t3.store_type_code,t3.city_id,t3.city_name,t3.region_code,t3.region_name,t3.is_day_clear; ADS层开发 回顾dayofweek函数 -- dayofweek -- 注意: dayofweek是老外从周日算,所以返回的结果和咱们中国人思路差1天 select dayofweek(2023-12-7); -- 需求: 获取到2023-12-7所在的周中的周一日期 select date_sub(2023-12-7,if(dayofweek(2023-12-7)1,6,dayofweek(2023-12-7)-2));-- 需求: 获取到2023-12-7所在的周中的周日日期 select date_sub(2023-12-8,if(dayofweek(2023-12-8)1,0,dayofweek(2023-12-8)-8));-- day0fmonth select dayofmonth(2023-12-07); -- 需求: 获取2023-12-7所在月的第一天的日期 select date_sub(2023-12-07,dayofmonth(2023-12-07)-1); -- 需求: 获取2023-12-7所在月的最后一天的日期 select last_day(2023-12-07); 各类会员数量统计分析 维度指标: 指标新增注册会员数、累计注册会员数、新增消费会员数、累计消费会员数、新增复购会员数、累计复购会员数、活跃会员数、沉睡会员数、会员消费金额 维度: 时间维度天、周、月涉及ADS表:门店会员分类月表 和 门店会员分类周表表字段的组成: 维度字段 指标结果字段 门店会员分类周表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_store_member_classify_week_i(trade_date STRING COMMENT 周一日期,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,reg_num_add BIGINT COMMENT 新增注册会员数,reg_num_sum BIGINT COMMENT 累计注册会员数,consume_num_add BIGINT COMMENT 新增消费会员数,consume_num_sum BIGINT COMMENT 累计消费会员数,repurchase_num_add BIGINT COMMENT 新增复购会员数,repurchase_num_sum BIGINT COMMENT 累计复购会员数,active_member_num BIGINT COMMENT 活跃会员数,sleep_member_num BIGINT COMMENT 沉睡会员数,sale_amount_bind DECIMAL(27, 2) COMMENT 会员消费金额 ) comment 门店会员分类周表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 指标分为累计值和新增值累计值可以取当周最后一天的数值、新增值可以进行聚合得到。 ​ 需要注意的是这里计算的是一张周表所以当考虑到数据的场景时需要取到当周所有的数据进行聚合以及取到当周最后一天进行取累加值。 思考: 当计算某一天对应这一周的指标, 如果获取这一周相关的数据呢? where t.dt in (select max(dt) from dws.dws_mem_store_member_classify_day_iwhere dtdate_sub(${inputdate}, if (dayofweek(${inputdate}) 1, 6, dayofweek(${inputdate}) - 2)) and dtdate_sub(${inputdate}, if (dayofweek(${inputdate}) 1, 0, dayofweek(${inputdate}) - 8)) ) 代码实现: with t1 as ( -- 计算非累加值 selectweek_trade_date,store_no,sum(reg_num_add) as reg_num_add,sum(consume_num_add) as consume_num_add,sum(repurchase_num_add) as repurchase_num_add,sum(sale_amount_bind) as sale_amount_bind from dws.dws_mem_store_member_classify_day_i where dt date_sub(2023-11-14,if(dayofweek(2023-11-14) 1,6, dayofweek(2023-11-14)-2 ))and dt date_add(2023-11-14,if(dayofweek(2023-11-14) 1,0,-dayofweek(2023-11-14)8)) group by week_trade_date,store_no ), t2 as ( -- 计算 累计值 -- 如果获取这一周的最后一天呢?selectweek_trade_date as trade_date,store_no,store_name,store_sale_type,store_type_code,city_id,city_name,region_code,region_name,is_day_clear,reg_num_sum,consume_num_sum,repurchase_num_sum,active_member_num,sleep_member_numfrom dws.dws_mem_store_member_classify_day_i where dt in (selectmax(dt) as c1from dws.dws_mem_store_member_classify_day_i as twhere dt date_sub(2023-11-14,if(dayofweek(2023-11-14) 1,6, dayofweek(2023-11-14)-2 ))and dt date_add(2023-11-14,if(dayofweek(2023-11-14) 1,0,-dayofweek(2023-11-14)8))) ) insert overwrite table ads.ads_mem_store_member_classify_week_i partition (dt) selectt2.trade_date,t2.store_no,t2.store_name,t2.store_sale_type,t2.store_type_code,t2.city_id,t2.city_name,t2.region_code,t2.region_name,t2.is_day_clear,t1.reg_num_add,t2.reg_num_sum,t1.consume_num_add,t2.consume_num_sum,t1.repurchase_num_add,t2.repurchase_num_sum,t2.active_member_num,t2.sleep_member_num,t1.sale_amount_bind,t2.trade_date as dt from t2 left join t1 on t2.trade_date t1.week_trade_date and t2.store_no t1.store_no; 门店会员分类月表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_store_member_classify_month_i(trade_date STRING COMMENT 月一日期,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,reg_num_add BIGINT COMMENT 新增注册会员数,reg_num_sum BIGINT COMMENT 累计注册会员数,consume_num_add BIGINT COMMENT 新增消费会员数,consume_num_sum BIGINT COMMENT 累计消费会员数,repurchase_num_add BIGINT COMMENT 新增复购会员数,repurchase_num_sum BIGINT COMMENT 累计复购会员数,active_member_num BIGINT COMMENT 活跃会员数,sleep_member_num BIGINT COMMENT 沉睡会员数,sale_amount_bind DECIMAL(27, 2) COMMENT 会员消费金额 ) comment 门店会员分类月表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 处理思路: 同周表ads_mem_store_member_classify_week_i,改变下范围即可 思考: 如果获取一个月范围的数据呢? select date_sub(2023-09-30,dayofmonth(2023-09-30)-1), last_day(2023-09-30) 门店会员分析 维度指标: 指标: 门店销售额、门店总订单量、当日注册人数、累计注册会员数、当日注册且充值会员数、当日注册且充值且消费会员数、当日注册且消费会员数、充值会员数、充值金额、累计会员充值金额、当日有余额的会员人数、当日会员余额、余额消费人数/单量、余额支付金额、余额消费金额、会员消费人数/单量、会员消费金额、会员首单人数/订单量/销售额、会员非首单人数/订单量/销售额 维度: 时间维度天、周、月涉及表: 门店会员统计周表 和 门店会员统计月表涉及表字段: 维度字段 指标结果字段 门店会员统计周表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_store_member_statistics_week_i(trade_date STRING COMMENT 周一日期,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,store_sale_amount DECIMAL(27, 2) COMMENT 门店销售金额,store_orders_number BIGINT COMMENT 门店总订单量,register_member_num BIGINT COMMENT 当日注册人数,register_member_num_all BIGINT COMMENT 累计注册会员数,register_recharge_num BIGINT COMMENT 当日注册且充值会员数,rg_rc_td_num BIGINT COMMENT 当日注册且充值且消费会员数,register_trade_num BIGINT COMMENT 当日注册且消费会员数,recharge_member_num BIGINT COMMENT 充值会员数,recharge_amount DECIMAL(27, 2) COMMENT 充值金额,recharge_amount_all DECIMAL(27, 2) COMMENT 累计会员充值金额,remain_member_num BIGINT COMMENT 当周最后一天有余额的会员人数,remain_member_amount DECIMAL(27, 2) COMMENT 当周最后一天会员余额,balance_member_num BIGINT COMMENT 余额消费人数,balance_member_order_num BIGINT COMMENT 余额消费单量,balance_pay_amount DECIMAL(27, 2) COMMENT 余额支付金额,balance_member_amount DECIMAL(27, 2) COMMENT 余额消费金额,member_num BIGINT COMMENT 会员消费人数,member_order_num BIGINT COMMENT 会员消费单量,member_amount DECIMAL(27, 2) COMMENT 会员消费金额,member_first_num BIGINT COMMENT 会员首单人数,member_first_order_num BIGINT COMMENT 会员首单订单量,member_first_amount DECIMAL(27, 2) COMMENT 会员首单销售额,member_nofirst_num BIGINT COMMENT 会员非首单人数,member_nofirst_order_num BIGINT COMMENT 会员非首单订单量,member_nofirst_amount DECIMAL(27, 2) COMMENT 会员非首单销售额 ) comment 门店会员统计周表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 指标分为三种情况一种是状态值比如说累计指标register_member_num_all等还有状态指标remain_member_num等另一种情况是可累加的指标比如金额和单量等还有一种情况是不可累积指标比如人数。状态值可以从最新的天表中dws_mem_store_member_statistics_day_i获取。 然后以这张表作为主表关联其他表。 可累加的指标直接从dws_mem_store_member_statistics_day_i中进行聚合得到。 不可累加的指标从dwm_mem_member_behavior_day_i中进行计算得到。 代码实现: -- ads 门店会员统计周表 with t1 as ( -- 第一部分: 基于层门店会员统计天表 获取指定天的对应这一周的数据, 对这一周进行聚合统计 selectweek_trade_date as trade_date,store_no,sum(store_sale_amount) as store_sale_amount,sum(store_orders_number) as store_orders_number,sum(register_member_num) as register_member_num,sum(register_recharge_num) as register_recharge_num,sum(rg_rc_td_num) as rg_rc_td_num,sum(register_trade_num) as register_trade_num,sum(recharge_amount) as recharge_amount,sum(balance_member_order_num) as balance_member_order_num,sum(balance_pay_amount) as balance_pay_amount,sum(balance_member_amount) as balance_member_amount,sum(member_order_num) as member_order_num,sum(member_amount) as member_amount,sum(member_first_num) as member_first_num,sum(member_first_order_num) as member_first_order_num,sum(member_first_amount) as member_first_amount,sum(member_nofirst_order_num) as member_nofirst_order_num,sum(member_nofirst_amount) as member_nofirst_amountfrom dws.dws_mem_store_member_statistics_day_i where dt date_sub(2023-11-14,if(dayofweek(2023-11-14) 1,6, dayofweek(2023-11-14)-2 ))and dt date_add(2023-11-14,if(dayofweek(2023-11-14) 1,0,-dayofweek(2023-11-14)8)) group by week_trade_date,store_no ),t2 as (selectweek_trade_date as trade_date,store_no,store_name,store_sale_type,store_type_code,city_id,city_name,region_code,region_name,is_day_clear,register_member_num_all,recharge_amount_all,remain_member_num,remain_member_amountfrom dws.dws_mem_store_member_statistics_day_i where dt in (selectmax(dt)from dws.dws_mem_store_member_statistics_day_i twhere dt date_sub(2023-11-14,if(dayofweek(2023-11-14) 1,6, dayofweek(2023-11-14)-2 ))and dt date_add(2023-11-14,if(dayofweek(2023-11-14) 1,0,-dayofweek(2023-11-14)8))) ),t3 as (selectweek_trade_date as trade_date,bind_md as store_no,count( DISTINCT if(is_recharge 1,zt_id,NULL) ) AS recharge_member_num,count( DISTINCT if(is_balance_consume 1,zt_id,NULL) ) AS balance_member_num,count( DISTINCT if(is_consume 1,zt_id,NULL) ) AS member_num,count( DISTINCT if(is_first_consume 0 and consume_times 0,zt_id,NULL) ) AS member_nofirst_numfrom dwm.dwm_mem_member_behavior_day_iwhere dt date_sub(2023-11-14,if(dayofweek(2023-11-14) 1,6, dayofweek(2023-11-14)-2 ))and dt date_add(2023-11-14,if(dayofweek(2023-11-14) 1,0,-dayofweek(2023-11-14)8))group by week_trade_date,bind_md )insert overwrite table ads.ads_mem_store_member_statistics_week_i partition (dt) selectt2.trade_date,t2.store_no,t2.store_name,t2.store_sale_type,t2.store_type_code,t2.city_id,t2.city_name,t2.region_code,t2.region_name,t2.is_day_clear,t1.store_sale_amount,t1.store_orders_number,t1.register_member_num,t2.register_member_num_all,t1.register_recharge_num,t1.rg_rc_td_num,t1.register_trade_num,t3.recharge_member_num,t1.recharge_amount,t2.recharge_amount_all,t2.remain_member_num,t2.remain_member_amount,t3.balance_member_num,t1.balance_member_order_num,t1.balance_pay_amount,t1.balance_member_amount,t3.member_num,t1.member_order_num,t1.member_amount,t1.member_first_num,t1.member_first_order_num,t1.member_first_amount,t3.member_nofirst_num,t1.member_nofirst_order_num,t1.member_nofirst_amount,t2.trade_date as dt from t2 left join t1 on t2.trade_date t1.trade_date and t2.store_no t1.store_noleft join t3 on t2.trade_date t3.trade_date and t2.store_no t3.store_no; 门店会员统计月表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_store_member_statistics_month_i(trade_date STRING COMMENT 月一日期,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,store_sale_amount DECIMAL(27, 2) COMMENT 门店销售金额,store_orders_number BIGINT COMMENT 门店总订单量,register_member_num BIGINT COMMENT 当日注册人数,register_member_num_all BIGINT COMMENT 累计注册会员数,register_recharge_num BIGINT COMMENT 当日注册且充值会员数,rg_rc_td_num BIGINT COMMENT 当日注册且充值且消费会员数,register_trade_num BIGINT COMMENT 当日注册且消费会员数,recharge_member_num BIGINT COMMENT 充值会员数,recharge_amount DECIMAL(27, 2) COMMENT 充值金额,recharge_amount_all DECIMAL(27, 2) COMMENT 累计会员充值金额,remain_member_num BIGINT COMMENT 当月最后一天有余额的会员人数,remain_member_amount DECIMAL(27, 2) COMMENT 当月最后一天会员余额,balance_member_num BIGINT COMMENT 余额消费人数,balance_member_order_num BIGINT COMMENT 余额消费单量,balance_pay_amount DECIMAL(27, 2) COMMENT 余额支付金额,balance_member_amount DECIMAL(27, 2) COMMENT 余额消费金额,member_num BIGINT COMMENT 会员消费人数,member_order_num BIGINT COMMENT 会员消费单量,member_amount DECIMAL(27, 2) COMMENT 会员消费金额,member_first_num BIGINT COMMENT 会员首单人数,member_first_order_num BIGINT COMMENT 会员首单订单量,member_first_amount DECIMAL(27, 2) COMMENT 会员首单销售额,member_nofirst_num BIGINT COMMENT 会员非首单人数非去重,member_nofirst_order_num BIGINT COMMENT 会员非首单订单量,member_nofirst_amount DECIMAL(27, 2) COMMENT 会员非首单销售额 ) comment 门店会员统计月表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 同 ads_mem_store_member_statistics_week_i,改变下范围即可 ADS层其他需求(基于Presto实现) Presto--分布式SQL查询引擎 Presto-简介 背景 大数据分析类软件发展历程。 Apache Hadoop-MapReduce 优点统一、通用、简单的编程模型分而治之思想处理海量数据。 缺点java学习成本高、MR执行慢、内部过程繁琐 Apache Hive 优点SQL on Hadoop。sql语言上手方便。学习成本低。 缺点底层默认还是MapReduce引擎、慢、延迟高 各种SQL类计算引擎开始出现主要追求的就是一个问题计算如何更快延迟如何降低。 Presto/trino Spark On Hive、Spark SQL Flink ....... FaceBook维护的原始版本: presto, 也叫prestoDBPresto创始人团队离职后研发并维护的: PrestoSQL 因为版权更名为Trino已经给大家整理好了对应网址如下:FaceBook维护的, Presto的官网: https://prestodb.io/ 创始人团队维护的, Trino的官网: https://trino.io/ Presto创始人团队维护的, 因为版权更名为Trino: http://github.com/trinodb/trino 相关文章如下:Presto在有赞的实践之路: https://cloud.tencent.com/developer/news/606849 Presto更名为-Trino: https://www.sohu.com/a/441836081_106784 介绍 Presto是一个开源的分布式SQL查询引擎适用于交互式查询数据量支持GB到PB字节。 Presto的设计和编写完全是为了解决Facebook这样规模的商业数据仓库交互式分析和处理速度的问题。 presto简介: 一条Presto查询可以将多个数据源进行合并可以跨越整个组织进行分析;presto特点: Presto以分析师的需求作为目标他们期望响应速度小于1秒到几分钟; 优缺点 # 优点 1Presto与Hive对比都能够处理PB级别的海量数据分析但Presto是基于内存运算减少没必要的硬盘IO所以更快。2能够连接多个数据源跨数据源连表查如从Hive查询大量网站访问记录然后从Mysql中匹配出设备信息。3部署也比Hive简单因为Hive是基于HDFS的需要先部署HDFS。# 缺点 1虽然能够处理PB级别的海量数据分析但不是代表Presto把PB级别都放在内存中计算的。而是根据场景如countavg等聚合运算是边读数据边计算再清内存再读数据再计算这种耗的内存并不高。但是连表查就可能产生大量的临时数据因此速度会变慢反而Hive此时会更擅长。2为了达到实时查询可能会想到用它直连MySql来操作查询这效率并不会提升瓶颈依然在MySql此时还引入网络瓶颈所以会比原本直接操作数据库要慢。 Presto-架构、相关术语 架构图 Presto是一个运行在多台服务器上的分布式系统。 完整安装包括一个coordinator和多个worker。 由客户端提交查询从Presto命令行CLI提交到coordinator; coordinator进行解析分析并执行查询计划然后分发处理队列到worker。 整个presto是一个 M-S架构 (主从架构):coordinator: 主节点 作用: 负责接收客户端发送的SQL, 对SQL进行编译, 形成执行计划, 根据执行计划, 分发给各个从节点进行执行操作 discovery service: 附属节点作用: 一般内嵌在主节点中, 主要负责维护从节点列表, 当从节点启动后, 都需要到 discovery 节点进行注册操作 worker节点: 从节点作用: 负责接收coordinator传递过来任务, 对任务进行具体处理工作(读取数据, 处理数据, 将处理后结果数据返回给coordinator) Connector 连接器 1、Presto通过Connector连接器来连接访问不同数据源例如Hive或mysql。连接器功能类似于数据库的驱动程序。允许Presto使用标准API与资源进行交互。2、Presto包含几个内置连接器JMX连接器可访问内置系统表的System连接器Hive连接器和旨在提供TPC-H基准数据的TPCH连接器。许多第三方开发人员都贡献了连接器因此Presto可以访问各种数据源中的数据比如ES、Kafka、MongoDB、Redis、Postgre、Druid、Cassandra等。 Catalog 连接目录: hive或者mysql等数据源 1、Presto Catalog是数据源schema的上一级并通过连接器访问数据源。2、例如可以配置Hive Catalog以通过Hive Connector连接器提供对Hive信息的访问。3、在Presto中使用表时标准表名始终是被支持的。 例如hive.test_data.test的标准表名将引用hive catalog中test_data schema中的test table。 Catalog需要在Presto的配置文件中进行配置。schema 库 Schema是组织表的一种方式。Catalog和Schema共同定义了一组可以查询的表。当使用Presto访问Hive或关系数据库例如MySQL时Schema会转换为目标数据库中的对应Schema(database)。 schema通俗理解就是我们所讲的database.想一下在hive中下面这两个sql是否相等。 show databases; -- presto不支持 show schemas; table 表 ... Presto-集群启停 [roothadoop01 ~]# /export/server/presto/bin/launcher start Started as 89560# 可以使用jps 配合kill -9命令 关闭进程 web UI页面 链接: http://192.168.88.80:8090/ui/ Presto-Datagrip连接使用 JDBC 驱动presto-jdbc-0.245.1.jar JDBC 地址jdbc:presto://192.168.88.80:8090/hive step1创建连接 由于驱动比较大,好多人经常下载失败,可以按照下图关联资料中提供的包: presto-jdbc-0.245.1.jar Presto--时间日期类型注意事项 date_format(timestamp, format) varchar 作用: 将指定的日期对象转换为字符串操作 date_parse(string, format) → timestamp 作用: 用于将字符串的日期数据转换为日期对象 select date_format( timestamp 2020-10-10 12:50:50 , %Y/%m/%d %H:%i:%s); select date_format( date_parse(2020:10:10 12-50-50,%Y:%m:%d %H-%i-%s) ,%Y/%m/%d %H:%i:%s);---- 注意: 参数一必须是日期对象所以如果传递的是字符串, 必须将先转换为日期对象: 方式一: 标识为日期对象, 但是格式必须为标准日期格式timestamp 2020-10-10 12:50:50date 2020-10-10方式二: 如果不标准先用date_parse解析成为标准date_parse(2020-10-10 12:50:50,%Y-%m-%d %H:%i:%s) 扩展说明: 日期format格式说明年%Y月%m日%d时%H分%i 秒%s周几%w(0..6) date_add(unit, value, timestamp) → [same as input] 作用: 用于对日期数据进行 加 减 操作 date_diff(unit, timestamp1, timestamp2) → bigint 作用: 用于比对两个日期之间差值 select date_add(hour,3,timestamp 2021-09-02 15:59:50); select date_add(day,-1,timestamp 2021-09-02 15:59:50); select date_add(month,-1,timestamp 2021-09-02 15:59:50);select date_diff(year,timestamp 2020-09-02 06:30:30,timestamp 2021-09-02 15:59:50) select date_diff(month,timestamp 2021-06-02 06:30:30,timestamp 2021-09-02 15:59:50) select date_diff(day,timestamp 2021-08-02 06:30:30,timestamp 2021-09-02 15:59:50) Presto-常规优化 数据存储优化 --1合理设置分区与Hive类似Presto会根据元信息读取分区数据合理的分区能减少Presto数据读取量提升查询性能。--2使用列式存储Presto对ORC文件读取做了特定优化因此在Hive中创建Presto使用的表时建议采用ORC格式存储。相对于ParquetPresto对ORC支持更好。Parquet和ORC一样都支持列式存储但是Presto对ORC支持更好而Impala对Parquet支持更好。在数仓设计时要根据后续可能的查询引擎合理设置数据存储格式。--3使用压缩数据压缩可以减少节点间数据传输对IO带宽压力对于需要快速解压的建议采用Snappy压缩。--4预先排序对于已经排序的数据在查询的数据过滤阶段ORC格式支持跳过读取不必要的数据。比如对于经常需要过滤的字段可以预先排序。SQL优化 列裁剪 分区裁剪 group by优化 按照数据量大小降序排列 order by使用limit join时候大表放置在左边 ... 替换非ORC格式的Hive表 Presto-内存调优 内存管理机制--内存分类 Presto管理的内存分为两大类user memory和system memory user memory用户内存 跟用户数据相关的比如读取用户输入数据会占据相应的内存这种内存的占用量跟用户底层数据量大小是强相关的 system memory系统内存 执行过程中衍生出的副产品比如tablescan表扫描write buffers写入缓冲区跟查询输入的数据本身不强相关的内存。 内存管理机制--内存池 内存池中来实现分配user memory和system memory。 内存池为常规内存池GENERAL_POOL、预留内存池RESERVED_POOL。 1、GENERAL_POOL:在一般情况下一个查询执行所需要的user/system内存都是从general pool中分配的reserved pool在一般情况下是空闲不用的。2、RESERVED_POOL:大部分时间里是不参与计算的但是当集群中某个Worker节点的general pool消耗殆尽之后coordinator会选择集群中内存占用最多的查询把这个查询分配到reserved pool这样这个大查询自己可以继续执行而腾出来的内存也使得其它的查询可以继续执行从而避免整个系统阻塞。注意: reserved pool到底多大呢这个是没有直接的配置可以设置的他的大小上限就是集群允许的最大的查询的大小(query.total-max-memory-per-node)。reserved pool也有缺点一个是在普通模式下这块内存会被浪费掉了二是大查询可以用Hive来替代。因此也可以禁用掉reserved poolexperimental.reserved-pool-enabled设置为false那系统内存耗尽的时候没有reserved pool怎么办呢它有一个OOM Killer的机制对于超出内存限制的大查询SQL将会被系统Kill掉从而避免影响整个presto。 内存相关参数 1、user memory用户内存参数 query.max-memory-per-node:单个query操作在单个worker上user memory能用的最大值 query.max-memory:单个query在整个集群中允许占用的最大user memory2、usersystem总内存参数 query.max-total-memory-per-node:单个query操作可在单个worker上使用的最大(user system)内存 query.max-total-memory:单个query在整个集群中允许占用的最大(user system) memory当这些阈值被突破的时候query会以insufficient memory内存不足的错误被终结。3、协助阻止机制 在高内存压力下保持系统稳定。当general pool常规内存池已满时操作会被置为blocked阻塞状态直到通用池中的内存可用为止。此机制可防止激进的查询填满JVM堆并引起可靠性问题。4、其他参数 memory.heap-headroom-per-node:这个内存是JVM堆中预留给第三方库的内存分配presto无法跟踪统计默认值是-Xmx * 0.35、结论 GeneralPool 服务器总内存 - ReservedPool - memory.heap-headroom-per-node - Linux系统内存常规内存池内存大小服务器物理总内存-服务器linux操作系统内存-预留内存池大小-预留给第三方库内存 内存优化建议 常见的报错解决 1、Query exceeded per-node total memory limit of xx 适当增加query.max-total-memory-per-node。2、Query exceeded distributed user memory limit of xx 适当增加query.max-memory。3、Could not communicate with the remote task. The node may have crashed or be under too much load 内存不够导致节点crash可以查看/var/log/message。 建议参数设置 1、query.max-memory-per-node和query.max-total-memory-per-node是query操作使用的主要内存配置因此这两个配置可以适当加大。 memory.heap-headroom-per-node是三方库的内存默认值是JVM-Xmx * 0.3可以手动改小一些。1) 各节点JVM内存推荐大小: 当前节点剩余内存*80%2) 对于heap-headroom-pre-node第三方库的内存配置: 建议jvm内存的%15左右3) 在配置的时候, 不要正正好好, 建议预留一点点, 以免出现问题数据量在35TB , presto节点数量大约在30台左右 (128GB内存 8核CPU) 注意 1、query.max-memory-per-node小于query.max-total-memory-per-node。 2、query.max-memory小于query.max-total-memory。 3、query.max-total-memory-per-node 与memory.heap-headroom-per-node 之和必须小于 jvm max memory也就是jvm.config 中配置的-Xmx。 ADS层开发_其他需求(Presto实现) 维度指标 需求一: 会员首次充值(统计每个会员首次充值的时间, 交易单ID以及对应门店和充值金额) 需求二: 门店新老会员消费(统计每个门店每个月新会员、老会员、全部会员、非会员的数量、消费金额、消费单量(新会员指的首次消费后30天内, 老会员指的首次消费后大于30天)) 需求三: 会员复购统计(留存)(统计的指标为统计日期用户量、一日后用户量、二日后用户量、三日后用户量、四日后用户量、五日后用户量、六日后用户量) 需求四: 会员贡献(统计各个会员每天在各个门店消费单量、消费金额、消费成本、线上订单量、线上消费金额、线上消费成本、线下订单量、线下消费金额、线下消费成本)涉及表:会员首次充值表 和 门店新老会员消费月表 和 会员复购统计天表 以及 会员贡献天表 会员首次充值表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_member_first_recharge_i(trade_date_time STRING COMMENT 交易时间,trade_date STRING COMMENT 日期,trade_order_id STRING COMMENT 对应的交易单id,zt_id BIGINT COMMENT 中台 会员id,store_no STRING COMMENT 门店编号,city_id BIGINT COMMENT 城市ID,recharge_amount DECIMAL(27, 2) COMMENT 充值金额 ) comment 会员首次充值表 partitioned by (dt STRING COMMENT 统计日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 说明: 同dwm_mem_first_buy_i insert into hive.ads.ads_mem_member_first_recharge_i with t1 as (selectdate_format(trade_date,%Y-%m-%d %H:%i:%s) as trade_date_time,date_format(trade_date,%Y-%m-%d) as trade_date,trade_order_id,zt_id,store_no,city_id,amount as recharge_amount,row_number() over(partition by zt_id order by trade_date) as rnfrom hive.ods.ods_mem_store_amount_record_i where record_type 2 and date_format(trade_date,%Y-%m-%d) 2023-11-20 ) selectt1.trade_date_time,t1.trade_date,t1.trade_order_id,t1.zt_id,t1.store_no,t1.city_id,t1.recharge_amount,2023-11-20 as dt from t1left join hive.ads.ads_mem_member_first_recharge_i ton t1.zt_id t.zt_id and t1.store_no t.store_no and t.dt 2023-11-20 where rn 1 and t.zt_id is null; 门店新老会员消费月表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_store_new_old_member_month_i(trade_date STRING COMMENT 月一时间,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,member_type BIGINT COMMENT 会员类型:1新会员,2老会员,3会员,4非会员,member_num BIGINT COMMENT 消费会员数,sale_amount DECIMAL(27, 2) COMMENT 消费金额,order_num BIGINT COMMENT 消费单量 ) comment 门店新老会员消费月表 partitioned by (dt STRING COMMENT 消费日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 新会员首次消费后30天内的 老会员首次消费后大于30天需要统计每个门店每个月新会员、老会员、全部会员、非会员的数量、消费金额、消费单量。 注意这里是一个月表在判断新老会员的时候按照当月最后一天为标准往前推30天30天内的为新会员。比如今天是5月28日在计算5月份的数据时4月29日——5月28日这30天的都是新会员。而在计算4月份数据时因为4月份已经过去了所以以4月30日为最后一天4月1日——4月30日为4月份的新会员。 1在计算月表时需要取到当月最后一天然后以最后一天为标准取到前30天 selecta.trade_date,date_sub(a.trade_date, 30) as day30 -- 前30天,a.month_trade_date -- 对应的月一时间 from dim.dwd_dim_date_f a inner join(select max(dt) mdt from dws.dws_mem_store_member_statistics_day_i -- 取到最大的分区where dtdate_sub(${inputdate}, dayofmonth(${inputdate}) - 1)and dtlast_day(${inputdate}) ) b on a.trade_date b.mdt 2新会员30天内首次消费的会员 取新会员可以使用首次消费表取前30天到当月最大一天的会员即可。 因为在hive的where语句中不能使用子查询所以这里使用join的方式解决 with dtt as (selecta.trade_date,date_sub(a.trade_date, 30) as day30 -- 前30天,a.month_trade_date -- 对应的月一时间from dim.dwd_dim_date_f ainner join(select max(dt) mdt from dws.dws_mem_store_member_statistics_day_i -- 取到最大的分区where dtdate_sub(${inputdate}, dayofmonth(${inputdate}) - 1)and dtlast_day(${inputdate}) ) bon a.trade_date b.mdt), zt as (select s.zt_id from dwm.dwm_mem_first_buy_i s -- 取到最大分区与其前30天的数据cross join dttwhere s.dt dtt.day30 and s.dt dtt.trade_date) 代码实现: insert into hive.ads.ads_mem_store_new_old_member_month_i with t1 as (selecttrade_date,date_format(date_add(day,-30,date 2023-11-20),%Y-%m-%d) as day30,month_trade_date,month_end_datefrom hive.dim.dwd_dim_date_f where trade_date (selectmax(dt)from hive.dws.dws_mem_store_member_statistics_day_iwhere dt date_format(date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20),%Y-%m-%d)and dt date_format(date_add(month,1,date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20)),%Y-%m-%d)) ), t2 as (-- 获取最近30天有过消费的新用户selecttemp1.trade_date,temp1.zt_id,temp1.store_nofrom hive.dwm.dwm_mem_first_buy_i temp1cross join t1where temp1.dt t1.day30 and temp1.dt t1.trade_date ), t3 as (-- 获取 最近30天新用户的消费select1 as member_type,temp2.bind_md as store_no,count(distinct temp2.zt_id) as member_num,sum(consume_amount) as sale_amount,sum(consume_times) as order_numfrom hive.dwm.dwm_mem_member_behavior_day_i temp2join t2 on temp2.zt_id t2.zt_id and temp2.bind_md t2.store_nowhere dt date_format(date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20),%Y-%m-%d)and dt date_format(date_add(month,1,date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20)),%Y-%m-%d)and consume_times 0group bytemp2.bind_mdunion all-- 获取 老会员select2 as member_type,temp2.bind_md as store_no,count(distinct temp2.zt_id) as member_num,sum(consume_amount) as sale_amount,sum(consume_times) as order_numfrom hive.dwm.dwm_mem_member_behavior_day_i temp2left join t2 on temp2.zt_id t2.zt_id and temp2.bind_md t2.store_nowhere dt date_format(date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20),%Y-%m-%d)and dt date_format(date_add(month,1,date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20)),%Y-%m-%d)and consume_times 0 and t2.zt_id is nullgroup bytemp2.bind_mdunion all-- 获取 全部会员select3 as member_type,temp2.bind_md as store_no,count(distinct temp2.zt_id) as member_num,sum(consume_amount) as sale_amount,sum(consume_times) as order_numfrom hive.dwm.dwm_mem_member_behavior_day_i temp2where dt date_format(date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20),%Y-%m-%d)and dt date_format(date_add(month,1,date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20)),%Y-%m-%d)and consume_times 0group bytemp2.bind_mdunion all-- 非会员数据select4 as member_type,store_no,0 as member_num,sum(real_paid_amount) as sale_amount,count(if(trade_type 0,parent_order_no,NULL)) - count(if(trade_type 5,parent_order_no,NULL)) as order_numfrom hive.dwm.dwm_sell_o2o_order_iwhere dt date_format(date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20),%Y-%m-%d)and dt date_format(date_add(month,1,date_add(day, -day(date 2023-11-20) 1 ,date 2023-11-20)),%Y-%m-%d)and member_type 0group by store_no ) selectt1.month_trade_date as trade_date,t3.store_no,t4.store_name,t4.store_sale_type,t4.store_type_code,t4.city_id,t4.city_name,t4.region_code,t4.region_name,t4.is_day_clear,t3.member_type,t3.member_num,cast(t3.sale_amount as decimal(27,2)),t3.order_num,t1.month_trade_date as dt from t3 cross join t1-- 注意: 一定要检查自己的dwd_dim_store_i分区目录,此处填写自己的分区目录时间left join hive.dim.dwd_dim_store_i t4 on t3.store_no t4.store_no and t4.dt 2023-11-23; 会员复购统计天表 建表语句: CREATE TABLE IF NOT EXISTS ads.ads_mem_repurchase_day_i(trade_date STRING COMMENT 统计时间,store_no STRING COMMENT 店铺编码,store_name STRING COMMENT 店铺名称,store_sale_type BIGINT COMMENT 店铺销售类型,store_type_code BIGINT COMMENT 分店类型,city_id BIGINT COMMENT 城市ID,city_name STRING COMMENT 城市名称,region_code STRING COMMENT 区域编码,region_name STRING COMMENT 区域名称,is_day_clear BIGINT COMMENT 是否日清:0否,1是,member_count BIGINT COMMENT 统计日期用户量,next_member_count_1 BIGINT COMMENT 一日后用户量,next_member_count_2 BIGINT COMMENT 二日后用户量,next_member_count_3 BIGINT COMMENT 三日后用户量,next_member_count_4 BIGINT COMMENT 四日后用户量,next_member_count_5 BIGINT COMMENT 五日后用户量,next_member_count_6 BIGINT COMMENT 六日后用户量 ) comment 会员复购统计天表 partitioned by (dt STRING COMMENT 消费日期) row format delimited fields terminated by , stored as orc tblproperties (orc.compressSNAPPY); 数据导入: 复购是一个非常重要的指标用来衡量客户的粘性。这个需求需要统计当天下单的用户一日、二日到六日的复购情况为了方便后续使用这里不直接统计复购率而是统计人数。所以这个需求需要统计的指标为统计日期用户量、一日后用户量、二日后用户量、三日后用户量、四日后用户量、五日后用户量、六日后用户量。本需求类似于计算留存也就是统计当天的用户在1日、2日、3日。。。之后是否再次购买。 使用dwm_mem_member_behavior_day_i表进行计算。因为需求中最多需要计算六日后的用户量所以当天中的这些消费用户需要6天之后才能拿到所有的数据1日后2日后...6日后。换个角度看只有6天前的数据才会稳定6天内的分区对应的数据每天都要进行更新所以每天要更新6个分区的数据。如果计算n天后的复购人数其实就是用第一天的会员与第n天的会员进行关联这里使用左关联关联条件为会员id以及日期 能关联上的即是复购的用户然后再count()则可得到相应数值。 代码实现: -- 六天前消费用户和往后每一天的复购情况--会员主题: ADS层 会员复购天表 -- 需求: 计算某一天及相对于第一天往后六天每天的复购的人数 -- 思路: 首先知道6天前的那一天的所有的消费用户 基于这个结果 left join 往后1天的所有消费用户 left join 往后2天的所有消费用户 ...往后6天的所有消费用户 selectdate_format(date_add(day,-6,date 2023-09-20),%Y-%m-%d) as trade_date,s.store_no,s.store_name,s.store_sale_type,s.store_type_code,s.city_id,s.city_name,s.region_code,s.region_name,s.is_day_clear,count(day0.zt_id) as member_count,count(day1.zt_id) as next_member_count_1,count(day2.zt_id) as next_member_count_2,count(day3.zt_id) as next_member_count_3,count(day4.zt_id) as next_member_count_4,count(day5.zt_id) as next_member_count_5,count(day6.zt_id) as next_member_count_6,date_format(date_add(day,-6,date 2023-09-20),%Y-%m-%d) as dt from (selectt.zt_id,t.bind_md,t.dt as after,tt.days_after1,tt.days_after2,tt.days_after3,tt.days_after4,tt.days_after5,tt.days_after6from hive.dwm.dwm_mem_member_behavior_day_i t left join hive.dim.dwd_dim_date_f tt on t.trade_date tt.trade_datewhere dt date_format(date_add(day,-6,date 2023-09-20),%Y-%m-%d) and consume_times0 ) as day0 left join (selectzt_id,bind_md,dt as after1from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-5,date 2023-09-20),%Y-%m-%d) and consume_times0 ) day1 on day0.days_after1 day1.after1 and day0.zt_id day1.zt_id left join (selectzt_id,bind_md,dt as after2from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-4,date 2023-09-20),%Y-%m-%d) and consume_times0 ) day2 on day0.days_after2 day2.after2 and day0.zt_id day2.zt_id left join (selectzt_id,bind_md,dt as after3from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-3,date 2023-09-20),%Y-%m-%d) and consume_times0 ) day3 on day0.days_after3 day3.after3 and day0.zt_id day3.zt_id left join (selectzt_id,bind_md,dt as after4from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-2,date 2023-09-20),%Y-%m-%d) and consume_times0 ) day4 on day0.days_after4 day4.after4 and day0.zt_id day4.zt_id left join (selectzt_id,bind_md,dt as after5from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-1,date 2023-09-20),%Y-%m-%d) and consume_times0 ) day5 on day0.days_after5 day5.after5 and day0.zt_id day5.zt_id left join (selectzt_id,bind_md,dt as after6from hive.dwm.dwm_mem_member_behavior_day_iwhere dt 2023-09-20 and consume_times0 ) day6 on day0.days_after6 day6.after6 and day0.zt_id day6.zt_id join hive.dim.dwd_dim_store_i s on day0.bind_md s.store_no and s.dt 2023-09-24 group bys.store_no,s.store_name,s.store_sale_type,s.store_type_code,s.city_id,s.city_name,s.region_code,s.region_name,s.is_day_clear;另一种写法: 直接计算出 6天 及其每一天和后面六天的数据 --会员主题: ADS层 会员复购天表 -- 需求: 计算某一天及相对于第一天往后六天每天的复购的人数 -- 思路: 首先知道6天前的那一天的所有的消费用户 基于这个结果 left join 往后1天的所有消费用户 left join 往后2天的所有消费用户 ...往后6天的所有消费用户 selectday0.after as trade_date,s.store_no,s.store_name,s.store_sale_type,s.store_type_code,s.city_id,s.city_name,s.region_code,s.region_name,s.is_day_clear,count(day0.zt_id) as member_count,count(day1.zt_id) as next_member_count_1,count(day2.zt_id) as next_member_count_2,count(day3.zt_id) as next_member_count_3,count(day4.zt_id) as next_member_count_4,count(day5.zt_id) as next_member_count_5,count(day6.zt_id) as next_member_count_6,date_format(date_add(day,-6,date 2023-09-20),%Y-%m-%d) as dt from (-- 获取 统计日期前6天的所有的消费数据selectt.zt_id,t.bind_md,t.dt as after,tt.days_after1,tt.days_after2,tt.days_after3,tt.days_after4,tt.days_after5,tt.days_after6from hive.dwm.dwm_mem_member_behavior_day_i t left join hive.dim.dwd_dim_date_f tt on t.trade_date tt.trade_datewhere dt date_format(date_add(day,-6,date 2023-09-20),%Y-%m-%d)and dt 2023-09-20and consume_times0 ) as day0 left join (-- 获取 统计日期前5天和 后1天的的所有的消费数据selectzt_id,bind_md,dt as after1from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-5,date 2023-09-20),%Y-%m-%d)and dt date_format(date_add(day,1,date 2023-09-20),%Y-%m-%d)and consume_times0 ) day1 on day0.days_after1 day1.after1 and day0.zt_id day1.zt_id left join (-- 获取 统计日期前4天和 后2天的的所有的消费数据selectzt_id,bind_md,dt as after2from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-4,date 2023-09-20),%Y-%m-%d)and dt date_format(date_add(day,2,date 2023-09-20),%Y-%m-%d)and consume_times0 ) day2 on day0.days_after2 day2.after2 and day0.zt_id day2.zt_id left join (-- 获取 统计日期前3天和 后3天的的所有的消费数据selectzt_id,bind_md,dt as after3from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-3,date 2023-09-20),%Y-%m-%d)and dt date_format(date_add(day,3,date 2023-09-20),%Y-%m-%d)and consume_times0 ) day3 on day0.days_after3 day3.after3 and day0.zt_id day3.zt_id left join (-- 获取 统计日期前2天和 后4天的的所有的消费数据selectzt_id,bind_md,dt as after4from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-2,date 2023-09-20),%Y-%m-%d)and dt date_format(date_add(day,4,date 2023-09-20),%Y-%m-%d)and consume_times0 ) day4 on day0.days_after4 day4.after4 and day0.zt_id day4.zt_id left join (-- 获取 统计日期前1天和 后5天的的所有的消费数据selectzt_id,bind_md,dt as after5from hive.dwm.dwm_mem_member_behavior_day_iwhere dt date_format(date_add(day,-1,date 2023-09-20),%Y-%m-%d)and dt date_format(date_add(day,5,date 2023-09-20),%Y-%m-%d)and consume_times0 ) day5 on day0.days_after5 day5.after5 and day0.zt_id day5.zt_id left join (-- 获取 统计日期后6天的的所有的消费数据selectzt_id,bind_md,dt as after6from hive.dwm.dwm_mem_member_behavior_day_iwhere dt 2023-09-20and dt date_format(date_add(day,6,date 2023-09-20),%Y-%m-%d)and consume_times0 ) day6 on day0.days_after6 day6.after6 and day0.zt_id day6.zt_id join hive.dim.dwd_dim_store_i s on day0.bind_md s.store_no group byday0.after,s.store_no,s.store_name,s.store_sale_type,s.store_type_code,s.city_id,s.city_name,s.region_code,s.region_name,s.is_day_clear;
http://wiki.neutronadmin.com/news/38310/

相关文章:

  • 青海省住房和城乡建设局网站佛山英文网站建设
  • 工信部网站登陆嘉兴企业网站开发
  • 做网站好迷茫怎样调试 wordpress
  • 云虚服务器网站建设微信公众号人工咨询电话
  • 免费网站建站百度云怎么用wordpress建电商网站
  • 用ps做网站广告图k8s部署wordpress
  • 整站建设和网站优化拍卖公司宣传册设计样本
  • 具有品牌的做pc端网站打开网站乱码怎么做
  • 网页制作与网站建设策划书案例微餐饮网站建设用途
  • 烟台H5网站设计公司莱特币做空国外网站
  • 国内网站制作欣赏网站建设公司销售技巧
  • 如何自己创建一个网址seo发外链的网站
  • 润滑油 东莞网站建设网站快速备案被退回的几种原因分析
  • 东莞seo建站排名软文推广是什么意思
  • 扬中网站建设策划朋友圈广告代理商官网
  • 南阳做网站推广建一个网页需要多少钱
  • 北京飞雨网站建设公司wordpress dokan
  • 查外链网站wordpress5.2自动保存
  • 网站开发销售单页展示网站
  • 商品网站建设实验记录公司名字大全 必过
  • 网站的百度推广怎么做嵌入式软件开发工程师工作内容
  • 手机网站页面范例企业宣传册ppt模板
  • 免费制作动画网站成化区建设局网站
  • 全平台开发网站及app博客网站开发背景
  • 山东川畅信息技术有限公司网站建设网站怎么加代码
  • 郑州市建设局官方网站wordpress php
  • 网站快照历史各学院二级网站建设通报
  • 网站报价单申请网页要多少钱
  • 重庆网站运营购物网站功能
  • 网站中 点击出现登录框怎么做网站上怎么做弹幕效果