当前位置: 首页 > news >正文

北京网站建设文章湖南百度seo

北京网站建设文章,湖南百度seo,酷站海洛,怀化优化办主任4.6高级处理-缺失值处理 点击标题即可获取文章源代码和笔记 数据集#xff1a;https://download.csdn.net/download/weixin_44827418/12548095 Pandas高级处理缺失值处理数据离散化合并交叉表与透视表分组与聚合综合案例4.6 高级处理-缺失值处理1#xff09;如何进行缺失值处…4.6高级处理-缺失值处理 点击标题即可获取文章源代码和笔记 数据集https://download.csdn.net/download/weixin_44827418/12548095 Pandas高级处理缺失值处理数据离散化合并交叉表与透视表分组与聚合综合案例4.6 高级处理-缺失值处理1如何进行缺失值处理两种思路1删除含有缺失值的样本2替换/插补4.6.1 如何处理nan1判断数据中是否存在NaNpd.isnull(df)pd.notnull(df)2删除含有缺失值的样本df.dropna(inplaceFalse)替换/插补df.fillna(value, inplaceFalse)4.6.2 不是缺失值nan有默认标记的1替换 - np.nandf.replace(to_replace?, valuenp.nan)2处理np.nan缺失值的步骤2缺失值处理实例 4.7 高级处理-数据离散化性别 年龄 A 1 23 B 2 30 C 1 18物种 毛发 A 1 B 2 C 3男 女 年龄 A 1 0 23 B 0 1 30 C 1 0 18狗 猪 老鼠 毛发 A 1 0 0 2 B 0 1 0 1 C 0 0 1 1 one-hot编码哑变量 4.7.1 什么是数据的离散化原始的身高数据165174160180159163192184 4.7.2 为什么要离散化 4.7.3 如何实现数据的离散化1分组自动分组srpd.qcut(data, bins)自定义分组srpd.cut(data, [])2将分组好的结果转换成one-hot编码pd.get_dummies(sr, prefix) 4.8 高级处理-合并numpynp.concatnate((a, b), axis)水平拼接np.hstack()竖直拼接np.vstack()1按方向拼接pd.concat([data1, data2], axis1)2按索引拼接pd.merge实现合并pd.merge(left, right, howinner, on[索引]) 4.9 高级处理-交叉表与透视表找到、探索两个变量之间的关系4.9.1 交叉表与透视表什么作用4.9.2 使用crosstab(交叉表)实现pd.crosstab(value1, value2)4.9.3 pivot_table 4.10 高级处理-分组与聚合4.10.1 什么是分组与聚合4.10.2 分组与聚合APIdataframesr 4.6.1如何处理nan import pandas as pd movie pd.read_csv(./datas/IMDB-Movie-Data.csv) movieRankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376.012PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665.023SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262.034SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259.045Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240.0.......................................995996Secret in Their EyesCrime,Drama,MysteryA tight-knit team of rising investigators, alo...Billy RayChiwetel Ejiofor, Nicole Kidman, Julia Roberts...20151116.227585NaN45.0996997Hostel: Part IIHorrorThree American college students studying abroa...Eli RothLauren German, Heather Matarazzo, Bijou Philli...2007945.57315217.5446.0997998Step Up 2: The StreetsDrama,Music,RomanceRomantic sparks occur between two dance studen...Jon M. ChuRobert Hoffman, Briana Evigan, Cassie Ventura,...2008986.27069958.0150.0998999Search PartyAdventure,ComedyA pair of friends embark on a mission to reuni...Scot ArmstrongAdam Pally, T.J. Miller, Thomas Middleditch,Sh...2014935.64881NaN22.09991000Nine LivesComedy,Family,FantasyA stuffy businessman finds himself trapped ins...Barry SonnenfeldKevin Spacey, Jennifer Garner, Robbie Amell,Ch...2016875.31243519.6411.0 1000 rows × 12 columns # 1. 判断是否存在NaN类型的缺失值,为True的就是缺失值 movie.isnull()RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore0FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse1FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse2FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse3FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse4FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse.......................................995FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse996FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse997FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse998FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse999FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse 1000 rows × 12 columns import numpy as np# any() 只要有一个True就会返回True # 返回结果为True说明数据中存在缺失值 np.any(movie.isnull())True# 为False的就是缺失值 pd.notnull(movie)RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore0TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue1TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue2TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue3TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue4TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue.......................................995TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueFalseTrue996TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue997TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue998TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueFalseTrue999TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue 1000 rows × 12 columns # all()只要有一个False就返回False # 返回结果为False说明数据中存在缺失值 np.all(pd.notnull(movie))Falsepd.isnull(movie).any()Rank False Title False Genre False Description False Director False Actors False Year False Runtime (Minutes) False Rating False Votes False Revenue (Millions) True Metascore True dtype: boolpd.notnull(movie).all()Rank True Title True Genre True Description True Director True Actors True Year True Runtime (Minutes) True Rating True Votes True Revenue (Millions) False Metascore False dtype: bool# 缺失值处理 # 方法1 删除含有缺失值的样本 movie_full movie.dropna()movie_full.isnull().any()Rank False Title False Genre False Description False Director False Actors False Year False Runtime (Minutes) False Rating False Votes False Revenue (Millions) False Metascore False dtype: bool# 方法2 替换 movie.head()RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376.012PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665.023SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262.034SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259.045Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240.0 movie[Revenue (Millions)].mean()82.95637614678897# 含有缺失值的字段 # Revenue (Millions) False # Metascore False movie[Revenue (Millions)].fillna(movie[Revenue (Millions)].mean(),inplaceTrue)movie[Revenue (Millions)].isnull().any()False# inplaceTrue ,直接在原数据上进行填充 movie[Metascore].fillna(movie[Metascore].mean(),inplaceTrue)movie[Metascore].isnull().any()Falsemovie.isnull().any() # 缺失值已经处理完毕Rank False Title False Genre False Description False Director False Actors False Year False Runtime (Minutes) False Rating False Votes False Revenue (Millions) False Metascore False dtype: bool不是缺失值nan有默认标记的处理方法 data pd.read_csv(./datas/GBvideos.csv,encodingGBK)datavideo_idtitlechannel_titlecategory_idtagsviewslikesdislikescomment_totalthumbnail_linkdate0jt2OHQh0HoQLive Apple Event - Apple September Event 2017 ...Apple Event28apple events|apple event|iphone 8|iphone x|iph...74263937824013548705https://i.ytimg.com/vi/jt2OHQh0HoQ/default_liv...13.091AqokkXoa7uEHolly and Phillip Meet Samantha the Sex Robot ...This Morning24this morning|interview|holly willoughby|philli...494203265113090https://i.ytimg.com/vi/AqokkXoa7uE/default.jpg13.092YPVcg45W0z4My DNA Test Results? Im WHAT??emmablackery24emmablackery|emma blackery|emma|blackery|briti...142819131191511141https://i.ytimg.com/vi/YPVcg45W0z4/default.jpg13.093T_PuZBdT2iMgetting into a conversation in a language you ...ProZD1skit|korean|language|conversation|esl|japanese...15800286572915293598https://i.ytimg.com/vi/T_PuZBdT2iM/default.jpg13.094NsjsmgmbCfcBaby Name Challenge?Sprinkleofglitter26sprinkleofglitter|sprinkle of glitter|baby gli...40592501957490https://i.ytimg.com/vi/NsjsmgmbCfc/default.jpg13.09....................................1595w8fAellnPnsJuicy Chicken Breast - You Suck at Cooking (ep...You Suck At Cooking26how to|cooking|recipe|kitchen|chicken|chicken ...788466319459452274https://i.ytimg.com/vi/w8fAellnPns/default.jpg20.091596RsG37JcEQNwWeezer - Beach Boysweezer10weezer|pacific daydream|pacificdaydream|beach ...1079272435412641https://i.ytimg.com/vi/RsG37JcEQNw/default.jpg20.091597htSiIA2g7G8Berry Frozen Yogurt Bark RecipeSORTEDfood26frozen yogurt bark|frozen yoghurt bark|frozen ...109222484035212https://i.ytimg.com/vi/htSiIA2g7G8/default.jpg20.091598ZQK1F0wz6z4What Do You Want to Eat??Wong Fu Productions24panda|what should we eat|buzzfeed|comedy|boyfr...626223229625321559https://i.ytimg.com/vi/ZQK1F0wz6z4/default.jpg20.091599DuPXdnSWoLkThe Child in Time: Trailer - BBC OneBBC24BBC|iPlayer|bbc one|bbc 1|bbc1|trailer|the chi...992281699?135https://i.ytimg.com/vi/DuPXdnSWoLk/default.jpg20.09 1600 rows × 11 columns # 1. 将 替换为np.nan new_data data.replace(to_replace?,valuenp.nan)new_datavideo_idtitlechannel_titlecategory_idtagsviewslikesdislikescomment_totalthumbnail_linkdate0jt2OHQh0HoQLive Apple Event - Apple September Event 2017 ...Apple Event28apple events|apple event|iphone 8|iphone x|iph...74263937824013548705https://i.ytimg.com/vi/jt2OHQh0HoQ/default_liv...13.091AqokkXoa7uEHolly and Phillip Meet Samantha the Sex Robot ...This Morning24this morning|interview|holly willoughby|philli...494203265113090https://i.ytimg.com/vi/AqokkXoa7uE/default.jpg13.092YPVcg45W0z4My DNA Test Results? Im WHAT??emmablackery24emmablackery|emma blackery|emma|blackery|briti...142819131191511141https://i.ytimg.com/vi/YPVcg45W0z4/default.jpg13.093T_PuZBdT2iMgetting into a conversation in a language you ...ProZD1skit|korean|language|conversation|esl|japanese...15800286572915293598https://i.ytimg.com/vi/T_PuZBdT2iM/default.jpg13.094NsjsmgmbCfcBaby Name Challenge?Sprinkleofglitter26sprinkleofglitter|sprinkle of glitter|baby gli...40592501957490https://i.ytimg.com/vi/NsjsmgmbCfc/default.jpg13.09....................................1595w8fAellnPnsJuicy Chicken Breast - You Suck at Cooking (ep...You Suck At Cooking26how to|cooking|recipe|kitchen|chicken|chicken ...788466319459452274https://i.ytimg.com/vi/w8fAellnPns/default.jpg20.091596RsG37JcEQNwWeezer - Beach Boysweezer10weezer|pacific daydream|pacificdaydream|beach ...1079272435412641https://i.ytimg.com/vi/RsG37JcEQNw/default.jpg20.091597htSiIA2g7G8Berry Frozen Yogurt Bark RecipeSORTEDfood26frozen yogurt bark|frozen yoghurt bark|frozen ...109222484035212https://i.ytimg.com/vi/htSiIA2g7G8/default.jpg20.091598ZQK1F0wz6z4What Do You Want to Eat??Wong Fu Productions24panda|what should we eat|buzzfeed|comedy|boyfr...626223229625321559https://i.ytimg.com/vi/ZQK1F0wz6z4/default.jpg20.091599DuPXdnSWoLkThe Child in Time: Trailer - BBC OneBBC24BBC|iPlayer|bbc one|bbc 1|bbc1|trailer|the chi...992281699NaN135https://i.ytimg.com/vi/DuPXdnSWoLk/default.jpg20.09 1600 rows × 11 columns new_data.isnull().any() # 说明dislikes列中的已经替换成了NaNvideo_id False title False channel_title False category_id False tags False views False likes False dislikes True comment_total False thumbnail_link False date False dtype: boolnew_data.dropna(inplaceTrue)new_data.isnull().any()video_id False title False channel_title False category_id False tags False views False likes False dislikes False comment_total False thumbnail_link False date False dtype: bool4.7 高级处理-数据离散化 import pandas as pd # 准备数据 data pd.Series([165,174,160,180,159,163,192,184],index[No1:165,No2:174,No3:160,No4:180,No5:159,No6:163,No7:192,No8:184]) dataNo1:165 165 No2:174 174 No3:160 160 No4:180 180 No5:159 159 No6:163 163 No7:192 192 No8:184 184 dtype: int64自动分组 # 1. 分组# 自动分组 #qcut(data,组数) sr pd.qcut(data,3) srNo1:165 (163.667, 178.0] No2:174 (163.667, 178.0] No3:160 (158.999, 163.667] No4:180 (178.0, 192.0] No5:159 (158.999, 163.667] No6:163 (158.999, 163.667] No7:192 (178.0, 192.0] No8:184 (178.0, 192.0] dtype: category Categories (3, interval[float64]): [(158.999, 163.667] (163.667, 178.0] (178.0, 192.0]]# 查看分组情况 sr.value_counts()(178.0, 192.0] 3 (158.999, 163.667] 3 (163.667, 178.0] 2 dtype: int64type(sr)pandas.core.series.Series# 2. 将分组好的结果转换成独热编码 # prefix,设置列名的前缀 pd.get_dummies(sr,prefixheight)height_(158.999, 163.667]height_(163.667, 178.0]height_(178.0, 192.0]No1:165010No2:174010No3:160100No4:180001No5:159100No6:163100No7:192001No8:184001 自定义分组 # 自定义分组 # pd.cut(data,包含全部分界值的列表) sr pd.cut(data,[150,165,180,195]) srNo1:165 (150, 165] No2:174 (165, 180] No3:160 (150, 165] No4:180 (165, 180] No5:159 (150, 165] No6:163 (150, 165] No7:192 (180, 195] No8:184 (180, 195] dtype: category Categories (3, interval[int64]): [(150, 165] (165, 180] (180, 195]]sr.value_counts()(150, 165] 4 (180, 195] 2 (165, 180] 2 dtype: int64pd.get_dummies(sr,prefix身高)身高_(150, 165]身高_(165, 180]身高_(180, 195]No1:165100No2:174010No3:160100No4:180010No5:159100No6:163100No7:192001No8:184001 4.8 高级处理-合并 4.8.1 pd.concat实现合并按方向拼接 data1 np.arange(0,20,1).reshape(4,5) data1 pd.DataFrame(data1) data1012340012341567892101112131431516171819 data2 np.arange(100,120,1).reshape(4,5) data2 pd.DataFrame(data2) data2012340100101102103104110510610710810921101111121131143115116117118119 # 将data1 和 data2 进行水平拼接 data_concat pd.concat([data1,data2],axis1)data_concat01234012340012341001011021031041567891051061071081092101112131411011111211311431516171819115116117118119 data2.T012301001051101151101106111116210210711211731031081131184104109114119 # 将data1 和 data2 进行竖直拼接 data_concat1 pd.concat([data1,data2.T],axis0)data_concat101234001234.0156789.021011121314.031516171819.00100105110115NaN1101106111116NaN2102107112117NaN3103108113118NaN4104109114119NaN 4.8.2 pd.merge实现合并按索引拼接 leftpd.DataFrame({key1:[K0,K0,K1,K2], key2:[K0,K1,K0,K1], A:[A0,A1,A2,A3], B:[B0,B1,B2,B3]}) leftkey1key2AB0K0K0A0B01K0K1A1B12K1K0A2B23K2K1A3B3 rightpd.DataFrame({key1:[K0,K1,K1,K2], key2:[K0,K0,K0,K0], C:[Co,C1,C2,C3],D:[DO,D1,D2,D3]}) rightkey1key2CD0K0K0CoDO1K1K0C1D12K1K0C2D23K2K0C3D3 # 默认内连接inner # inner 保留共有的key result pd.merge(left,right,on[key1,key2],howinner) resultkey1key2ABCD0K0K0A0B0CoDO1K1K0A2B2C1D12K1K0A2B2C2D2 # left ,左连接 # 左表中所有的key都保留以左表为主进行合并 result_left pd.merge(left,right,on[key1,key2],howleft) result_leftkey1key2ABCD0K0K0A0B0CoDO1K0K1A1B1NaNNaN2K1K0A2B2C1D13K1K0A2B2C2D24K2K1A3B3NaNNaN # right ,右连接 # 右表中所有的key都保留以右表为主进行合并 result_right pd.merge(left,right,on[key1,key2],howright) result_rightkey1key2ABCD0K0K0A0B0CoDO1K1K0A2B2C1D12K1K0A2B2C2D23K2K0NaNNaNC3D3 # outer ,外连接 # 左右两表中所有的key都保留进行合并 result_outer pd.merge(left,right,on[key1,key2],howouter) result_outerkey1key2ABCD0K0K0A0B0CoDO1K0K1A1B1NaNNaN2K1K0A2B2C1D13K1K0A2B2C2D24K2K1A3B3NaNNaN5K2K0NaNNaNC3D3 4.9 高级处理-交叉表与透视表 用来探索两个变量之间的关系 4.9.2 使用crosstab交叉表实现 data pd.read_excel(./datas/szfj_baoan.xls) datadistrictroomnumhallAREAC_floorfloor_numschoolsubwayper_price0baoan3289.3middle31007.07731baoan42127.0high31006.92912baoan1128.0low39003.92863baoan1128.0middle30003.35684baoan2278.0middle8115.0769..............................1246baoan4289.3low8004.25531247baoan2167.0middle30003.80601248baoan2267.4middle29105.34121249baoan2273.1low15105.95081250baoan3286.2middle32014.5244 1251 rows × 9 columns time 2020-06-23 # pandas日期类型 date pd.to_datetime(time) dateTimestamp(2020-06-23 00:00:00)type(date)pandas._libs.tslibs.timestamps.Timestampdate.year2020date.month6data[week] date.weekdaydata.drop(week,axis1,inplaceTrue)datadistrictroomnumhallAREAC_floorfloor_numschoolsubwayper_price0baoan3289.3middle31007.07731baoan42127.0high31006.92912baoan1128.0low39003.92863baoan1128.0middle30003.35684baoan2278.0middle8115.0769..............................1246baoan4289.3low8004.25531247baoan2167.0middle30003.80601248baoan2267.4middle29105.34121249baoan2273.1low15105.95081250baoan3286.2middle32014.5244 1251 rows × 9 columns data[feature] np.where(data[per_price] 5.0000,1,0)datadistrictroomnumhallAREAC_floorfloor_numschoolsubwayper_pricefeature0baoan3289.3middle31007.077311baoan42127.0high31006.929112baoan1128.0low39003.928603baoan1128.0middle30003.356804baoan2278.0middle8115.07691.................................1246baoan4289.3low8004.255301247baoan2167.0middle30003.806001248baoan2267.4middle29105.341211249baoan2273.1low15105.950811250baoan3286.2middle32014.52440 1251 rows × 10 columns # 交叉表# 查看楼层 和 每平方米单价是否50000的关系 # 返回值为每个楼层中为0的个数和为1的个数 data0 pd.crosstab(data[floor_num],data[feature]) data0feature01floor_num168301401063771625819329211104911811121313420140515833169191720211817351911520242116220123482410262543726957275382863529266830307831415132211263334203415351236043711380139510401343014406450747015001510352025301 data0.sum(axis1) # 按行求和floor_num 1 14 3 1 4 10 6 10 7 41 8 51 9 13 10 13 11 19 12 4 13 24 14 5 15 41 16 28 17 41 18 52 19 16 20 6 21 7 22 1 23 12 24 36 25 41 26 66 27 43 28 41 29 94 30 108 31 155 32 147 33 54 34 6 35 3 36 4 37 2 38 1 39 15 40 4 43 1 44 6 45 7 47 1 50 1 51 3 52 2 53 1 dtype: int64data0.div(data0.sum(axis1),axis0) # 按行做除法feature01floor_num10.4285710.57142930.0000001.00000040.0000001.00000060.3000000.70000070.3902440.60975680.3725490.62745190.1538460.846154100.3076920.692308110.4210530.578947120.2500000.750000130.1666670.833333140.0000001.000000150.1951220.804878160.3214290.678571170.4878050.512195180.3269230.673077190.6875000.312500200.3333330.666667210.1428570.857143220.0000001.000000230.3333330.666667240.2777780.722222250.0975610.902439260.1363640.863636270.1162790.883721280.1463410.853659290.2765960.723404300.2777780.722222310.0258060.974194320.1428570.857143330.6296300.370370340.1666670.833333350.3333330.666667360.0000001.000000370.5000000.500000380.0000001.000000390.3333330.666667400.2500000.750000430.0000001.000000440.0000001.000000450.0000001.000000470.0000001.000000500.0000001.000000510.0000001.000000520.0000001.000000530.0000001.000000 data_percent data0.div(data0.sum(axis1),axis0) data_percentfeature01floor_num10.4285710.57142930.0000001.00000040.0000001.00000060.3000000.70000070.3902440.60975680.3725490.62745190.1538460.846154100.3076920.692308110.4210530.578947120.2500000.750000130.1666670.833333140.0000001.000000150.1951220.804878160.3214290.678571170.4878050.512195180.3269230.673077190.6875000.312500200.3333330.666667210.1428570.857143220.0000001.000000230.3333330.666667240.2777780.722222250.0975610.902439260.1363640.863636270.1162790.883721280.1463410.853659290.2765960.723404300.2777780.722222310.0258060.974194320.1428570.857143330.6296300.370370340.1666670.833333350.3333330.666667360.0000001.000000370.5000000.500000380.0000001.000000390.3333330.666667400.2500000.750000430.0000001.000000440.0000001.000000450.0000001.000000470.0000001.000000500.0000001.000000510.0000001.000000520.0000001.000000530.0000001.000000 # stackedTrue 是否重叠显示 data_percent.plot(kindbar,stackedTrue)matplotlib.axes._subplots.AxesSubplot at 0x24719dd7488data_percent data0.div(data0.sum(axis1),axis0) data_percenttrth50/thtd0.000000/tdtd1.000000/td /tr trth51/thtd0.000000/tdtd1.000000/td /tr trth52/thtd0.000000/tdtd1.000000/td /tr trth53/thtd0.000000/tdtd1.000000/td /trfeature01floor_num10.4285710.57142930.0000001.00000040.0000001.00000060.3000000.70000070.3902440.60975680.3725490.62745190.1538460.846154100.3076920.692308110.4210530.578947120.2500000.750000130.1666670.833333140.0000001.000000150.1951220.804878160.3214290.678571170.4878050.512195180.3269230.673077190.6875000.312500200.3333330.666667210.1428570.857143220.0000001.000000230.3333330.666667240.2777780.722222250.0975610.902439260.1363640.863636270.1162790.883721280.1463410.853659290.2765960.723404300.2777780.722222 4.9.3使用pivot_table透视表实现 # 通过透视表整个过程会变得更加简单些 # 结果直接就是值为1的百分比 data.pivot_table([feature],index[floor_num])... featurefloor_num10.57142931.00000041.00000060.700000501.000000511.000000521.000000531.000000 4.10 高级处理-分组与聚合 4.10.2 分组与聚合API col pd.DataFrame({color:[white,red,green,red,green],object:[pen,pencil,pencil,ashtray,pen],price1:[4.56,4.20,1.30,0.56,2.75],price2:[4.75,4.12,1.68,0.75,3.15]}) colcolorobjectprice1price20whitepen4.564.751redpencil4.204.122greenpencil1.301.683redashtray0.560.754greenpen2.753.15 # 进行分组对颜色进行分组对价格price1进行聚合 # 用DataFrame的方法进行分组 col.groupby(bycolor)[price1].max()color green 2.75 red 4.20 white 4.56 Name: price1, dtype: float64# 用Series的方法进行分组 col[price1].groupby(col[color])pandas.core.groupby.generic.SeriesGroupBy object at 0x000002471D178D08col[price1].groupby(col[color]).max()color green 2.75 red 4.20 white 4.56 Name: price1, dtype: float644.11 综合案例 # 1. 准备数据 movie pd.read_csv(./datas/IMDB-Movie-Data.csv) movieRankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376.012PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665.023SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262.034SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259.045Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240.0.......................................995996Secret in Their EyesCrime,Drama,MysteryA tight-knit team of rising investigators, alo...Billy RayChiwetel Ejiofor, Nicole Kidman, Julia Roberts...20151116.227585NaN45.0996997Hostel: Part IIHorrorThree American college students studying abroa...Eli RothLauren German, Heather Matarazzo, Bijou Philli...2007945.57315217.5446.0997998Step Up 2: The StreetsDrama,Music,RomanceRomantic sparks occur between two dance studen...Jon M. ChuRobert Hoffman, Briana Evigan, Cassie Ventura,...2008986.27069958.0150.0998999Search PartyAdventure,ComedyA pair of friends embark on a mission to reuni...Scot ArmstrongAdam Pally, T.J. Miller, Thomas Middleditch,Sh...2014935.64881NaN22.09991000Nine LivesComedy,Family,FantasyA stuffy businessman finds himself trapped ins...Barry SonnenfeldKevin Spacey, Jennifer Garner, Robbie Amell,Ch...2016875.31243519.6411.0 1000 rows × 12 columns #问题1我们想知道这些电影数据中评分的平均分导演的人数等信息 # 我们应该怎么获取 movie[Rating].mean()6.723200000000003movie[Director]0 James Gunn 1 Ridley Scott 2 M. Night Shyamalan 3 Christophe Lourdelet 4 David Ayer... 995 Billy Ray 996 Eli Roth 997 Jon M. Chu 998 Scot Armstrong 999 Barry Sonnenfeld Name: Director, Length: 1000, dtype: object# np.unique()去重因为导演可能是多个电影的导演 np.unique(movie[Director])array([Aamir Khan, Abdellatif Kechiche, Adam Leon, Adam McKay,Adam Shankman, Adam Wingard, Afonso Poyart, Aisling Walsh,Akan Satayev, Akiva Schaffer, Alan Taylor, Albert Hughes,Alejandro Amenábar, Alejandro González Iñárritu,...Tomas Alfredson, Tony Gilroy, Tony Scott, Travis Knight,Tyler Shields, Wally Pfister, Walt Dohrn, Walter Hill,Warren Beatty, Werner Herzog, Wes Anderson, Wes Ball,Wes Craven, Whit Stillman, Will Gluck, Will Slocombe,William Brent Bell, William Oldroyd, Woody Allen,Xavier Dolan, Yimou Zhang, Yorgos Lanthimos, Zack Snyder,Zackary Adler], dtypeobject)# 导演的人数 np.unique(movie[Director]).size644# 问题2 对于这一组电影数据如果我们先rating,runtime的分布情况应该如何呈现数据 movie[Rating].plot(kindhist,figsize(20,8),fontsize40)matplotlib.axes._subplots.AxesSubplot at 0x2471ce18708import matplotlib.pyplot as plt# 1. 创建画布 plt.figure(figsize(20,8),dpi100)# 2. 绘制直方图 plt.hist(movie[Rating],20)# 修改刻度 plt.xticks(np.linspace(movie[Rating].min(),movie[Rating].max(),21))# 添加网格 plt.grid(linestyle--,alpha0.5)# 3. 显示图像 plt.show()movie[Rating]0 8.1 1 7.0 2 7.3 3 7.2 4 6.2... 995 6.2 996 5.5 997 6.2 998 5.6 999 5.3 Name: Rating, Length: 1000, dtype: float64# 问题3对于这一组电影数据如果我们希望统计电影分类genre的情况应该如何处理数据# 先统计电影类别有哪些 movie_genre [i.split(,) for i in movie[Genre]] movie_genre[[Action, Adventure, Sci-Fi],[Adventure, Mystery, Sci-Fi],[Horror, Thriller],[Animation, Comedy, Family],[Action, Adventure, Fantasy],...[Horror],[Drama, Music, Romance],[Adventure, Comedy],[Comedy, Family, Fantasy]][j for i in movie_genre for j in i][Action,Adventure,Sci-Fi,Adventure,Mystery,Sci-Fi, ...Animation,Action,Adventure,Action,Adventure,Drama,...]movie_class np.unique([j for i in movie_genre for j in i])movie_classarray([Action, Adventure, Animation, Biography, Comedy, Crime,Drama, Family, Fantasy, History, Horror, Music,Musical, Mystery, Romance, Sci-Fi, Sport, Thriller,War, Western], dtypeU9)len(movie_class) # 20 个电影类别20# 统计每个类别有几个电影# 先创建一个空的DataFrame表 count pd.DataFrame(np.zeros(shape[1000,20],dtypeint32),columnsmovie_class)count.head()ActionAdventureAnimationBiographyComedyCrimeDramaFamilyFantasyHistoryHorrorMusicMusicalMysteryRomanceSci-FiSportThrillerWarWestern000000000000000000000100000000000000000000200000000000000000000300000000000000000000400000000000000000000 count.loc[0,movie_genre[0]]Action 0 Adventure 0 Sci-Fi 0 Name: 0, dtype: int32movie_genre[0][Action, Adventure, Sci-Fi]# 计数填表 for i in range(1000):count.loc[i,movie_genre[i]] 1countActionAdventureAnimationBiographyComedyCrimeDramaFamilyFantasyHistoryHorrorMusicMusicalMysteryRomanceSci-FiSportThrillerWarWestern011000000000000010000101000000000001010000200000000001000000100300101001000000000000411000000100000000000...............................................................9950000011000000100000099600000000001000000000997000000100001001000009980100100000000000000099900001001100000000000 1000 rows × 20 columns # 按列求和 count.sum(axis0)Action 303 Adventure 259 Animation 49 Biography 81 Comedy 279 Crime 150 Drama 513 Family 51 Fantasy 101 History 29 Horror 119 Music 16 Musical 5 Mystery 106 Romance 141 Sci-Fi 120 Sport 18 Thriller 195 War 13 Western 7 dtype: int64count.sum(axis0).sort_values(ascendingFalse)Drama 513 Action 303 Comedy 279 Adventure 259 Thriller 195 Crime 150 Romance 141 Sci-Fi 120 Horror 119 Mystery 106 Fantasy 101 Biography 81 Family 51 Animation 49 History 29 Sport 18 Music 16 War 13 Western 7 Musical 5 dtype: int64count.sum(axis0).sort_values(ascendingFalse).plot(kindbar,fontsize20,figsize(20,9),colormapcool)matplotlib.axes._subplots.AxesSubplot at 0x2472450c1c8
http://www.yutouwan.com/news/454815/

相关文章:

  • 郑州汽车网站建设哪家好vs2008 新建网站
  • 网站落地页和普通网页公司网站建设需要考虑什么问题
  • 网站托管方式安徽 两学一做 网站
  • 做卫生用品的网站网站注意事项
  • 公司宣传页设计印刷新网站做seo
  • 网站建设简图广东省医院建设协会网站首页
  • 网站推广免费 优帮云wordpress标题太长
  • 永嘉网站建设几网站制作中山
  • 佛山企业网站优化曲靖做网站需要多少钱
  • 丹东淘宝做网站美发店会员卡管理系统
  • 电商网站设计动态视觉设计网站
  • 临海做网站公司我的小程序在哪里找
  • 老薛主机做多个网站电商运营招聘
  • 重庆电商网站提高网站粘性
  • 用ps做网站导航如何做高并发网站的架构设计
  • 做网站的收钱不管了提供网站制作
  • 成都网站建设cdxwcxwordpress 上传svg
  • php 网站开发收费如何给网站做二维码
  • 营销网站建设专业服务公司0元无货源开网店怎么开
  • 做微信公众号的网站有哪些内容网站被恶意关键字访问
  • 做鞋子皮革有什么网站做百度手机网站优化快
  • 网站掉权重是怎么回事龙口网络
  • 黑龙江省建设教育协会网站首页黄页在哪里买?
  • 张家港外贸型网站制作婚庆 wordpress
  • 开发区建网站外包关于做网站电话销售
  • 我的网站360搜索被做跳转电话网络营销是什么
  • 大同市住房城乡建设网站东莞企业建站程序
  • 毕业设计做视频网站网站服务器错误怎么解决
  • 网站建设 协议书 doc网站的js效果代码大全
  • 商派商城网站建设公司wordpress 文章调用函数