当前位置: 首页 > news >正文

网站设计语言有哪些做网站 用asp

网站设计语言有哪些,做网站 用asp,外贸网站推广 雅虎问答有用吗,莆田建设网站经典又兼具备趣味性的Kaggle案例泰坦尼克号问题 大家都熟悉的『Jack and Rose』的故事#xff0c;豪华游艇倒了#xff0c;大家都惊恐逃生#xff0c;可是救生艇的数量有限#xff0c;无法人人都有#xff0c;副船长发话了『lady and kid first#xff01;』#xff0c…经典又兼具备趣味性的Kaggle案例泰坦尼克号问题 大家都熟悉的『Jack and Rose』的故事豪华游艇倒了大家都惊恐逃生可是救生艇的数量有限无法人人都有副船长发话了『lady and kid first』所以是否获救其实并非随机而是基于一些背景有rank先后的。 训练和测试数据是一些乘客的个人信息以及存活状况要尝试根据它生成合适的模型并预测其他人的存活状况。 对这是一个二分类问题很多分类算法都可以解决。 看看数据长什么样 还是用pandas加载数据 # 这个ipython notebook主要是我解决Kaggle Titanic问题的思路和过程 %matplotlib inline import matplotlib.pyplot as plt plt.rcParams[font.sans-serif][SimHei] plt.rcParams[axes.unicode_minus]Falseimport pandas as pd #数据分析 import numpy as np #科学计算 from pandas import Series,DataFrame 第一步读取数据并认识数据 data_train pd.read_csv(Train.csv) #从本地读取训练集 data_train.columns #输出数据的属性列都有哪些 #data_train[data_train.Cabin.notnull()][Survived].value_counts()Index([uPassengerId, uSurvived, uPclass, uName, uSex, uAge,uSibSp, uParch, uTicket, uFare, uCabin, uEmbarked],dtypeobject)我们看大概有以下这些字段 PassengerId 乘客ID Pclass 乘客等级(1/2/3等舱位) Name 乘客姓名 Sex 性别 Age 年龄 SibSp 堂兄弟/妹个数 Parch 父母与小孩个数 Ticket 船票信息 Fare 票价 Cabin 客舱 Embarked 登船港口 我这么懒的人显然会让pandas自己先告诉我们一些信息 data_train.info() #查看数据中每个属性的类别数值型还是类别型和是否含有缺失值class pandas.core.frame.DataFrame RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6 KB上面的数据说啥了它告诉我们训练数据中总共有891名乘客但是很不幸我们有些属性的数据不全比如说 Age年龄属性只有714名乘客有记录Cabin客舱更是只有204名乘客是已知的 似乎信息略少啊想再瞄一眼具体数据数值情况呢恩我们用下列的方法得到数值型数据的一些分布(因为有些属性比如姓名是文本型而另外一些属性比如登船港口是类目型。这些我们用下面的函数是看不到的) data_train.describe() #用于查看数值型的数据的统计信息可以初略的看出数值型数据的一个大体的分布PassengerIdSurvivedPclassAgeSibSpParchFarecount891.000000891.000000891.000000714.000000891.000000891.000000891.000000mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208std257.3538420.4865920.83607114.5264971.1027430.80605749.693429min1.0000000.0000001.0000000.4200000.0000000.0000000.00000025%223.5000000.0000002.00000020.1250000.0000000.0000007.91040050%446.0000000.0000003.00000028.0000000.0000000.00000014.45420075%668.5000001.0000003.00000038.0000001.0000000.00000031.000000max891.0000001.0000003.00000080.0000008.0000006.000000512.329200 mean字段告诉我们大概0.383838的人最后获救了2/3等舱的人数比1等舱要多平均乘客年龄大概是29.7岁(计算这个时候会略掉无记录的)等等… 『对数据的认识太重要了』『对数据的认识太重要了』『对数据的认识太重要了』 口号喊完了上面的简单描述信息并没有什么卵用啊咱们得再细一点分析下数据啊。 看看每个/多个 属性和最后的Survived之间有着什么样的关系 第二步通过对数据可视化进一步认识数据 import matplotlib.pyplot as plt fig plt.figure(figsize(18,9)) fig.set(alpha0.2) # 设定图表颜色alpha参数#总共只有5个数值型的属性所以对这5个数值型数据与标签的相关性做一个可视化 plt.subplot2grid((2,3),(0,0)) # 在一张大图里分列几个小图这里是第一行第一列的图 data_train.Survived.value_counts().plot(kindbar)# plots a bar graph of those who surived vs those who did not. plt.title(u获救情况 (1为获救)) # puts a title on our graph plt.ylabel(u人数) plt.subplot2grid((2,3),(0,1)) #这里是第一行第二列的图 data_train.Pclass.value_counts().plot(kindbar) #画出客仓等级的统计直方图 plt.ylabel(u人数) plt.title(u乘客等级分布)plt.subplot2grid((2,3),(0,2)) #这里是第一行第三列的图 plt.scatter(data_train.Survived, data_train.Age) #画出存活与年龄的散点图 plt.ylabel(u年龄) # sets the y axis lable plt.grid(bTrue, whichmajor, axisy) # formats the grid line style of our graphs plt.title(u按年龄看获救分布 (1为获救))plt.subplot2grid((2,3),(1,0), colspan2) #这里是第二行第一和二列的图 data_train.Age[data_train.Pclass 1].plot(kindkde) # plots a kernel desnsity estimate of the subset of the 1st class passangess age data_train.Age[data_train.Pclass 2].plot(kindkde) data_train.Age[data_train.Pclass 3].plot(kindkde) plt.xlabel(u年龄)# plots an axis lable plt.ylabel(u密度) plt.title(u各等级的乘客年龄分布) plt.legend((u头等舱, u2等舱,u3等舱),locbest) # sets our legend for our graph.plt.subplot2grid((2,3),(1,2)) #这里是第二行第二列的图 data_train.Embarked.value_counts().plot(kindbar) plt.title(u各登船口岸上船人数) plt.ylabel(u人数) plt.show() 于是得到了像下面这样一张图 bingo图还是比数字好看多了。所以我们在图上可以看出来: 被救的人300多点不到半数3等舱乘客灰常多遇难和获救的人年龄似乎跨度都很广3个不同的舱年龄总体趋势似乎也一致2/3等舱乘客20岁多点的人最多1等舱40岁左右的最多(→_→似乎符合财富和年龄的分配哈咳咳别理我我瞎扯的)登船港口人数按照S、C、Q递减而且S远多于另外俩港口。 这个时候我们可能会有一些想法了 不同舱位/乘客等级可能和财富/地位有关系最后获救概率可能会不一样年龄对获救概率也一定是有影响的毕竟前面说了副船长还说『小孩和女士先走』呢和登船港口是不是有关系呢也许登船港口不同人的出身地位不同 口说无凭空想无益。老老实实再来统计统计看看这些属性值的统计分布吧。 #看看各乘客等级的获救情况 fig plt.figure() fig.set(alpha0.2) # 设定图表颜色alpha参数Survived_0 data_train.Pclass[data_train.Survived 0].value_counts() #将未获救的等级部分数据取出 Survived_1 data_train.Pclass[data_train.Survived 1].value_counts() #将获救的等级部分数据取出 dfpd.DataFrame({u获救:Survived_1, u未获救:Survived_0}) #构造dataframe数据结构 df.plot(kindbar, stackedTrue) #画出堆叠柱状图 plt.title(u各乘客等级的获救情况) plt.xlabel(u乘客等级) plt.ylabel(u人数) plt.show() dfFigure size 432x288 with 0 Axes未获救获救180136297873372119 得到这个图 啧啧果然钱和地位对舱位有影响进而对获救的可能性也有影响啊←_← 咳咳跑题了我想说的是明显等级为1的乘客获救的概率高很多。恩这个一定是影响最后获救结果的一个特征。 #看看各登录港口的获救情况 fig plt.figure() fig.set(alpha0.2) # 设定图表颜色alpha参数 #将等船口类型按照是否获救进行拆分 Survived_0 data_train.Embarked[data_train.Survived 0].value_counts() Survived_1 data_train.Embarked[data_train.Survived 1].value_counts() dfpd.DataFrame({u获救:Survived_1, u未获救:Survived_0}) #合并2个差分后的dataframe df.plot(kindbar, stackedTrue) plt.title(u各登录港口乘客的获救情况) plt.xlabel(u登录港口) plt.ylabel(u人数) plt.show() dfFigure size 432x288 with 0 Axes未获救获救S427217C7593Q4730 并没有看出什么… 那个看看性别好了 #看看各性别的获救情况 fig plt.figure() fig.set(alpha0.2) # 设定图表颜色alpha参数#将年龄属性按照是否获救进行拆分 Survived_m data_train.Survived[data_train.Sex male].value_counts() Survived_f data_train.Survived[data_train.Sex female].value_counts() dfpd.DataFrame({u男性:Survived_m, u女性:Survived_f}) #将差分后的数据拼接成一个dataframe df.plot(kindbar, stackedTrue) plt.title(u按性别看获救情况) plt.xlabel(u性别) plt.ylabel(u人数) plt.show() dfFigure size 432x288 with 0 Axes女性男性0814681233109 歪果盆友果然很尊重ladylady first践行得不错。性别无疑也要作为重要特征加入最后的模型之中。 再来个详细版的好了 #然后我们再来看看各种舱级别情况下各性别的获救情况 figplt.figure(figsize(12,5)) fig.set(alpha0.65) # 设置图像透明度无所谓 plt.title(u根据舱等级和性别的获救情况) #1-2等级的女性获救情况 ax1fig.add_subplot(141) data_train.Survived[data_train.Sex female][data_train.Pclass ! 3].value_counts().plot(kindbar, labelfemale highclass, color#FA2479) ax1.set_xticklabels([u获救, u未获救], rotation0) ax1.legend([u女性/高级舱], locbest) #3等级的女性获救情况 ax2fig.add_subplot(142, shareyax1) data_train.Survived[data_train.Sex female][data_train.Pclass 3].value_counts().plot(kindbar, labelfemale, low class, colorpink) ax2.set_xticklabels([u未获救, u获救], rotation0) plt.legend([u女性/低级舱], locbest) #1-2等级男性获救情况 ax3fig.add_subplot(143, shareyax1) data_train.Survived[data_train.Sex male][data_train.Pclass ! 3].value_counts().plot(kindbar, labelmale, high class,colorlightblue) ax3.set_xticklabels([u未获救, u获救], rotation0) plt.legend([u男性/高级舱], locbest) #3等级男性获救情况 ax4fig.add_subplot(144, shareyax1) data_train.Survived[data_train.Sex male][data_train.Pclass 3].value_counts().plot(kindbar, labelmale low class, colorsteelblue) ax4.set_xticklabels([u未获救, u获救], rotation0) plt.legend([u男性/低级舱], locbest)plt.show()那堂兄弟和父母呢 大家族会有优势么 g data_train.groupby([SibSp,Survived]) #将属性SibSp,Survived组合 df pd.DataFrame(g.count()[PassengerId]) dfPassengerIdSibSpSurvived003981210109711122015113301214401513505807 g data_train.groupby([Parch,Survived]) df pd.DataFrame(g.count()[PassengerId]) dfPassengerIdParchSurvived004451233105316520401403021340450411601 好吧没看出特别特别明显的规律(为自己的智商感到捉急…)先作为备选特征放一放。 看看船票好了 ticket是船票编号应该是unique的和最后的结果没有太大的关系不纳入考虑的特征范畴 cabin只有204个乘客有值我们先看看它的一个分布 #ticket是船票编号应该是unique的和最后的结果没有太大的关系不纳入考虑的特征范畴 #cabin只有204个乘客有值我们先看看它的一个分布 data_train.Cabin.value_counts() #对船票这个属性进行统计C23 C25 C27 4 G6 4 B96 B98 4 D 3 C22 C26 3 E101 3 F2 3 F33 3 B57 B59 B63 B66 2 C68 2 B58 B60 2 E121 2 D20 2 E8 2 E44 2 B77 2 C65 2 D26 2 E24 2 E25 2 B20 2 C93 2 D33 2 E67 2 D35 2 D36 2 C52 2 F4 2 C125 2 C124 2.. F G63 1 A6 1 D45 1 D6 1 D56 1 C101 1 C54 1 D28 1 D37 1 B102 1 D30 1 E17 1 E58 1 F E69 1 D10 D12 1 E50 1 A14 1 C91 1 A16 1 B38 1 B39 1 C95 1 B78 1 B79 1 C99 1 B37 1 A19 1 E12 1 A7 1 D15 1 Name: Cabin, Length: 147, dtype: int64这三三两两的…如此不集中…我们猜一下也许前面的ABCDE是指的甲板位置、然后编号是房间号…好吧我瞎说的别当真… 关键是Cabin这鬼属性应该算作类目型的本来缺失值就多还如此不集中注定是个棘手货…第一感觉这玩意儿如果直接按照类目特征处理的话太散了估计每个因子化后的特征都拿不到什么权重。加上有那么多缺失值要不我们先把Cabin缺失与否作为条件(虽然这部分信息缺失可能并非未登记maybe只是丢失了而已所以这样做未必妥当)先在有无Cabin信息这个粗粒度上看看Survived的情况好了。 #cabin的值计数太分散了绝大多数Cabin值只出现一次。感觉上作为类目加入特征未必会有效 #那我们一起看看这个值的有无对于survival的分布状况影响如何吧 fig plt.figure() fig.set(alpha0.2) # 设定图表颜色alpha参数Survived_cabin data_train.Survived[pd.notnull(data_train.Cabin)].value_counts() Survived_nocabin data_train.Survived[pd.isnull(data_train.Cabin)].value_counts() dfpd.DataFrame({u有:Survived_cabin, u无:Survived_nocabin}).transpose() df.plot(kindbar, stackedTrue) plt.title(u按Cabin有无看获救情况) plt.xlabel(uCabin有无) plt.ylabel(u人数) plt.show() df#似乎有cabin记录的乘客survival比例稍高那先试试把这个值分为两类有cabin值/无cabin值一会儿加到类别特征好了Figure size 432x288 with 0 Axes01无481206有68136 有Cabin记录的似乎获救概率稍高一些先这么着放一放吧。 先从最突出的数据属性开始吧对Cabin和Age有丢失数据实在是对下一步工作影响太大。 先说Cabin暂时我们就按照刚才说的按Cabin有无数据将这个属性处理成Yes和No两种类型吧。 再说Age 通常遇到缺值的情况我们会有几种常见的处理方式 如果缺值的样本占总数比例极高我们可能就直接舍弃了作为特征加入的话可能反倒带入noise影响最后的结果了如果缺值的样本适中而该属性非连续值特征属性(比如说类目属性)那就把NaN作为一个新类别加到类别特征中如果缺值的样本适中而该属性为连续值特征属性有时候我们会考虑给定一个step(比如这里的age我们可以考虑每隔2/3岁为一个步长)然后把它离散化之后把NaN作为一个type加到属性类目中。有些情况下缺失的值个数并不是特别多那我们也可以试着根据已有的值拟合一下数据补充上。 本例中后两种处理方式应该都是可行的我们先试试拟合补全吧(虽然说没有特别多的背景可供我们拟合这不一定是一个多么好的选择) 我们这里用scikit-learn中的RandomForest来拟合一下缺失的年龄数据 第三步对数据预处理 ## 缺失值处理from sklearn.ensemble import RandomForestRegressor### 使用 RandomForestClassifier 填补缺失的年龄属性 def set_missing_ages(df):# 把已有的数值型特征取出来丢进Random Forest Regressor中age_df df[[Age,Fare, Parch, SibSp, Pclass]]# 乘客分成已知年龄和未知年龄两部分known_age age_df[age_df.Age.notnull()].as_matrix() #当做训练集的部分样本unknown_age age_df[age_df.Age.isnull()].as_matrix() #要预测的部分样本# y即目标年龄y known_age[:, 0] # X即特征属性值X known_age[:, 1:] #取后面的4列属性作为训练集# fit到RandomForestRegressor之中rfr RandomForestRegressor(random_state0, n_estimators2000, n_jobs-1)rfr.fit(X, y)# 用得到的模型进行未知年龄结果预测predictedAges rfr.predict(unknown_age[:, 1::]) # 用得到的预测结果填补原缺失数据df.loc[ (df.Age.isnull()), Age ] predictedAges return df, rfr#将船票属性进行二值化 def set_Cabin_type(df):df.loc[ (df.Cabin.notnull()), Cabin ] Yes df.loc[ (df.Cabin.isnull()), Cabin ] Noreturn dfdata_train, rfr set_missing_ages(data_train) data_train set_Cabin_type(data_train) data_trainF:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:10: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.# Remove the CWD from sys.path while we load stuff. F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:11: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.# This is added back by InteractiveShellApp.init_path()PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked0103Braund, Mr. Owen Harrismale22.00000010A/5 211717.2500NoS1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.00000010PC 1759971.2833YesC2313Heikkinen, Miss. Lainafemale26.00000000STON/O2. 31012827.9250NoS3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.0000001011380353.1000YesS4503Allen, Mr. William Henrymale35.000000003734508.0500NoS5603Moran, Mr. Jamesmale23.838953003308778.4583NoQ6701McCarthy, Mr. Timothy Jmale54.000000001746351.8625YesS7803Palsson, Master. Gosta Leonardmale2.0000003134990921.0750NoS8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.0000000234774211.1333NoS91012Nasser, Mrs. Nicholas (Adele Achem)female14.0000001023773630.0708NoC101113Sandstrom, Miss. Marguerite Rutfemale4.00000011PP 954916.7000YesS111211Bonnell, Miss. Elizabethfemale58.0000000011378326.5500YesS121303Saundercock, Mr. William Henrymale20.00000000A/5. 21518.0500NoS131403Andersson, Mr. Anders Johanmale39.0000001534708231.2750NoS141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.000000003504067.8542NoS151612Hewlett, Mrs. (Mary D Kingcome)female55.0000000024870616.0000NoS161703Rice, Master. Eugenemale2.0000004138265229.1250NoQ171812Williams, Mr. Charles Eugenemale32.0664930024437313.0000NoS181903Vander Planke, Mrs. Julius (Emelia Maria Vande...female31.0000001034576318.0000NoS192013Masselmani, Mrs. Fatimafemale29.5182050026497.2250NoC202102Fynney, Mr. Joseph Jmale35.0000000023986526.0000NoS212212Beesley, Mr. Lawrencemale34.0000000024869813.0000YesS222313McGowan, Miss. Anna Anniefemale15.000000003309238.0292NoQ232411Sloper, Mr. William Thompsonmale28.0000000011378835.5000YesS242503Palsson, Miss. Torborg Danirafemale8.0000003134990921.0750NoS252613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.0000001534707731.3875NoS262703Emir, Mr. Farred Chehabmale29.5182050026317.2250NoC272801Fortune, Mr. Charles Alexandermale19.0000003219950263.0000YesS282913ODwyer, Miss. Ellen Nelliefemale22.380113003309597.8792NoQ293003Todoroff, Mr. Laliomale27.947206003492167.8958NoS.......................................86186202Giles, Mr. Frederick Edwardmale21.000000102813411.5000NoS86286311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.000000001746625.9292YesS86386403Sage, Miss. Dorothy Edith Dollyfemale10.86986782CA. 234369.5500NoS86486502Gill, Mr. John Williammale24.0000000023386613.0000NoS86586612Bystrom, Mrs. (Karolina)female42.0000000023685213.0000NoS86686712Duran y More, Miss. Asuncionfemale27.00000010SC/PARIS 214913.8583NoC86786801Roebling, Mr. Washington Augustus IImale31.00000000PC 1759050.4958YesS86886903van Melkebeke, Mr. Philemonmale25.977889003457779.5000NoS86987013Johnson, Master. Harold Theodormale4.0000001134774211.1333NoS87087103Balkic, Mr. Cerinmale26.000000003492487.8958NoS87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.000000111175152.5542YesS87287301Carlsson, Mr. Frans Olofmale33.000000006955.0000YesS87387403Vander Cruyssen, Mr. Victormale47.000000003457659.0000NoS87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.00000010P/PP 338124.0000NoC87587613Najib, Miss. Adele Kiamie Janefemale15.0000000026677.2250NoC87687703Gustafsson, Mr. Alfred Ossianmale20.0000000075349.8458NoS87787803Petroff, Mr. Nedeliomale19.000000003492127.8958NoS87887903Laleff, Mr. Kristomale27.947206003492177.8958NoS87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.000000011176783.1583YesC88088112Shelley, Mrs. William (Imanita Parrish Hall)female25.0000000123043326.0000NoS88188203Markun, Mr. Johannmale33.000000003492577.8958NoS88288303Dahlberg, Miss. Gerda Ulrikafemale22.00000000755210.5167NoS88388402Banfield, Mr. Frederick Jamesmale28.00000000C.A./SOTON 3406810.5000NoS88488503Sutehall, Mr. Henry Jrmale25.00000000SOTON/OQ 3920767.0500NoS88588603Rice, Mrs. William (Margaret Norton)female39.0000000538265229.1250NoQ88688702Montvila, Rev. Juozasmale27.0000000021153613.0000NoS88788811Graham, Miss. Margaret Edithfemale19.0000000011205330.0000YesS88888903Johnston, Miss. Catherine Helen Carriefemale16.19395012W./C. 660723.4500NoS88989011Behr, Mr. Karl Howellmale26.0000000011136930.0000YesC89089103Dooley, Mr. Patrickmale32.000000003703767.7500NoQ 891 rows × 12 columns 因为逻辑回归建模时需要输入的特征都是数值型特征我们通常会先对类目型的特征因子化/one-hot编码。 什么叫做因子化/one-hot编码举个例子 以Embarked为例原本一个属性维度因为其取值可以是[‘S’,’C’,’Q‘]而将其平展开为’Embarked_C’,’Embarked_S’, ‘Embarked_Q’三个属性 原本Embarked取值为S的在此处的”Embarked_S”下取值为1在’Embarked_C’, ‘Embarked_Q’下取值为0原本Embarked取值为C的在此处的”Embarked_C”下取值为1在’Embarked_S’, ‘Embarked_Q’下取值为0原本Embarked取值为Q的在此处的”Embarked_Q”下取值为1在’Embarked_C’, ‘Embarked_S’下取值为0 我们使用pandas的”get_dummies”来完成这个工作并拼接在原来的”data_train”之上如下所示。 # 因为逻辑回归建模时需要输入的特征都是数值型特征 # 我们先对类目型的特征离散/因子化 # 以Cabin为例原本一个属性维度因为其取值可以是[yes,no]而将其平展开为Cabin_yes,Cabin_no两个属性 # 原本Cabin取值为yes的在此处的Cabin_yes下取值为1在Cabin_no下取值为0 # 原本Cabin取值为no的在此处的Cabin_yes下取值为0在Cabin_no下取值为1 # 我们使用pandas的get_dummies来完成这个工作并拼接在原来的data_train之上如下所示 #对于类别型数据的处理 dummies_Cabin pd.get_dummies(data_train[Cabin], prefix Cabin) #对船票这个属性进行0ne-hot编码dummies_Embarked pd.get_dummies(data_train[Embarked], prefix Embarked)dummies_Sex pd.get_dummies(data_train[Sex], prefix Sex)dummies_Pclass pd.get_dummies(data_train[Pclass], prefix Pclass)df pd.concat([data_train, dummies_Cabin, dummies_Embarked, dummies_Sex, dummies_Pclass], axis1) #对dataframe按照列来拼接 df.drop([Pclass, Name, Sex, Ticket, Cabin, Embarked], axis1, inplaceTrue) dfPassengerIdSurvivedAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_SSex_femaleSex_malePclass_1Pclass_2Pclass_301022.000000107.2500100010100112138.0000001071.2833011001010023126.000000007.9250100011000134135.0000001053.1000010011010045035.000000008.0500100010100156023.838953008.4583100100100167054.0000000051.862501001011007802.0000003121.0750100010100189127.0000000211.13331000110001910114.0000001030.07081010010010101114.0000001116.700001001100011112158.0000000026.550001001101001213020.000000008.050010001010011314039.0000001531.275010001010011415014.000000007.854210001100011516155.0000000016.00001000110010161702.0000004129.125010010010011718132.0664930013.000010001010101819031.0000001018.000010001100011920129.518205007.225010100100012021035.0000000026.000010001010102122134.0000000013.000001001010102223115.000000008.029210010100012324128.0000000035.50000100101100242508.0000003121.075010001100012526138.0000001531.387510001100012627029.518205007.225010100010012728019.00000032263.000001001011002829122.380113007.879210010100012930027.947206007.89581000101001...................................................861862021.0000001011.50001000101010862863148.0000000025.92920100110100863864010.8698678269.55001000110001864865024.0000000013.00001000101010865866142.0000000013.00001000110010866867127.0000001013.85831010010010867868031.0000000050.49580100101100868869025.977889009.5000100010100186987014.0000001111.13331000101001870871026.000000007.89581000101001871872147.0000001152.55420100110100872873033.000000005.00000100101100873874047.000000009.00001000101001874875128.0000001024.00001010010010875876115.000000007.22501010010001876877020.000000009.84581000101001877878019.000000007.89581000101001878879027.947206007.89581000101001879880156.0000000183.15830110010100880881125.0000000126.00001000110010881882033.000000007.89581000101001882883022.0000000010.51671000110001883884028.0000000010.50001000101010884885025.000000007.05001000101001885886039.0000000529.12501001010001886887027.0000000013.00001000101010887888119.0000000030.00000100110100888889016.1939501223.45001000110001889890126.0000000030.00000110001100890891032.000000007.75001001001001 891 rows × 16 columns 我们还得做一些处理仔细看看Age和Fare两个属性乘客的数值幅度变化也忒大了吧如果大家了解逻辑回归与梯度下降的话会知道各属性值之间scale差距太大将对收敛速度造成几万点伤害值甚至不收敛 (╬▔皿▔)…所以我们先用scikit-learn里面的preprocessing模块对这俩货做一个scaling所谓scaling其实就是将一些变化幅度较大的特征化到[-1,1]之内。 # 接下来我们要接着做一些数据预处理的工作比如scaling将一些变化幅度较大的特征化到[-1,1]之内 # 这样可以加速logistic regression的收敛 #对数值型数据进行处理 import sklearn.preprocessing as preprocessing scaler preprocessing.StandardScaler() #对取值较大或者取值范围较大的数值型特征进行标准化均值为0方差为1的正太分布 age_scale_param scaler.fit(np.array(df[Age]).reshape((-1,1))) #需要注意插入的必须是一维的numpy数组而不是serials df[Age_scaled] scaler.fit_transform(np.array(df[Age]).reshape((-1,1)), age_scale_param) fare_scale_param scaler.fit(np.array(df[Fare]).reshape((-1,1))) df[Fare_scaled] scaler.fit_transform(np.array(df[Fare]).reshape((-1,1)), fare_scale_param) dfPassengerIdSurvivedAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_SSex_femaleSex_malePclass_1Pclass_2Pclass_3Age_scaledFare_scaled01022.000000107.25001000101001-0.561380-0.50244512138.0000001071.283301100101000.6131710.78684523126.000000007.92501000110001-0.267742-0.48885434135.0000001053.100001001101000.3929420.42073045035.000000008.050010001010010.392942-0.48633756023.838953008.45831001001001-0.426384-0.47811667054.0000000051.862501001011001.7877220.3958147802.0000003121.07501000101001-2.029569-0.22408389127.0000000211.13331000110001-0.194333-0.424256910114.0000001030.07081010010010-1.148655-0.042956101114.0000001116.70000100110001-1.882750-0.3121721112158.0000000026.550001001101002.081359-0.1138461213020.000000008.05001000101001-0.708199-0.4863371314039.0000001531.275010001010010.686580-0.0187091415014.000000007.85421000110001-1.148655-0.4902801516155.0000000016.000010001100101.861131-0.326267161702.0000004129.12501001001001-2.029569-0.0619991718132.0664930013.000010001010100.177595-0.3866711819031.0000001018.000010001100010.099305-0.2859971920129.518205007.22501010010001-0.009473-0.5029492021035.0000000026.000010001010100.392942-0.1249202122134.0000000013.000001001010100.319533-0.3866712223115.000000008.02921001010001-1.075246-0.4867562324128.0000000035.50000100101100-0.1209240.066360242508.0000003121.07501000110001-1.589112-0.2240832526138.0000001531.387510001100010.613171-0.0164442627029.518205007.22501010001001-0.009473-0.5029492728019.00000032263.00000100101100-0.7816084.6470012829122.380113007.87921001010001-0.533476-0.4897762930027.947206007.89581000101001-0.124799-0.489442.........................................................861862021.0000001011.50001000101010-0.634790-0.416873862863148.0000000025.929201001101001.347265-0.126345863864010.8698678269.55001000110001-1.3784370.751946864865024.0000000013.00001000101010-0.414561-0.386671865866142.0000000013.000010001100100.906808-0.386671866867127.0000001013.85831010010010-0.194333-0.369389867868031.0000000050.495801001011000.0993050.368295868869025.977889009.50001000101001-0.269366-0.45714286987014.0000001111.13331000101001-1.882750-0.424256870871026.000000007.89581000101001-0.267742-0.489442871872147.0000001152.554201001101001.2738560.409741872873033.000000005.000001001011000.246124-0.547748873874047.000000009.000010001010011.273856-0.467209874875128.0000001024.00001010010010-0.120924-0.165189875876115.000000007.22501010010001-1.075246-0.502949876877020.000000009.84581000101001-0.708199-0.450180877878019.000000007.89581000101001-0.781608-0.489442878879027.947206007.89581000101001-0.124799-0.489442879880156.0000000183.158301100101001.9345401.025945880881125.0000000126.00001000110010-0.341152-0.124920881882033.000000007.895810001010010.246124-0.489442882883022.0000000010.51671000110001-0.561380-0.436671883884028.0000000010.50001000101010-0.120924-0.437007884885025.000000007.05001000101001-0.341152-0.506472885886039.0000000529.125010010100010.686580-0.061999886887027.0000000013.00001000101010-0.194333-0.386671887888119.0000000030.00000100110100-0.781608-0.044381888889016.1939501223.45001000110001-0.987599-0.176263889890126.0000000030.00000110001100-0.267742-0.044381890891032.000000007.750010010010010.172714-0.492378 891 rows × 18 columns 我们把需要的feature字段取出来转成numpy格式使用scikit-learn中的LogisticRegression建模。 第四步baseline模型训练 # 我们把需要的feature字段取出来转成numpy格式使用scikit-learn中的LogisticRegression建模 from sklearn import linear_modeltrain_df df.filter(regexSurvived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*) #将处理后的特征选出 train_np train_df.as_matrix() #将dataframe数据结构转换为matrix以便输入到模型中进行训练 # train_np # y即Survival结果 y_final train_np[:, 0] #第0列保存的是存活数据# X即特征属性值 X_final train_np[:, 1:]# fit到RandomForestRegressor之中 clf linear_model.LogisticRegression(C1.0, penaltyl1, tol1e-6) #使用带L1正则化项的LR模型 clf.fit(X_final, y_final)clfF:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:5: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.LogisticRegression(C1.0, class_weightNone, dualFalse, fit_interceptTrue,intercept_scaling1, max_iter100, multi_classovr, n_jobs1,penaltyl1, random_stateNone, solverliblinear, tol1e-06,verbose0, warm_startFalse)接下来咱们对测试集做和训练集一样的操作 data_test pd.read_csv(test.csv) ##对缺失值处理 data_test.loc[ (data_test.Fare.isnull()), Fare ] 0 #将工资属性中缺失值填充0 # 接着我们对test_data做和train_data中一致的特征变换 # 首先用同样的RandomForestRegressor模型填上丢失的年龄 tmp_df data_test[[Age,Fare, Parch, SibSp, Pclass]] null_age tmp_df[data_test.Age.isnull()].as_matrix() #取出年龄中的缺失值并转换为矩阵方便输入模型 # 根据特征属性X预测年龄并补上 X null_age[:, 1:] #将含年龄缺失值的包含其他4种数值型属性的数据取出 predictedAges rfr.predict(X) #用训练集中训练好的RF模型进行缺失值的填充 data_test.loc[ (data_test.Age.isnull()), Age ] predictedAges##对类别型数据处理one-hot data_test set_Cabin_type(data_test) dummies_Cabin pd.get_dummies(data_test[Cabin], prefix Cabin) dummies_Embarked pd.get_dummies(data_test[Embarked], prefix Embarked) dummies_Sex pd.get_dummies(data_test[Sex], prefix Sex) dummies_Pclass pd.get_dummies(data_test[Pclass], prefix Pclass)df_test pd.concat([data_test, dummies_Cabin, dummies_Embarked, dummies_Sex, dummies_Pclass], axis1) df_test.drop([Pclass, Name, Sex, Ticket, Cabin, Embarked], axis1, inplaceTrue) ## 对数值型数据进行缩放 df_test[Age_scaled] scaler.fit_transform(np.array(df_test[Age]).reshape(-1,1), age_scale_param) df_test[Fare_scaled] scaler.fit_transform(np.array(df_test[Fare]).reshape(-1,1), fare_scale_param) df_testF:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:7: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.import sysPassengerIdAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_SSex_femaleSex_malePclass_1Pclass_2Pclass_3Age_scaledFare_scaled089234.500000007.829210010010010.307521-0.496637189347.000000107.000010001100011.256241-0.511497289462.000000009.687510010010102.394706-0.463335389527.000000008.66251000101001-0.261711-0.481704489622.0000001112.28751000110001-0.641199-0.416740589714.000000009.22501000101001-1.248380-0.471623689830.000000007.62921001010001-0.034018-0.500221789926.0000001129.00001000101010-0.337609-0.117238890018.000000007.22921010010001-0.944790-0.507390990121.0000002024.15001000101001-0.717097-0.2041541090227.947206007.89581000101001-0.189820-0.4954441190346.0000000026.000010001011001.180344-0.1710001290423.0000001082.26670100110100-0.5653010.8373491390563.0000001026.000010001010102.470603-0.1710001490647.0000001061.175001001101001.2562410.4593671590724.0000001027.72081010010010-0.489404-0.1401621690835.0000000012.350010010010100.345470-0.4156201790921.000000007.22501010001001-0.717097-0.5074651891027.000000107.92501000110001-0.261711-0.4949201991145.000000007.225010100100011.104446-0.5074652091255.0000001059.400010100011001.8634220.427557219139.000000013.17081000101001-1.627868-0.5801202291452.3143110031.683310001101001.659585-0.0691512391521.0000000161.37921010001100-0.7170970.4630262491648.00000013262.375001100101001.3321394.0650492591750.0000001014.500010001010011.483934-0.3770902691822.0000000161.97920110010100-0.6411990.4737792791922.500000007.22501010001001-0.603250-0.5074652892041.0000000030.500001001011000.800856-0.0903562992123.4596832021.67921010001001-0.530413-0.248433......................................................388128021.000000007.75001001001001-0.717097-0.49805638912816.0000003121.07501000101001-1.855561-0.259261390128223.0000000093.50000100101100-0.5653011.038659391128351.0000000139.400001001101001.5598320.069140392128413.0000000220.25001000101001-1.324278-0.274045393128547.0000000010.500010001010101.256241-0.448774394128629.0000003122.02501000101001-0.109916-0.242236395128718.0000001060.00000100110100-0.9447900.438310396128824.000000007.25001001001001-0.489404-0.507017397128948.0000001179.200001100101001.3321390.782391398129022.000000007.77501000101001-0.641199-0.497608399129131.000000007.733310010010010.041880-0.498356400129230.00000000164.86670100110100-0.0340182.317614401129338.0000001021.000010001010100.573163-0.260605402129422.0000000159.40001010010100-0.6411990.427557403129517.0000000047.10001000101100-1.0206870.207130404129643.0000001027.720801100011000.952651-0.140162405129720.0000000013.86250110001010-0.792994-0.388515406129823.0000001010.50001000101010-0.565301-0.448774407129950.00000011211.500001100011001.4839343.153324408130019.895581007.72081001010001-0.800919-0.49858040913013.0000001113.77501000110001-2.083254-0.390083410130235.295824007.750010010100010.367922-0.498056411130337.0000001090.000001010101000.4972650.975936412130428.000000007.77501000110001-0.185813-0.497608413130530.705727008.050010001010010.019545-0.492680414130639.00000000108.900001100101000.6490611.314641415130738.500000007.250010001010010.611112-0.507017416130830.705727008.050010001010010.019545-0.492680417130925.7833771122.35831010001001-0.354050-0.236263 418 rows × 17 columns test df_test.filter(regexAge_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*) #取出处理后的特征 predictions clf.predict(test) #对结构进行预测 result pd.DataFrame({PassengerId:data_test[PassengerId].as_matrix(), Survived:predictions.astype(np.int32)}) #按照ID将结果结构化 result.to_csv(logistic_regression_predictions.csv, indexFalse) #将实验结果进行保存方便提交F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:3: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.This is separate from the ipykernel package so we can avoid doing imports untilpd.read_csv(logistic_regression_predictions.csv)PassengerIdSurvived08920189302894038950489615897068981789908900199010109020119030129041139050149061159071169080179090189101199111209120219130229141239150249161259170269181279190289200299210.........388128003891281039012821391128313921284039312850394128603951287139612880397128913981290039912910400129214011293040212941403129504041296040512971406129804071299040813001409130114101302141113031412130414131305041413061415130704161308041713090 418 rows × 2 columns 0.76555恩结果还不错。毕竟这只是我们简单分析过后出的一个baseline系统嘛 第五步模型的优化 要判定一下当前模型所处状态(欠拟合or过拟合) 有一个很可能发生的问题是我们不断地做feature engineering产生的特征越来越多用这些特征去训练模型会对我们的训练集拟合得越来越好同时也可能在逐步丧失泛化能力从而在待预测的数据上表现不佳也就是发生过拟合问题。 从另一个角度上说如果模型在待预测的数据上表现不佳除掉上面说的过拟合问题也有可能是欠拟合问题也就是说在训练集上其实拟合的也不是那么好。 额这个欠拟合和过拟合怎么解释呢。这么说吧 过拟合就像是你班那个学数学比较刻板的同学老师讲过的题目一字不漏全记下来了于是老师再出一样的题目分分钟精确出结果。but数学考试因为总是碰到新题目所以成绩不咋地。欠拟合就像是咳咳和博主level差不多的差生。连老师讲的练习题也记不住于是连老师出一样题目复习的周测都做不好考试更是可想而知了。 而在机器学习的问题上对于过拟合和欠拟合两种情形。我们优化的方式是不同的。 对过拟合而言通常以下策略对结果优化是有用的 做一下feature selection挑出较好的feature的subset来做training提供更多的数据从而弥补原始数据的bias问题学习到的model也会更准确 而对于欠拟合而言我们通常需要更多的feature更复杂的模型来提高准确度。 著名的learning curve可以帮我们判定我们的模型现在所处的状态。我们以样本数为横坐标训练和交叉验证集上的错误率作为纵坐标两种状态分别如下两张图所示过拟合(overfitting/high variace)欠拟合(underfitting/high bias) 著名的learning curve可以帮我们判定我们的模型现在所处的状态。我们以样本数为横坐标训练和交叉验证集上的错误率作为纵坐标两种状态分别如下两张图所示过拟合(overfitting/high variace)欠拟合(underfitting/high bias) 我们也可以把错误率替换成准确率(得分)得到另一种形式的learning curve(sklearn 里面是这么做的)。 回到我们的问题我们用scikit-learn里面的learning_curve来帮我们分辨我们模型的状态。举个例子这里我们一起画一下我们最先得到的baseline model的learning curve。 ##通过学习曲线来判断模型所处的状态 import numpy as np import matplotlib.pyplot as plt from sklearn.learning_curve import learning_curve# 用sklearn的learning_curve得到training_score和cv_score使用matplotlib画出learning curve def plot_learning_curve(estimator, title, X, y, ylimNone, cvNone, n_jobs1, train_sizesnp.linspace(.05, 1., 20), verbose0, plotTrue):画出data在某模型上的learning curve.参数解释----------estimator : 你用的分类器。title : 表格的标题。X : 输入的featurenumpy类型y : 输入的target vectorylim : tuple格式的(ymin, ymax), 设定图像中纵坐标的最低点和最高点cv : 做cross-validation的时候数据分成的份数其中一份作为cv集其余n-1份作为training(默认为3份)n_jobs : 并行的的任务数(默认1)train_sizes, train_scores, test_scores learning_curve(estimator, X, y, cvcv, n_jobsn_jobs, train_sizestrain_sizes, verboseverbose)train_scores_mean np.mean(train_scores, axis1)train_scores_std np.std(train_scores, axis1)test_scores_mean np.mean(test_scores, axis1)test_scores_std np.std(test_scores, axis1)if plot:plt.figure()plt.title(title)if ylim is not None:plt.ylim(*ylim)plt.xlabel(u训练样本数)plt.ylabel(u得分)plt.gca().invert_yaxis()plt.grid()plt.fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean train_scores_std, alpha0.1, colorb)plt.fill_between(train_sizes, test_scores_mean - test_scores_std, test_scores_mean test_scores_std, alpha0.1, colorr)plt.plot(train_sizes, train_scores_mean, o-, colorb, labelu训练集上得分)plt.plot(train_sizes, test_scores_mean, o-, colorr, labelu交叉验证集上得分)plt.legend(locbest)plt.draw()plt.gca().invert_yaxis()plt.show()midpoint ((train_scores_mean[-1] train_scores_std[-1]) (test_scores_mean[-1] - test_scores_std[-1])) / 2diff (train_scores_mean[-1] train_scores_std[-1]) - (test_scores_mean[-1] - test_scores_std[-1])return midpoint, diffplot_learning_curve(clf, u学习曲线, X_final, y_final)(0.80656968448540245, 0.018258876711338634)在实际数据上看我们得到的learning curve没有理论推导的那么光滑哈但是可以大致看出来训练集和交叉验证集上的得分曲线走势还是符合预期的。 目前的曲线看来我们的model并不处于overfitting的状态(overfitting的表现一般是训练集上得分高而交叉验证集上要低很多中间的gap比较大)。因此我们可以再做些feature engineering的工作添加一些新产出的特征或者组合特征到模型中。 接下来我们就该看看如何优化baseline系统了 我们还有些特征可以再挖掘挖掘 比如说Name和Ticket两个属性被我们完整舍弃了(好吧其实是一开始我们对于这种每一条记录都是一个完全不同的值的属性并没有很直接的处理方式)比如说我们想想年龄的拟合本身也未必是一件非常靠谱的事情另外以我们的日常经验小盆友和老人可能得到的照顾会多一些这样看的话年龄作为一个连续值给一个固定的系数似乎体现不出两头受照顾的实际情况所以说不定我们把年龄离散化按区段分作类别属性会更合适一些 那怎么样才知道哪些地方可以优化哪些优化的方法是promising的呢 是的 要做交叉验证(cross validation)! 要做交叉验证(cross validation)! 要做交叉验证(cross validation)! 重要的事情说3编 因为test.csv里面并没有Survived这个字段(好吧这是废话这明明就是我们要预测的结果)我们无法在这份数据上评定我们算法在该场景下的效果。。。 我们通常情况下这么做cross validation把train.csv分成两部分一部分用于训练我们需要的模型另外一部分数据上看我们预测算法的效果。 我们可以用scikit-learn的cross_validation来完成这个工作 在此之前咱们可以看看现在得到的模型的系数因为系数和它们最终的判定能力强弱是正相关的 pd.DataFrame({columns:list(train_df.columns)[1:], coef:list(clf.coef_.T)}) #根据LR模型的参数来选择特征的重要性coefcolumns0[-0.34423548326]SibSp1[-0.104915808836]Parch2[0.0]Cabin_No3[0.902107533438]Cabin_Yes4[0.0]Embarked_C5[0.0]Embarked_Q6[-0.417263127613]Embarked_S7[1.95657020854]Sex_female8[-0.677421170681]Sex_male9[0.341159711576]Pclass_110[0.0]Pclass_211[-1.1941300472]Pclass_312[-0.523766573778]Age_scaled13[0.0844349202536]Fare_scaled 上面的系数和最后的结果是一个正相关的关系 我们先看看那些权重绝对值非常大的feature在我们的模型上 Sex属性如果是female会极大提高最后获救的概率而male会很大程度拉低这个概率。Pclass属性1等舱乘客最后获救的概率会上升而乘客等级为3会极大地拉低这个概率。有Cabin值会很大程度拉升最后获救概率(这里似乎能看到了一点端倪事实上从最上面的有无Cabin记录的Survived分布图上看出即使有Cabin记录的乘客也有一部分遇难了估计这个属性上我们挖掘还不够)Age是一个负相关意味着在我们的模型里年龄越小越有获救的优先权(还得回原数据看看这个是否合理有一个登船港口S会很大程度拉低获救的概率另外俩港口压根就没啥作用(这个实际上非常奇怪因为我们从之前的统计图上并没有看到S港口的获救率非常低所以也许可以考虑把登船港口这个feature去掉试试)。船票Fare有小幅度的正相关(并不意味着这个feature作用不大有可能是我们细化的程度还不够举个例子说不定我们得对它离散化再分至各个乘客等级上) 噢啦观察完了我们现在有一些想法了但是怎么样才知道哪些优化的方法是promising的呢 恩要靠交叉验证 from sklearn import cross_validation# 简单看看打分情况 clf linear_model.LogisticRegression(C1.0, penaltyl1, tol1e-6) all_data df.filter(regexSurvived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*) X all_data.as_matrix()[:,1:] y all_data.as_matrix()[:,0] print cross_validation.cross_val_score(clf, X, y, cv5)# 分割数据 split_train, split_cv cross_validation.train_test_split(df, test_size0.3, random_state0) #划分训练集和评估集 train_df split_train.filter(regexSurvived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*) # 生成模型 clf linear_model.LogisticRegression(C1.0, penaltyl1, tol1e-6) clf.fit(train_df.as_matrix()[:,1:], train_df.as_matrix()[:,0]) #训练模型 # # 对cross validation数据进行预测 cv_df split_cv.filter(regexSurvived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*) predictions clf.predict(cv_df.as_matrix()[:,1:]) #得到预测值 # split_cv[predictions ! cv_df.as_matrix()[:,0]].drop(axis 0) #去除预测错误的样本[ 0.81564246 0.81564246 0.78651685 0.78651685 0.81355932]F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:6: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:7: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.import sys F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:16: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.app.launch_new_instance() F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:20: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.# 去除预测错误的case看原始dataframe数据 #split_cv[PredictResult] predictions origin_data_train pd.read_csv(Train.csv) bad_cases origin_data_train.loc[origin_data_train[PassengerId].isin(split_cv[predictions ! cv_df.as_matrix()[:,0]][PassengerId].values)] bad_casesPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.00003504067.8542NaNS495003Arnold-Franchi, Mrs. Josef (Josefine Franchi)female18.001034923717.8000NaNS555611Woolner, Mr. HughmaleNaN001994735.5000C52S656613Moubarek, Master. GeriosmaleNaN11266115.2458NaNC686913Andersson, Miss. Erna Alexandrafemale17.004231012817.9250NaNS858613Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...female33.0030310127815.8500NaNS11311403Jussila, Miss. Katriinafemale20.001041369.8250NaNS14014103Boulos, Mrs. Joseph (Sultana)femaleNaN02267815.2458NaNC20420513Cohen, Mr. Gurshon Gusmale18.0000A/5 35408.0500NaNS24024103Zabour, Miss. ThaminefemaleNaN10266514.4542NaNC25125203Strom, Mrs. Wilhelm (Elna Matilda Persson)female29.001134705410.4625G6S26126213Asplund, Master. Edvin Rojj Felixmale3.004234707731.3875NaNS26426503Henry, Miss. DeliafemaleNaN003826497.7500NaNQ26726813Persson, Mr. Ernst Ulrikmale25.00103470837.7750NaNS27127213Tornquist, Mr. William Henrymale25.0000LINE0.0000NaNS27928013Abbott, Mrs. Stanton (Rosa Hunt)female35.0011C.A. 267320.2500NaNS28328413Dorking, Mr. Edward Arthurmale19.0000A/5. 104828.0500NaNS29329403Haas, Miss. Aloisiafemale24.00003492368.8500NaNS29829911Saalfeld, Mr. AdolphemaleNaN001998830.5000C106S30130213McCoy, Mr. BernardmaleNaN2036722623.2500NaNQ31231302Lahtinen, Mrs. William (Anna Sylfven)female26.001125065126.0000NaNS33833913Dahl, Mr. Karl Edwartmale45.000075988.0500NaNS36236303Barbara, Mrs. (Catherine David)female45.0001269114.4542NaNC39039111Carter, Mr. William Ernestmale36.0012113760120.0000B96 B98S40240303Jussila, Miss. Mari Ainafemale21.001041379.8250NaNS44744811Seward, Mr. Frederic Kimbermale34.000011379426.5500NaNS47447503Strandberg, Miss. Ida Sofiafemale22.000075539.8375NaNS48348413Turkula, Mrs. (Hedwig)female63.000041349.5875NaNS48949013Coutts, Master. Eden Leslie Nevillemale9.0011C.A. 3767115.9000NaNS50150203Canavan, Miss. Maryfemale21.00003648467.7500NaNQ50350403Laitinen, Miss. Kristina Sofiafemale37.000041359.5875NaNS50550601Penasco y Castellana, Mr. Victor de Satodemale18.0010PC 17758108.9000C65C56456503Meanwell, Miss. (Marion Ogden)femaleNaN00SOTON/O.Q. 3920878.0500NaNS56756803Palsson, Mrs. Nils (Alma Cornelia Berglund)female29.000434990921.0750NaNS57057112Harris, Mr. Georgemale62.0000S.W./PP 75210.5000NaNS58758811Frolicher-Stehli, Mr. Maxmillianmale60.00111356779.2000B41C64264303Skoog, Miss. Margit Elizabethfemale2.003234708827.9000NaNS64364413Foo, Mr. ChoongmaleNaN00160156.4958NaNS64764811Simonius-Blumer, Col. Oberst Alfonsmale56.00001321335.5000A26C65465503Hegarty, Miss. Hanora Norafemale18.00003652266.7500NaNQ68068103Peters, Miss. KatiefemaleNaN003309358.1375NaNQ71271311Taylor, Mr. Elmer Zebleymale48.00101999652.0000C126S74074111Hawksford, Mr. Walter JamesmaleNaN001698830.0000D45S76276313Barah, Mr. Hanna Assimale20.000026637.2292NaNC78878913Dean, Master. Bertram Veremale1.0012C.A. 231520.5750NaNS80380413Thomas, Master. Assad Alexandermale0.420126258.5167NaNC83883913Chip, Mr. Changmale32.0000160156.4958NaNS83984011Marechal, Mr. PierremaleNaN001177429.7000C47C85285303Boulos, Miss. Nourelainfemale9.0011267815.2458NaNC88288303Dahlberg, Miss. Gerda Ulrikafemale22.0000755210.5167NaNS 对比bad case我们仔细看看我们预测错的样本到底是哪些特征有问题咱们处理得还不够细 我们随便列一些可能可以做的优化操作 Age属性不使用现在的拟合方式而是根据名称中的『Mr』『Mrs』『Miss』等的平均值进行填充。Age不做成一个连续值属性而是使用一个步长进行离散化变成离散的类目feature。Cabin再细化一些对于有记录的Cabin属性我们将其分为前面的字母部分(我猜是位置和船层之类的信息) 和 后面的数字部分(应该是房间号有意思的事情是如果你仔细看看原始数据你会发现这个值大的情况下似乎获救的可能性高一些)。Pclass和Sex俩太重要了我们试着用它们去组出一个组合属性来试试这也是另外一种程度的细化。单加一个Child字段Age12的设为1其余为0(你去看看数据确实小盆友优先程度很高啊)如果名字里面有『Mrs』而Parch1的我们猜测她可能是一个母亲应该获救的概率也会提高因此可以多加一个Mother字段此种情况下设为1其余情况下设为0登船港口可以考虑先去掉试试(Q和C本来就没权重S有点诡异)把堂兄弟/兄妹 和 Parch 还有自己 个数加在一起组一个Family_size字段(考虑到大家族可能对最后的结果有影响)Name是一个我们一直没有触碰的属性我们可以做一些简单的处理比如说男性中带某些字眼的(‘Capt’, ‘Don’, ‘Major’, ‘Sir’)可以统一到一个Title女性也一样。 大家接着往下挖掘可能还可以想到更多可以细挖的部分。我这里先列这些了然后我们可以使用手头上的”train_df”和”cv_df”开始试验这些feature engineering的tricks是否有效了。 data_train[data_train[Name].str.contains(Major)]PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked44945011Peuchen, Major. Arthur Godfreymale520011378630.50YesS53653701Butt, Major. Archibald Willinghammale450011305026.55YesS data_train pd.read_csv(Train.csv) data_train[Sex_Pclass] data_train.Sex _ data_train.Pclass.map(str)from sklearn.ensemble import RandomForestRegressor### 使用 RandomForestClassifier 填补缺失的年龄属性 def set_missing_ages(df):# 把已有的数值型特征取出来丢进Random Forest Regressor中age_df df[[Age,Fare, Parch, SibSp, Pclass]]# 乘客分成已知年龄和未知年龄两部分known_age age_df[age_df.Age.notnull()].as_matrix()unknown_age age_df[age_df.Age.isnull()].as_matrix()# y即目标年龄y known_age[:, 0]# X即特征属性值X known_age[:, 1:]# fit到RandomForestRegressor之中rfr RandomForestRegressor(random_state0, n_estimators2000, n_jobs-1)rfr.fit(X, y)# 用得到的模型进行未知年龄结果预测predictedAges rfr.predict(unknown_age[:, 1::])# 用得到的预测结果填补原缺失数据df.loc[ (df.Age.isnull()), Age ] predictedAges return df, rfrdef set_Cabin_type(df):df.loc[ (df.Cabin.notnull()), Cabin ] Yesdf.loc[ (df.Cabin.isnull()), Cabin ] Noreturn dfdata_train, rfr set_missing_ages(data_train) data_train set_Cabin_type(data_train)dummies_Cabin pd.get_dummies(data_train[Cabin], prefix Cabin) dummies_Embarked pd.get_dummies(data_train[Embarked], prefix Embarked) dummies_Sex pd.get_dummies(data_train[Sex], prefix Sex) dummies_Pclass pd.get_dummies(data_train[Pclass], prefix Pclass) dummies_Sex_Pclass pd.get_dummies(data_train[Sex_Pclass], prefix Sex_Pclass)df pd.concat([data_train, dummies_Cabin, dummies_Embarked, dummies_Sex, dummies_Pclass, dummies_Sex_Pclass], axis1) df.drop([Pclass, Name, Sex, Ticket, Cabin, Embarked, Sex_Pclass], axis1, inplaceTrue) import sklearn.preprocessing as preprocessing scaler preprocessing.StandardScaler() age_scale_param scaler.fit(np.array(df[Age]).reshape(-1,1)) df[Age_scaled] scaler.fit_transform(np.array(df[Age]).reshape(-1,1), age_scale_param) fare_scale_param scaler.fit(np.array(df[Fare]).reshape(-1,1)) df[Fare_scaled] scaler.fit_transform(np.array(df[Fare]).reshape(-1,1), fare_scale_param)from sklearn import linear_modeltrain_df df.filter(regexSurvived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass.*) train_np train_df.as_matrix()# y即Survival结果 y train_np[:, 0]# X即特征属性值 X train_np[:, 1:]# fit到RandomForestRegressor之中 clf linear_model.LogisticRegression(C1.0, penaltyl1, tol1e-6) clf.fit(X, y) clfF:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:13: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.del sys.path[0] F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:14: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:61: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.LogisticRegression(C1.0, class_weightNone, dualFalse, fit_interceptTrue,intercept_scaling1, max_iter100, multi_classovr, n_jobs1,penaltyl1, random_stateNone, solverliblinear, tol1e-06,verbose0, warm_startFalse)data_test pd.read_csv(test.csv) data_test.loc[ (data_test.Fare.isnull()), Fare ] 0 data_test[Sex_Pclass] data_test.Sex _ data_test.Pclass.map(str) # 接着我们对test_data做和train_data中一致的特征变换 # 首先用同样的RandomForestRegressor模型填上丢失的年龄 tmp_df data_test[[Age,Fare, Parch, SibSp, Pclass]] null_age tmp_df[data_test.Age.isnull()].as_matrix() # 根据特征属性X预测年龄并补上 X null_age[:, 1:] predictedAges rfr.predict(X) data_test.loc[ (data_test.Age.isnull()), Age ] predictedAgesdata_test set_Cabin_type(data_test) dummies_Cabin pd.get_dummies(data_test[Cabin], prefix Cabin) dummies_Embarked pd.get_dummies(data_test[Embarked], prefix Embarked) dummies_Sex pd.get_dummies(data_test[Sex], prefix Sex) dummies_Pclass pd.get_dummies(data_test[Pclass], prefix Pclass) dummies_Sex_Pclass pd.get_dummies(data_test[Sex_Pclass], prefix Sex_Pclass)df_test pd.concat([data_test, dummies_Cabin, dummies_Embarked, dummies_Sex, dummies_Pclass, dummies_Sex_Pclass], axis1) df_test.drop([Pclass, Name, Sex, Ticket, Cabin, Embarked, Sex_Pclass], axis1, inplaceTrue) df_test[Age_scaled] scaler.fit_transform(np.array(df_test[Age]).reshape(-1,1), age_scale_param) df_test[Fare_scaled] scaler.fit_transform(np.array(df_test[Fare]).reshape(-1,1), fare_scale_param) df_testF:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:7: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.import sysPassengerIdAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_S...Pclass_2Pclass_3Sex_Pclass_female_1Sex_Pclass_female_2Sex_Pclass_female_3Sex_Pclass_male_1Sex_Pclass_male_2Sex_Pclass_male_3Age_scaledFare_scaled089234.500000007.829210010...010000010.307521-0.496637189347.000000107.000010001...010010001.256241-0.511497289462.000000009.687510010...100000102.394706-0.463335389527.000000008.662510001...01000001-0.261711-0.481704489622.0000001112.287510001...01001000-0.641199-0.416740589714.000000009.225010001...01000001-1.248380-0.471623689830.000000007.629210010...01001000-0.034018-0.500221789926.0000001129.000010001...10000010-0.337609-0.117238890018.000000007.229210100...01001000-0.944790-0.507390990121.0000002024.150010001...01000001-0.717097-0.2041541090227.947206007.895810001...01000001-0.189820-0.4954441190346.0000000026.000010001...000001001.180344-0.1710001290423.0000001082.266701001...00100000-0.5653010.8373491390563.0000001026.000010001...100000102.470603-0.1710001490647.0000001061.175001001...001000001.2562410.4593671590724.0000001027.720810100...10010000-0.489404-0.1401621690835.0000000012.350010010...100000100.345470-0.4156201790921.000000007.225010100...01000001-0.717097-0.5074651891027.000000107.925010001...01001000-0.261711-0.4949201991145.000000007.225010100...010010001.104446-0.5074652091255.0000001059.400010100...000001001.8634220.427557219139.000000013.170810001...01000001-1.627868-0.5801202291452.3143110031.683310001...001000001.659585-0.0691512391521.0000000161.379210100...00000100-0.7170970.4630262491648.00000013262.375001100...001000001.3321394.0650492591750.0000001014.500010001...010000011.483934-0.3770902691822.0000000161.979201100...00100000-0.6411990.4737792791922.500000007.225010100...01000001-0.603250-0.5074652892041.0000000030.500001001...000001000.800856-0.0903562992123.4596832021.679210100...01000001-0.530413-0.248433..................................................................388128021.000000007.750010010...01000001-0.717097-0.49805638912816.0000003121.075010001...01000001-1.855561-0.259261390128223.0000000093.500001001...00000100-0.5653011.038659391128351.0000000139.400001001...001000001.5598320.069140392128413.0000000220.250010001...01000001-1.324278-0.274045393128547.0000000010.500010001...100000101.256241-0.448774394128629.0000003122.025010001...01000001-0.109916-0.242236395128718.0000001060.000001001...00100000-0.9447900.438310396128824.000000007.250010010...01000001-0.489404-0.507017397128948.0000001179.200001100...001000001.3321390.782391398129022.000000007.775010001...01000001-0.641199-0.497608399129131.000000007.733310010...010000010.041880-0.498356400129230.00000000164.866701001...00100000-0.0340182.317614401129338.0000001021.000010001...100000100.573163-0.260605402129422.0000000159.400010100...00100000-0.6411990.427557403129517.0000000047.100010001...00000100-1.0206870.207130404129643.0000001027.720801100...000001000.952651-0.140162405129720.0000000013.862501100...10000010-0.792994-0.388515406129823.0000001010.500010001...10000010-0.565301-0.448774407129950.00000011211.500001100...000001001.4839343.153324408130019.895581007.720810010...01001000-0.800919-0.49858040913013.0000001113.775010001...01001000-2.083254-0.390083410130235.295824007.750010010...010010000.367922-0.498056411130337.0000001090.000001010...001000000.4972650.975936412130428.000000007.775010001...01001000-0.185813-0.497608413130530.705727008.050010001...010000010.019545-0.492680414130639.00000000108.900001100...001000000.6490611.314641415130738.500000007.250010001...010000010.611112-0.507017416130830.705727008.050010001...010000010.019545-0.492680417130925.7833771122.358310100...01000001-0.354050-0.236263 418 rows × 23 columns test df_test.filter(regexAge_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass.*) predictions clf.predict(test) result pd.DataFrame({PassengerId:data_test[PassengerId].as_matrix(), Survived:predictions.astype(np.int32)}) result.to_csv(logistic_regression_predictions2.csv, indexFalse)F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:3: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.This is separate from the ipykernel package so we can avoid doing imports until一般做到后期咱们要进行模型优化的方法就是模型融合啦 先解释解释啥叫模型融合哈我们还是举几个例子直观理解一下好了。 大家都看过知识问答的综艺节目中求助现场观众时候让观众投票最高的答案作为自己的答案的形式吧每个人都有一个判定结果最后我们相信答案在大多数人手里。 再通俗一点举个例子。你和你班某数学大神关系好每次作业都『模仿』他的于是绝大多数情况下他做对了你也对了。突然某一天大神脑子犯糊涂手一抖写错了一个数于是…恩你也只能跟着错了。 我们再来看看另外一个场景你和你班5个数学大神关系都很好每次都把他们作业拿过来对比一下再『自己做』那你想想如果哪天某大神犯糊涂了写错了but另外四个写对了啊那你肯定相信另外4人的是正确答案吧 最简单的模型融合大概就是这么个意思比如分类问题当我们手头上有一堆在同一份数据集上训练得到的分类器(比如logistic regressionSVMKNNrandom forest神经网络)那我们让他们都分别去做判定然后对结果做投票统计取票数最多的结果为最后结果。 bingo问题就这么完美的解决了。 模型融合可以比较好地缓解训练过程中产生的过拟合问题从而对于结果的准确度提升有一定的帮助。 话说回来回到我们现在的问题。你看我们现在只讲了logistic regression如果我们还想用这个融合思想去提高我们的结果我们该怎么做呢 既然这个时候模型没得选那咱们就在数据上动动手脚咯。大家想想如果模型出现过拟合现在一定是在我们的训练上出现拟合过度造成的对吧。 那我们干脆就不要用全部的训练集每次取训练集的一个subset做训练这样我们虽然用的是同一个机器学习算法但是得到的模型却是不一样的同时因为我们没有任何一份子数据集是全的因此即使出现过拟合也是在子训练集上出现过拟合而不是全体数据上这样做一个融合可能对最后的结果有一定的帮助。对这就是常用的Bagging。 我们用scikit-learn里面的Bagging来完成上面的思路过程非常简单。代码如下 第六步模型融合 from sklearn.ensemble import BaggingRegressortrain_df df.filter(regexSurvived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass.*|Mother|Child|Family|Title) train_np train_df.as_matrix()# y即Survival结果 y train_np[:, 0]# X即特征属性值 X train_np[:, 1:]# fit到BaggingRegressor之中 clf linear_model.LogisticRegression(C1.0, penaltyl1, tol1e-6) bagging_clf BaggingRegressor(clf, n_estimators10, max_samples0.8, max_features1.0, bootstrapTrue, bootstrap_featuresFalse, n_jobs-1) bagging_clf.fit(X, y)test df_test.filter(regexAge_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass.*|Mother|Child|Family|Title) predictions bagging_clf.predict(test) result pd.DataFrame({PassengerId:data_test[PassengerId].as_matrix(), Survived:predictions.astype(np.int32)}) result.to_csv(logistic_regression_predictions2.csv, indexFalse)F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:4: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.after removing the cwd from sys.path. F:\ancoda\soft\envs\py27\lib\site-packages\ipykernel_launcher.py:19: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.resultPassengerIdSurvived08920189302894038950489605897068981789908900199010109020119030129041139050149061159071169080179090189100199110209120219130229141239150249161259170269181279190289200299210.........388128003891281039012820391128313921284039312850394128603951287139612880397128913981290039912910400129214011293040212941403129504041296040512970406129804071299040813001409130114101302041113031412130404131305041413061415130704161308041713090 418 rows × 2 columns 下面是咱们用别的分类器解决这个问题的代码 import numpy as np import pandas as pd from pandas import DataFrame from patsy import dmatrices import string from operator import itemgetter import json from sklearn.ensemble import RandomForestClassifier from sklearn.cross_validation import cross_val_score from sklearn.pipeline import Pipeline from sklearn.grid_search import GridSearchCV from sklearn.cross_validation import train_test_split,StratifiedShuffleSplit,StratifiedKFold from sklearn import preprocessing from sklearn.metrics import classification_report from sklearn.externals import joblib##Read configuration parameterstrain_filetrain.csv MODEL_PATH./ test_filetest.csv SUBMISSION_PATH./ seed 0print train_file,seed# 输出得分 def report(grid_scores, n_top3):top_scores sorted(grid_scores, keyitemgetter(1), reverseTrue)[:n_top]for i, score in enumerate(top_scores):print(Model with rank: {0}.format(i 1))print(Mean validation score: {0:.3f} (std: {1:.3f}).format(score.mean_validation_score,np.std(score.cv_validation_scores)))print(Parameters: {0}.format(score.parameters))print()#清理和处理数据 def substrings_in_string(big_string, substrings):for substring in substrings:if string.find(big_string, substring) ! -1:return substringprint big_stringreturn np.nanle preprocessing.LabelEncoder() encpreprocessing.OneHotEncoder() # def clean_and_munge_data(df):#处理缺省值df.Fare df.Fare.map(lambda x: np.nan if x0 else x) #使用0填充缺失值#处理一下名字生成Title字段title_list[Mrs, Mr, Master, Miss, Major, Rev,Dr, Ms, Mlle,Col, Capt, Mme, Countess,Don, Jonkheer]df[Title]df[Name].map(lambda x: substrings_in_string(x, title_list))#处理特殊的称呼全处理成mr, mrs, miss, masterdef replace_titles(x):titlex[Title]if title in [Mr,Don, Major, Capt, Jonkheer, Rev, Col]:return Mrelif title in [Master]:return Masterelif title in [Countess, Mme,Mrs]:return Mrselif title in [Mlle, Ms,Miss]:return Misselif title Dr:if x[Sex]Male:return Mrelse:return Mrselif title :if x[Sex]Male:return Masterelse:return Misselse:return titledf[Title]df.apply(replace_titles, axis1)#看看家族是否够大咳咳df[Family_Size]df[SibSp]df[Parch]df[Family]df[SibSp]*df[Parch]df.loc[ (df.Fare.isnull())(df.Pclass1),Fare] np.median(df[df[Pclass] 1][Fare].dropna())df.loc[ (df.Fare.isnull())(df.Pclass2),Fare] np.median( df[df[Pclass] 2][Fare].dropna())df.loc[ (df.Fare.isnull())(df.Pclass3),Fare] np.median(df[df[Pclass] 3][Fare].dropna())df[Gender] df[Sex].map( {female: 0, male: 1} ).astype(int)df[AgeFill]df[Age]mean_ages np.zeros(4)mean_ages[0]np.average(df[df[Title] Miss][Age].dropna())mean_ages[1]np.average(df[df[Title] Mrs][Age].dropna())mean_ages[2]np.average(df[df[Title] Mr][Age].dropna())mean_ages[3]np.average(df[df[Title] Master][Age].dropna())df.loc[ (df.Age.isnull()) (df.Title Miss) ,AgeFill] mean_ages[0]df.loc[ (df.Age.isnull()) (df.Title Mrs) ,AgeFill] mean_ages[1]df.loc[ (df.Age.isnull()) (df.Title Mr) ,AgeFill] mean_ages[2]df.loc[ (df.Age.isnull()) (df.Title Master) ,AgeFill] mean_ages[3]df[AgeCat]df[AgeFill]df.loc[ (df.AgeFill10) ,AgeCat] childdf.loc[ (df.AgeFill60),AgeCat] ageddf.loc[ (df.AgeFill10) (df.AgeFill 30) ,AgeCat] adultdf.loc[ (df.AgeFill30) (df.AgeFill 60) ,AgeCat] seniordf.Embarked df.Embarked.fillna(S)df.loc[ df.Cabin.isnull()True,Cabin] 0.5df.loc[ df.Cabin.isnull()False,Cabin] 1.5df[Fare_Per_Person]df[Fare]/(df[Family_Size]1)#Age times classdf[AgeClass]df[AgeFill]*df[Pclass]df[ClassFare]df[Pclass]*df[Fare_Per_Person]df[HighLow]df[Pclass]df.loc[ (df.Fare_Per_Person8) ,HighLow] Lowdf.loc[ (df.Fare_Per_Person8) ,HighLow] Highle.fit(df[Sex] )x_sexle.transform(df[Sex])df[Sex]x_sex.astype(np.float)le.fit( df[Ticket])x_Ticketle.transform( df[Ticket])df[Ticket]x_Ticket.astype(np.float)le.fit(df[Title])x_titlele.transform(df[Title])df[Title] x_title.astype(np.float)le.fit(df[HighLow])x_hlle.transform(df[HighLow])df[HighLow]x_hl.astype(np.float)le.fit(df[AgeCat])x_agele.transform(df[AgeCat])df[AgeCat] x_age.astype(np.float)le.fit(df[Embarked])x_emble.transform(df[Embarked])df[Embarked]x_emb.astype(np.float)df df.drop([PassengerId,Name,Age,Cabin], axis1) #remove Name,Age and PassengerIdreturn df#读取数据 traindfpd.read_csv(train_file) ##清洗数据 dfclean_and_munge_data(traindf) ########################################formula################################formula_mlSurvived~PclassC(Title)SexC(AgeCat)Fare_Per_PersonFareFamily_Size y_train, x_train dmatrices(formula_ml, datadf, return_typedataframe) y_train np.asarray(y_train).ravel() print y_train.shape,x_train.shape##选择训练和测试集 X_train, X_test, Y_train, Y_test train_test_split(x_train, y_train, test_size0.2,random_stateseed) #初始化分类器 clfRandomForestClassifier(n_estimators500, criterionentropy, max_depth5, min_samples_split1,min_samples_leaf1, max_featuresauto, bootstrapFalse, oob_scoreFalse, n_jobs1, random_stateseed,verbose0)###grid search找到最好的参数 param_grid dict( ) ##创建分类pipeline pipelinePipeline([ (clf,clf) ]) grid_search GridSearchCV(pipeline, param_gridparam_grid, verbose3,scoringaccuracy,\ cvStratifiedShuffleSplit(Y_train, n_iter10, test_size0.2, train_sizeNone, indicesNone, \ random_stateseed, n_iterationsNone)).fit(X_train, Y_train) # 对结果打分 print(Best score: %0.3f % grid_search.best_score_) print(grid_search.best_estimator_) report(grid_search.grid_scores_)print(-----grid search end------------) print (on all train set) scores cross_val_score(grid_search.best_estimator_, x_train, y_train,cv3,scoringaccuracy) print scores.mean(),scores print (on test set) scores cross_val_score(grid_search.best_estimator_, X_test, Y_test,cv3,scoringaccuracy) print scores.mean(),scores# 对结果打分print(classification_report(Y_train, grid_search.best_estimator_.predict(X_train) )) print(test data) print(classification_report(Y_test, grid_search.best_estimator_.predict(X_test) ))model_fileMODEL_PATHmodel-rf.pkl joblib.dump(grid_search.best_estimator_, model_file)/Users/MLS/Downloads/train.csv 0 (891,) (891, 12) Fitting 10 folds for each of 1 candidates, totalling 10 fits [CV] ................................................................ [CV] ....................................... , score0.860140 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.832168 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.818182 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.839161 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.811189 - 0.5s [CV] ................................................................ [CV] ....................................... , score0.874126 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.811189 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.783217 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.825175 - 0.4s [CV] ................................................................ [CV] ....................................... , score0.839161 - 0.4s[Parallel(n_jobs1)]: Done 1 jobs | elapsed: 0.4s [Parallel(n_jobs1)]: Done 10 out of 10 | elapsed: 4.1s finishedBest score: 0.829 Pipeline(steps[(clf, RandomForestClassifier(bootstrapFalse, class_weightNone,criterionentropy, max_depth5, max_featuresauto,max_leaf_nodesNone, min_samples_leaf1, min_samples_split1,min_weight_fraction_leaf0.0, n_estimators500, n_jobs1,oob_scoreFalse, random_state0, verbose0, warm_startFalse))]) Model with rank: 1 Mean validation score: 0.829 (std: 0.025) Parameters: {}-----grid search end------------ on all train set 0.826038159371 [ 0.81144781 0.83501684 0.83164983] on test set 0.782203389831 [ 0.76666667 0.78333333 0.79661017]precision recall f1-score support0.0 0.86 0.90 0.88 4391.0 0.83 0.75 0.79 273avg / total 0.85 0.85 0.85 712test dataprecision recall f1-score support0.0 0.86 0.87 0.86 1101.0 0.79 0.77 0.78 69avg / total 0.83 0.83 0.83 179[/Users/MLS/Downloads/model-rf.pkl,/Users/MLS/Downloads/model-rf.pkl_01.npy,/Users/MLS/Downloads/model-rf.pkl_02.npy,/Users/MLS/Downloads/model-rf.pkl_03.npy,/Users/MLS/Downloads/model-rf.pkl_04.npy,/Users/MLS/Downloads/model-rf.pkl_05.npy,/Users/MLS/Downloads/model-rf.pkl_06.npy,/Users/MLS/Downloads/model-rf.pkl_07.npy,/Users/MLS/Downloads/model-rf.pkl_08.npy,/Users/MLS/Downloads/model-rf.pkl_09.npy,/Users/MLS/Downloads/model-rf.pkl_10.npy,/Users/MLS/Downloads/model-rf.pkl_11.npy,/Users/MLS/Downloads/model-rf.pkl_12.npy,/Users/MLS/Downloads/model-rf.pkl_13.npy,/Users/MLS/Downloads/model-rf.pkl_14.npy,/Users/MLS/Downloads/model-rf.pkl_15.npy,/Users/MLS/Downloads/model-rf.pkl_16.npy,/Users/MLS/Downloads/model-rf.pkl_17.npy,/Users/MLS/Downloads/model-rf.pkl_18.npy,/Users/MLS/Downloads/model-rf.pkl_19.npy,/Users/MLS/Downloads/model-rf.pkl_20.npy,/Users/MLS/Downloads/model-rf.pkl_21.npy,/Users/MLS/Downloads/model-rf.pkl_22.npy,/Users/MLS/Downloads/model-rf.pkl_23.npy,/Users/MLS/Downloads/model-rf.pkl_24.npy,/Users/MLS/Downloads/model-rf.pkl_25.npy,/Users/MLS/Downloads/model-rf.pkl_26.npy,/Users/MLS/Downloads/model-rf.pkl_27.npy,/Users/MLS/Downloads/model-rf.pkl_28.npy,/Users/MLS/Downloads/model-rf.pkl_29.npy,/Users/MLS/Downloads/model-rf.pkl_30.npy,/Users/MLS/Downloads/model-rf.pkl_31.npy,/Users/MLS/Downloads/model-rf.pkl_32.npy,/Users/MLS/Downloads/model-rf.pkl_33.npy,/Users/MLS/Downloads/model-rf.pkl_34.npy,/Users/MLS/Downloads/model-rf.pkl_35.npy,/Users/MLS/Downloads/model-rf.pkl_36.npy,/Users/MLS/Downloads/model-rf.pkl_37.npy,/Users/MLS/Downloads/model-rf.pkl_38.npy,/Users/MLS/Downloads/model-rf.pkl_39.npy,/Users/MLS/Downloads/model-rf.pkl_40.npy,/Users/MLS/Downloads/model-rf.pkl_41.npy,/Users/MLS/Downloads/model-rf.pkl_42.npy,/Users/MLS/Downloads/model-rf.pkl_43.npy,/Users/MLS/Downloads/model-rf.pkl_44.npy,/Users/MLS/Downloads/model-rf.pkl_45.npy,/Users/MLS/Downloads/model-rf.pkl_46.npy,/Users/MLS/Downloads/model-rf.pkl_47.npy,/Users/MLS/Downloads/model-rf.pkl_48.npy,/Users/MLS/Downloads/model-rf.pkl_49.npy,/Users/MLS/Downloads/model-rf.pkl_50.npy,/Users/MLS/Downloads/model-rf.pkl_51.npy,/Users/MLS/Downloads/model-rf.pkl_52.npy,/Users/MLS/Downloads/model-rf.pkl_53.npy,/Users/MLS/Downloads/model-rf.pkl_54.npy,/Users/MLS/Downloads/model-rf.pkl_55.npy,/Users/MLS/Downloads/model-rf.pkl_56.npy,/Users/MLS/Downloads/model-rf.pkl_57.npy,/Users/MLS/Downloads/model-rf.pkl_58.npy,/Users/MLS/Downloads/model-rf.pkl_59.npy,/Users/MLS/Downloads/model-rf.pkl_60.npy,/Users/MLS/Downloads/model-rf.pkl_61.npy,/Users/MLS/Downloads/model-rf.pkl_62.npy,/Users/MLS/Downloads/model-rf.pkl_63.npy,/Users/MLS/Downloads/model-rf.pkl_64.npy,/Users/MLS/Downloads/model-rf.pkl_65.npy,/Users/MLS/Downloads/model-rf.pkl_66.npy,/Users/MLS/Downloads/model-rf.pkl_67.npy,/Users/MLS/Downloads/model-rf.pkl_68.npy,/Users/MLS/Downloads/model-rf.pkl_69.npy,/Users/MLS/Downloads/model-rf.pkl_70.npy,/Users/MLS/Downloads/model-rf.pkl_71.npy,/Users/MLS/Downloads/model-rf.pkl_72.npy,/Users/MLS/Downloads/model-rf.pkl_73.npy,/Users/MLS/Downloads/model-rf.pkl_74.npy,/Users/MLS/Downloads/model-rf.pkl_75.npy,/Users/MLS/Downloads/model-rf.pkl_76.npy,/Users/MLS/Downloads/model-rf.pkl_77.npy,/Users/MLS/Downloads/model-rf.pkl_78.npy,/Users/MLS/Downloads/model-rf.pkl_79.npy,/Users/MLS/Downloads/model-rf.pkl_80.npy,/Users/MLS/Downloads/model-rf.pkl_81.npy,/Users/MLS/Downloads/model-rf.pkl_82.npy,/Users/MLS/Downloads/model-rf.pkl_83.npy,/Users/MLS/Downloads/model-rf.pkl_84.npy,/Users/MLS/Downloads/model-rf.pkl_85.npy,/Users/MLS/Downloads/model-rf.pkl_86.npy,/Users/MLS/Downloads/model-rf.pkl_87.npy,/Users/MLS/Downloads/model-rf.pkl_88.npy,/Users/MLS/Downloads/model-rf.pkl_89.npy,/Users/MLS/Downloads/model-rf.pkl_90.npy,/Users/MLS/Downloads/model-rf.pkl_91.npy,/Users/MLS/Downloads/model-rf.pkl_92.npy,/Users/MLS/Downloads/model-rf.pkl_93.npy,/Users/MLS/Downloads/model-rf.pkl_94.npy,/Users/MLS/Downloads/model-rf.pkl_95.npy,/Users/MLS/Downloads/model-rf.pkl_96.npy,/Users/MLS/Downloads/model-rf.pkl_97.npy,/Users/MLS/Downloads/model-rf.pkl_98.npy,/Users/MLS/Downloads/model-rf.pkl_99.npy,/Users/MLS/Downloads/model-rf.pkl_100.npy,/Users/MLS/Downloads/model-rf.pkl_101.npy,/Users/MLS/Downloads/model-rf.pkl_102.npy,/Users/MLS/Downloads/model-rf.pkl_103.npy,/Users/MLS/Downloads/model-rf.pkl_104.npy,/Users/MLS/Downloads/model-rf.pkl_105.npy,/Users/MLS/Downloads/model-rf.pkl_106.npy,/Users/MLS/Downloads/model-rf.pkl_107.npy,/Users/MLS/Downloads/model-rf.pkl_108.npy,/Users/MLS/Downloads/model-rf.pkl_109.npy,/Users/MLS/Downloads/model-rf.pkl_110.npy,/Users/MLS/Downloads/model-rf.pkl_111.npy,/Users/MLS/Downloads/model-rf.pkl_112.npy,/Users/MLS/Downloads/model-rf.pkl_113.npy,/Users/MLS/Downloads/model-rf.pkl_114.npy,/Users/MLS/Downloads/model-rf.pkl_115.npy,/Users/MLS/Downloads/model-rf.pkl_116.npy,/Users/MLS/Downloads/model-rf.pkl_117.npy,/Users/MLS/Downloads/model-rf.pkl_118.npy,/Users/MLS/Downloads/model-rf.pkl_119.npy,/Users/MLS/Downloads/model-rf.pkl_120.npy,/Users/MLS/Downloads/model-rf.pkl_121.npy,/Users/MLS/Downloads/model-rf.pkl_122.npy,/Users/MLS/Downloads/model-rf.pkl_123.npy,/Users/MLS/Downloads/model-rf.pkl_124.npy,/Users/MLS/Downloads/model-rf.pkl_125.npy,/Users/MLS/Downloads/model-rf.pkl_126.npy,/Users/MLS/Downloads/model-rf.pkl_127.npy,/Users/MLS/Downloads/model-rf.pkl_128.npy,/Users/MLS/Downloads/model-rf.pkl_129.npy,/Users/MLS/Downloads/model-rf.pkl_130.npy,/Users/MLS/Downloads/model-rf.pkl_131.npy,/Users/MLS/Downloads/model-rf.pkl_132.npy,/Users/MLS/Downloads/model-rf.pkl_133.npy,/Users/MLS/Downloads/model-rf.pkl_134.npy,/Users/MLS/Downloads/model-rf.pkl_135.npy,/Users/MLS/Downloads/model-rf.pkl_136.npy,/Users/MLS/Downloads/model-rf.pkl_137.npy,/Users/MLS/Downloads/model-rf.pkl_138.npy,/Users/MLS/Downloads/model-rf.pkl_139.npy,/Users/MLS/Downloads/model-rf.pkl_140.npy,/Users/MLS/Downloads/model-rf.pkl_141.npy,/Users/MLS/Downloads/model-rf.pkl_142.npy,/Users/MLS/Downloads/model-rf.pkl_143.npy,/Users/MLS/Downloads/model-rf.pkl_144.npy,/Users/MLS/Downloads/model-rf.pkl_145.npy,/Users/MLS/Downloads/model-rf.pkl_146.npy,/Users/MLS/Downloads/model-rf.pkl_147.npy,/Users/MLS/Downloads/model-rf.pkl_148.npy,/Users/MLS/Downloads/model-rf.pkl_149.npy,/Users/MLS/Downloads/model-rf.pkl_150.npy,/Users/MLS/Downloads/model-rf.pkl_151.npy,/Users/MLS/Downloads/model-rf.pkl_152.npy,/Users/MLS/Downloads/model-rf.pkl_153.npy,/Users/MLS/Downloads/model-rf.pkl_154.npy,/Users/MLS/Downloads/model-rf.pkl_155.npy,/Users/MLS/Downloads/model-rf.pkl_156.npy,/Users/MLS/Downloads/model-rf.pkl_157.npy,/Users/MLS/Downloads/model-rf.pkl_158.npy,/Users/MLS/Downloads/model-rf.pkl_159.npy,/Users/MLS/Downloads/model-rf.pkl_160.npy,/Users/MLS/Downloads/model-rf.pkl_161.npy,/Users/MLS/Downloads/model-rf.pkl_162.npy,/Users/MLS/Downloads/model-rf.pkl_163.npy,/Users/MLS/Downloads/model-rf.pkl_164.npy,/Users/MLS/Downloads/model-rf.pkl_165.npy,/Users/MLS/Downloads/model-rf.pkl_166.npy,/Users/MLS/Downloads/model-rf.pkl_167.npy,/Users/MLS/Downloads/model-rf.pkl_168.npy,/Users/MLS/Downloads/model-rf.pkl_169.npy,/Users/MLS/Downloads/model-rf.pkl_170.npy,/Users/MLS/Downloads/model-rf.pkl_171.npy,/Users/MLS/Downloads/model-rf.pkl_172.npy,/Users/MLS/Downloads/model-rf.pkl_173.npy,/Users/MLS/Downloads/model-rf.pkl_174.npy,/Users/MLS/Downloads/model-rf.pkl_175.npy,/Users/MLS/Downloads/model-rf.pkl_176.npy,/Users/MLS/Downloads/model-rf.pkl_177.npy,/Users/MLS/Downloads/model-rf.pkl_178.npy,/Users/MLS/Downloads/model-rf.pkl_179.npy,/Users/MLS/Downloads/model-rf.pkl_180.npy,/Users/MLS/Downloads/model-rf.pkl_181.npy,/Users/MLS/Downloads/model-rf.pkl_182.npy,/Users/MLS/Downloads/model-rf.pkl_183.npy,/Users/MLS/Downloads/model-rf.pkl_184.npy,/Users/MLS/Downloads/model-rf.pkl_185.npy,/Users/MLS/Downloads/model-rf.pkl_186.npy,/Users/MLS/Downloads/model-rf.pkl_187.npy,/Users/MLS/Downloads/model-rf.pkl_188.npy,/Users/MLS/Downloads/model-rf.pkl_189.npy,/Users/MLS/Downloads/model-rf.pkl_190.npy,/Users/MLS/Downloads/model-rf.pkl_191.npy,/Users/MLS/Downloads/model-rf.pkl_192.npy,/Users/MLS/Downloads/model-rf.pkl_193.npy,/Users/MLS/Downloads/model-rf.pkl_194.npy,/Users/MLS/Downloads/model-rf.pkl_195.npy,/Users/MLS/Downloads/model-rf.pkl_196.npy,/Users/MLS/Downloads/model-rf.pkl_197.npy,/Users/MLS/Downloads/model-rf.pkl_198.npy,/Users/MLS/Downloads/model-rf.pkl_199.npy,/Users/MLS/Downloads/model-rf.pkl_200.npy,/Users/MLS/Downloads/model-rf.pkl_201.npy,/Users/MLS/Downloads/model-rf.pkl_202.npy,/Users/MLS/Downloads/model-rf.pkl_203.npy,/Users/MLS/Downloads/model-rf.pkl_204.npy,/Users/MLS/Downloads/model-rf.pkl_205.npy,/Users/MLS/Downloads/model-rf.pkl_206.npy,/Users/MLS/Downloads/model-rf.pkl_207.npy,/Users/MLS/Downloads/model-rf.pkl_208.npy,/Users/MLS/Downloads/model-rf.pkl_209.npy,/Users/MLS/Downloads/model-rf.pkl_210.npy,/Users/MLS/Downloads/model-rf.pkl_211.npy,/Users/MLS/Downloads/model-rf.pkl_212.npy,/Users/MLS/Downloads/model-rf.pkl_213.npy,/Users/MLS/Downloads/model-rf.pkl_214.npy,/Users/MLS/Downloads/model-rf.pkl_215.npy,/Users/MLS/Downloads/model-rf.pkl_216.npy,/Users/MLS/Downloads/model-rf.pkl_217.npy,/Users/MLS/Downloads/model-rf.pkl_218.npy,/Users/MLS/Downloads/model-rf.pkl_219.npy,/Users/MLS/Downloads/model-rf.pkl_220.npy,/Users/MLS/Downloads/model-rf.pkl_221.npy,/Users/MLS/Downloads/model-rf.pkl_222.npy,/Users/MLS/Downloads/model-rf.pkl_223.npy,/Users/MLS/Downloads/model-rf.pkl_224.npy,/Users/MLS/Downloads/model-rf.pkl_225.npy,/Users/MLS/Downloads/model-rf.pkl_226.npy,/Users/MLS/Downloads/model-rf.pkl_227.npy,/Users/MLS/Downloads/model-rf.pkl_228.npy,/Users/MLS/Downloads/model-rf.pkl_229.npy,/Users/MLS/Downloads/model-rf.pkl_230.npy,/Users/MLS/Downloads/model-rf.pkl_231.npy,/Users/MLS/Downloads/model-rf.pkl_232.npy,/Users/MLS/Downloads/model-rf.pkl_233.npy,/Users/MLS/Downloads/model-rf.pkl_234.npy,/Users/MLS/Downloads/model-rf.pkl_235.npy,/Users/MLS/Downloads/model-rf.pkl_236.npy,/Users/MLS/Downloads/model-rf.pkl_237.npy,/Users/MLS/Downloads/model-rf.pkl_238.npy,/Users/MLS/Downloads/model-rf.pkl_239.npy,/Users/MLS/Downloads/model-rf.pkl_240.npy,/Users/MLS/Downloads/model-rf.pkl_241.npy,/Users/MLS/Downloads/model-rf.pkl_242.npy,/Users/MLS/Downloads/model-rf.pkl_243.npy,/Users/MLS/Downloads/model-rf.pkl_244.npy,/Users/MLS/Downloads/model-rf.pkl_245.npy,/Users/MLS/Downloads/model-rf.pkl_246.npy,/Users/MLS/Downloads/model-rf.pkl_247.npy,/Users/MLS/Downloads/model-rf.pkl_248.npy,/Users/MLS/Downloads/model-rf.pkl_249.npy,/Users/MLS/Downloads/model-rf.pkl_250.npy,/Users/MLS/Downloads/model-rf.pkl_251.npy,/Users/MLS/Downloads/model-rf.pkl_252.npy,/Users/MLS/Downloads/model-rf.pkl_253.npy,/Users/MLS/Downloads/model-rf.pkl_254.npy,/Users/MLS/Downloads/model-rf.pkl_255.npy,/Users/MLS/Downloads/model-rf.pkl_256.npy,/Users/MLS/Downloads/model-rf.pkl_257.npy,/Users/MLS/Downloads/model-rf.pkl_258.npy,/Users/MLS/Downloads/model-rf.pkl_259.npy,/Users/MLS/Downloads/model-rf.pkl_260.npy,/Users/MLS/Downloads/model-rf.pkl_261.npy,/Users/MLS/Downloads/model-rf.pkl_262.npy,/Users/MLS/Downloads/model-rf.pkl_263.npy,/Users/MLS/Downloads/model-rf.pkl_264.npy,/Users/MLS/Downloads/model-rf.pkl_265.npy,/Users/MLS/Downloads/model-rf.pkl_266.npy,/Users/MLS/Downloads/model-rf.pkl_267.npy,/Users/MLS/Downloads/model-rf.pkl_268.npy,/Users/MLS/Downloads/model-rf.pkl_269.npy,/Users/MLS/Downloads/model-rf.pkl_270.npy,/Users/MLS/Downloads/model-rf.pkl_271.npy,/Users/MLS/Downloads/model-rf.pkl_272.npy,/Users/MLS/Downloads/model-rf.pkl_273.npy,/Users/MLS/Downloads/model-rf.pkl_274.npy,/Users/MLS/Downloads/model-rf.pkl_275.npy,/Users/MLS/Downloads/model-rf.pkl_276.npy,/Users/MLS/Downloads/model-rf.pkl_277.npy,/Users/MLS/Downloads/model-rf.pkl_278.npy,/Users/MLS/Downloads/model-rf.pkl_279.npy,/Users/MLS/Downloads/model-rf.pkl_280.npy,/Users/MLS/Downloads/model-rf.pkl_281.npy,/Users/MLS/Downloads/model-rf.pkl_282.npy,/Users/MLS/Downloads/model-rf.pkl_283.npy,/Users/MLS/Downloads/model-rf.pkl_284.npy,/Users/MLS/Downloads/model-rf.pkl_285.npy,/Users/MLS/Downloads/model-rf.pkl_286.npy,/Users/MLS/Downloads/model-rf.pkl_287.npy,/Users/MLS/Downloads/model-rf.pkl_288.npy,/Users/MLS/Downloads/model-rf.pkl_289.npy,/Users/MLS/Downloads/model-rf.pkl_290.npy,/Users/MLS/Downloads/model-rf.pkl_291.npy,/Users/MLS/Downloads/model-rf.pkl_292.npy,/Users/MLS/Downloads/model-rf.pkl_293.npy,/Users/MLS/Downloads/model-rf.pkl_294.npy,/Users/MLS/Downloads/model-rf.pkl_295.npy,/Users/MLS/Downloads/model-rf.pkl_296.npy,/Users/MLS/Downloads/model-rf.pkl_297.npy,/Users/MLS/Downloads/model-rf.pkl_298.npy,/Users/MLS/Downloads/model-rf.pkl_299.npy,/Users/MLS/Downloads/model-rf.pkl_300.npy,/Users/MLS/Downloads/model-rf.pkl_301.npy,/Users/MLS/Downloads/model-rf.pkl_302.npy,/Users/MLS/Downloads/model-rf.pkl_303.npy,/Users/MLS/Downloads/model-rf.pkl_304.npy,/Users/MLS/Downloads/model-rf.pkl_305.npy,/Users/MLS/Downloads/model-rf.pkl_306.npy,/Users/MLS/Downloads/model-rf.pkl_307.npy,/Users/MLS/Downloads/model-rf.pkl_308.npy,/Users/MLS/Downloads/model-rf.pkl_309.npy,/Users/MLS/Downloads/model-rf.pkl_310.npy,/Users/MLS/Downloads/model-rf.pkl_311.npy,/Users/MLS/Downloads/model-rf.pkl_312.npy,/Users/MLS/Downloads/model-rf.pkl_313.npy,/Users/MLS/Downloads/model-rf.pkl_314.npy,/Users/MLS/Downloads/model-rf.pkl_315.npy,/Users/MLS/Downloads/model-rf.pkl_316.npy,/Users/MLS/Downloads/model-rf.pkl_317.npy,/Users/MLS/Downloads/model-rf.pkl_318.npy,/Users/MLS/Downloads/model-rf.pkl_319.npy,/Users/MLS/Downloads/model-rf.pkl_320.npy,/Users/MLS/Downloads/model-rf.pkl_321.npy,/Users/MLS/Downloads/model-rf.pkl_322.npy,/Users/MLS/Downloads/model-rf.pkl_323.npy,/Users/MLS/Downloads/model-rf.pkl_324.npy,/Users/MLS/Downloads/model-rf.pkl_325.npy,/Users/MLS/Downloads/model-rf.pkl_326.npy,/Users/MLS/Downloads/model-rf.pkl_327.npy,/Users/MLS/Downloads/model-rf.pkl_328.npy,/Users/MLS/Downloads/model-rf.pkl_329.npy,/Users/MLS/Downloads/model-rf.pkl_330.npy,/Users/MLS/Downloads/model-rf.pkl_331.npy,/Users/MLS/Downloads/model-rf.pkl_332.npy,/Users/MLS/Downloads/model-rf.pkl_333.npy,/Users/MLS/Downloads/model-rf.pkl_334.npy,/Users/MLS/Downloads/model-rf.pkl_335.npy,/Users/MLS/Downloads/model-rf.pkl_336.npy,/Users/MLS/Downloads/model-rf.pkl_337.npy,/Users/MLS/Downloads/model-rf.pkl_338.npy,/Users/MLS/Downloads/model-rf.pkl_339.npy,/Users/MLS/Downloads/model-rf.pkl_340.npy,/Users/MLS/Downloads/model-rf.pkl_341.npy,/Users/MLS/Downloads/model-rf.pkl_342.npy,/Users/MLS/Downloads/model-rf.pkl_343.npy,/Users/MLS/Downloads/model-rf.pkl_344.npy,/Users/MLS/Downloads/model-rf.pkl_345.npy,/Users/MLS/Downloads/model-rf.pkl_346.npy,/Users/MLS/Downloads/model-rf.pkl_347.npy,/Users/MLS/Downloads/model-rf.pkl_348.npy,/Users/MLS/Downloads/model-rf.pkl_349.npy,/Users/MLS/Downloads/model-rf.pkl_350.npy,/Users/MLS/Downloads/model-rf.pkl_351.npy,/Users/MLS/Downloads/model-rf.pkl_352.npy,/Users/MLS/Downloads/model-rf.pkl_353.npy,/Users/MLS/Downloads/model-rf.pkl_354.npy,/Users/MLS/Downloads/model-rf.pkl_355.npy,/Users/MLS/Downloads/model-rf.pkl_356.npy,/Users/MLS/Downloads/model-rf.pkl_357.npy,/Users/MLS/Downloads/model-rf.pkl_358.npy,/Users/MLS/Downloads/model-rf.pkl_359.npy,/Users/MLS/Downloads/model-rf.pkl_360.npy,/Users/MLS/Downloads/model-rf.pkl_361.npy,/Users/MLS/Downloads/model-rf.pkl_362.npy,/Users/MLS/Downloads/model-rf.pkl_363.npy,/Users/MLS/Downloads/model-rf.pkl_364.npy,/Users/MLS/Downloads/model-rf.pkl_365.npy,/Users/MLS/Downloads/model-rf.pkl_366.npy,/Users/MLS/Downloads/model-rf.pkl_367.npy,/Users/MLS/Downloads/model-rf.pkl_368.npy,/Users/MLS/Downloads/model-rf.pkl_369.npy,/Users/MLS/Downloads/model-rf.pkl_370.npy,/Users/MLS/Downloads/model-rf.pkl_371.npy,/Users/MLS/Downloads/model-rf.pkl_372.npy,/Users/MLS/Downloads/model-rf.pkl_373.npy,/Users/MLS/Downloads/model-rf.pkl_374.npy,/Users/MLS/Downloads/model-rf.pkl_375.npy,/Users/MLS/Downloads/model-rf.pkl_376.npy,/Users/MLS/Downloads/model-rf.pkl_377.npy,/Users/MLS/Downloads/model-rf.pkl_378.npy,/Users/MLS/Downloads/model-rf.pkl_379.npy,/Users/MLS/Downloads/model-rf.pkl_380.npy,/Users/MLS/Downloads/model-rf.pkl_381.npy,/Users/MLS/Downloads/model-rf.pkl_382.npy,/Users/MLS/Downloads/model-rf.pkl_383.npy,/Users/MLS/Downloads/model-rf.pkl_384.npy,/Users/MLS/Downloads/model-rf.pkl_385.npy,/Users/MLS/Downloads/model-rf.pkl_386.npy,/Users/MLS/Downloads/model-rf.pkl_387.npy,/Users/MLS/Downloads/model-rf.pkl_388.npy,/Users/MLS/Downloads/model-rf.pkl_389.npy,/Users/MLS/Downloads/model-rf.pkl_390.npy,/Users/MLS/Downloads/model-rf.pkl_391.npy,/Users/MLS/Downloads/model-rf.pkl_392.npy,/Users/MLS/Downloads/model-rf.pkl_393.npy,/Users/MLS/Downloads/model-rf.pkl_394.npy,/Users/MLS/Downloads/model-rf.pkl_395.npy,/Users/MLS/Downloads/model-rf.pkl_396.npy,/Users/MLS/Downloads/model-rf.pkl_397.npy,/Users/MLS/Downloads/model-rf.pkl_398.npy,/Users/MLS/Downloads/model-rf.pkl_399.npy,/Users/MLS/Downloads/model-rf.pkl_400.npy,/Users/MLS/Downloads/model-rf.pkl_401.npy,/Users/MLS/Downloads/model-rf.pkl_402.npy,/Users/MLS/Downloads/model-rf.pkl_403.npy,/Users/MLS/Downloads/model-rf.pkl_404.npy,/Users/MLS/Downloads/model-rf.pkl_405.npy,/Users/MLS/Downloads/model-rf.pkl_406.npy,/Users/MLS/Downloads/model-rf.pkl_407.npy,/Users/MLS/Downloads/model-rf.pkl_408.npy,/Users/MLS/Downloads/model-rf.pkl_409.npy,/Users/MLS/Downloads/model-rf.pkl_410.npy,/Users/MLS/Downloads/model-rf.pkl_411.npy,/Users/MLS/Downloads/model-rf.pkl_412.npy,/Users/MLS/Downloads/model-rf.pkl_413.npy,/Users/MLS/Downloads/model-rf.pkl_414.npy,/Users/MLS/Downloads/model-rf.pkl_415.npy,/Users/MLS/Downloads/model-rf.pkl_416.npy,/Users/MLS/Downloads/model-rf.pkl_417.npy,/Users/MLS/Downloads/model-rf.pkl_418.npy,/Users/MLS/Downloads/model-rf.pkl_419.npy,/Users/MLS/Downloads/model-rf.pkl_420.npy,/Users/MLS/Downloads/model-rf.pkl_421.npy,/Users/MLS/Downloads/model-rf.pkl_422.npy,/Users/MLS/Downloads/model-rf.pkl_423.npy,/Users/MLS/Downloads/model-rf.pkl_424.npy,/Users/MLS/Downloads/model-rf.pkl_425.npy,/Users/MLS/Downloads/model-rf.pkl_426.npy,/Users/MLS/Downloads/model-rf.pkl_427.npy,/Users/MLS/Downloads/model-rf.pkl_428.npy,/Users/MLS/Downloads/model-rf.pkl_429.npy,/Users/MLS/Downloads/model-rf.pkl_430.npy,/Users/MLS/Downloads/model-rf.pkl_431.npy,/Users/MLS/Downloads/model-rf.pkl_432.npy,/Users/MLS/Downloads/model-rf.pkl_433.npy,/Users/MLS/Downloads/model-rf.pkl_434.npy,/Users/MLS/Downloads/model-rf.pkl_435.npy,/Users/MLS/Downloads/model-rf.pkl_436.npy,/Users/MLS/Downloads/model-rf.pkl_437.npy,/Users/MLS/Downloads/model-rf.pkl_438.npy,/Users/MLS/Downloads/model-rf.pkl_439.npy,/Users/MLS/Downloads/model-rf.pkl_440.npy,/Users/MLS/Downloads/model-rf.pkl_441.npy,/Users/MLS/Downloads/model-rf.pkl_442.npy,/Users/MLS/Downloads/model-rf.pkl_443.npy,/Users/MLS/Downloads/model-rf.pkl_444.npy,/Users/MLS/Downloads/model-rf.pkl_445.npy,/Users/MLS/Downloads/model-rf.pkl_446.npy,/Users/MLS/Downloads/model-rf.pkl_447.npy,/Users/MLS/Downloads/model-rf.pkl_448.npy,/Users/MLS/Downloads/model-rf.pkl_449.npy,/Users/MLS/Downloads/model-rf.pkl_450.npy,/Users/MLS/Downloads/model-rf.pkl_451.npy,/Users/MLS/Downloads/model-rf.pkl_452.npy,/Users/MLS/Downloads/model-rf.pkl_453.npy,/Users/MLS/Downloads/model-rf.pkl_454.npy,/Users/MLS/Downloads/model-rf.pkl_455.npy,/Users/MLS/Downloads/model-rf.pkl_456.npy,/Users/MLS/Downloads/model-rf.pkl_457.npy,/Users/MLS/Downloads/model-rf.pkl_458.npy,/Users/MLS/Downloads/model-rf.pkl_459.npy,/Users/MLS/Downloads/model-rf.pkl_460.npy,/Users/MLS/Downloads/model-rf.pkl_461.npy,/Users/MLS/Downloads/model-rf.pkl_462.npy,/Users/MLS/Downloads/model-rf.pkl_463.npy,/Users/MLS/Downloads/model-rf.pkl_464.npy,/Users/MLS/Downloads/model-rf.pkl_465.npy,/Users/MLS/Downloads/model-rf.pkl_466.npy,/Users/MLS/Downloads/model-rf.pkl_467.npy,/Users/MLS/Downloads/model-rf.pkl_468.npy,/Users/MLS/Downloads/model-rf.pkl_469.npy,/Users/MLS/Downloads/model-rf.pkl_470.npy,/Users/MLS/Downloads/model-rf.pkl_471.npy,/Users/MLS/Downloads/model-rf.pkl_472.npy,/Users/MLS/Downloads/model-rf.pkl_473.npy,/Users/MLS/Downloads/model-rf.pkl_474.npy,/Users/MLS/Downloads/model-rf.pkl_475.npy,/Users/MLS/Downloads/model-rf.pkl_476.npy,/Users/MLS/Downloads/model-rf.pkl_477.npy,/Users/MLS/Downloads/model-rf.pkl_478.npy,/Users/MLS/Downloads/model-rf.pkl_479.npy,/Users/MLS/Downloads/model-rf.pkl_480.npy,/Users/MLS/Downloads/model-rf.pkl_481.npy,/Users/MLS/Downloads/model-rf.pkl_482.npy,/Users/MLS/Downloads/model-rf.pkl_483.npy,/Users/MLS/Downloads/model-rf.pkl_484.npy,/Users/MLS/Downloads/model-rf.pkl_485.npy,/Users/MLS/Downloads/model-rf.pkl_486.npy,/Users/MLS/Downloads/model-rf.pkl_487.npy,/Users/MLS/Downloads/model-rf.pkl_488.npy,/Users/MLS/Downloads/model-rf.pkl_489.npy,/Users/MLS/Downloads/model-rf.pkl_490.npy,/Users/MLS/Downloads/model-rf.pkl_491.npy,/Users/MLS/Downloads/model-rf.pkl_492.npy,/Users/MLS/Downloads/model-rf.pkl_493.npy,/Users/MLS/Downloads/model-rf.pkl_494.npy,/Users/MLS/Downloads/model-rf.pkl_495.npy,/Users/MLS/Downloads/model-rf.pkl_496.npy,/Users/MLS/Downloads/model-rf.pkl_497.npy,/Users/MLS/Downloads/model-rf.pkl_498.npy,/Users/MLS/Downloads/model-rf.pkl_499.npy,/Users/MLS/Downloads/model-rf.pkl_500.npy,/Users/MLS/Downloads/model-rf.pkl_501.npy,/Users/MLS/Downloads/model-rf.pkl_502.npy,/Users/MLS/Downloads/model-rf.pkl_503.npy,/Users/MLS/Downloads/model-rf.pkl_504.npy,/Users/MLS/Downloads/model-rf.pkl_505.npy,/Users/MLS/Downloads/model-rf.pkl_506.npy,/Users/MLS/Downloads/model-rf.pkl_507.npy,/Users/MLS/Downloads/model-rf.pkl_508.npy,/Users/MLS/Downloads/model-rf.pkl_509.npy,/Users/MLS/Downloads/model-rf.pkl_510.npy,/Users/MLS/Downloads/model-rf.pkl_511.npy,/Users/MLS/Downloads/model-rf.pkl_512.npy,/Users/MLS/Downloads/model-rf.pkl_513.npy,/Users/MLS/Downloads/model-rf.pkl_514.npy,/Users/MLS/Downloads/model-rf.pkl_515.npy,/Users/MLS/Downloads/model-rf.pkl_516.npy,/Users/MLS/Downloads/model-rf.pkl_517.npy,/Users/MLS/Downloads/model-rf.pkl_518.npy,/Users/MLS/Downloads/model-rf.pkl_519.npy,/Users/MLS/Downloads/model-rf.pkl_520.npy,/Users/MLS/Downloads/model-rf.pkl_521.npy,/Users/MLS/Downloads/model-rf.pkl_522.npy,/Users/MLS/Downloads/model-rf.pkl_523.npy,/Users/MLS/Downloads/model-rf.pkl_524.npy,/Users/MLS/Downloads/model-rf.pkl_525.npy,/Users/MLS/Downloads/model-rf.pkl_526.npy,/Users/MLS/Downloads/model-rf.pkl_527.npy,/Users/MLS/Downloads/model-rf.pkl_528.npy,/Users/MLS/Downloads/model-rf.pkl_529.npy,/Users/MLS/Downloads/model-rf.pkl_530.npy,/Users/MLS/Downloads/model-rf.pkl_531.npy,/Users/MLS/Downloads/model-rf.pkl_532.npy,/Users/MLS/Downloads/model-rf.pkl_533.npy,/Users/MLS/Downloads/model-rf.pkl_534.npy,/Users/MLS/Downloads/model-rf.pkl_535.npy,/Users/MLS/Downloads/model-rf.pkl_536.npy,/Users/MLS/Downloads/model-rf.pkl_537.npy,/Users/MLS/Downloads/model-rf.pkl_538.npy,/Users/MLS/Downloads/model-rf.pkl_539.npy,/Users/MLS/Downloads/model-rf.pkl_540.npy,/Users/MLS/Downloads/model-rf.pkl_541.npy,/Users/MLS/Downloads/model-rf.pkl_542.npy,/Users/MLS/Downloads/model-rf.pkl_543.npy,/Users/MLS/Downloads/model-rf.pkl_544.npy,/Users/MLS/Downloads/model-rf.pkl_545.npy,/Users/MLS/Downloads/model-rf.pkl_546.npy,/Users/MLS/Downloads/model-rf.pkl_547.npy,/Users/MLS/Downloads/model-rf.pkl_548.npy,/Users/MLS/Downloads/model-rf.pkl_549.npy,/Users/MLS/Downloads/model-rf.pkl_550.npy,/Users/MLS/Downloads/model-rf.pkl_551.npy,/Users/MLS/Downloads/model-rf.pkl_552.npy,/Users/MLS/Downloads/model-rf.pkl_553.npy,/Users/MLS/Downloads/model-rf.pkl_554.npy,/Users/MLS/Downloads/model-rf.pkl_555.npy,/Users/MLS/Downloads/model-rf.pkl_556.npy,/Users/MLS/Downloads/model-rf.pkl_557.npy,/Users/MLS/Downloads/model-rf.pkl_558.npy,/Users/MLS/Downloads/model-rf.pkl_559.npy,/Users/MLS/Downloads/model-rf.pkl_560.npy,/Users/MLS/Downloads/model-rf.pkl_561.npy,/Users/MLS/Downloads/model-rf.pkl_562.npy,/Users/MLS/Downloads/model-rf.pkl_563.npy,/Users/MLS/Downloads/model-rf.pkl_564.npy,/Users/MLS/Downloads/model-rf.pkl_565.npy,/Users/MLS/Downloads/model-rf.pkl_566.npy,/Users/MLS/Downloads/model-rf.pkl_567.npy,/Users/MLS/Downloads/model-rf.pkl_568.npy,/Users/MLS/Downloads/model-rf.pkl_569.npy,/Users/MLS/Downloads/model-rf.pkl_570.npy,/Users/MLS/Downloads/model-rf.pkl_571.npy,/Users/MLS/Downloads/model-rf.pkl_572.npy,/Users/MLS/Downloads/model-rf.pkl_573.npy,/Users/MLS/Downloads/model-rf.pkl_574.npy,/Users/MLS/Downloads/model-rf.pkl_575.npy,/Users/MLS/Downloads/model-rf.pkl_576.npy,/Users/MLS/Downloads/model-rf.pkl_577.npy,/Users/MLS/Downloads/model-rf.pkl_578.npy,/Users/MLS/Downloads/model-rf.pkl_579.npy,/Users/MLS/Downloads/model-rf.pkl_580.npy,/Users/MLS/Downloads/model-rf.pkl_581.npy,/Users/MLS/Downloads/model-rf.pkl_582.npy,/Users/MLS/Downloads/model-rf.pkl_583.npy,/Users/MLS/Downloads/model-rf.pkl_584.npy,/Users/MLS/Downloads/model-rf.pkl_585.npy,/Users/MLS/Downloads/model-rf.pkl_586.npy,/Users/MLS/Downloads/model-rf.pkl_587.npy,/Users/MLS/Downloads/model-rf.pkl_588.npy,/Users/MLS/Downloads/model-rf.pkl_589.npy,/Users/MLS/Downloads/model-rf.pkl_590.npy,/Users/MLS/Downloads/model-rf.pkl_591.npy,/Users/MLS/Downloads/model-rf.pkl_592.npy,/Users/MLS/Downloads/model-rf.pkl_593.npy,/Users/MLS/Downloads/model-rf.pkl_594.npy,/Users/MLS/Downloads/model-rf.pkl_595.npy,/Users/MLS/Downloads/model-rf.pkl_596.npy,/Users/MLS/Downloads/model-rf.pkl_597.npy,/Users/MLS/Downloads/model-rf.pkl_598.npy,/Users/MLS/Downloads/model-rf.pkl_599.npy,/Users/MLS/Downloads/model-rf.pkl_600.npy,/Users/MLS/Downloads/model-rf.pkl_601.npy,/Users/MLS/Downloads/model-rf.pkl_602.npy,/Users/MLS/Downloads/model-rf.pkl_603.npy,/Users/MLS/Downloads/model-rf.pkl_604.npy,/Users/MLS/Downloads/model-rf.pkl_605.npy,/Users/MLS/Downloads/model-rf.pkl_606.npy,/Users/MLS/Downloads/model-rf.pkl_607.npy,/Users/MLS/Downloads/model-rf.pkl_608.npy,/Users/MLS/Downloads/model-rf.pkl_609.npy,/Users/MLS/Downloads/model-rf.pkl_610.npy,/Users/MLS/Downloads/model-rf.pkl_611.npy,/Users/MLS/Downloads/model-rf.pkl_612.npy,/Users/MLS/Downloads/model-rf.pkl_613.npy,/Users/MLS/Downloads/model-rf.pkl_614.npy,/Users/MLS/Downloads/model-rf.pkl_615.npy,/Users/MLS/Downloads/model-rf.pkl_616.npy,/Users/MLS/Downloads/model-rf.pkl_617.npy,/Users/MLS/Downloads/model-rf.pkl_618.npy,/Users/MLS/Downloads/model-rf.pkl_619.npy,/Users/MLS/Downloads/model-rf.pkl_620.npy,/Users/MLS/Downloads/model-rf.pkl_621.npy,/Users/MLS/Downloads/model-rf.pkl_622.npy,/Users/MLS/Downloads/model-rf.pkl_623.npy,/Users/MLS/Downloads/model-rf.pkl_624.npy,/Users/MLS/Downloads/model-rf.pkl_625.npy,/Users/MLS/Downloads/model-rf.pkl_626.npy,/Users/MLS/Downloads/model-rf.pkl_627.npy,/Users/MLS/Downloads/model-rf.pkl_628.npy,/Users/MLS/Downloads/model-rf.pkl_629.npy,/Users/MLS/Downloads/model-rf.pkl_630.npy,/Users/MLS/Downloads/model-rf.pkl_631.npy,/Users/MLS/Downloads/model-rf.pkl_632.npy,/Users/MLS/Downloads/model-rf.pkl_633.npy,/Users/MLS/Downloads/model-rf.pkl_634.npy,/Users/MLS/Downloads/model-rf.pkl_635.npy,/Users/MLS/Downloads/model-rf.pkl_636.npy,/Users/MLS/Downloads/model-rf.pkl_637.npy,/Users/MLS/Downloads/model-rf.pkl_638.npy,/Users/MLS/Downloads/model-rf.pkl_639.npy,/Users/MLS/Downloads/model-rf.pkl_640.npy,/Users/MLS/Downloads/model-rf.pkl_641.npy,/Users/MLS/Downloads/model-rf.pkl_642.npy,/Users/MLS/Downloads/model-rf.pkl_643.npy,/Users/MLS/Downloads/model-rf.pkl_644.npy,/Users/MLS/Downloads/model-rf.pkl_645.npy,/Users/MLS/Downloads/model-rf.pkl_646.npy,/Users/MLS/Downloads/model-rf.pkl_647.npy,/Users/MLS/Downloads/model-rf.pkl_648.npy,/Users/MLS/Downloads/model-rf.pkl_649.npy,/Users/MLS/Downloads/model-rf.pkl_650.npy,/Users/MLS/Downloads/model-rf.pkl_651.npy,/Users/MLS/Downloads/model-rf.pkl_652.npy,/Users/MLS/Downloads/model-rf.pkl_653.npy,/Users/MLS/Downloads/model-rf.pkl_654.npy,/Users/MLS/Downloads/model-rf.pkl_655.npy,/Users/MLS/Downloads/model-rf.pkl_656.npy,/Users/MLS/Downloads/model-rf.pkl_657.npy,/Users/MLS/Downloads/model-rf.pkl_658.npy,/Users/MLS/Downloads/model-rf.pkl_659.npy,/Users/MLS/Downloads/model-rf.pkl_660.npy,/Users/MLS/Downloads/model-rf.pkl_661.npy,/Users/MLS/Downloads/model-rf.pkl_662.npy,/Users/MLS/Downloads/model-rf.pkl_663.npy,/Users/MLS/Downloads/model-rf.pkl_664.npy,/Users/MLS/Downloads/model-rf.pkl_665.npy,/Users/MLS/Downloads/model-rf.pkl_666.npy,/Users/MLS/Downloads/model-rf.pkl_667.npy,/Users/MLS/Downloads/model-rf.pkl_668.npy,/Users/MLS/Downloads/model-rf.pkl_669.npy,/Users/MLS/Downloads/model-rf.pkl_670.npy,/Users/MLS/Downloads/model-rf.pkl_671.npy,/Users/MLS/Downloads/model-rf.pkl_672.npy,/Users/MLS/Downloads/model-rf.pkl_673.npy,/Users/MLS/Downloads/model-rf.pkl_674.npy,/Users/MLS/Downloads/model-rf.pkl_675.npy,/Users/MLS/Downloads/model-rf.pkl_676.npy,/Users/MLS/Downloads/model-rf.pkl_677.npy,/Users/MLS/Downloads/model-rf.pkl_678.npy,/Users/MLS/Downloads/model-rf.pkl_679.npy,/Users/MLS/Downloads/model-rf.pkl_680.npy,/Users/MLS/Downloads/model-rf.pkl_681.npy,/Users/MLS/Downloads/model-rf.pkl_682.npy,/Users/MLS/Downloads/model-rf.pkl_683.npy,/Users/MLS/Downloads/model-rf.pkl_684.npy,/Users/MLS/Downloads/model-rf.pkl_685.npy,/Users/MLS/Downloads/model-rf.pkl_686.npy,/Users/MLS/Downloads/model-rf.pkl_687.npy,/Users/MLS/Downloads/model-rf.pkl_688.npy,/Users/MLS/Downloads/model-rf.pkl_689.npy,/Users/MLS/Downloads/model-rf.pkl_690.npy,/Users/MLS/Downloads/model-rf.pkl_691.npy,/Users/MLS/Downloads/model-rf.pkl_692.npy,/Users/MLS/Downloads/model-rf.pkl_693.npy,/Users/MLS/Downloads/model-rf.pkl_694.npy,/Users/MLS/Downloads/model-rf.pkl_695.npy,/Users/MLS/Downloads/model-rf.pkl_696.npy,/Users/MLS/Downloads/model-rf.pkl_697.npy,/Users/MLS/Downloads/model-rf.pkl_698.npy,/Users/MLS/Downloads/model-rf.pkl_699.npy,/Users/MLS/Downloads/model-rf.pkl_700.npy,/Users/MLS/Downloads/model-rf.pkl_701.npy,/Users/MLS/Downloads/model-rf.pkl_702.npy,/Users/MLS/Downloads/model-rf.pkl_703.npy,/Users/MLS/Downloads/model-rf.pkl_704.npy,/Users/MLS/Downloads/model-rf.pkl_705.npy,/Users/MLS/Downloads/model-rf.pkl_706.npy,/Users/MLS/Downloads/model-rf.pkl_707.npy,/Users/MLS/Downloads/model-rf.pkl_708.npy,/Users/MLS/Downloads/model-rf.pkl_709.npy,/Users/MLS/Downloads/model-rf.pkl_710.npy,/Users/MLS/Downloads/model-rf.pkl_711.npy,/Users/MLS/Downloads/model-rf.pkl_712.npy,/Users/MLS/Downloads/model-rf.pkl_713.npy,/Users/MLS/Downloads/model-rf.pkl_714.npy,/Users/MLS/Downloads/model-rf.pkl_715.npy,/Users/MLS/Downloads/model-rf.pkl_716.npy,/Users/MLS/Downloads/model-rf.pkl_717.npy,/Users/MLS/Downloads/model-rf.pkl_718.npy,/Users/MLS/Downloads/model-rf.pkl_719.npy,/Users/MLS/Downloads/model-rf.pkl_720.npy,/Users/MLS/Downloads/model-rf.pkl_721.npy,/Users/MLS/Downloads/model-rf.pkl_722.npy,/Users/MLS/Downloads/model-rf.pkl_723.npy,/Users/MLS/Downloads/model-rf.pkl_724.npy,/Users/MLS/Downloads/model-rf.pkl_725.npy,/Users/MLS/Downloads/model-rf.pkl_726.npy,/Users/MLS/Downloads/model-rf.pkl_727.npy,/Users/MLS/Downloads/model-rf.pkl_728.npy,/Users/MLS/Downloads/model-rf.pkl_729.npy,/Users/MLS/Downloads/model-rf.pkl_730.npy,/Users/MLS/Downloads/model-rf.pkl_731.npy,/Users/MLS/Downloads/model-rf.pkl_732.npy,/Users/MLS/Downloads/model-rf.pkl_733.npy,/Users/MLS/Downloads/model-rf.pkl_734.npy,/Users/MLS/Downloads/model-rf.pkl_735.npy,/Users/MLS/Downloads/model-rf.pkl_736.npy,/Users/MLS/Downloads/model-rf.pkl_737.npy,/Users/MLS/Downloads/model-rf.pkl_738.npy,/Users/MLS/Downloads/model-rf.pkl_739.npy,/Users/MLS/Downloads/model-rf.pkl_740.npy,/Users/MLS/Downloads/model-rf.pkl_741.npy,/Users/MLS/Downloads/model-rf.pkl_742.npy,/Users/MLS/Downloads/model-rf.pkl_743.npy,/Users/MLS/Downloads/model-rf.pkl_744.npy,/Users/MLS/Downloads/model-rf.pkl_745.npy,/Users/MLS/Downloads/model-rf.pkl_746.npy,/Users/MLS/Downloads/model-rf.pkl_747.npy,/Users/MLS/Downloads/model-rf.pkl_748.npy,/Users/MLS/Downloads/model-rf.pkl_749.npy,/Users/MLS/Downloads/model-rf.pkl_750.npy,/Users/MLS/Downloads/model-rf.pkl_751.npy,/Users/MLS/Downloads/model-rf.pkl_752.npy,/Users/MLS/Downloads/model-rf.pkl_753.npy,/Users/MLS/Downloads/model-rf.pkl_754.npy,/Users/MLS/Downloads/model-rf.pkl_755.npy,/Users/MLS/Downloads/model-rf.pkl_756.npy,/Users/MLS/Downloads/model-rf.pkl_757.npy,/Users/MLS/Downloads/model-rf.pkl_758.npy,/Users/MLS/Downloads/model-rf.pkl_759.npy,/Users/MLS/Downloads/model-rf.pkl_760.npy,/Users/MLS/Downloads/model-rf.pkl_761.npy,/Users/MLS/Downloads/model-rf.pkl_762.npy,/Users/MLS/Downloads/model-rf.pkl_763.npy,/Users/MLS/Downloads/model-rf.pkl_764.npy,/Users/MLS/Downloads/model-rf.pkl_765.npy,/Users/MLS/Downloads/model-rf.pkl_766.npy,/Users/MLS/Downloads/model-rf.pkl_767.npy,/Users/MLS/Downloads/model-rf.pkl_768.npy,/Users/MLS/Downloads/model-rf.pkl_769.npy,/Users/MLS/Downloads/model-rf.pkl_770.npy,/Users/MLS/Downloads/model-rf.pkl_771.npy,/Users/MLS/Downloads/model-rf.pkl_772.npy,/Users/MLS/Downloads/model-rf.pkl_773.npy,/Users/MLS/Downloads/model-rf.pkl_774.npy,/Users/MLS/Downloads/model-rf.pkl_775.npy,/Users/MLS/Downloads/model-rf.pkl_776.npy,/Users/MLS/Downloads/model-rf.pkl_777.npy,/Users/MLS/Downloads/model-rf.pkl_778.npy,/Users/MLS/Downloads/model-rf.pkl_779.npy,/Users/MLS/Downloads/model-rf.pkl_780.npy,/Users/MLS/Downloads/model-rf.pkl_781.npy,/Users/MLS/Downloads/model-rf.pkl_782.npy,/Users/MLS/Downloads/model-rf.pkl_783.npy,/Users/MLS/Downloads/model-rf.pkl_784.npy,/Users/MLS/Downloads/model-rf.pkl_785.npy,/Users/MLS/Downloads/model-rf.pkl_786.npy,/Users/MLS/Downloads/model-rf.pkl_787.npy,/Users/MLS/Downloads/model-rf.pkl_788.npy,/Users/MLS/Downloads/model-rf.pkl_789.npy,/Users/MLS/Downloads/model-rf.pkl_790.npy,/Users/MLS/Downloads/model-rf.pkl_791.npy,/Users/MLS/Downloads/model-rf.pkl_792.npy,/Users/MLS/Downloads/model-rf.pkl_793.npy,/Users/MLS/Downloads/model-rf.pkl_794.npy,/Users/MLS/Downloads/model-rf.pkl_795.npy,/Users/MLS/Downloads/model-rf.pkl_796.npy,/Users/MLS/Downloads/model-rf.pkl_797.npy,/Users/MLS/Downloads/model-rf.pkl_798.npy,/Users/MLS/Downloads/model-rf.pkl_799.npy,/Users/MLS/Downloads/model-rf.pkl_800.npy,/Users/MLS/Downloads/model-rf.pkl_801.npy,/Users/MLS/Downloads/model-rf.pkl_802.npy,/Users/MLS/Downloads/model-rf.pkl_803.npy,/Users/MLS/Downloads/model-rf.pkl_804.npy,/Users/MLS/Downloads/model-rf.pkl_805.npy,/Users/MLS/Downloads/model-rf.pkl_806.npy,/Users/MLS/Downloads/model-rf.pkl_807.npy,/Users/MLS/Downloads/model-rf.pkl_808.npy,/Users/MLS/Downloads/model-rf.pkl_809.npy,/Users/MLS/Downloads/model-rf.pkl_810.npy,/Users/MLS/Downloads/model-rf.pkl_811.npy,/Users/MLS/Downloads/model-rf.pkl_812.npy,/Users/MLS/Downloads/model-rf.pkl_813.npy,/Users/MLS/Downloads/model-rf.pkl_814.npy,/Users/MLS/Downloads/model-rf.pkl_815.npy,/Users/MLS/Downloads/model-rf.pkl_816.npy,/Users/MLS/Downloads/model-rf.pkl_817.npy,/Users/MLS/Downloads/model-rf.pkl_818.npy,/Users/MLS/Downloads/model-rf.pkl_819.npy,/Users/MLS/Downloads/model-rf.pkl_820.npy,/Users/MLS/Downloads/model-rf.pkl_821.npy,/Users/MLS/Downloads/model-rf.pkl_822.npy,/Users/MLS/Downloads/model-rf.pkl_823.npy,/Users/MLS/Downloads/model-rf.pkl_824.npy,/Users/MLS/Downloads/model-rf.pkl_825.npy,/Users/MLS/Downloads/model-rf.pkl_826.npy,/Users/MLS/Downloads/model-rf.pkl_827.npy,/Users/MLS/Downloads/model-rf.pkl_828.npy,/Users/MLS/Downloads/model-rf.pkl_829.npy,/Users/MLS/Downloads/model-rf.pkl_830.npy,/Users/MLS/Downloads/model-rf.pkl_831.npy,/Users/MLS/Downloads/model-rf.pkl_832.npy,/Users/MLS/Downloads/model-rf.pkl_833.npy,/Users/MLS/Downloads/model-rf.pkl_834.npy,/Users/MLS/Downloads/model-rf.pkl_835.npy,/Users/MLS/Downloads/model-rf.pkl_836.npy,/Users/MLS/Downloads/model-rf.pkl_837.npy,/Users/MLS/Downloads/model-rf.pkl_838.npy,/Users/MLS/Downloads/model-rf.pkl_839.npy,/Users/MLS/Downloads/model-rf.pkl_840.npy,/Users/MLS/Downloads/model-rf.pkl_841.npy,/Users/MLS/Downloads/model-rf.pkl_842.npy,/Users/MLS/Downloads/model-rf.pkl_843.npy,/Users/MLS/Downloads/model-rf.pkl_844.npy,/Users/MLS/Downloads/model-rf.pkl_845.npy,/Users/MLS/Downloads/model-rf.pkl_846.npy,/Users/MLS/Downloads/model-rf.pkl_847.npy,/Users/MLS/Downloads/model-rf.pkl_848.npy,/Users/MLS/Downloads/model-rf.pkl_849.npy,/Users/MLS/Downloads/model-rf.pkl_850.npy,/Users/MLS/Downloads/model-rf.pkl_851.npy,/Users/MLS/Downloads/model-rf.pkl_852.npy,/Users/MLS/Downloads/model-rf.pkl_853.npy,/Users/MLS/Downloads/model-rf.pkl_854.npy,/Users/MLS/Downloads/model-rf.pkl_855.npy,/Users/MLS/Downloads/model-rf.pkl_856.npy,/Users/MLS/Downloads/model-rf.pkl_857.npy,/Users/MLS/Downloads/model-rf.pkl_858.npy,/Users/MLS/Downloads/model-rf.pkl_859.npy,/Users/MLS/Downloads/model-rf.pkl_860.npy,/Users/MLS/Downloads/model-rf.pkl_861.npy,/Users/MLS/Downloads/model-rf.pkl_862.npy,/Users/MLS/Downloads/model-rf.pkl_863.npy,/Users/MLS/Downloads/model-rf.pkl_864.npy,/Users/MLS/Downloads/model-rf.pkl_865.npy,/Users/MLS/Downloads/model-rf.pkl_866.npy,/Users/MLS/Downloads/model-rf.pkl_867.npy,/Users/MLS/Downloads/model-rf.pkl_868.npy,/Users/MLS/Downloads/model-rf.pkl_869.npy,/Users/MLS/Downloads/model-rf.pkl_870.npy,/Users/MLS/Downloads/model-rf.pkl_871.npy,/Users/MLS/Downloads/model-rf.pkl_872.npy,/Users/MLS/Downloads/model-rf.pkl_873.npy,/Users/MLS/Downloads/model-rf.pkl_874.npy,/Users/MLS/Downloads/model-rf.pkl_875.npy,/Users/MLS/Downloads/model-rf.pkl_876.npy,/Users/MLS/Downloads/model-rf.pkl_877.npy,/Users/MLS/Downloads/model-rf.pkl_878.npy,/Users/MLS/Downloads/model-rf.pkl_879.npy,/Users/MLS/Downloads/model-rf.pkl_880.npy,/Users/MLS/Downloads/model-rf.pkl_881.npy,/Users/MLS/Downloads/model-rf.pkl_882.npy,/Users/MLS/Downloads/model-rf.pkl_883.npy,/Users/MLS/Downloads/model-rf.pkl_884.npy,/Users/MLS/Downloads/model-rf.pkl_885.npy,/Users/MLS/Downloads/model-rf.pkl_886.npy,/Users/MLS/Downloads/model-rf.pkl_887.npy,/Users/MLS/Downloads/model-rf.pkl_888.npy,/Users/MLS/Downloads/model-rf.pkl_889.npy,/Users/MLS/Downloads/model-rf.pkl_890.npy,/Users/MLS/Downloads/model-rf.pkl_891.npy,/Users/MLS/Downloads/model-rf.pkl_892.npy,/Users/MLS/Downloads/model-rf.pkl_893.npy,/Users/MLS/Downloads/model-rf.pkl_894.npy,/Users/MLS/Downloads/model-rf.pkl_895.npy,/Users/MLS/Downloads/model-rf.pkl_896.npy,/Users/MLS/Downloads/model-rf.pkl_897.npy,/Users/MLS/Downloads/model-rf.pkl_898.npy,/Users/MLS/Downloads/model-rf.pkl_899.npy,/Users/MLS/Downloads/model-rf.pkl_900.npy,/Users/MLS/Downloads/model-rf.pkl_901.npy,/Users/MLS/Downloads/model-rf.pkl_902.npy,/Users/MLS/Downloads/model-rf.pkl_903.npy,/Users/MLS/Downloads/model-rf.pkl_904.npy,/Users/MLS/Downloads/model-rf.pkl_905.npy,/Users/MLS/Downloads/model-rf.pkl_906.npy,/Users/MLS/Downloads/model-rf.pkl_907.npy,/Users/MLS/Downloads/model-rf.pkl_908.npy,/Users/MLS/Downloads/model-rf.pkl_909.npy,/Users/MLS/Downloads/model-rf.pkl_910.npy,/Users/MLS/Downloads/model-rf.pkl_911.npy,/Users/MLS/Downloads/model-rf.pkl_912.npy,/Users/MLS/Downloads/model-rf.pkl_913.npy,/Users/MLS/Downloads/model-rf.pkl_914.npy,/Users/MLS/Downloads/model-rf.pkl_915.npy,/Users/MLS/Downloads/model-rf.pkl_916.npy,/Users/MLS/Downloads/model-rf.pkl_917.npy,/Users/MLS/Downloads/model-rf.pkl_918.npy,/Users/MLS/Downloads/model-rf.pkl_919.npy,/Users/MLS/Downloads/model-rf.pkl_920.npy,/Users/MLS/Downloads/model-rf.pkl_921.npy,/Users/MLS/Downloads/model-rf.pkl_922.npy,/Users/MLS/Downloads/model-rf.pkl_923.npy,/Users/MLS/Downloads/model-rf.pkl_924.npy,/Users/MLS/Downloads/model-rf.pkl_925.npy,/Users/MLS/Downloads/model-rf.pkl_926.npy,/Users/MLS/Downloads/model-rf.pkl_927.npy,/Users/MLS/Downloads/model-rf.pkl_928.npy,/Users/MLS/Downloads/model-rf.pkl_929.npy,/Users/MLS/Downloads/model-rf.pkl_930.npy,/Users/MLS/Downloads/model-rf.pkl_931.npy,/Users/MLS/Downloads/model-rf.pkl_932.npy,/Users/MLS/Downloads/model-rf.pkl_933.npy,/Users/MLS/Downloads/model-rf.pkl_934.npy,/Users/MLS/Downloads/model-rf.pkl_935.npy,/Users/MLS/Downloads/model-rf.pkl_936.npy,/Users/MLS/Downloads/model-rf.pkl_937.npy,/Users/MLS/Downloads/model-rf.pkl_938.npy,/Users/MLS/Downloads/model-rf.pkl_939.npy,/Users/MLS/Downloads/model-rf.pkl_940.npy,/Users/MLS/Downloads/model-rf.pkl_941.npy,/Users/MLS/Downloads/model-rf.pkl_942.npy,/Users/MLS/Downloads/model-rf.pkl_943.npy,/Users/MLS/Downloads/model-rf.pkl_944.npy,/Users/MLS/Downloads/model-rf.pkl_945.npy,/Users/MLS/Downloads/model-rf.pkl_946.npy,/Users/MLS/Downloads/model-rf.pkl_947.npy,/Users/MLS/Downloads/model-rf.pkl_948.npy,/Users/MLS/Downloads/model-rf.pkl_949.npy,/Users/MLS/Downloads/model-rf.pkl_950.npy,/Users/MLS/Downloads/model-rf.pkl_951.npy,/Users/MLS/Downloads/model-rf.pkl_952.npy,/Users/MLS/Downloads/model-rf.pkl_953.npy,/Users/MLS/Downloads/model-rf.pkl_954.npy,/Users/MLS/Downloads/model-rf.pkl_955.npy,/Users/MLS/Downloads/model-rf.pkl_956.npy,/Users/MLS/Downloads/model-rf.pkl_957.npy,/Users/MLS/Downloads/model-rf.pkl_958.npy,/Users/MLS/Downloads/model-rf.pkl_959.npy,/Users/MLS/Downloads/model-rf.pkl_960.npy,/Users/MLS/Downloads/model-rf.pkl_961.npy,/Users/MLS/Downloads/model-rf.pkl_962.npy,/Users/MLS/Downloads/model-rf.pkl_963.npy,/Users/MLS/Downloads/model-rf.pkl_964.npy,/Users/MLS/Downloads/model-rf.pkl_965.npy,/Users/MLS/Downloads/model-rf.pkl_966.npy,/Users/MLS/Downloads/model-rf.pkl_967.npy,/Users/MLS/Downloads/model-rf.pkl_968.npy,/Users/MLS/Downloads/model-rf.pkl_969.npy,/Users/MLS/Downloads/model-rf.pkl_970.npy,/Users/MLS/Downloads/model-rf.pkl_971.npy,/Users/MLS/Downloads/model-rf.pkl_972.npy,/Users/MLS/Downloads/model-rf.pkl_973.npy,/Users/MLS/Downloads/model-rf.pkl_974.npy,/Users/MLS/Downloads/model-rf.pkl_975.npy,/Users/MLS/Downloads/model-rf.pkl_976.npy,/Users/MLS/Downloads/model-rf.pkl_977.npy,/Users/MLS/Downloads/model-rf.pkl_978.npy,/Users/MLS/Downloads/model-rf.pkl_979.npy,/Users/MLS/Downloads/model-rf.pkl_980.npy,/Users/MLS/Downloads/model-rf.pkl_981.npy,/Users/MLS/Downloads/model-rf.pkl_982.npy,/Users/MLS/Downloads/model-rf.pkl_983.npy,/Users/MLS/Downloads/model-rf.pkl_984.npy,/Users/MLS/Downloads/model-rf.pkl_985.npy,/Users/MLS/Downloads/model-rf.pkl_986.npy,/Users/MLS/Downloads/model-rf.pkl_987.npy,/Users/MLS/Downloads/model-rf.pkl_988.npy,/Users/MLS/Downloads/model-rf.pkl_989.npy,/Users/MLS/Downloads/model-rf.pkl_990.npy,/Users/MLS/Downloads/model-rf.pkl_991.npy,/Users/MLS/Downloads/model-rf.pkl_992.npy,/Users/MLS/Downloads/model-rf.pkl_993.npy,/Users/MLS/Downloads/model-rf.pkl_994.npy,/Users/MLS/Downloads/model-rf.pkl_995.npy,/Users/MLS/Downloads/model-rf.pkl_996.npy,/Users/MLS/Downloads/model-rf.pkl_997.npy,/Users/MLS/Downloads/model-rf.pkl_998.npy,/Users/MLS/Downloads/model-rf.pkl_999.npy,...]总结 对于结构化数据进行机器学习一般步骤 1.从磁盘中读取原始数据并进行数据备份一般读取为dataframe数据结构2.观察原始数据的属性代表的意思是什么重点查看那些属性属于类别属性那些属性属于数值型属性通过df.info()和df.describle()来查看。3.对数据进行预处理缺失值处理填充删除和模型学习数值型数据如果数据取值比其他属性大或者属性内取值范围大应该进行归一化处理可以尝试转换为类别型的数据类别型的数据进行one-hot处理4.特征工程特征生成特征组合特征提取特征筛选嵌入型包裹型过滤型5.训练得到baseline6.模型状态评估7.模型优化交叉验证超参数选择等8.模型融合基于上面优化的模型进行模型的融合操作。
http://wiki.neutronadmin.com/news/43759/

相关文章:

  • 自己怎么做新闻开头视频网站一级造价工程师注册管理系统
  • 关于做好网站建设的通知跨平台软件开发工具
  • 设备免费做网站推广2022年最新国际军事新闻
  • 南昌网站建设排行个人网站工商备案
  • 网站开发视频教程局域网搭建的步骤
  • 怎样做商城网站做网站不带优化的吗
  • 聊城市建设工程质量监督站网站惠州网站制作推广公司排名
  • 如何查询一个网站是那家公司做的wordpress站点地址写错
  • 福州网站推广公司山西焦煤集团公司网站
  • 古典风格中药医药企业网站模板源码模特公司网站模板
  • 网站建设实训进程计划十堰网络推广平台
  • 安顺建设局网站深圳网站设计工资一般多少
  • 响应式网站 cms做绿色产品的网站
  • 外贸公司网站建设费用 如何申请系统登录入口
  • 湖南企业建站系统费用重庆做网站多少钱
  • 郎创网站建设教你如何做网络营销推广
  • 建设储蓄卡网站免费个人网站搭建
  • 河南专业网站建设公司健身房网站模板
  • 莆田建设网站建站订货网站怎么做
  • 站内推广和站外推广的区别网站ping怎么做
  • 如何利用影视网站做cpaphp网站开发实例视频
  • 商城网站建设如何交谈易点科技
  • 如何设计产品网站建设wordpress 发评论代码
  • 怎样看网站是谁做的网站备案网站要有内容吗
  • python 做下载网站怎么样做团购网站
  • layui做网站前端徐州百度推广
  • 昆山建设局网站查预售如何快速收录一个网站的信息
  • 有哪些做数据分析的网站做班级相册网站的目的意义
  • 阿里云国际站官网网站建设解析
  • 住房和城乡建设部的网站wordpress和drupal