当前位置: 首页 > news >正文

专业彩票网站建设让人做网站 需要准备什么软件

专业彩票网站建设,让人做网站 需要准备什么软件,福州帮人建网站公司,手机网站开发ios原标题#xff1a;用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里#xff0c;我们想按照章节(列表中第一个就是章节1的内容#xff0c;列表中第二个是章节2的内容)进行分析#xff0c;这就需要用到正则表达式整理数据。比如我们先看看 …原标题用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里我们想按照章节(列表中第一个就是章节1的内容列表中第二个是章节2的内容)进行分析这就需要用到正则表达式整理数据。比如我们先看看 01-Harry Potter and the Sorcerers Stone.txt 里的章节情况我们打开txt经过检索发现所有章节存在规律性表达[Chapter][空格][整数][换行符n][可能含有空格的英文标题][换行符n]我们先熟悉下正则使用这个设计一个模板pattern提取章节信息import reimport nltkraw_text open(data/01-Harry Potter and the Sorcerers Stone.txt).readpattern Chapter dn[a-zA-Z ]nre.findall(pattern, raw_text)[Chapter 1nThe Boy Who Livedn,Chapter 2nThe Vanishing Glassn,Chapter 3nThe Letters From No Onen,Chapter 4nThe Keeper Of The Keysn,Chapter 5nDiagon Alleyn,Chapter 7nThe Sorting Hatn,Chapter 8nThe Potions Mastern,Chapter 9nThe Midnight Dueln,Chapter 10nHalloweenn,Chapter 11nQuidditchn,Chapter 12nThe Mirror Of Erisedn,Chapter 13nNicholas Flameln,Chapter 14nNorbert the Norwegian Ridgebackn,Chapter 15nThe Forbidden Forestn,Chapter 16nThrough the Trapdoorn,Chapter 17nThe Man With Two Facesn]熟悉上面的正则表达式操作我们想更精准一些。我准备了一个test文本与实际小说中章节目录表达相似只不过文本更短更利于理解。按照我们的预期我们数据中只有5个章节那么列表的长度应该是5。这样操作后的列表中第一个内容就是章节1的内容列表中第二个内容是章节2的内容。import retest Chapter 1nThe Boy Who LivednMr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,Chapter 2nThe Vanishing GlassnFor a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.Chapter 3nThe Letters From No OnenThe traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.Chapter 4nThe Keeper Of The KeysnHe didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin.Chapter 5nDiagon AlleynIt was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. #获取章节内容列表(列表中第一个内容就是章节1的内容列表中第二个内容是章节2的内容)#为防止列表中有空内容这里加了一个条件判断保证列表长度与章节数预期一致chapter_contents [c for c in re.split(Chapter dn[a-zA-Z ]n, test) if c]chapter_contents[Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,n ,For a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.n ,The traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.n ,He didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin. n ,It was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. ]能得到哈利波特的章节内容列表也就意味着我们可以做真正的文本分析了数据分析章节数对比import osimport reimport matplotlib.pyplot as pltcolors [#78C850, #A8A878,#F08030,#C03028,#6890F0, #A890F0,#A040A0]harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为小说名harry_potter_names [n.replace(Harry Potter and the , )[:-4]for n in harry_potters]#纵坐标为章节数chapter_nums []for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapter_contents [c for c in re.split(pattern, raw_text) if c]chapter_nums.append(len(chapter_contents))#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Chapter Number of Harry Potter, fontsize25, weightbold)#绘制带色条形图plt.bar(harry_potter_names, chapter_nums, colorcolors)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Harry Potter Series, fontsize20, weightbold)plt.ylabel(Chapter Number, rotation25, fontsize20, weightbold)plt.show从上面可以看出哈利波特系列小说的后四部章节数据较多(这分析没啥大用处主要是练习)用词丰富程度如果说一句100个词的句子同时词语不带重样的那么用词的丰富程度为100。而如果说同样长度的句子只用到20个词语那么用词的丰富程度为100/205。import osimport reimport matplotlib.pyplot as pltfrom nltk import word_tokenizefrom nltk.stem.snowball importSnowballStemmerplt.style.use(fivethirtyeight)colors [#78C850, #A8A878,#F08030,#C03028,#6890F0, #A890F0,#A040A0]harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为小说名harry_potter_names [n.replace(Harry Potter and the , )[:-4]for n in harry_potters]#用词丰富程度richness_of_words []stemmer SnowballStemmer(english)for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readwords word_tokenize(raw_text)words [stemmer.stem(w.lower) for w in words]wordset set(words)richness len(words)/len(wordset)richness_of_words.append(richness)#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(The Richness of Word in Harry Potter, fontsize25, weightbold)#绘制带色条形图plt.bar(harry_potter_names, richness_of_words, colorcolors)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Harry Potter Series, fontsize20, weightbold)plt.ylabel(Richness of Words, rotation25, fontsize20, weightbold)plt.show情感分析哈利波特系列小说情绪发展趋势这里使用VADER,有现成的库vaderSentiment这里使用其中的polarity_scores函数可以得到neg:负面得分neu中性得分pos积极得分compound: 综合情感得分from vaderSentiment.vaderSentiment importSentimentIntensityAnalyzeranalyzer SentimentIntensityAnalyzertest i am so sorryanalyzer.polarity_scores(test){neg: 0.443, neu: 0.557, pos: 0.0, compound: -0.1513}import osimport reimport matplotlib.pyplot as pltfrom nltk.tokenize import sent_tokenizefrom vaderSentiment.vaderSentiment importSentimentIntensityAnalyzerharry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为章节序列chapter_indexes []#纵坐标为章节情绪得分compounds []analyzer SentimentIntensityAnalyzerchapter_index 1for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapters [c for c in re.split(pattern, raw_text) if c]#计算每个章节的情感得分for chapter in chapters:compound 0sentences sent_tokenize(chapter)for sentence in sentences:score analyzer.polarity_scores(sentence)compound score[compound]compounds.append(compound/len(sentences))chapter_indexes.append(chapter_index)chapter_index1#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Average Sentiment of the Harry Potter, fontsize25, weightbold)#绘制折线图plt.plot(chapter_indexes, compounds, color#A040A0)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Chapter, fontsize20, weightbold)plt.ylabel(Average Sentiment, rotation25, fontsize20, weightbold)plt.show曲线不够平滑为了熨平曲线波动自定义了一个函数import numpy as npimport osimport reimport matplotlib.pyplot as pltfrom nltk.tokenize import sent_tokenizefrom vaderSentiment.vaderSentiment importSentimentIntensityAnalyzer#曲线平滑函数def movingaverage(value_series, window_size):window np.ones(int(window_size))/float(window_size)return np.convolve(value_series, window, same)harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为章节序列chapter_indexes []#纵坐标为章节情绪得分compounds []analyzer SentimentIntensityAnalyzerchapter_index 1for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapters [c for c in re.split(pattern, raw_text) if c]#计算每个章节的情感得分for chapter in chapters:compound 0sentences sent_tokenize(chapter)for sentence in sentences:score analyzer.polarity_scores(sentence)compound score[compound]compounds.append(compound/len(sentences))chapter_indexes.append(chapter_index)chapter_index1#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Average Sentiment of the Harry Potter,fontsize25,weightbold)#绘制折线图plt.plot(chapter_indexes, compounds,colorred)plt.plot(movingaverage(compounds, 10),colorblack,linestyle:)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25,fontsize16,weightbold)plt.yticks(fontsize16,weightbold)#坐标轴名字plt.xlabel(Chapter,fontsize20,weightbold)plt.ylabel(Average Sentiment,rotation25,fontsize20,weightbold)plt.show全新打卡学习模式每天30分钟30天学会Python编程世界正在奖励坚持学习的人返回搜狐查看更多责任编辑
http://wiki.neutronadmin.com/news/293885/

相关文章:

  • 商城微网站创建安阳网站建设哪家便宜
  • 内江住房和城乡建设厅网站移动端开发工具
  • 罗湖网站制作多少钱WordPress修改网站背景
  • 外贸行业网站建设公司管理咨询公司企业文化
  • 学做招投标的网站有哪些淘客网站建设
  • 上海金山网站设计公司泰安网站建设 九微米
  • wap网站网站设计的提案
  • 山东网站排名优化公司中国建设工程协会网站电话
  • 福建建筑人才服务中心档案wordpress插件带seo
  • 网站在线制作wordpress 标签手册
  • 青海市建设局网站东莞企业网站制作推广运营
  • 注册网站要多少钱7zwd一起做网店官网
  • 金坛市住房和城乡建设局网站做的网站没给我备案
  • 网站开发常用数据库主流网站开发技术
  • 以网络营销为导向的网站建设应注意什么问题wordpress问答悬赏插件
  • 帮别人做违法网站会判刑吗做网站程序
  • 昆明做网站建设网站建设网站制作哪个好
  • 国内好的企业网站唐山乾正建设工程材料检测公司网站
  • 网站建设 的销售图片网站建设新闻 常识
  • 郑州网站建设贝斯特做外贸电商网站有哪个
  • 网站建设中是因为没有ftp上传吗手机派网站
  • 滴滴出行网站建设wordpress本地播放器
  • 网站空间计算wordpress适应大数据
  • 小学学校网站模板旅游网站建设方案预算
  • 网站编辑能在家做怎么做网站劳务中介
  • 做防腐木花架的网站网站备案没座机
  • 新余+网站建设全自动在线网页制作
  • 能上国外网站的dns电脑优化大师
  • win10做网站设计网站都有什么作用是什么原因
  • 正规货源网站大全竞价推广培训课程