当前位置: 首页 > news >正文

广西网站建设教程wordpress 图片管理插件

广西网站建设教程,wordpress 图片管理插件,淮南政务,织梦wap网站模板原标题#xff1a;用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里#xff0c;我们想按照章节(列表中第一个就是章节1的内容#xff0c;列表中第二个是章节2的内容)进行分析#xff0c;这就需要用到正则表达式整理数据。比如我们先看看 …原标题用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里我们想按照章节(列表中第一个就是章节1的内容列表中第二个是章节2的内容)进行分析这就需要用到正则表达式整理数据。比如我们先看看 01-Harry Potter and the Sorcerers Stone.txt 里的章节情况我们打开txt经过检索发现所有章节存在规律性表达[Chapter][空格][整数][换行符n][可能含有空格的英文标题][换行符n]我们先熟悉下正则使用这个设计一个模板pattern提取章节信息import reimport nltkraw_text open(data/01-Harry Potter and the Sorcerers Stone.txt).readpattern Chapter dn[a-zA-Z ]nre.findall(pattern, raw_text)[Chapter 1nThe Boy Who Livedn,Chapter 2nThe Vanishing Glassn,Chapter 3nThe Letters From No Onen,Chapter 4nThe Keeper Of The Keysn,Chapter 5nDiagon Alleyn,Chapter 7nThe Sorting Hatn,Chapter 8nThe Potions Mastern,Chapter 9nThe Midnight Dueln,Chapter 10nHalloweenn,Chapter 11nQuidditchn,Chapter 12nThe Mirror Of Erisedn,Chapter 13nNicholas Flameln,Chapter 14nNorbert the Norwegian Ridgebackn,Chapter 15nThe Forbidden Forestn,Chapter 16nThrough the Trapdoorn,Chapter 17nThe Man With Two Facesn]熟悉上面的正则表达式操作我们想更精准一些。我准备了一个test文本与实际小说中章节目录表达相似只不过文本更短更利于理解。按照我们的预期我们数据中只有5个章节那么列表的长度应该是5。这样操作后的列表中第一个内容就是章节1的内容列表中第二个内容是章节2的内容。import retest Chapter 1nThe Boy Who LivednMr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,Chapter 2nThe Vanishing GlassnFor a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.Chapter 3nThe Letters From No OnenThe traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.Chapter 4nThe Keeper Of The KeysnHe didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin.Chapter 5nDiagon AlleynIt was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. #获取章节内容列表(列表中第一个内容就是章节1的内容列表中第二个内容是章节2的内容)#为防止列表中有空内容这里加了一个条件判断保证列表长度与章节数预期一致chapter_contents [c for c in re.split(Chapter dn[a-zA-Z ]n, test) if c]chapter_contents[Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,n ,For a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.n ,The traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.n ,He didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin. n ,It was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. ]能得到哈利波特的章节内容列表也就意味着我们可以做真正的文本分析了数据分析章节数对比import osimport reimport matplotlib.pyplot as pltcolors [#78C850, #A8A878,#F08030,#C03028,#6890F0, #A890F0,#A040A0]harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为小说名harry_potter_names [n.replace(Harry Potter and the , )[:-4]for n in harry_potters]#纵坐标为章节数chapter_nums []for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapter_contents [c for c in re.split(pattern, raw_text) if c]chapter_nums.append(len(chapter_contents))#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Chapter Number of Harry Potter, fontsize25, weightbold)#绘制带色条形图plt.bar(harry_potter_names, chapter_nums, colorcolors)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Harry Potter Series, fontsize20, weightbold)plt.ylabel(Chapter Number, rotation25, fontsize20, weightbold)plt.show从上面可以看出哈利波特系列小说的后四部章节数据较多(这分析没啥大用处主要是练习)用词丰富程度如果说一句100个词的句子同时词语不带重样的那么用词的丰富程度为100。而如果说同样长度的句子只用到20个词语那么用词的丰富程度为100/205。import osimport reimport matplotlib.pyplot as pltfrom nltk import word_tokenizefrom nltk.stem.snowball importSnowballStemmerplt.style.use(fivethirtyeight)colors [#78C850, #A8A878,#F08030,#C03028,#6890F0, #A890F0,#A040A0]harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为小说名harry_potter_names [n.replace(Harry Potter and the , )[:-4]for n in harry_potters]#用词丰富程度richness_of_words []stemmer SnowballStemmer(english)for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readwords word_tokenize(raw_text)words [stemmer.stem(w.lower) for w in words]wordset set(words)richness len(words)/len(wordset)richness_of_words.append(richness)#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(The Richness of Word in Harry Potter, fontsize25, weightbold)#绘制带色条形图plt.bar(harry_potter_names, richness_of_words, colorcolors)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Harry Potter Series, fontsize20, weightbold)plt.ylabel(Richness of Words, rotation25, fontsize20, weightbold)plt.show情感分析哈利波特系列小说情绪发展趋势这里使用VADER,有现成的库vaderSentiment这里使用其中的polarity_scores函数可以得到neg:负面得分neu中性得分pos积极得分compound: 综合情感得分from vaderSentiment.vaderSentiment importSentimentIntensityAnalyzeranalyzer SentimentIntensityAnalyzertest i am so sorryanalyzer.polarity_scores(test){neg: 0.443, neu: 0.557, pos: 0.0, compound: -0.1513}import osimport reimport matplotlib.pyplot as pltfrom nltk.tokenize import sent_tokenizefrom vaderSentiment.vaderSentiment importSentimentIntensityAnalyzerharry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为章节序列chapter_indexes []#纵坐标为章节情绪得分compounds []analyzer SentimentIntensityAnalyzerchapter_index 1for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapters [c for c in re.split(pattern, raw_text) if c]#计算每个章节的情感得分for chapter in chapters:compound 0sentences sent_tokenize(chapter)for sentence in sentences:score analyzer.polarity_scores(sentence)compound score[compound]compounds.append(compound/len(sentences))chapter_indexes.append(chapter_index)chapter_index1#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Average Sentiment of the Harry Potter, fontsize25, weightbold)#绘制折线图plt.plot(chapter_indexes, compounds, color#A040A0)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Chapter, fontsize20, weightbold)plt.ylabel(Average Sentiment, rotation25, fontsize20, weightbold)plt.show曲线不够平滑为了熨平曲线波动自定义了一个函数import numpy as npimport osimport reimport matplotlib.pyplot as pltfrom nltk.tokenize import sent_tokenizefrom vaderSentiment.vaderSentiment importSentimentIntensityAnalyzer#曲线平滑函数def movingaverage(value_series, window_size):window np.ones(int(window_size))/float(window_size)return np.convolve(value_series, window, same)harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为章节序列chapter_indexes []#纵坐标为章节情绪得分compounds []analyzer SentimentIntensityAnalyzerchapter_index 1for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapters [c for c in re.split(pattern, raw_text) if c]#计算每个章节的情感得分for chapter in chapters:compound 0sentences sent_tokenize(chapter)for sentence in sentences:score analyzer.polarity_scores(sentence)compound score[compound]compounds.append(compound/len(sentences))chapter_indexes.append(chapter_index)chapter_index1#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Average Sentiment of the Harry Potter,fontsize25,weightbold)#绘制折线图plt.plot(chapter_indexes, compounds,colorred)plt.plot(movingaverage(compounds, 10),colorblack,linestyle:)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25,fontsize16,weightbold)plt.yticks(fontsize16,weightbold)#坐标轴名字plt.xlabel(Chapter,fontsize20,weightbold)plt.ylabel(Average Sentiment,rotation25,fontsize20,weightbold)plt.show全新打卡学习模式每天30分钟30天学会Python编程世界正在奖励坚持学习的人返回搜狐查看更多责任编辑
http://wiki.neutronadmin.com/news/223724/

相关文章:

  • 佛山网站推广市场设计建网站
  • 济南建设网站公司哪个好苏州企业网站设计方案
  • 房地产网站制作教程多少钱翻译
  • 怎么做网站竞价推广可以进网站的软件
  • 东南亚购物网站排名京东联盟 wordpress
  • 网站排名软件推荐wordpress 证书风险
  • 老域名做网站阿里巴巴网站建设的目的
  • 关于春节的网站设计html世界互联网峰会2022
  • 资兴市建设局网站阿里巴巴国际站下载电脑版
  • 电脑报网站建设成品网站货源
  • 免费的关键词优化工具广东搜索引擎优化
  • 网站建设套餐电话wordpress query_post showpost参数
  • 北京 网站设计招聘信息上海网站建设乐云seo
  • 花市小说网站那里进邢台信息港最新二手房出售信息
  • 网站怎么做定位功能交通局网站模板
  • 两性做受技巧视频网站网站备案需要审核多久
  • 合肥论坛网站制作移动网站开发服务器
  • 手机做的兼职网站wordpress如何开启邀请码注册
  • 前端开发培训机构哪家好长沙网站seo服务
  • 做网站要知道哪些代码添加了字体为什么wordpress
  • 邯郸医疗网站建设新网站建设验收
  • 成都网站建设联系电话photoshop破解版下载免费中文版
  • 网站建设优化的技巧如何做网络营销推广掷25金手指效率高
  • 美妆网站制作教程营销型网站开发方案
  • 遵义网站建设1w1hwordpress页面重定向
  • 园区门户网站建设方案北京住房建设部官方网站
  • 给企业做网站如何定价网站域名怎么过户
  • jsp网站开发什么框架西安知名网站开发的公司
  • 苏州网站建设设计公司摄影网站建设任务书
  • 网站开发培训是不是坑龙岩建设局升降机网站