网站建设案例公司,seo爱站网,智邦国际软件怎么样,医疗设备公司的网站怎么做问题最近在做爬虫的时候发现很多网页都是浏览器看得见#xff0c;但是源文件是看不到的#xff0c;也就是所谓的异步加载。这时候如果我们需要那些异步内容#xff0c;要么是了解他的规则#xff0c;进行条件的组合进而再次进行http请求#xff0c;得到数据#xff1b;这… 问题最近在做爬虫的时候发现很多网页都是浏览器看得见但是源文件是看不到的也就是所谓的异步加载。这时候如果我们需要那些异步内容要么是了解他的规则进行条件的组合进而再次进行http请求得到数据这种方式有时候遇到逻辑复杂的就比较不好处理。这时候ChromeDriver就派上用场了。办法下面我们来看下这个例子爬取腾讯视频获取电视剧或电影链接。浏览器是这样的查看文件是这样的压根没有视频地址使用ChromeOptions模拟用户行为 ChromeOptions options new ChromeOptions();options.AddArguments(--test-type, --ignore-certificate-errors);options.AddArguments(user-agentmozilla/5.0 (linux; u; android 2.3.3; en-us; sdk build/ gri34) applewebkit/533.1 (khtml, like gecko) version/4.0 mobile safari/533.1);options.AddArgument(enable-automation);// options.AddArgument(headless);// options.AddArguments(--proxy-serverhttp://user:passwordyourProxyServer.com:8080);// IWebDriver driver new ChromeDriver(System.Environment.CurrentDirectory, options);//chromeDriverService System.Environment.CurrentDirectory System.Environment.CurrentDirectoryusing (IWebDriver driver new OpenQA.Selenium.Chrome.ChromeDriver(C:\Users\Administrator\Downloads\chromedriver_win32, options, TimeSpan.FromSeconds(120))){// trylogin(driver);driver.Url http://v.qq.com/iframe/player.html?tiny1auto0vidz0023uikqoj;//tenvideo_video_player_0SetText(driver.PageSource);Thread.Sleep(200);try{for (int a 1; a 2; a){SetText(\r\n第 a.ToString() 个);driver.Navigate().GoToUrl(https://s.1688.com/youyuan/index.htm?tabimageSearchimageTypeossimageAddresscbuimgsearch/eWXC7XHHPN1607529600000spm);//登录if (driver.Url.Contains(login.1688.com)){SetText(\r\n需要登录开始尝试...);trylogin(driver); //尝试登录完成//再试试driver.Navigate().GoToUrl(https://s.1688.com/youyuan/index.htm?tabimageSearchimageTypeossimageAddresscbuimgsearch/eWXC7XHHPN1607529600000spm);if (driver.Url.Contains(login.1688.com)){//没办法退出SetText(\r\n退出换ip重试...);return;}}//鼠标放上去的内容因为页面自带只能显示一个的原因 没办法做到全部显示 然后在下载 只能是其他方式下载// var elements document.getElementsByClassName(hover-container);// Array.prototype.forEach.call(elements, function(element) {// element.style.display block;// console.log(element);// });IJavaScriptExecutor js (IJavaScriptExecutor)driver;var sss js.ExecuteScript( var elements document.getElementsByClassName(hover-container); Array.prototype.forEach.call(elements, function(element) { console.log(element); element.setAttribute(\class\, \测试title\); element.style.display \block\; console.log(element); }););Thread.Sleep(500);var responseModel Write(driver.PageSource, Pagetypeenum.列表);Thread.Sleep(500);int i 1;foreach (var offer in responseModel?.data?.offerList ?? new ListOfferItemModel()){driver.Navigate().GoToUrl(offer.information.detailUrl);string responseDatadetail driver.PageSource;Write(driver.PageSource, Pagetypeenum.详情);SetText(\r\n第 a.ToString() - i.ToString() 个);Thread.Sleep(500);i;}}}catch (Exception ex){CloseChromeDriver(driver);throw;}}// Thread thread new Thread(go);// thread.Start();}
得到网页信息SetText(driver.PageSource); private void button2_Click(object sender, EventArgs e){//文件路径string filePath G:\conan\reptiles1688\bin\Debug\test.txt;using (FileStream fsRead new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)){int fsLen (int)fsRead.Length;byte[] heByte new byte[fsLen];fsRead.Read(heByte, 0, heByte.Length);string myStr System.Text.Encoding.Default.GetString(heByte);this.textBox1.Text myStr;///读取}HtmlAgilityPack.HtmlDocument doc new HtmlAgilityPack.HtmlDocument();doc.LoadHtml(this.textBox1.Text);HtmlNode node doc.GetElementbyId(tenvideo_video_player_0);textBox1.Text node.Attributes[src].Value;// var node doc.DocumentNode.SelectNodes(//video[idtenvideo_video_player_0]//video);// textBox1.Text (node[3].InnerHtml);}}
解析得到我们想到的视频地址。