前言
即将用到爬虫,于是打算收集一下ur agent(ua)数据。接着马上想到自己网站的访问日志不就是现成的优质数据源吗?于是愉快的决定写个脚本统计一下nginx访问日志中的ua信息。
这类简单操作,用脚本语言就足够,毫无疑问肯定要用最熟悉的php。打开vim就开撸,十几分钟下来,功能简单的统计脚本就搞定了。
脚本目前有三个功能:
1. 找出所有的ua信息并排序; 2. 统计操作系统数据; 3. 统计浏览器数据。
程序运行截图如下:
1、ua信息
2、操作系统信息
3、浏览器
用脚本统计最近一个月的访问日志,得到以下结果:
搜索引擎爬虫比较频繁,每天有好几千次数据访问;windows仍是份额最大的操作系统,linux桌面依然份额很小;chrome目请假邮件前是浏览器领域的霸主,其次是firefox,opera已经很小众了。最后附上php脚本的代码,也可以从本人的github里找到:https://github.com/tlanyan/scrip民生问题ts/blob/master/statua.php
#!/usr/bin/php<?php/** * @brief stat ua in access log * * @author tlanyan<tlanyan@hotmail.com> * @link http://tlanyan.me *//* vim: t ts=4; t sw=4; t ss=4; t expandtab; */function getfilelist(string $path) : array { return glob(rtrim($path, "/") . "/*access.log*");}function statfiles(array $files) : array { $stat = []; echo php_eol, "start to read files...", php_eol; foreach ($files as $file) { echo "read file: $file ...", php_eol; $contents = getfilecontent($file); foreach ($contents as $line) { $ua = getua($line); if (ist($stat[$ua])) { $stat[$ua] += 1; } el { $stat[$ua] = 1; } } } echo "stat all files done!", php_eol, php_eol; return $stat;}function getfilecontent(string $file) : array { if (substr($file, -3, 3) === ".gz") { return gzfile($file); } return file($file);}function getua(string $line) : ?string { // important! nginx log format determins the ua location in the line! // you may have to refactor following codes to get the right result // ua starts from fifth double quote $count = 0; $offt = 0; while ($count < 5) { $pos = strpos($line, '"', $offt); if ($pos === fal) { echo "error! unknown line: $line", php_eol; return null; } $count ++; $offt = $pos + 1; } $end = strpos($line, '"', $offt); return substr($line, $offt, $end - $offt);}function usage() { echo "usage: php statua.php [option] [dir]", php_eol; echo " options:", php_eol; echo " -h: show this help", php_eol; echo " -v: verbo mode", php_eol; echo "-n num: ua list number", php_eol; echo " dir: directory to the log files", php_eol; echo php_eol;}function filterua(array& $stat, array $uafilters) { $filtercount = 0; foreach ($uafilters as $filter) { foreach ($stat as $ua => $count) { if (stripos($ua, $filter) !== fal) { $filtercount += $count; unt($stat[$ua]); } } } echo "filter $filtercount records!", php_eol;}function printcount(array $stat) { $sum = array_sum($stat); foreach ($stat as $key => $count) { echo $key, " : ", $count, ", percent: ", sprintf("%.2f", 100*$count/$sum), php_eol; }}function statos(array $uas) : array { global $debug; echo php_eol, "stat os...", php_eol; $os = ["windows", "macos", "linux", "android", "ios", "other"]; $stat = array_fill_keys($os, 0); foreach ($uas as $key => $count) { if (strpos($key, "windows") !== fal) { $stat["windows"] += $count; } el if (strpos($key, "macintosh") !== fal) { $stat["macos"] += $count; // must deal android first, then linux } el if (strpos($key, "android") !== fal) { $stat["android"] += $count; } el if (strpos($key, "linux") !== fal) { $stat["linux"] += $count; } el if (strpos($key, "iphone") !== fal || strpos($key, "ios") !== fal || strpos($key, "like mac os") !== fal || strpos($key, "darwin") !== fal) { $stat["ios"] += $count; } el { if ($debug) { echo "other: $key, count: $count", php_eol; } $stat["other"] += $count; } } return $stat;}function statbrowr(array $uas) : array { global $debug; echo php_eol, "stat brwor...", php_eol; $browrs = ["chrome", "firefox", "ie", "safari", "edge", "opera", "other"]; $stat = array_fill_keys($browrs, 0); foreach ($uas as $key => $count) { if (strpos($key, "msie") !== fal) { $stat["ie"] += $count; } el if (strpos($key, "edge") !== fal) { $stat["edge"] += $count; } el if (strpos($key, "firefox") !== fal) { $stat["firefox"] += $count; } el if (strpos($key, "opr") !== fal) { $stat["opera"] += $count; // first chrome, then safari } el if (strpos($key, "chrome") !== fal) { $stat["chrome"] += $count; } el if (strpos($key, "safari") !== fal) { $stat["safari"] += $count; } el { if ($debug) { echo "other: $key, count: $count", php_eol; } $stat["other"] += $count; } } return $st憧憬造句at;}function parcmd() { global $debug, $num, $path, $argc, $argv; $optind = null; $options = getopt("hvn:", [], $optind); if ($argc > 2 && empty($options)) { usage(); exit(1); } if (ist($options['h'])) { usage(); exit(0); } if (ist($options['v'])) { $debug = true; } if (ist($options['n'])) { $num = intval($options['n']); if ($num <= 0) { $num = 10; } } if ($argc === 2 && empty($options)) { $path = $argv[1]; } if ($argc > $optind) { $path = $argv[$optind]; } if (!is_dir($path)) { echo "invalid directory: $path", php_eol; exit(1); } if ($debug) { echo "num: $num", php_eol; echo "verbo: ", var_export($debug, true), php_eol; echo "path: $path", php_eol; }}if (version_compare(php_version, "7.1") < 0) { exit("scripts require php >=7.1");}$path = ".";$debug = fal;$num = 10;$uafilters = [ "spider", "bot", "wget", "curl",];parcmd();$files = getfilelist($path);if (empty($files)) { echo '"' . realpath($path) . '" does not contain access log files.', php_eol; exit(0);}$allua = statfiles($fil曹冲称象文言文es);if (empty($allua)) { echo "no data", php_eol; exit(0);}filterua($allua, $uafilters);// sort array with countuasort($allua, function ($a, $b) { return $b - $a;});if ($debug) { print_r($allua);}echo php_eol, "---- top $num ua ----", php_eol;printcount(array_slice($allua, 0, $num));echo "-------------------", php_eol;$os = statos($allua);echo php_eol, "os count:", php电机与电器_eol;printcount($os);$browr = statbrowr($allua);echo php_eol, "browr count:", php_eol;printcount($browr);
总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作具有一定的参考学习价值,如果有疑问大家可以留言交流,谢谢大家对www.887551.com的支持。
本文发布于:2023-04-06 15:06:59,感谢您对本站的认可!
本文链接:https://www.wtabcd.cn/fanwen/zuowen/6799088d0a42e8b0243c5358b657ba7d.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文word下载地址:利用PHP如何统计Nginx日志的User Agent数据.doc
本文 PDF 下载地址:利用PHP如何统计Nginx日志的User Agent数据.pdf
留言与评论(共有 0 条评论) |