首页 > 作文

利用PHP如何统计Nginx日志的User Agent数据

更新时间:2023-04-06 15:07:16 阅读: 评论:0

前言

即将用到爬虫,于是打算收集一下ur agent(ua)数据。接着马上想到自己网站的访问日志不就是现成的优质数据源吗?于是愉快的决定写个脚本统计一下nginx访问日志中的ua信息。

这类简单操作,用脚本语言就足够,毫无疑问肯定要用最熟悉的php。打开vim就开撸,十几分钟下来,功能简单的统计脚本就搞定了。

脚本目前有三个功能:

1. 找出所有的ua信息并排序; 2. 统计操作系统数据; 3. 统计浏览器数据。

程序运行截图如下:

1、ua信息

2、操作系统信息

3、浏览器

用脚本统计最近一个月的访问日志,得到以下结果:

搜索引擎爬虫比较频繁,每天有好几千次数据访问;windows仍是份额最大的操作系统,linux桌面依然份额很小;chrome目请假邮件前是浏览器领域的霸主,其次是firefox,opera已经很小众了。

最后附上php脚本的代码,也可以从本人的github里找到:https://github.com/tlanyan/scrip民生问题ts/blob/master/statua.php

#!/usr/bin/php<?php/** * @brief stat ua in access log * * @author tlanyan<tlanyan@hotmail.com> * @link http://tlanyan.me *//* vim: t ts=4; t sw=4; t ss=4; t expandtab; */function getfilelist(string $path) : array { return glob(rtrim($path, "/") . "/*access.log*");}function statfiles(array $files) : array { $stat = []; echo php_eol, "start to read files...", php_eol; foreach ($files as $file) {  echo "read file: $file ...", php_eol;  $contents = getfilecontent($file);  foreach ($contents as $line) {   $ua = getua($line);   if (ist($stat[$ua])) {    $stat[$ua] += 1;   } el {    $stat[$ua] = 1;   }  } } echo "stat all files done!", php_eol, php_eol; return $stat;}function getfilecontent(string $file) : array { if (substr($file, -3, 3) === ".gz") {  return gzfile($file); } return file($file);}function getua(string $line) : ?string { // important! nginx log format determins the ua location in the line! // you may have to refactor following codes to get the right result // ua starts from fifth double quote  $count = 0; $offt = 0; while ($count < 5) {  $pos = strpos($line, '"', $offt);  if ($pos === fal) {   echo "error! unknown line: $line", php_eol;   return null;  }  $count ++;  $offt = $pos + 1; } $end = strpos($line, '"', $offt); return substr($line, $offt, $end - $offt);}function usage() { echo "usage: php statua.php [option] [dir]", php_eol; echo " options:", php_eol; echo " -h: show this help", php_eol; echo " -v: verbo mode", php_eol; echo "-n num: ua list number", php_eol; echo " dir: directory to the log files", php_eol; echo php_eol;}function filterua(array& $stat, array $uafilters) { $filtercount = 0; foreach ($uafilters as $filter) {  foreach ($stat as $ua => $count) {   if (stripos($ua, $filter) !== fal) {    $filtercount += $count;    unt($stat[$ua]);   }  } } echo "filter $filtercount records!", php_eol;}function printcount(array $stat) { $sum = array_sum($stat); foreach ($stat as $key => $count) {  echo $key, " : ", $count, ", percent: ", sprintf("%.2f", 100*$count/$sum), php_eol; }}function statos(array $uas) : array { global $debug; echo php_eol, "stat os...", php_eol; $os = ["windows", "macos", "linux", "android", "ios", "other"]; $stat = array_fill_keys($os, 0); foreach ($uas as $key => $count) {  if (strpos($key, "windows") !== fal) {   $stat["windows"] += $count;  } el if (strpos($key, "macintosh") !== fal) {   $stat["macos"] += $count;  // must deal android first, then linux  } el if (strpos($key, "android") !== fal) {   $stat["android"] += $count;  } el if (strpos($key, "linux") !== fal) {   $stat["linux"] += $count;  } el if (strpos($key, "iphone") !== fal || strpos($key, "ios") !== fal || strpos($key, "like mac os") !== fal || strpos($key, "darwin") !== fal) {   $stat["ios"] += $count;  } el {   if ($debug) {    echo "other: $key, count: $count", php_eol;   }   $stat["other"] += $count;  } } return $stat;}function statbrowr(array $uas) : array { global $debug; echo php_eol, "stat brwor...", php_eol; $browrs = ["chrome", "firefox", "ie", "safari", "edge", "opera", "other"]; $stat = array_fill_keys($browrs, 0); foreach ($uas as $key => $count) {  if (strpos($key, "msie") !== fal) {   $stat["ie"] += $count;  } el if (strpos($key, "edge") !== fal) {   $stat["edge"] += $count;  } el if (strpos($key, "firefox") !== fal) {   $stat["firefox"] += $count;  } el if (strpos($key, "opr") !== fal) {   $stat["opera"] += $count;  // first chrome, then safari  } el if (strpos($key, "chrome") !== fal) {   $stat["chrome"] += $count;  } el if (strpos($key, "safari") !== fal) {   $stat["safari"] += $count;  } el {   if ($debug) {    echo "other: $key, count: $count", php_eol;   }   $stat["other"] += $count;  } } return $st憧憬造句at;}function parcmd() { global $debug, $num, $path, $argc, $argv; $optind = null; $options = getopt("hvn:", [], $optind); if ($argc > 2 && empty($options)) {  usage();  exit(1); } if (ist($options['h'])) {  usage();  exit(0); } if (ist($options['v'])) {  $debug = true; } if (ist($options['n'])) {  $num = intval($options['n']);  if ($num <= 0) {   $num = 10;  } } if ($argc === 2 && empty($options)) {  $path = $argv[1]; } if ($argc > $optind) {  $path = $argv[$optind]; } if (!is_dir($path)) {  echo "invalid directory: $path", php_eol;  exit(1); } if ($debug) {  echo "num: $num", php_eol;  echo "verbo: ", var_export($debug, true), php_eol;  echo "path: $path", php_eol; }}if (version_compare(php_version, "7.1") < 0) { exit("scripts require php >=7.1");}$path = ".";$debug = fal;$num = 10;$uafilters = [ "spider", "bot", "wget", "curl",];parcmd();$files = getfilelist($path);if (empty($files)) { echo '"' . realpath($path) . '" does not contain access log files.', php_eol; exit(0);}$allua = statfiles($fil曹冲称象文言文es);if (empty($allua)) { echo "no data", php_eol; exit(0);}filterua($allua, $uafilters);// sort array with countuasort($allua, function ($a, $b) { return $b - $a;});if ($debug) { print_r($allua);}echo php_eol, "---- top $num ua ----", php_eol;printcount(array_slice($allua, 0, $num));echo "-------------------", php_eol;$os = statos($allua);echo php_eol, "os count:", php电机与电器_eol;printcount($os);$browr = statbrowr($allua);echo php_eol, "browr count:", php_eol;printcount($browr);

总结

以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作具有一定的参考学习价值,如果有疑问大家可以留言交流,谢谢大家对www.887551.com的支持。

本文发布于:2023-04-06 15:06:59,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/zuowen/6799088d0a42e8b0243c5358b657ba7d.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

本文word下载地址:利用PHP如何统计Nginx日志的User Agent数据.doc

本文 PDF 下载地址:利用PHP如何统计Nginx日志的User Agent数据.pdf

下一篇:返回列表
标签:脚本   数据   爬虫   信息
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图