首页 > 美文鉴赏

Python爬取Facebook公共主页帖子

更新时间:2023-05-17 08:13:02 阅读：评论：0

Python爬取Facebook公共主页帖⼦

Resource Recommendation

前段时间做项⽬需要爬Facebook，但因为疫情原因官⽅的个⼈Graph API暂停申请权限，抓⽿挠腮之际只能奔向万能的GitHub找资源。多多少少试了好多包，把个⼈觉得⽐较好的罗列在下⾯，仅供个⼈学习和交流，不⽤于商业⽤途。

1. 在线 Facebook主页基本信息（公开的地址、电话、邮箱、营业时间等等）爬取⼯具，快速便捷，有免费试⽤版。

/automations/facebook/8369/facebook-profile-scraper

2. 来⾃GitHub，试了下爬取个⼈主页的相关帖⼦、视频等等还是很强⼤的，需要有效的credentials（注册邮箱和密码）。

/harismuneer/Ultimate-Facebook-Scraper

3. 来⾃GitHub，可以爬取公共主页所有帖⼦、对应时间、转赞评数⽬、帖⼦ID等，不需要credentials，是我找到的少数⼏个能爬公共

论语起名

主页的有效代码，可惜评论的具体内容⽆法爬取。/kevinzg/facebook-scraper

Practical Usage

最终选择上述第三种⽅法来爬取⽬标公司Facebook公共主页的所有帖⼦并输出xlsx数据：

import re

四川名小吃import time

import datetime

import pandas as pd

import numpy as np

from Facebook_Scraper.facebook_scraper import get_posts

from Facebook_Scraper.facebook_scraper import fetch_share_and_reactions

def facebook_scrap():

# The data type of incorporation date and dissolution date is timestamp, we'll convert them into string containing only date.

心理手抄报文字data = pd.read_excel('../data/datat.xlsx',converters={'Date of Establishment_legal':str,'Dissolved_legal':str})

# Column 'Date of Establishment_legal' contains the company's incorporation date, column 'Dissolved_legal' contains the company's dissolution date, an d column 'Facebook' contains the link of the Facebook public page of the company if any.

# We only extract companies with Facebook links

data = data[data['Facebook'].notna()]

data['Date of Establishment_legal']= data['Date of Establishment_legal'].apply(lambda x: x[0:10])三年级科学上册

data['Dissolved_legal']= data['Dissolved_legal'].apply(lambda x: x[0:10]if type(x)==str el(x))

# The input of Facebook scraping code should be its account name, so we need to extract account name from the link

links = data['Facebook'].to_list()

account =[0for _ in range(data.shape[0])]

pattern = re.compile('/([a-zA-Z0-9.]+)')

for i in range(len(links)):

try:

name = re.findall(pattern, links[i])[0]

account[i]= name

except:

空调功能图标介绍

account[i]=0

posts_data = pd.DataFrame({"post_id":"","text":"","post_text":"","shared_text":"","time":"","image":"","likes":"","comments":"",\

"shares":"","post_url":"","link":""},index=["0"])

abbreviation = data['Company name_abbreviation'].to_list()

incorporation_date = data['Date of Establishment_legal'].to_list()

dissolution_date = data['Dissolved_legal'].to_list()

# Starting to scrap posts

for i in range(0,len(account)):

cnt =0

#There are about 2 posts per page, and pages=4000 should be enough for us to scrap all the Facebook posts since the account was created.

for post in get_posts(account = account[i], pages=4000):

cnt +=1

more_info_post = fetch_share_and_reactions(post)

more_info_post['Company name_abbreviation']= abbreviation[i]

木兰诗注音原文

more_info_post['account']= account[i]

梦到大蟒蛇

关于亲情的题目more_info_post['incorporation_date']= incorporation_date[i]

more_info_post['dissolution_date']= dissolution_date[i]

df = pd.DataFrame(more_info_post,index=["0"])

posts_data = posts_data.append(df,ignore_index=True,sort=Fal)

print(account[i],cnt,' posts are scraped.')

uful_columns =['post_id','text','shared_text','time','image','likes','comments','shares','post_url','link',\

'Company name_abbreviation','account','incorporation_date','dissolution_date']

posts_data = pd.DataFrame(posts_data, columns=uful_columns)

posts_data = posts_data.drop([0])

_excel('../data/all_facebook_posts.xlsx',index=Fal)

return posts_data

本文发布于:2023-05-17 08:13:02，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/907297.html

上一篇：计算机工作总结范文(精选7篇)

下一篇：基层工会工作总结及工作计划工会工作总结及计划范文三篇

标签：主页爬取邮箱学习转赞空调文字

留言与评论（共有 0 条评论）