蒙特利尔睡眠数据库_在蒙特利尔使⽤数据科学进⾏房屋狩猎蒙特利尔睡眠数据库
介绍 (Introduction)
I happen to live in Montreal, in my condo on the edge of McGill Ghetto. Clo to Saint Laurent Boulevard or the Maine as locals call it, with all it's attractions — bars, restaurants, night clubs, drunken students. And once upon a time, on a
particular lively night, listening to the sounds of McGill frosh students drunkenly heading home after hard night of studying.
I thought, that it might be a good idea to move into my own hou, a little bit further away from the action.
我碰巧住在蒙特利尔,在麦吉尔贫民窟边缘的公寓⾥。 靠近当地⼈称其为Saint Laurent Boulevard或缅因州的缅因州,拥有所有景点,包括酒吧,餐馆,夜总会,醉酒的学⽣。 曾⼏何时,在⼀个特别热闹的夜晚,听了麦吉尔弗罗什学⽣的声⾳,他们经过艰苦的学习夜夜醉酒地回家。 我认为,搬⼊⾃⼰的房⼦,离活动有点远,可能是个好主意。
软件实施方案
It was not my first rodeo, buying a real estate in Montreal, but first time buying a hou. So, I decided to do a little bit of rearch, before trusting my money to a real estate agent. I quickly realized that I can't afford a hou anywhere clo to the subway station on the Island, but I could possible afford a duplex or a triplex, where tenants would be covering part of my mortgage. The solution to this problem depends not only on the price of the hou, but also on the rent or potential rent that the tenants could be paying.
这不是我的第⼀个⽜仔竞技表演,是在蒙特利尔购买房地产,⽽是第⼀次购买房屋。 因此,我决定做⼀些研究,然后再将钱委托给房地产经纪⼈。 我很快意识到,在岛上地铁站附近的任何地⽅都买不起房,但是我可以负担得起双⼯或三⼈房,房客将⽀付我部分抵押贷款。 解决这个问题的⽅法不仅取决于房屋价格,还取决于租户可能要⽀付的租⾦或潜在租⾦。
So, being a visual person with background in rearch, I wanted to e a visual map of how much things cost around the island, and how much revenue I could get. In the States, and even in Ontario there are rvices like Zillow that can show some of the information, but for Montreal I couldn't find anything, apart from the realtor association . Maybe my preference of using English language is to blame.
因此,作为⼀个具有研究背景的视觉⼈物,我想看⼀眼视觉地图,以了解该岛周围的东西要花多少钱以及我可以得到多少收⼊。 在美国,甚⾄在安⼤略省,都有像Zillow这样的服务可以显⽰⼀些信息,但是对于蒙特利尔,除了房地产经纪⼈协会之外,我什么也找不到。 也许我更喜欢使⽤英语。
因此,在研究了realtor.ca和kijiji⼏周之后,我编写了⼀个python脚本,使⽤在github上找到的⼀些资源从它们中抓取信息: : 。 此外,蒙特利尔市有⼀个开放的数据⽹站,可帮助您填补空⽩。
After the data is collected by webscrappers it is procesd in R, using , . I found excellent resources on how to process geospatial information in R: , I ud to make graphs and for map making.
通过webscrappers收集数据后,使⽤R的 , 。 我在R: 找到了有关如何处理地理空间信息的出⾊资源,我使⽤制作了⽤于制作图形和 。
Now I have more then a year worth of data to study.
现在,我有超过⼀年的数据值得研究。
数据预处理 (Data pre-processing)
I preprocess the data by converting it into simple-features format first, and then changing the to
我先对数据进⾏预处理,⽅法是先将其转换为简单特征格式,然后将更改
library(tidyver)
library(sf)
property<-read_csv("....") %>%
缉熙st_as_sf(coords=c("lng","lat"), crs=4326) %>%
中国有嘻哈导师st_transform(crs=32188)
公寓价格 (Condo price)
First I wanted to evaluate how much I could get for my condo. I need to define my neighborhood and find all the condos for sale around me.
⾸先,我想评估⼀下我可以从公寓得到多少。 我需要定义我的邻居并找到我附近所有待售的公寓。
邻⾥地图 (Neighborhood map)
neighbourhood<-geojson_sf("json") %>%
男人补身体吃什么好
st_transform(32188) %>%
filter(nom_qr %in% c("Saint-Louis", "Milton-Parc")) %>%
summarize() %>%
st_buffer(dist=0)
Selecting condos for sale.
选择公寓出售。
neighbors <- st_join(property, neighbourhood, left=F)
Using a bamap from openstreetmap.
使⽤openstreetmap中的底图。
osm_neighbourhood<-read_osm(st_bbox(neighbourhood%>%st_transform(4326)), ext=1.5, type="esri")
Drawing results using tmap package.
使⽤tmap包绘制结果。
library(tmap)
library(tmaptools)
tm_shape(osm_neighbourhood) + tm_rgb(alpha=0.7)+
tm_shape(neighbourhood) + tm_borders(col='red',alpha=0.8) +
tm_shape(neighbors) + tm_symbols(shape=3,size=0.2,alpha=0.8) +
tm_shape(ref_home) + tm_symbols(col='red',shape=4,size=0.5,alpha=0.8)+
tm_compass(position=c("right", "bottom"))+
tm_scale_bar(position=c("right", "bottom"))
社区公寓价格 (Neighbourhood condo prices)
憋气世界纪录Now I can show the prices, and e how the depend on condo surface area and if there is a parking
lot. And If i u a simple linear regression I can get the first approximation of what my condo might be worth.
现在,我可以显⽰价格,看看如何取决于公寓的表⾯积以及是否有停车场。 ⽽且,如果我使⽤简单的线性回归,则可以得出我的公寓可能价值的第⼀近似值。
线性模型 (Linear model)樊振东是吗
More formally I can u linear model to predict price and confidence intervals
更正式地说,我可以使⽤线性模型来预测价格和置信区间
model_price_lm <- lm(mprice ~ parking:area_interior , data=neighbors_)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41861.30 22421.28 1.867 0.0628 .
## parkingFALSE:area_interior 436.65 23.56 18.530 <2e-16 ***
## parkingTRUE:area_interior 511.95 19.40 26.393 <2e-16 ***
So, in my neighborhood every square foot in a condo without parking adds 437$ to the ba price of 42k$, and with
parking it is 512$ per square foot. And now I can make a prediction of the price: 443k$ with confidence interval [422k$, 465k$]
因此,在我附近,不带停车位的公寓中每平⽅英尺增加了437 $⾄42k $的基本价格,带停车位则为每平⽅英尺512 $。 现在,我可以预测价格了:443k $,置信区间为[422k $,465k $]
However, if I look at the difference between what my model predicts for all the condos in the neighborhood and the prices, I can e that error depends on the predicted value:
但是,如果我查看模型对附近所有公寓的预测结果与价格之间的差异,则可以看到误差取决于预测值:
Therefore violating one of the conditions where simple linear regression can be ud. This kind of be
haviour is called , and there are veral ways of dealing with it. In particular, I found in the literature that I should be using a with for errors and logarithmic link function.
因此违反了可以使⽤简单线性回归的条件之⼀。 这种⾏为称为 ,有⼏种处理⽅法。 特别是,我在⽂献中发现我应该使⽤具有的来处理误差和对数链接函数。
⼴义线性模型 (Generalized linear model)
The estimate using generalized linear model is following:
使⽤⼴义线性模型的估计如下:
model_price_glm <- glm(mprice ~ parking:area_interior , data=neighbors_,
family=inver.gaussian(link="log"))
Which gives prediction 436k$ [422k$, 452k$]
预测为436k $ [422k $,452k $]
Note that I am ignoring number of rooms, floor of the building and the location of the condo for simpli
city. It is possible to plug them all in into the regression, but it will increa number of parameters and make modelling results more difficult to interpret. Also, many parameters are correlated, for example bigger apartments tend to have more rooms and there a more of them with parking.
请注意,为简单起见,我忽略了房间数量,建筑物楼层和公寓位置。 可以将它们全部插⼊回归中,但是它将增加参数数量,并使建模结果更难以解释。 ⽽且,许多参数是相关的,例如,较⼤的公寓往往有更多的房间,⽽其中有更多的带有停车位。
Now, for the sake of simplicity of comparing different properties, I could estimate price per square foot, and how it is affected by different factors.
现在,为了简化⽐较不同属性的⽬的,我可以估算每平⽅英尺的价格以及不同因素对价格的影响。
Again, using generalized linear model with inver Gaussian distribution and log link:我的期待
再次,使⽤具有反⾼斯分布和对数链接的⼴义线性模型 :
每平⽅英尺价格 (price per square foot)
It's easy to make n of the regression results:
理解回归结果很容易:
print(exp(model_psqft$coeff))
## (Intercept) parkingTRUE bedrooms2 bedrooms3 bedrooms4
## 501.7826165 1.1215192 0.9769839 0.9818974 0.8349424
So, the square foot is worth 501$, parking adds 12%, two bedrooms reduce price by 2.4%, three bedrooms by 1.2%, four bedrooms 17% (given the same total price).
因此,平⽅英尺的价格为501美元,停车位增加12%,两居室降低价格2.4%,三居室降低1.2%,四居室降低17%(总价相同)。
The predicted price of my condo is: 431k$ [414k$, 449k$]
我的公寓的预测价格是:431k $ [414k $,449k $]
纵向公寓价格模型 (Longitudinal condo price model)
All my previous models are showing results bad on the condos on the market during the last year,
without trying to account for the price change. It would have been interesting, how the price change with time. I have no idea how prices should behave, there is no reason to think that there is a steady linear trend, considering asonal ri and fall in prices, so first, I could just smooth the data using function.
我以前的所有模型都是根据去年的市场公寓显⽰结果,⽽没有考虑价格变化。 价格会随时间变化会很有趣。 我不知道价格应该如何表现,没有理由认为考虑到价格的季节性上升和下降有⼀个稳定的线性趋势,所以⾸先,我可以使⽤函数对数据进⾏平滑处理。
黄⼟平滑 (Loess smoothing)
If I pile all the data together:
如果我将所有数据堆在⼀起:
But if I try to parate by number of bedrooms, the results are kind of random, since the data might be too spar.
但是,如果我尝试按卧室数量进⾏划分,则结果是随机的,因为数据可能太稀疏了。
So, it ems that I would rather want to have an overall smooth variation in price, while taking into account some features of the condos: i.e there is actually no reason to think that two bedroom condos are gaining in value slower then three bedroom ones. But there is variation of the proportion of different appartments with time, which would bias the results.
因此,在考虑公寓的某些功能的同时,我似乎希望价格总体上保持平稳变化:即,实际上没有理由认为两居室公寓的增值速度要慢于三居室的公寓。 但是,不同公寓的⽐例会随着时间变化,这将使结果产⽣偏差。
So, I am going to u where I can model overall change of price using a smooth function, while taking into account
difference between different kinds of condos.
因此,我将使⽤ ,在其中可以使⽤平滑函数对价格的整体变化进⾏建模,同时考虑到不同类型公寓之间的差异。
纵向公寓价格模型:GAM模型 (Longitudinal condo price model:GAM model)
# price model with time
model_psqft_t <- gam(price_sqft ~ bedrooms + parking + s(start_date, k=24) ,
data=neighbors_, bs="cr",method='REML',
family=inver.gaussian(link="log"))
It still looks like the prices are going up.
看起来价格似乎还在上涨。
Using this model, the prediction of the price is 468k$ [435k$, 503k$]
使⽤此模型,价格的预测为468k $ [435k $,503k $]
卖多长时间 (How long would it take to ll)
Another important question — how long would it take to ll? For this one can u Technically, it looks like some types of condos ll faster then others, but the difference is not big. It looks like half of the condos disappear from the market within 60 days :
想的英语单词另⼀个重要的问题-卖多长时间? 为此,可以使⽤从技术上讲,看起来某些类型的公寓出售得⽐其他
类型的要快,但相差不⼤。 看来有⼀半的公寓在60天内从市场上消失了:
Plex价格估算 (Plex price estimate)
Similarly, when I am looking at the potential plex I would like to know how much hous cost in the neighborhood. Let's say within 2km radius of the plex I was interested at some point:
同样,当我查看潜在的建筑群时,我想知道附近有多少房屋要价。 假设在plex半径2公⾥以内,我对某个点感兴趣:
The price distribution is
价格分布为
Here i can e that the ller is asking slightly more then what is the average for neighborhood, but at the same time the variability is quite high. For plexes many more parameters are important then for condos, like the size of the backyard,
which year the building was built and how much existing tennants are paying.
在这⾥,我可以看到卖⽅要问的要多于邻⾥的平均值,但与此同时变异性却很⾼。 对于plex,对于公寓⽽⾔,还有更多的参数很重要,例如后院的⼤⼩,建筑物的建造年份以及现有租户要付多少钱。
Using similar GLM model as for condos, the estimate for the price is the following: 567к$ [522k$, 616k$]
使⽤与公寓相似的GLM模型,价格估算如下:567к$ [522k $,616k $]
To estimate the rentals prices in the neighborhood I can find all the appartments listed on Kijiji during last year clo by.
要估算附近的租⾦价格,我可以在附近找到Kijiji上列出的所有公寓。