数据科学分析⽅法_进⼊数据科学的3种⽅法
数据科学分析⽅法
重点 (Top highlight)
数据科学的第⼀步 (Data Science First Steps)
With the popularity and demand for data scientists, and the well-documented , more people are interested in data science as a career. Over time, I’ve gotten an increasingly large number of questions regarding how to start out as a data scientist. Like many other roles, landing the first job is typically the hardest, as having some experience under your belt is mandatory for many employers. This can create a vicious catch 22: how do you land your first job if they all require prior experience?
随着数据科学家的普及和需求以及有据可查 ,越来越多的⼈对数据科学作为⼀种职业感兴趣。 随着时间的流逝,关于如何开始成为数据科学家的问题越来越多。 像许多其他⾓⾊⼀样,找到第⼀份⼯作通常是最困难的,因为许多雇主都必须具备⼀定的经验。 这可能会带来恶性陷阱22:如果他们都需要先验经验,那么您将如何找到第⼀份⼯作?
In this post, I’ll try to give you some advice — bad on my own experience moving into data science s
everal years back, and my current experience managing a data science department, interviewing dozens of candidates and reviewing hundreds of applications every year.
在这篇⽂章中,我将根据您⼏年前进⼊数据科学领域的经验,以及我⽬前管理数据科学部门的经验,每年⾯试数⼗名候选⼈和审查数百个应⽤程序的经验,为您提供⼀些建议。relatively
你的背景是什么? (What’s your background?)
btestFrom my experience, people trying to start a career in data science can be split into three relatively distinct groups. It’s important to identify which of the you are most similar to, in order to figure out your best next steps.
根据我的经验,尝试开始从事数据科学事业的⼈们可以分为三个相对不同的群体。 重要的是要确定与您最相似的哪个,以便找出最佳的下⼀步。
六一快乐的英文怎么写
1. The STEM career change — The are people with an advanced academic degree in a technical/scientific field who may
already have veral years’ work experience in an adjacent field. As the hype around data science has grown,
something the lord made
they’ve started considering the option of transitioning. They typically have a strong mathematics and rearch
background and can follow the linear algebra and statistics behind machine learning models. They have experience reading academic papers and aren’t intimidated by the formulas. Their transferable skills can help them become good data scientists relatively quickly.
STEM职业变更 -这些⼈是在技术/科学领域具有较⾼学历的⼈,他们可能已经在相邻领域拥有数年的⼯作经验。 随着围绕数据科学的炒作越来越多,他们已经开始考虑过渡的选择。 他们通常具有强⼤的数学和研究背景,并且可以遵循机器学习模型背后的线性代数和统计信息。 他们具有阅读学术论⽂的经验,并且不受公式的约束。 他们的可转让技能可以帮助他们相对Swift地成为优秀的数据科学家。
2. The data science new grad — While it’s taken a few years, universities have started to address the industry demand
and various faculties are now offering MSc programs in data science. Depending on the university, the might include the statistics, electrical engineering or industrial engineering departments. While the degrees can’t cover
everything, they’re quickly becoming a gold standard for comprehensive data science training that a 3- or 6-month bootcamp can’t meet. A good program will also include a thesis (and publication/s), which gives the employer an opportunity to discuss your work in greater detail. Whenever interviewing new grads I deep dive into their thesis, making sure they understand alternative approaches, discuss why they made certain decisions and ascertain how they handle feedback. Due to the scope of a thesis, it’s usually a great way to evaluate how someone performs rearch and how well they really know their material, in a way that a Kaggle project they did a while back can’t achieve.
数据科学新毕业⽣ —尽管花费了⼏年时间,但⼤学已经开始满⾜⾏业需求,并且各个学院现在都提供数据科学理学硕⼠课程。 根据⼤学的不同,这些可能包括统计,电⽓⼯程或⼯业⼯程系。 尽管这些学位不能涵盖所有内容,但它们正Swift成为进⾏3个⽉或6个⽉训练营⽆法满⾜的全⾯数据科学培训的⾦标准。 ⼀个好的程序还将包括⼀篇论⽂(和出版物),使雇主有机会更详细地讨论您的⼯作。 每当采访新毕业⽣时,我都会深⼊研究他们的论⽂,确保他们了解替代⽅法,讨论他们为什么做出某些决定并确定他们如何处理反馈。
sos的意思由于论⽂的范围,它通常是评估某⼈如何进⾏研究以及他们对⾃⼰的材料的真正了解程度的⼀种好⽅法,⽽这是他们前⼀段时间所做的Kaggle项⽬⽆法实现的。
3. The optimist — This is someone who hasn’t gone through formal data science training nor do they have an extensive
statistics/math background. They may have veral years’ experience in data analytics within a specific vertical (finance, healthcare, etc) and want to complement their current skills to gradually move into a data science role. In the past, veral people turned to me for consultation about their possibility to be a data scientist in fintech or some other specific vertical. While business acumen and experience in the vertical is important, this is the wrong mental mindt.
The commonality between data science roles in various verticals is significant — the tools and algorithms solve generic mathematical problems, not vertical-specific ones. It’s easier to teach a good data scientist about a new domain than it is to train a business analyst with domain knowledge how to program, teach them statistics and machine learning. If you want to be a data scientist — you want to be just that, not a fintech data scientist.
乐观主义者 -这是⼀个没有经过正规数据科学培训也没有⼴泛的统计学/数学背景的⼈。 他们可能在特定⾏业(财务,医疗保健等)的数据分析领域拥有数年的经验,并希望补充其当前的技能以逐步担任数据科学的⾓⾊。 过去,有⼏个⼈向我咨询以寻求成为⾦融科技或其他特定领域的数据科学家的可能性。
尽管业务敏锐度和垂直⾏业经验很重要,但这是错误的思维⽅式。 各个垂直领域的数据科学⾓⾊之间的共通性很重要-这些⼯具和算法可以解决通⽤的数学问题,⽽不是特定于垂直领域的问题。 向优秀的数据科学家讲授新领域要⽐培训具有领域知识的业务分析师如何编程,教他们进⾏统计和机器学习要容易得多。 如果您想成为⼀名数据科学家–您就是那样,⽽不是⾦融科技数据科学家。
If you’ve read this far, you probably know that there are a lot of online cours teaching everything data science related. While tho cours are fundamental and deliver a ton of content, the vast majority try to give the most practical
information as fast as possible. This typically means you’re going to learn a lot of machine learning models but only get the 30K foot explanation of how the algorithm actually works. Many cours won’t complicate matters with complex math so they can remain accessible to as big an audience as possible. While it’s definitely possible to train models and ‘do data science’ without understanding the intricacies of the algorithm, your capabilities will be limited. With the trend of , plugging in an algorithm and trying out a few standard options won’t require a data scientist in the near future. Like many other professions, data scientists too will need to keep an edge over automated systems to keep their jobs, which will typically mean a much deeper understanding of the algorithms.
gardeners
如果您已经阅读了到⽬前为⽌,您可能会知道有很多在线课程教授与数据科学相关的所有知识。 虽然这些课程是基础课程并提供⼤量内容,但绝⼤多数课程都试图尽快提供最实⽤的信息。 这通常意味着您将要学习很多机器学习模型,但只能获得该算法实际⼯作原理的30K 英尺解释。 许多课程不会使复杂的数学变得复杂,因此可以让尽可能多的观众接触到它们。 虽然绝对有可能在不了解算法复杂性的情况下训练模型和“做数据科学”,但您的能⼒将受到限制。 随着机器学习的趋势,插⼊算法并尝试⼀些标准选项在不久的将来将不再需要数据科学家。 像许多其他专业⼀样,数据科学家也需要在⾃动化系统上保持优势,以保持其⼯作,这通常意味着对算法有更深⼊的了解。
Due to the very accessible nature of data science training and lack of standard required qualifications to practice data science, anyone who has undergone a 50 hour cour can lf-appoint themlves as a data scientist. As elwhere, when a role is in high demand, supply will increa to meet the demand and an influx of new candidates will start moving in. To have a rious chance at making it in the field, a significant investment of time is required.
由于数据科学培训的易⽤性以及缺乏实践数据科学所需的标准资格,因此,经过50⼩时课程的任何⼈都可以⾃⾏任命⾃⼰为数据科学家。与其他地⽅⼀样,当⼀个⾓⾊的需求很⾼时,供应将增加以满⾜需求,并且将涌⼊新的候选⼈。要想在该领域取得成功的机会很⼤,就需要⼤量的时间投⼊。
如何闯⼊数据科学 (How to break into data science)
There are different ways to gain the minimal experience and knowledge to get your first data science position. When hiring for a junior position, the interviewer is going to look for a few things:
有多种⽅法可以获取最少的经验和知识,从⽽获得您的第⼀个数据科学职位。 招聘初级职位时,⾯试官会寻找⼀些东西:
Do you understand the fundamentals and theory of machine learning?
您了解机器学习的基础知识和理论吗?
Do you have the necessary coding skills (usually Python or R)?
您是否具备必要的编码技能(通常是Python或R)?
Can you demonstrate both of the points (e.g. walk the walk, not just talk the talk)?
您能同时说明这两个点吗(例如,⾛路,⽽不只是说话)?
As a candidate, you need to remember that the company’s loss function is asymmetric — hiring a bad candidate can have a much wor outcome than turning down a good hire. This means that co
mpanies are going to be cautious about taking risks on someone lacking a track record. You need to help the hiring manager as much as possible to demonstrate that you’re a low-risk and high-potential hire. This also means that your chances may be relatively low and you need to be emotionally prepared for a lot of rejections before getting an offer.
作为应聘者,您需要记住,公司的亏损职能是不对称的-聘⽤糟糕的应聘者⽐拒绝优秀的聘⽤者要糟糕得多。 这意味着公司将谨慎对待缺乏良好业绩记录的⼈。 您需要尽可能地帮助招聘经理,以证明您是低风险和⾼潜⼒的员⼯。 这也意味着您的机会可能相对较低,在获得要约之前,您需要为许多拒绝⽽在情绪上做好准备。
There are 3 main ways to gain the theoretical knowledge and experti necessary for your first role, and they can be combined in various methods:
您可以通过3种主要⽅法来获得担任第⼀职务所需的理论知识和专业知识,并且可以将它们结合使⽤多种⽅法:
1. Masters Degree (with thesis) — As mentioned above, this is probably the gold standard for training today. While it can
take 1–2 years, it is time well spent, especially if studying at a well known university. University pedigrees vary by location so it helps to understand what’s considered a good university in your vicinity.
硕⼠学位(附论⽂)—如上所述,这可能是当今培训的黄⾦标准。 虽然可能需要1-2年,但它是花费的时间,特别是如果在著名的⼤学学习。 ⼤学的⾎统书因地点⽽异,因此有助于了解您附近的⼀所好⼤学。
2. Bootcamp — the typically run 3–6 months for full time immersive programs and much longer if they’re part-time.
It’s best to pay clo attention to the financial incentive the program has in regards to your future career. In some bootcamps it’s very straightforward — you pay for the training. On the other hand, the best bootcamps will also offer Income Share Agreements. In this scenario, after the bootcamp is complete you pay them a percentage of your salary only if it is above a threshold. The agreement is usually in effect for 2–4 years and is capped (e.g. 1.5–2X the
upfront tuition cost). In Israel, and operate in this fashion and put a bigger focus on assisting their students land their first role. Other bootcamps work by keeping you on their payroll for 2 years followi
ng the training period, during which you work on a project for their client companies (e.g. in Israel). The bootcamp pays your salary directly and pockets the difference between it and their outsourcing fee, while typically offering the employee an exit clau (which covers their training expens).
训练营-对于全职沉浸式课程,这些课程通常需要运⾏3-6个⽉,如果是兼职课程,则需要更长的时间。 最好密切注意该计划对您未来职业的经济激励。 在某些训练营中,这⾮常简单-您需要⽀付培训费⽤。 另⼀⽅⾯,最好的训练营也将提供收⼊分成协议。 在这种情况下,新⼿训练营结束后,您仅需⽀付⼯资的⼀定百分⽐即可,仅⽀付薪⽔的⼀部分。 该协议通常有效期为2⾄4年,并且有上限(例如,前期学费的1.5⾄2倍)。 在以⾊列, 和以这种⽅式开展业务,并将重点更多地放在帮助他们的学⽣获得他们的第⼀个⾓⾊上。 其他训练营的⼯作⽅式是在培训期结束您的薪⽔保持在两年内,在此期间,您为他们的客户公司(例如以⾊列的 )从事⼀个项⽬。 训练营直接⽀付您的薪⽔,并收取薪⽔与其外包费⽤之间的差额,同时通常向员⼯提供退出条款(涵盖培训费⽤)。
Generally speaking, the bootcamps cover a wide range of topics and include theoretical machine learning knowledge, coding skills, statistics and (at least one) capstone project. As you can understand, different bootcamps have various levels of incentive to ensure your successful placement following their training. In some cas, it may be worthwhile to invest the time in a bootcamp, even if a fair chunk of the material is already known just to benefit from their assistance in
landing the first position.passmark
⼀般来说,这些训练营涵盖了⼴泛的主题,包括理论上的机器学习知识,编码技能,统计数据和(⾄少⼀个)顶点项⽬。 如您所知,不同的训练营有不同程度的激励机制,以确保您在训练后能够成功⼊职。 在某些情况下,将时间花在训练营上是值得的,即使已经知道相当⼀部分材料只是受益于他们帮助他们登上第⼀个职位。
3. Online cours — the amount and quality of the cours has been transformational, enabling anyone around the
world to learn from the top experts. The fact that such high quality content is now freely accessible to anyone has dramatically reduced the barrier to entry. At a very high level one can parate the cours into two types — intro level cours that try to cover a bit of everything in machine learning, and more advanced cours that dive deeper into specific areas. Several of the popular intro level cours can be completed in under 80 hours of dedicated effort.
While this does require dedication (especially for something doing this on top of a full time job), it’s a relatively trivial time investment compared to many other high-paying professions (e.g. think of the time required to become a pilot, lawyer or doctor). I’ve en a few applicants who put down as their s
ingle training in the field. I agree that it’s a great cour (it was the first one I took when transitioning to data science), but it was definitely not sufficient to qualify as a data scientist. You should be very wary of any cour that claims to teach you the A-Z of ML. They might be a great intro into the field, but you should treat them as the first step in a long journey.
在线课程-这些课程的数量和质量已经发⽣了改变,使世界各地的所有⼈都可以向顶尖专家学习。 现在任何⼈都可以⾃由访问这样⾼质量的内容,这⼀事实⼤⼤减少了进⼊的障碍。 在⾮常⾼的层次上,可以将这些课程分为两种类型:⼊门级课程,尝试涵盖机器学习的所有内容,以及更⾼级的课程,深⼊研究特定领域。 不到80⼩时的投⼊,即可完成⼏门热门的⼊门级课程。 尽管这确实需要奉献精神(尤其是在全职⼯作之上做某事),但与许多其他⾼薪职业相⽐,这是相对微不⾜道的时间投⼊(例如,考虑成为飞⾏员,律师或医⽣所需的时间) 。 我见过⼀些申请者将作为他们在该领域的唯⼀培训。 我同意这是⼀门很棒的课程(这是我过渡到数据科学的第⼀门课程),但是绝对不⾜以成为数据科学家。 您应该警惕任何声称可以教您ML AZ的课程。 它们可能是该领域的不错⼊门,但是您应该将它们视为长途旅⾏的第⼀步。
这些趋势对我意味着什么? (What do the trends mean for me?)
The STEM career change — Of the three paths this is probably the fastest one, and if you invest eno
ugh time, your chances of success are pretty good. Additionally, the clor your background is to data science, the better. Depending on your background, you may already have most of the mathematical background and need to invest more heavily in your programming skills. As an employer, discussing someone’s thesis or disrtation can help show how well they grasp complex rearch subjects. Can they get into the weeds and back up to 30K feet quickly? Do they really understand why they made different decisions or ud certain algorithms? What value might their rearch have? While strong rearch capabilities aren’t enough for a data scientist, checking the marks can help de-risk a new candidate, especially one with limited direct experience in the field. As someone who went through this path veral years back (my MSc was in applied physics), I continue to e how my education gives me a different viewpoint in solving problems compared to colleagues with math, statistics, economics or biology backgrounds.
STEM职业变更 -在这三种途径中,这可能是最快的途径,⽽且如果您投⼊⾜够的时间,那么成功的机会就很⼤。 此外,您的背景与数据科学越近越好。 根据您的背景,您可能已经拥有⼤多数数学背景,并且需要在编程技能上投⼊更多的精⼒。 作为雇主,讨论某⼈的论⽂或论⽂可以帮助证明他们掌握复杂研究课题的能⼒。 他们可以进⼊杂草并Swift回到30K英尺吗? 他们真的了解为什么他们做出不同的决定或使⽤某些算法吗? 他们的研究可能有什么价值? 尽管强⼤的研究能⼒不⾜以吸引数据科
学家,但检查这些标记可以帮助降低新候选⼈的风险,尤其是在该领域中缺乏直接经验的候选⼈。 作为⼏年前曾经⾛过这条路的⼈(我的硕⼠是应⽤物理学的),我继续看到与具有数学,统计学,经济学或⽣物学背景的同事相⽐,我的教育对解决问题有何不同的看法。
Someone going through this path also has the benefit of being able to pick up more advanced material quickly. Once
you’ve gotten your feet wet, you’ll want to understand the algorithms to a great extent and develop an insight for the hyperparameters. This is a lot easier if you’re accustomed to advanced math.
沿这条路⾛的⼈还具有能够快速拾取更多⾼级材料的好处。 ⼀旦弄湿了,您将需要在很⼤程度上理解算法并深⼊了解超参数。 如果您习惯了⾼级数学,这会容易得多。
Pro Tip — if you’re at all able to highlight data science / machine learning work you’ve done before you officially started as a data scientist, you might be able to get additional years of your experience recognized as relevant when negotiating compensation. While you don’t want to embellish your past work, it is uful to point out your
programming experience, data analytics, advanced statistics, experimental design, algorithm development or other adjacent types of work.
专家提⽰ -如果您完全能够突出您在正式成为数据科学家之前就已经完成的数据科学/机器学习⼯作,那么在进⾏薪酬谈判时,您可能会获得更多与经验相关的经验。 虽然您不想修饰过去的⼯作,但是指出您的编程经验,数据分析,⾼级统计,实验设计,算法开发或其他相邻类型的⼯作很有⽤。
The data science new grad — assuming you still have some time to complete your studies, look for any extra-curricular activities that can help you gain experience. Ideally, this would involve an internship within a data science team. One of my past employers would regularly bring in interns each summer and make offers at the end of the ason to the most promising ones. This was a great win-win and a large portion of the company’s hires came through that program. If an internship isn’t possible, your university might have a capstone project you can invest in. At we’ve collaborated with a local university, giving one of their teams an open project to work on with our guidance as their capstone. If the students invest and do genuinely good work (i.e. not just to pass their cour, but something that would qualify as good work in the company), we could be interested in hiring or at the very least writing a letter of recommendation for future employers.
数据科学专业的新毕业⽣ —假设您还有时间完成学习,那么寻找可以帮助您获得经验的任何课外活动。 理想情况下,这需要在数据科学团队中进⾏实习。 我以前的雇主之⼀会在每个夏天定期聘⽤实习⽣,并在本赛季结束时向最有前途的雇主提出要约。 这是⼀次双赢,公司的⼤部分员⼯都是通过该
计划获得的。 如果⽆法进⾏实习,则您的⼤学可能有⼀个您可以投资的顶峰项⽬。在我们与当地⼀所⼤学合作,为他们的团队之⼀提供了⼀个开放的项⽬,以我们的指导作为顶峰。 如果学⽣投资并做真正的好⼯作(即不仅要通过他们的课程,⽽且要在公司中有良好的⼯作资格),我们可能会对招聘感兴趣,或者⾄少写⼀封给未来雇主的推荐信。桃花心木课件
Pro Tip — When working in data science (as in almost any career), you’ll need to be able to explain things to people outside your domain (side note — never make the mistake of thinking non-technical people aren’t as smart as you).
During your interviews, you’re going to be asked quite a bit about your thesis. Find a smart friend with limited
knowledge in machine learning to ask you about this. Can you explain to them what you did and how it was different from existing solutions? I’ve interviewed veral new grads who could describe all the details of their rearch but were stumped by some high level, introduction questions (e.g. why is this rearch important?).
专家提⽰ -在数据科学领域⼯作(⼏乎在任何职业中),您都需要能够向⾃⼰领域以外的⼈解释事物(旁注-切勿误以为⾮技术⼈员不那么聪明就像你⼀样)。 在⾯试中,您将被问及有关论⽂的很多信息。 寻
钿头银篦击节碎
找⼀个在机器学习⽅⾯知识有限的聪明朋友,向您询问有关此事。 您能否向他们解释您做了什么以及与现有解决⽅案有何不同? 我采访了⼏位新毕业⽣,他们可以描述他们研究的所有细节,但被⼀些⾼级的⼊门问题所困扰(例如,为什么这项研究很重要?)。
Finally, don’t forget that success requires lifelong learning and you’ve only completed one pha of your training so far. Continuing to learn on the job is just as important and may be more difficult as it isn’t as structured.
ciba最后,不要忘记,成功需要终⾝学习,到⽬前为⽌,您只完成了培训的⼀个阶段。 继续在⼯作中学习同样重要,并且可能因为没有那么结构化⽽变得更加困难。
The optimists — There are a lot of people learning to become data scientists through online cours and bootcamps. Competition is stiff and you’re not going to get a job in the field after investing 80 hours. Employers are going to look at the duration of your class/bootcamp and how familiar they are — nano-degrees on EdX or a 6-month bootcamp are going to be a lot more impressive than a single cour on Udemy or Courra.
乐观主义者 -有很多⼈通过在线课程和训练营学习成为数据科学家。 竞争⾮常激烈,您在投⼊80个⼩时后就不会在野外找到⼯作。 雇主将查看您的课程/训练营的持续时间,以及他们的熟悉程度-在EdX
或6个⽉的训练营上的纳⽶学位将⽐在Udemy或Courra上的⼀门课程印象深刻。