Principles of Machine Learning -- Before You Start 翻译
全世界都在學(xué)習(xí)AI,當(dāng)然我也不能例外。自動(dòng)駕駛、人臉識(shí)別、遍地的機(jī)器人。。。So,今天起,我將開始著手翻譯Principles of Machine Learning全書,全書共7個(gè)章節(jié)加一個(gè)導(dǎo)讀,如果中間摻雜有實(shí)驗(yàn),我也會(huì)和大家一起來完成。那么現(xiàn)在,讓我們開始機(jī)器學(xué)習(xí)的旅程吧!
Introduction
Welcome to the principles of Machine Learning!?My name is Cynthia Rudin.?>> And I’m Steve Elston.?>> Now machine learning is everywhere.?This is the time for machine learning;
it’s becoming mainstream, it’s in the search engines we use every day, it’s in the bank teller machines reading our checks, it’s in our smart phone assistance like Cortana, it’s – you know,
jobs in machine learning are in every industry and we are thrilled to be able to give you an instruction to machine learning in this course.?So let’s Steven and I introduce ourselves first.
So I am an associate professor of computer science and electrical and computer engineering at Duke, and an associate professor of statistics at MIT, and my main expertise is in machine learning and data mining.
My lab is called “the prediction analysis” lab. And I have a PhD from Princeton University, and a lot of my work that I do is applied in machine learning and it’s applied to problems in the electric power history, in healthcare,
and in computational criminology.?>>
Hi and I’m Steve Elston.?I’m a co-founder and principle consultant at a data science consultancy at Seattle called Quantia Analytics.?I’ve been working in predictive analytics and machine learning for several decades now.
I’ve been a long-term R S/SPLUS Python user and developer, started using S when it was a Bell labs project and of course more – you know, in recent decade moved to R like everybody else.
I’m currently an advisor on Azure machine learning and some other analytics products to Microsoft, and I’ve worked in a variety of industries:
payment fraud prevention, telecommunication, capital markets including things like market credit risk models, clearing, and collateral management,
and also worked in several industrial areas such as forecasting for logistics management.
And I have a PhD also from Princeton University and mine is in geophysics.?>> Now when I first learned about machine learning, I thought it was magic.
A way for computers to predict the future, just by seeing the past.?And you know, it’s a way for computers to learn on their own how to solve problems that I can’t solve, and that’s exactly what’s going on.
Computers are learning, just from observing what’s happened in the past.?But it’s nothing like magic.?Now machine learning, in addition to being a really useful toolbox for industrial applications,
it also gives you a perspective about the way your mind works.?So let’s say that I asked you why you could learn and why a computer can’t, right, what would you say?
Would you say that it’s because you’ve seen more of the world than a computer has?
I mean, I think that’s not particularly true anymore, because we have lots of pictures and video and sound now that we could feed to any computer.
Is it because there are more connections in your brain than in a computer??Well that might be part of it, but lots of creatures with much smaller brains than my computer can still learn,
so that’s not it.?Maybe you could argue that a brain is more flexible in some ways than a computer;?maybe you could think your brain is somehow more open to identifying new types of patterns than your computer,
and that’s why you can learn perhaps.
The interesting thing is that actually that’s not quite the way it is;?in fact, it’s sort of the opposite.
Your brain is really good at identifying only certain kinds of patterns;?in fact, these are the types of patterns that it’s expecting.
The fact that humans can learn is not so much a consequence of so much of the human brain being flexible, as it is of the human brain being inflexible,
being wired to identify exactly the types of patterns that it comes across, right.?Natural images, real sounds, patterns of behavior… these are – you know, these are things that we’re really good at identifying. Humans are absolutely awful at identifying patterns in large databases,right, we can’t – we just can’t learn in some settings, and what enables us to learn in the settings we can learn in is the way that our brains are wired. It’s the structure in our mind;?it’s not the flexibility, it’s the limited flexibility.
It’s just that structure.?Okay so what is the field of machine learning exactly??It completely revolves around setting up structures in the computer that limit its flexibility and allow it to learn.
Okay, setting up these structures is really a form of statistical modelling, and that’s what we’re going to do in this course.?And once you can teach a computer to learn, there are a huge number of applications that you can use it on.
>> So, let’s talk about a few of the applications that we’ll use both for our demos and for the labs that you’re going to do hands-on in this course.?So first off, we’re going to do a classification example,
and we’ll be coming back to this in several points in the course – actually each of these and these examples,
and so we’re going to work on classifying diabetes patients who have been in a hospital for treatment and we want to classify the ones who are at high risk that they’re going to be readmitted to the hospital;
that is, that somehow their treatment or the follow up to their treatment or something isn’t likely to be sufficient and they’re going to wind up being re-hospitalized, which is, as you can imagine a serious problem.
It’s expensive, it’s dangerous for the patients, etc. so there’s a lot of reasons why this is an important area.?We’re going to look at forecasting;?forecasting for demand is used all over the place from warehouse management to power generation.
In particular, we’re going to look at forecasting demand for rented bicycles, and so that will be an – again, an application we’ll come back to at several points in this course.
A lot of these things are done in clustering and segmentation, and we’re going to look at segmenting people by their income level, and that’s an –
again, an analog for lots of different things that are done and everything from political science to marketing.?And finally, we’re going to look at how a recommender works;
we’re going to use a restaurant database of Mexican restaurants and compute some recommendations for some of the customers who have written reviews for these Mexican restaurants.?>>
Okay now as I mentioned, humans are lousy at finding patterns in large databases, and so here are some of the applications that we’re working on in my lab that use large databases and machine learning,
and in all of these applications, the answer is really in the data.?It really is, and by providing the computer with the proper machine learning structure to find important patterns, we can really make headway into societal problems.
For instance, we’ve been looking at power grid failures and personalized advertising, and healthcare applications.
>> So, why would you want to continue with this course??What should you expect to get out of this course??Well first off, it’s going to be a hands-on introduction to machine learning.
We have some great labs laid out here, there’s going to be demos – so you’re going to gain some practical experience at working with data and applying machine learning algorithms of various types to those data.
We’re going to look at actually all the major focus areas in machine learning, so we’ll cover a wide variety of algorithms,
methods and techniques.?We’re going to use Azure machine learning quite a bit for demos and for your labs;?and why actually we’re doing this, it’s not only a great environment,
but it’s also a great learning environment because a lot of the tedious stuff is kind of taking care for you, so there’s a lot of things you won’t have to spend time when you do your weekly labs.
Nonetheless, we’ll do a significant amount of data cleaning and visualization using R and/or Python, you can pick which path you’re on.?So we’ll be working – you can be building some skills with that. And we hope that as you go along here as you work on these examples as you listen to the theory lectures, you start to build some intuition around analytics and machine learning and how it all fits together and mostly given intuition of what’s a useful result, what’s adding value, and what’s going in the direction you or say your boss wants you to go.?And we’re going to minimize the math; there’s not going to be any heavy theories, so if you remember a little bit of calculus and some minimal linear algebra, you should be good to go here.
So what are we going to cover specifically in this course??So the first module, we’re going to discuss an introduction to classification, and classification is – in the history of machine learning is kind of where machine learning grew out of largely.?
Then we’re going to talk about regression, and regression is also – many regression methods that are important in machine learning and they have even a much longer history in statistics going back to the late 19th century.?
We’re going to then talk about how do you – once you have improved machine learning models, how do you evaluate the performance??How do you know what to do to improve that performance??
We’re going to then look at some more modern powerful methods like tree and ensemble learning methods and if you don’t know what that means, stay tuned you’ll find out a lot about it.?
And we’re going to look at optimization-based learning methods such as sport vector machines and neural networks.?And we’ll finish up with clustering and recommenders.?>>?
So, as you’re taking this course, we hope you will take some steps to get the most out of it to maximize your learning experience.
So overall, think about the fact that this course is going to be over 6 weeks, we have one module per week over those 6weeks so you can kind of plan your time and your work that way.?
For each module, we have lectures, demos, and labs;?and the labs derive from the lectures and the demos and they are for you to do on your own to reinforce key learning concepts.
And you’ll perform the labs using – as I already mentioned, Azure machine learning, but also either R or Python, and I suggest you decide if you’re going to use R or Python.
Every lab has the same materials or the same steps in either language;?it doesn’t matter in terms of the learning experience.?If you’re very ambitious of course you can try both, but for most people just doing one or the other is going to be just great.?So some of you want to get the certificate from this course, so what do you need to know??
First off, you need a 70% score to pass and get the certificate, and that score is divided between assessments at the end of each of the 6 modules, and the final exam.?
So each module assessment that – or all those module assessments together are half your grade, 50% of your grade, and on each question for the assessment, you actually get two tries so if you mess it up the first time don’t panic,?
you get another chance.?The other half of your grade is a final exam at the end of the class.?This one you only get one try per question, but by then you’ve been through the lectures, you’ve seen all the demos,
and you’ve done all the labs, and so you should – you know, be in a great position to ace that.?
So we hope you get a lot out of this course, and we’re looking forward to presenting it and I think it’s going to be really great informative class to get yourself bootstrapped into the wonderful world of machine learning!
歡迎來到機(jī)器學(xué)習(xí)的原理!我叫辛西婭·魯丁。>>,我是Steve Elston。>>現(xiàn)在機(jī)器學(xué)習(xí)到處都是。這是機(jī)器學(xué)習(xí)的時(shí)代;
機(jī)器學(xué)習(xí)已成為主流,它應(yīng)用與我們每天使用的搜索引擎中,它在銀行柜員機(jī)里讀取我們的支票,它在我們的智能手機(jī)幫助下,像Cortana,它-你知道的,
機(jī)器學(xué)習(xí)應(yīng)用于我們工作中的每一個(gè)行業(yè),我們很高興能在這門課上給你一個(gè)機(jī)器學(xué)習(xí)的指導(dǎo)。讓我們先介紹一下我們自己。
我是杜克大學(xué)計(jì)算機(jī)科學(xué)與電子計(jì)算機(jī)工程的副教授,麻省理工學(xué)院的統(tǒng)計(jì)學(xué)副教授,我的主要專長是機(jī)器學(xué)習(xí)和數(shù)據(jù)挖掘。
我的實(shí)驗(yàn)室被稱為“預(yù)測(cè)分析”實(shí)驗(yàn)室,我在普林斯頓大學(xué)有一個(gè)博士學(xué)位,我的很多工作都應(yīng)用了機(jī)器學(xué)習(xí),比如說電力歷史,醫(yī)療保健,
在計(jì)算犯罪學(xué)。> >
大家好,我是史蒂夫·埃爾斯頓。我是西雅圖一家名為Quantia Analytics的數(shù)據(jù)科學(xué)咨詢公司的聯(lián)合創(chuàng)始人和首席顧問。我從事預(yù)測(cè)分析和機(jī)器學(xué)習(xí)已經(jīng)有幾十年了。
我是一個(gè)長期的R S/SPLUS Python用戶和開發(fā)人員,開始使用S時(shí)是在貝爾實(shí)驗(yàn)室的一個(gè)項(xiàng)目中,當(dāng)然后來-你知道,在最近十年,像其他人一樣轉(zhuǎn)移到R。
我現(xiàn)在是Azure機(jī)器學(xué)習(xí)的顧問和微軟的其他分析產(chǎn)品,我在很多行業(yè)工作過:?例如支付欺詐預(yù)防,電信,資本市場包括市場信用風(fēng)險(xiǎn)模型,清算和抵押品管理,并在多個(gè)工業(yè)領(lǐng)域工作,如物流管理預(yù)測(cè)。
我有一個(gè)普林斯頓大學(xué)的博士學(xué)位,我主攻地球物理學(xué)。>>現(xiàn)在當(dāng)我第一次學(xué)習(xí)機(jī)器學(xué)習(xí)的時(shí)候,我覺得它很神奇。
一種通過觀察過去來預(yù)測(cè)未來的方法。你知道,這是一種讓電腦自己學(xué)習(xí)如何解決我無法解決的問題的方法,而這正是正在發(fā)生的事情。
計(jì)算機(jī)正在學(xué)習(xí),僅僅是通過觀察過去發(fā)生的事情。但這并不是魔法。現(xiàn)在機(jī)器學(xué)習(xí),除了作為工業(yè)應(yīng)用的一個(gè)非常有用的工具箱,
它也給你一個(gè)關(guān)于你的思維運(yùn)作方式的視角。假設(shè)我問你為什么你可以去主動(dòng)學(xué)習(xí),電腦卻不會(huì)主動(dòng)學(xué)習(xí)呢,你會(huì)說什么?
你會(huì)說這是因?yàn)槟憧吹降氖澜绫入娔X還多嗎?
我的意思是,我認(rèn)為這不再是事實(shí)了,因?yàn)槲覀冇泻芏鄨D片,視頻和聲音,現(xiàn)在我們可以輸入進(jìn)任何電腦。
是因?yàn)榇竽X中的神經(jīng)網(wǎng)絡(luò)比電腦的多嗎?這可能是其中的一部分,但是很多大腦比我的電腦小的生物仍然可以學(xué)習(xí),
所以這不是它。也許你可以認(rèn)為大腦在某些方面比電腦更靈活;也許你會(huì)認(rèn)為你的大腦比你的電腦更容易識(shí)別出新的模式,
這就是為什么你可以學(xué)習(xí)。
有趣的是,事實(shí)并非如此;事實(shí)上,這恰恰相反。
你的大腦非常善于識(shí)別特定的模式;實(shí)際上,這些是它所期望的模式類型。
人類能夠?qū)W習(xí)的事實(shí)與其說是由于人類大腦的靈活,不如說是由于人類大腦的靈活性,
通過連線來確定它所遇到的模式的類型。自然圖像,真實(shí)聲音,行為模式這些都是我們非常擅長識(shí)別的東西。
人類對(duì)于在大型數(shù)據(jù)庫中識(shí)別模式是非常可怕的,對(duì)吧,我們不能——我們不能在某些環(huán)境中學(xué)習(xí),在我們可以學(xué)習(xí)的環(huán)境中,讓我們學(xué)習(xí)的是我們的大腦是如何連接的。這是我們頭腦中的結(jié)構(gòu);這不是靈活性,而是有限的靈活性。
這就是結(jié)構(gòu)!那么機(jī)器學(xué)習(xí)的領(lǐng)域到底是什么呢?它完全圍繞在計(jì)算機(jī)中設(shè)置結(jié)構(gòu),限制其靈活性并允許它學(xué)習(xí)。
好的,建立這些結(jié)構(gòu)實(shí)際上是一種統(tǒng)計(jì)模型,這是我們?cè)谶@門課上要做的。一旦你可以教電腦學(xué)習(xí),你可以使用大量的應(yīng)用程序。
>>所以,讓我們來討論一下我們將會(huì)用到的一些應(yīng)用程序,用于我們的演示和實(shí)驗(yàn)室,你們將在這門課上親自動(dòng)手。首先,我們要做一個(gè)分類的例子,
我們將會(huì)在課程的幾個(gè)方面回到這一點(diǎn)——實(shí)際上,每一個(gè)例子,
所以我們將致力于對(duì)那些在醫(yī)院接受治療的糖尿病患者進(jìn)行分類我們想要對(duì)那些高危人群進(jìn)行分類他們將被重新接納到醫(yī)院;
也就是說,他們的治療或后續(xù)治療可能是不夠的他們會(huì)被重新送進(jìn)醫(yī)院,這是一個(gè)很嚴(yán)重的問題。
它很貴,對(duì)病人來說很危險(xiǎn),所以這是一個(gè)重要的領(lǐng)域有很多原因。我們來看看預(yù)測(cè);對(duì)需求的預(yù)測(cè)從倉庫管理到發(fā)電都使用了。
特別地,我們將會(huì)看到租賃自行車的需求預(yù)測(cè),這將是一個(gè)應(yīng)用,我們將在這門課的幾個(gè)點(diǎn)上討論這個(gè)應(yīng)用。
很多事情都是在聚類和分割中完成的,我們會(huì)考慮按收入水平細(xì)分人們,這是。
再一次,這是一種類似于許多不同事物的模擬,從政治科學(xué)到市場營銷。最后,我們來看看推薦者是如何工作的;
我們將使用一家墨西哥餐館的餐館數(shù)據(jù)庫,并為一些為這些墨西哥餐館撰寫評(píng)論的顧客提供一些建議。> >
好了,正如我剛才提到的,人類在大型數(shù)據(jù)庫中發(fā)現(xiàn)模式很糟糕,所以這里有一些應(yīng)用程序我們?cè)谖业膶?shí)驗(yàn)室中使用大型數(shù)據(jù)庫和機(jī)器學(xué)習(xí),
在所有這些應(yīng)用中,答案都是在數(shù)據(jù)中。它確實(shí)是,并且通過提供計(jì)算機(jī)與適當(dāng)?shù)臋C(jī)器學(xué)習(xí)結(jié)構(gòu)來尋找重要的模式,我們真的可以在社會(huì)問題上取得進(jìn)展。
例如,我們一直在研究電網(wǎng)故障和個(gè)性化廣告,以及醫(yī)療應(yīng)用。
>>,你為什么要繼續(xù)這門課?你希望從這門課中學(xué)到什么?首先,這將是機(jī)器學(xué)習(xí)的入門介紹。
我們這里有一些很棒的實(shí)驗(yàn)室,會(huì)有一些演示,所以你會(huì)獲得一些實(shí)際的經(jīng)驗(yàn),在處理數(shù)據(jù)和將各種類型的機(jī)器學(xué)習(xí)算法應(yīng)用到這些數(shù)據(jù)中。
我們將會(huì)看到機(jī)器學(xué)習(xí)中所有主要的重點(diǎn)領(lǐng)域,我們將涉及到各種各樣的算法,
方法和技術(shù)。我們將使用Azure機(jī)器來學(xué)習(xí)一些演示和實(shí)驗(yàn)室的知識(shí);為什么我們要這么做,這不僅是一個(gè)偉大的環(huán)境,但這也是一個(gè)很棒的學(xué)習(xí)環(huán)境因?yàn)楹芏喾爆嵉氖虑槎际菫榱苏疹櫮?#xff0c;所以你不用花時(shí)間在每周的實(shí)驗(yàn)室里。盡管如此,我們將使用R和/或Python做大量的數(shù)據(jù)清理和可視化,您可以選擇您所使用的路徑。所以我們會(huì)工作——你可以用它來建立一些技能。
和我們希望你在這里工作在這些例子中你聽理論講座,你開始建立一些直覺分析和機(jī)器學(xué)習(xí)和如何相互配合,主要是直覺的一個(gè)有用的結(jié)果,增加價(jià)值是什么,什么方向你或你的老板要你去說。我們要最小化數(shù)學(xué);
不會(huì)有什么大的理論,所以如果你還記得一些微積分和最小的線性代數(shù),應(yīng)該最好。
那么我們?cè)谶@門課中具體要講什么呢?第一個(gè)模塊,我們將討論分類的介紹,分類是——在機(jī)器學(xué)習(xí)的歷史中,機(jī)器學(xué)習(xí)是在很大程度上產(chǎn)生的。
然后我們將討論回歸,回歸也是很多回歸方法在機(jī)器學(xué)習(xí)中很重要他們甚至有更長的歷史可以追溯到19世紀(jì)晚期。
我們接下來要講的是如何——一旦你有了更好的機(jī)器學(xué)習(xí)模型,你如何評(píng)價(jià)它的性能?你怎么知道該怎么做才能提高你模型的性能?
我們將會(huì)看到一些更現(xiàn)代的強(qiáng)大的方法比如樹和集成學(xué)習(xí)方法。如果你不知道這意味著什么,請(qǐng)繼續(xù)關(guān)注,你會(huì)發(fā)現(xiàn)很多關(guān)于它的東西。
我們將研究基于優(yōu)化的學(xué)習(xí)方法,比如運(yùn)動(dòng)向量機(jī)和神經(jīng)網(wǎng)絡(luò)。我們將以聚類和推薦結(jié)束。> >
所以,當(dāng)你學(xué)習(xí)這門課程的時(shí)候,我們希望你能采取一些步驟來最大化你的學(xué)習(xí)經(jīng)驗(yàn)。
所以總的來說,考慮到這門課要超過6周的時(shí)間,我們每周有一個(gè)模塊在這6周內(nèi)所以你可以安排你的時(shí)間和你的工作。
對(duì)于每個(gè)模塊,我們都有講座、演示和實(shí)驗(yàn)室;實(shí)驗(yàn)室來自于講課和演示,它們是你自己做的,以強(qiáng)化關(guān)鍵的學(xué)習(xí)概念。
你將使用-我已經(jīng)提到過的,Azure機(jī)器學(xué)習(xí),也可以使用R或Python,我建議你決定是否使用R或Python。
每個(gè)實(shí)驗(yàn)室在兩種語言中都有相同的材料或相同的步驟;就學(xué)習(xí)經(jīng)驗(yàn)而言,這無關(guān)緊要。如果你雄心勃勃,你當(dāng)然可以同時(shí)嘗試這兩種方法,但對(duì)大多數(shù)人來說,只做一件事或另一件事就太好了。你們有些人想從這門課拿到證書,你們需要知道什么?
首先,你需要一個(gè)70%的分?jǐn)?shù)才能通過并獲得證書,而這個(gè)分?jǐn)?shù)在6個(gè)模塊的末尾和期末考試中分成兩部分。
所以每個(gè)模塊評(píng)估——或者所有這些模塊的評(píng)估都是你的一半分?jǐn)?shù),50%的分?jǐn)?shù),在每一個(gè)評(píng)估的問題上,你實(shí)際上得到了兩個(gè)嘗試如果你第一次把它搞砸了不要驚慌,
你得到另一個(gè)機(jī)會(huì)。你們成績的另一半是期末考試。這個(gè)你只需要一個(gè)問題,但是到那時(shí)你已經(jīng)通過了講座,你已經(jīng)看到了所有的演示,
你已經(jīng)做了所有的實(shí)驗(yàn),所以你應(yīng)該——你知道,處于一個(gè)非常有利的位置。
所以我們希望你能從這門課中學(xué)到很多東西,我們期待著展示它,我認(rèn)為這將是一個(gè)非常好的信息課程,讓你自己進(jìn)入機(jī)器學(xué)習(xí)的奇妙世界!
總結(jié)
以上是生活随笔為你收集整理的Principles of Machine Learning -- Before You Start 翻译的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 关于端口号
- 下一篇: 在C++程序中使用系统热键(附代码)