深度学习数据更换背景_开始学习数据科学的最佳方法是了解其背景
深度學(xué)習(xí)數(shù)據(jù)更換背景
數(shù)據(jù)科學(xué)教育 (DATA SCIENCE EDUCATION)
目錄 (Table of Contents)
The Importance of Context Knowledge
情境知識的重要性
(Optional) Research Supporting Context-Based Learning
(可選)研究支持基于上下文的學(xué)習(xí)
The Context of Data Science
數(shù)據(jù)科學(xué)的背景
Understand the Concept, not the Calculation
了解概念,而不是計算
The Context of the Sub-Disciplines
子學(xué)科的背景
Next Steps
下一步
情境知識的重要性 (The Importance of Context Knowledge)
I made the decision to orient my career path towards data science during my senior year of university. It only took one or two research-binges before I realized the vast depth of the field in front of me. I knew eventually I’d have to understand things like the architecture of a convolutional neural network, the process of numericalization for NLP, or the underpinnings of principal component analysis. However, rather than jumping into the minutiae of these concepts in a void, I’ve always needed to develop a rock-solid contextual foundation of knowledge first. I’ll call this approach context-based learning.
我決定在大學(xué)四年級時將自己的職業(yè)道路轉(zhuǎn)向數(shù)據(jù)科學(xué)。 在我意識到眼前的廣闊領(lǐng)域之前,只花了一兩個研究便步。 我知道最終我將必須了解卷積神經(jīng)網(wǎng)絡(luò)的體系結(jié)構(gòu),NLP的數(shù)字化過程或主成分分析的基礎(chǔ)。 但是,我始終沒有首先跳入這些概念的細(xì)微之處,而是始終首先開發(fā)了堅實的知識上下文基礎(chǔ)。 我將這種方法稱為基于上下文的學(xué)習(xí) 。
什么是基于上下文的學(xué)習(xí)? (What is context-based learning?)
I will loosely define context-based learning as learning a concept by first focusing on its contextual elements. In other words, understanding the big picture before delving into the deep theory. It’s important to emphasize “first” in that definition, as learning the context is analogous to building the chassis of a vehicle. Although the chassis is an essential element, it is not a car, and is non-functional on its own. Rather, it is the bedrock from which the car is built. In the same way, a contextual framework is the bedrock from which technical content is laid on top of.
我將寬松地將基于上下文的學(xué)習(xí)定義為通過首先關(guān)注其上下文元素來學(xué)習(xí)概念 。 換句話說,在深入研究深度理論之前先了解全局。 重要的是要在該定義中強調(diào)“第一”,因為學(xué)習(xí)上下文類似于構(gòu)建車輛底盤。 盡管底盤是必不可少的元素,但它不是汽車,并且無法單獨發(fā)揮作用。 相反,它是制造汽車的基石。 同樣,上下文框架是基礎(chǔ),技術(shù)內(nèi)容是基礎(chǔ)。
(可選)研究支持基于上下文的學(xué)習(xí) ((Optional) Research Supporting Context-Based Learning)
This style of learning leverages a fact well supported by research in the psychology of learning — humans retain knowledge most effectively by associating them to something they have a firm grasp on rather than memorizing new concepts in a void. In short, we learn by association.
這種學(xué)習(xí)方式充分利用了學(xué)習(xí)心理學(xué)方面的研究支持的事實-人類通過將知識與他們牢牢掌握的東西聯(lián)系起來而不是在空虛中記住新概念,從而最有效地保留了知識。 簡而言之, 我們通過聯(lián)想學(xué)習(xí)。
The late educational psychology professor Dr. Barak Rosenshine at the University of Illinois emphasized the importance of these contextual frameworks in education in Principles of Instruction:
伊利諾伊大學(xué)的已故教育心理學(xué)教授Barak Rosenshine博士在《教學(xué)原理》中強調(diào)了這些情境框架在教育中的重要性:
“When one’s knowledge on a particular topic is large and well-connected, it is easier to learn new information and prior knowledge is more readily available for use.”
“當(dāng)一個人對某個特定主題的知識廣博且聯(lián)系緊密時,它就更容易學(xué)習(xí)新信息,并且現(xiàn)有知識也更易于使用。”
The amount of background knowledge you have is also correlated to how well you comprehend new material. Therefore, to learn most efficiently, one must develop a strong foundation of background knowledge prior to delving into the details.
您所擁有的背景知識的數(shù)量也與您對新材料的理解程度有關(guān) 。 因此,為了最有效地學(xué)習(xí),在深入研究細(xì)節(jié)之前,必須先建立扎實的背景知識基礎(chǔ)。
數(shù)據(jù)科學(xué)的背景 (The Context of Data Science)
So what is the background knowledge, or context, of data science? Well, I always begin context-based learning by asking a lot of questions. Specifically, I try to ask broad, conceptual questions, as opposed to detail-oriented ones.
那么,數(shù)據(jù)科學(xué)的背景知識或背景是什么? 好吧,我總是通過問很多問題來開始基于上下文的學(xué)習(xí)。 具體來說,我嘗試提出廣泛的概念性問題 ,而不是注重細(xì)節(jié)的問題。
The following is a handful of questions I first asked myself at the beginning of my data science journey, as well as the answers I provided. I want to emphasize that my answers fulfilled my context gaps of knowledge at the time. In the same way, you should answer these and other questions in a manner that relates to your educational and personal background directly.
以下是我在數(shù)據(jù)科學(xué)之旅開始時首先問自己的幾個問題,以及我提供的答案。 我想強調(diào)的是,我的答案彌補了我當(dāng)時在知識方面的空白。 同樣,您應(yīng)該以與您的教育和個人背景直接相關(guān)的方式回答這些問題和其他問題。
數(shù)據(jù)科學(xué)如何適應(yīng)我對其他領(lǐng)域的理解? (How does data science fit into my understanding of other fields?)
Data science is an interdisciplinary field that leverages math, programming, business, and domain knowledge to tackle difficult data problems. The overlap between data science and my major (cognitive science with machine learning & neural computation) rests on math (which is necessary for machine learning), programming (which provides computational functionality for the field as a whole), as well as data analysis techniques, such as those used in computational neuroscience. The “science” in data science comes from its use of various scientific methodologies, such as statistical significance.
數(shù)據(jù)科學(xué)是一個跨學(xué)科領(lǐng)域,它利用數(shù)學(xué),編程,業(yè)務(wù)和領(lǐng)域知識來解決棘手的數(shù)據(jù)問題。 數(shù)據(jù)科學(xué)與我的專業(yè)(具有機器學(xué)習(xí)和神經(jīng)計算的認(rèn)知科學(xué))之間的重疊在于數(shù)學(xué)(機器學(xué)習(xí)必需的),編程(為整個領(lǐng)域提供計算功能)以及數(shù)據(jù)分析技術(shù),例如計算神經(jīng)科學(xué)中使用的那些。 數(shù)據(jù)科學(xué)中的“科學(xué)”來自對各種科學(xué)方法的使用,例如統(tǒng)計意義。
數(shù)據(jù)科學(xué)中最重要的元素是什么,它們?nèi)绾蜗嗷ヂ?lián)系? (What are the most important elements of data science, and how do they relate to one another?)
All data scientists go through a process known as the “data science pipeline”, essentially a step-by-step, end-to-end process outlining the workflow of a data scientist. Acronyms like OSEMN make the basic pipeline easy to remember, but generally, pipelines vary in their subtleties. The basic structure is as follows:
所有數(shù)據(jù)科學(xué)家都要經(jīng)歷一個稱為“數(shù)據(jù)科學(xué)管道”的過程,該過程本質(zhì)上是一個循序漸進(jìn)的,端到端的過程,概述了數(shù)據(jù)科學(xué)家的工作流程。 OSEMN等首字母縮寫詞使基本管道易于記憶,但是通常,管道的細(xì)微之處有所不同。 基本結(jié)構(gòu)如下:
- Data Collection 數(shù)據(jù)采集
- Data Cleaning 數(shù)據(jù)清理
- Exploratory Data Analysis 探索性數(shù)據(jù)分析
- Model Building 建筑模型
- Visualization/ Model Deployment 可視化/模型部署
什么是機器學(xué)習(xí)? 為何機器學(xué)習(xí)與數(shù)據(jù)科學(xué)如此緊密地聯(lián)系在一起? (What is machine learning? And why is machine learning so tied to data science specifically?)
Machine learning (ML) is a field that studies computer science algorithms that are not traditional “closed” algorithms. Instead, ML algorithms “l(fā)earn” from data. This reliance on data is what makes ML so integral to data science. ML is in the “model building” and “model deployment” category of the data science pipeline.
機器學(xué)習(xí)(ML)是研究不是傳統(tǒng)的“封閉式”算法的計算機科學(xué)算法的領(lǐng)域。 相反,機器學(xué)習(xí)算法從數(shù)據(jù)中“學(xué)習(xí)”。 這種對數(shù)據(jù)的依賴使ML成為數(shù)據(jù)科學(xué)不可或缺的一部分。 ML屬于數(shù)據(jù)科學(xué)管道的“模型構(gòu)建”和“模型部署”類別。
數(shù)據(jù)科學(xué)的子學(xué)科是什么? (What are the sub-disciplines of data science?)
There are many fields that contribute to data science, but the most fundamental disciplines that make up data science are computer science, statistics, machine learning, and linear algebra. Although business and domain knowledge are also critical, the academic scope of data science relies on the original sub-disciplines mentioned. Furthermore, the sub-disciplines themselves often have their own sub-disciplines, such as calculus being necessary to understand how machine learning algorithms work.
數(shù)據(jù)科學(xué)有很多領(lǐng)域,但構(gòu)成數(shù)據(jù)科學(xué)的最基本學(xué)科是計算機科學(xué),統(tǒng)計學(xué),機器學(xué)習(xí)和線性代數(shù)。 盡管業(yè)務(wù)和領(lǐng)域知識也很關(guān)鍵,但是數(shù)據(jù)科學(xué)的學(xué)術(shù)范圍取決于所提到的原始子學(xué)科。 此外,子學(xué)科本身通常也具有自己的子學(xué)科,例如微積分對于理解機器學(xué)習(xí)算法的工作方式是必不可少的。
了解概念,而不是計算 (Understand the Concept, not the Calculation)
One important dichotomy I discovered early on during my undergrad math studies was the distinction between calculations and conceptual understanding. For example, in the case of statistics, memorizing how to calculate this
我在本科數(shù)學(xué)學(xué)習(xí)初期發(fā)現(xiàn)的一個重要二分法是計算與概念理解之間的區(qū)別。 例如,對于統(tǒng)計數(shù)據(jù),請記住如何計算
is far less important than understanding the use case of a chi-square test statistic in testing hypotheses between categorical variables. Or, for calculus, understanding that this
在理解分類變量之間的假設(shè)時,遠(yuǎn)不如了解卡方檢驗統(tǒng)計量的用例重要。 或者,對于微積分,請理解
describes an area underneath a quadratic curve is far more important than memorizing fancy methods to solve it by hand. (*ahem*)
描述二次曲線下方的區(qū)域遠(yuǎn)比記憶花哨的方法來手工解決它重要得多。 (*啊*)
I actually find building programs to be an incredibly accurate analogy of this. When learning to program, it is evidently clear early on that trying to learn every implementation of every function is impossible. A much more efficient strategy is to understand the inputs and outputs so that you may piece together snippets of code to make things work.
我實際上發(fā)現(xiàn)構(gòu)建程序可以非常精確地類比。 在學(xué)習(xí)編程時,很顯然很早就開始嘗試學(xué)習(xí)每個功能的每個實現(xiàn)都是不可能的。 一種更有效的策略是理解輸入和輸出,以便您可以拼湊代碼片段以使事情正常進(jìn)行。
Image by the author圖片由作者提供Even in the cases you don’t google or use StackOverflow, courses like fastai abstract the vast majority of implementation away so that you may build an end-to-end framework of understanding first (in fastai’s case, build an end-to-end model), and only after do you go back to try to understand the fundamental details that underlie the abstractions.
即使在您不使用Google或不使用StackOverflow的情況下,諸如fastai之類的課程也將絕大多數(shù)實現(xiàn)抽象化了,以便您可以構(gòu)建首先了解的端到端框架(在fastai的情況下,構(gòu)建端到端模型),并做之后,才回去試著去了解背后的抽象的基本細(xì)節(jié)。
In this way, learning the concepts as opposed to the calculations is an application of context-based learning, as the contextual framework is built up so that when you do need to learn the calculations, they are compartmentalized properly.
通過這種方式,學(xué)習(xí)與計算相反的概念是基于上下文的學(xué)習(xí)的一種應(yīng)用,因為構(gòu)建了上下文框架,因此當(dāng)您確實需要學(xué)習(xí)計算時,可以將它們適當(dāng)?shù)胤指糸_。
子學(xué)科的背景 (The Context of the Sub-Disciplines)
Following the context-based learning approach, once we have figured out the sub-disciplines of data science, we should dig into their context to understand how they fit in with the overall scope of the field.
遵循基于上下文的學(xué)習(xí)方法,一旦我們弄清了數(shù)據(jù)科學(xué)的子學(xué)科,就應(yīng)該深入研究它們的上下文,以了解它們?nèi)绾芜m合該領(lǐng)域的整體范圍。
計算機科學(xué) (Computer Science)
Why are all data science projects so coding-heavy?
為什么所有數(shù)據(jù)科學(xué)項目都如此繁重的編碼?
Modern statistics dates back to the 19th century, yet the application of statistics was confined to small samples as there was no efficient means of organizing large amounts of data and calculating parameters. The computer was that means.
現(xiàn)代統(tǒng)計可以追溯到19世紀(jì),但由于沒有有效的方法來組織大量數(shù)據(jù)和計算參數(shù),因此統(tǒng)計的應(yīng)用僅限于小樣本。 電腦就是那個意思。
Furthermore, the advent of GPU parallel processing enabled machine learning models to train hundreds of times faster. In essence, incredibly powerful tools for statistics became accessible via the computer, thus the heavy emphasis on coding.
此外,GPU并行處理的出現(xiàn)使機器學(xué)習(xí)模型的訓(xùn)練速度提高了數(shù)百倍。 從本質(zhì)上講,非常強大的統(tǒng)計工具可以通過計算機訪問,因此非常重視編碼。
FURTHER Qs: What programming languages are the most important for data science? How much programming do I need for data science?
問:哪些編程語言對數(shù)據(jù)科學(xué)最重要? 數(shù)據(jù)科學(xué)需要多少編程?
統(tǒng)計 (Statistics)
Why is statistics important for data science?
為什么統(tǒng)計對于數(shù)據(jù)科學(xué)很重要?
Given that most of data science is simply computational statistics, this field lays out the groundwork and toolset for rigorous mathematical analysis of data.
鑒于大多數(shù)數(shù)據(jù)科學(xué)僅僅是計算統(tǒng)計,因此該領(lǐng)域為嚴(yán)格的數(shù)據(jù)數(shù)學(xué)分析奠定了基礎(chǔ)和工具集。
FURTHER Qs: Just what the hell is all this talk about Bayes? What specific statistics libraries do data scientists use?
問:問題 到底是關(guān)于貝葉斯的? 數(shù)據(jù)科學(xué)家使用哪些特定的統(tǒng)計庫?
線性代數(shù) (Linear algebra)
What is linear algebra and how does it relate to data science?
什么是線性代數(shù),它與數(shù)據(jù)科學(xué)有什么關(guān)系?
Linear algebra is simply the study of linear equations. Multiple linear equations stacked together can be expressed as a matrix. Matrices, collections of numbers in rows and columns, are essentially equivalent to tabular data (data in a table). Moreover, image data is nothing but an n-dimensional vector of tuples (i.e. a list of a list of numbers). This is why a good understanding of linear algebra provides an understanding of the structure of data itself.
線性代數(shù)只是線性方程的研究。 堆疊在一起的多個線性方程式可以表示為矩陣。 矩陣,即行和列中的數(shù)字的集合,基本上等效于表格數(shù)據(jù)(表格中的數(shù)據(jù))。 此外,圖像數(shù)據(jù)不過是元組的n維向量(即,數(shù)字列表的列表)。 這就是為什么很好地理解線性代數(shù)可以理解數(shù)據(jù)本身的結(jié)構(gòu)的原因。
FURTHER Qs: What is a tensor? How is linear algebra used in deep learning?
問:什么是張量? 線性代數(shù)如何在深度學(xué)習(xí)中使用?
機器學(xué)習(xí)與微積分 (Machine Learning & Calculus)
What is the link between calculus and machine learning?
微積分與機器學(xué)習(xí)之間的聯(lián)系是什么?
A critical component of calculus is the study of optimization. Since the objective of all machine learning algorithms is to minimize an error function, calculus provides the tools to understand how that minimization occurs.
微積分的重要組成部分是優(yōu)化研究。 由于所有機器學(xué)習(xí)算法的目標(biāo)都是最小化誤差函數(shù),因此演算提供了了解最小化如何發(fā)生的工具。
FURTHER Qs: What is gradient descent? What is back-propagation? Why is calculus involved in it?
問 : 什么是梯度下降? 什么是反向傳播? 為什么微積分參與其中?
下一步 (Next Steps)
Ask yourself conceptual questions. Lots of conceptual questions. These questions will vary for everyone as their aim should be to patch the gaps of knowledge for how data science fits into your overall understanding of the field.
問自己概念上的問題。 很多概念性問題。 這些問題對于每個人都會有所不同,因為他們的目標(biāo)應(yīng)該是彌補知識差距,以了解數(shù)據(jù)科學(xué)如何適合您對該領(lǐng)域的整體理解。
Get creative. A colleague of mine mentioned that visualization maps really helped her understand the context of AI, machine learning, and deep learning and how they all fit together. Similarly, use maps and flowcharts to understand any topics in data science you’re currently struggling to piece together.
發(fā)揮創(chuàng)意。 我的一位同事提到,可視化地圖確實幫助她了解了AI,機器學(xué)習(xí)和深度學(xué)習(xí)的上下文以及它們?nèi)绾稳诤显谝黄稹?同樣,使用地圖和流程圖了解您目前正在拼湊的數(shù)據(jù)科學(xué)中的任何主題。
Image by the author圖片由作者提供After you’re armed with a strong contextual understanding of data science, go ahead and dig deep into the nuances of various supervised algorithms, the best practices for data preprocessing, or the creation of beautiful dashboard visualizations with Tableau.
在對數(shù)據(jù)科學(xué)有很強的上下文理解能力之后,繼續(xù)深入研究各種監(jiān)督算法的細(xì)微差別,數(shù)據(jù)預(yù)處理的最佳實踐或使用Tableau創(chuàng)建漂亮的儀表板可視化效果。
Just try to make sure every new concept is put into context along the way.
只是嘗試確保在此過程中將每個新概念都放在上下文中。
翻譯自: https://towardsdatascience.com/the-best-way-to-start-learning-data-science-is-to-understand-its-context-751e917e655e
深度學(xué)習(xí)數(shù)據(jù)更換背景
總結(jié)
以上是生活随笔為你收集整理的深度学习数据更换背景_开始学习数据科学的最佳方法是了解其背景的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到布达拉宫怎么回事
- 下一篇: 梦到蛆虫爬到身上怎么弄都去不干净