tidb数据库_异构数据库复制到TiDB
tidb數(shù)據(jù)庫(kù)
This article is based on a talk given by Tianshuang Qin at TiDB DevCon 2020.
本文基于Tianshuang Qin在 TiDB DevCon 2020 上的演講 。
When we convert from a standalone system to a distributed one, one of the challenges is migrating the database. We’re faced with questions such as:
當(dāng)我們從獨(dú)立系統(tǒng)轉(zhuǎn)換為分布式系統(tǒng)時(shí),挑戰(zhàn)之一就是遷移數(shù)據(jù)庫(kù)。 我們面臨的問(wèn)題包括:
- Should I migrate full or incremental data? 我應(yīng)該遷移完整數(shù)據(jù)還是增量數(shù)據(jù)?
- Should I perform an online or offline migration? 我應(yīng)該執(zhí)行在線還是離線遷移?
- Should I use a ready-made data replication tool or develop a new one? 我應(yīng)該使用現(xiàn)成的數(shù)據(jù)復(fù)制工具還是開(kāi)發(fā)新工具?
When it comes to data migration, users are often faced with many options. At PingCAP, we’ve probably tried most of them. Over the years, we’ve migrated many heterogeneous databases between different database platforms and application scenarios. Today, I’ll save you some time by sharing with you the approaches that worked best.
當(dāng)涉及數(shù)據(jù)遷移時(shí),用戶通常面臨許多選擇。 在PingCAP,我們可能已經(jīng)嘗試了大多數(shù)。 多年來(lái),我們已經(jīng)在不同的數(shù)據(jù)庫(kù)平臺(tái)和應(yīng)用程序場(chǎng)景之間遷移了許多異構(gòu)數(shù)據(jù)庫(kù)。 今天,我將與您分享最有效的方法,為您節(jié)省一些時(shí)間。
典型的數(shù)據(jù)庫(kù)遷移過(guò)程 (A typical database migration process)
1.應(yīng)用適應(yīng)開(kāi)發(fā) (1. Application adaptation development)
Almost all early TiDB users have gone through this step. In version 3.0 or earlier, TiDB supports optimistic concurrency control and Repeatable Read (RR) isolation level, and its transaction size is limited to about 100 MB. Given these features and capacity, users need to put a lot of effort into adapting their applications. In contrast, TiDB 4.0 supports pessimistic concurrency control, Read Committed (RC) isolation, and large transactions with a maximum size of 10 GB. Users can adapt their applications to TiDB at a much lower cost.
幾乎所有早期的TiDB用戶都經(jīng)歷了這一步驟。 在版本3.0或更早版本中,TiDB支持開(kāi)放式并發(fā)控制和可重復(fù)讀取(RR)隔離級(jí)別,并且其事務(wù)大小限制為大約100 MB。 鑒于這些功能和功能,用戶需要花很多精力來(lái)適應(yīng)他們的應(yīng)用程序。 相比之下,TiDB 4.0支持悲觀并發(fā)控制,讀取提交(RC)隔離和最大10 GB的大型事務(wù)。 用戶可以以更低的成本使其應(yīng)用程序適應(yīng)TiDB。
2.應(yīng)用驗(yàn)證測(cè)試 (2. Application verification testing)
There are two ways to perform application verification testing. You can combine the two methods to effectively verify your application.
有兩種執(zhí)行應(yīng)用程序驗(yàn)證測(cè)試的方法。 您可以結(jié)合使用這兩種方法來(lái)有效地驗(yàn)證您的應(yīng)用程序。
Application verification應(yīng)用驗(yàn)證The first method is to test your application with production data. To do this, you must first use database replication technology to replicate the data from the production database to TiDB. Then, you use a testing application to perform a stress test. To stress TiDB and ensure that it will be stable in your production environment, apply a workload 10 to 20 times higher than your real production traffic. One of the advantages of replicating data from the production database to TiDB is that you avoid wide variations between test data and production data, which may cause many problems. For example, an SQL query, which has been tuned in the testing environment to achieve its highest performance, may become a slow SQL query in the production environment if the data is not replicated to TiDB for testing.
第一種方法是使用生產(chǎn)數(shù)據(jù)測(cè)試您的應(yīng)用程序。 為此,您必須首先使用數(shù)據(jù)庫(kù)復(fù)制技術(shù)將數(shù)據(jù)從生產(chǎn)數(shù)據(jù)庫(kù)復(fù)制到TiDB。 然后,使用測(cè)試應(yīng)用程序執(zhí)行壓力測(cè)試。 為了給TiDB施加壓力并確保它在生產(chǎn)環(huán)境中穩(wěn)定,請(qǐng)施加比實(shí)際生產(chǎn)流量高10到20倍的工作負(fù)載。 將數(shù)據(jù)從生產(chǎn)數(shù)據(jù)庫(kù)復(fù)制到TiDB的優(yōu)點(diǎn)之一是,避免了測(cè)試數(shù)據(jù)和生產(chǎn)數(shù)據(jù)之間的巨大差異,這可能會(huì)引起許多問(wèn)題。 例如,如果未將數(shù)據(jù)復(fù)制到TiDB進(jìn)行測(cè)試,則已在測(cè)試環(huán)境中進(jìn)行了調(diào)整以實(shí)現(xiàn)其最高性能SQL查詢(xún)?cè)谏a(chǎn)環(huán)境中可能會(huì)成為緩慢SQL查詢(xún)。
The second way to verify your application is to test it with production traffic. In this case, you must adopt a service bus similar to the enterprise service bus (ESB) for banks or message queuing technology. For example, you can use the Kafka message queuing mechanism to implement the multi-path replication of production traffic. Whether an application can successfully run in the production environment depends on the main path of the existing production database. There is also a bypass for the application. We can load an application that has been adapted to TiDB on the bypass and connect the application to TiDB for application verification.
驗(yàn)證應(yīng)用程序的第二種方法是使用生產(chǎn)流量對(duì)其進(jìn)行測(cè)試。 在這種情況下,您必須為銀行或消息排隊(duì)技術(shù)采用類(lèi)似于企業(yè)服務(wù)總線(ESB)的服務(wù)總線。 例如,您可以使用Kafka消息排隊(duì)機(jī)制來(lái)實(shí)現(xiàn)生產(chǎn)流量的多路徑復(fù)制。 應(yīng)用程序能否在生產(chǎn)環(huán)境中成功運(yùn)行取決于現(xiàn)有生產(chǎn)數(shù)據(jù)庫(kù)的主要路徑。 該應(yīng)用程序還有一個(gè)旁路。 我們可以在旁路上加載適合TiDB的應(yīng)用程序,并將該應(yīng)用程序連接到TiDB以進(jìn)行應(yīng)用程序驗(yàn)證。
3.遷移測(cè)試 (3. Migration testing)
Migration testing mainly involves verifying operations completed during the maintenance window. For example, you must follow the migration activity specified in the migration manual in advance to verify that the manual is correct and to determine whether the migration maintenance window is long enough to perform the migration. You also need to perform rollback testing, because if your deployment to production fails, you may need to roll back to the previous database.
遷移測(cè)試主要涉及驗(yàn)證在維護(hù)時(shí)段內(nèi)完成的操作。 例如,您必須事先遵循遷移手冊(cè)中指定的遷移活動(dòng),以驗(yàn)證該手冊(cè)是否正確,并確定遷移維護(hù)窗口是否足夠長(zhǎng)以執(zhí)行遷移。 您還需要執(zhí)行回滾測(cè)試,因?yàn)槿绻缴a(chǎn)的部署失敗,則可能需要回滾到先前的數(shù)據(jù)庫(kù)。
4.數(shù)據(jù)遷移和生產(chǎn)數(shù)據(jù)庫(kù)切換 (4. Data migration and production database switch)
Your applications may run 24/7 or you may only have a short maintenance window to switch over databases, so you must migrate your data before the maintenance window ends. To do that, you must use heterogeneous database replication technology. During the maintenance window, you can stop all running applications, replicate incremental data to the secondary database, perform a comparison to ensure that the secondary database is synchronized with the primary database, and then verify applications. Once the application verification testing is completed, database switchover starts. If the switchover is successful, TiDB will run as a primary database in the production environment.
您的應(yīng)用程序可能運(yùn)行24/7,或者您只有一個(gè)短暫的維護(hù)窗口來(lái)切換數(shù)據(jù)庫(kù),因此您必須在維護(hù)窗口結(jié)束之前遷移數(shù)據(jù)。 為此,您必須使用異構(gòu)數(shù)據(jù)庫(kù)復(fù)制技術(shù)。 在維護(hù)窗口期間,您可以停止所有正在運(yùn)行的應(yīng)用程序,將增量數(shù)據(jù)復(fù)制到輔助數(shù)據(jù)庫(kù),進(jìn)行比較以確保輔助數(shù)據(jù)庫(kù)與主數(shù)據(jù)庫(kù)同步,然后驗(yàn)證應(yīng)用程序。 一旦完成應(yīng)用程序驗(yàn)證測(cè)試,就將開(kāi)始數(shù)據(jù)庫(kù)切換。 如果切換成功,則TiDB將在生產(chǎn)環(huán)境中作為主數(shù)據(jù)庫(kù)運(yùn)行。
數(shù)據(jù)庫(kù)復(fù)制的應(yīng)用場(chǎng)景 (Application scenarios of database replication)
遷移資料 (Migrating data)
We’ve talked about this application scenario in the previous section.
在上一節(jié)中,我們已經(jīng)討論了此應(yīng)用程序場(chǎng)景。
創(chuàng)建災(zāi)難恢復(fù)數(shù)據(jù)庫(kù) (Creating a disaster recovery database)
If you use Oracle as the primary database, you can use TiDB as its disaster recovery database. If you’ve just deployed a TiDB database in the production environment without sufficient verification, you can use an Oracle database as the disaster recovery database for TiDB. That way, if the TiDB database suddenly crashes, you can promptly migrate the data back to the original production database.
如果將Oracle用作主數(shù)據(jù)庫(kù),則可以將TiDB用作其災(zāi)難恢復(fù)數(shù)據(jù)庫(kù)。 如果您只是在生產(chǎn)環(huán)境中部署TiDB數(shù)據(jù)庫(kù)而沒(méi)有足夠的驗(yàn)證,則可以將Oracle數(shù)據(jù)庫(kù)用作TiDB的災(zāi)難恢復(fù)數(shù)據(jù)庫(kù)。 這樣,如果TiDB數(shù)據(jù)庫(kù)突然崩潰,則可以立即將數(shù)據(jù)遷移回原始生產(chǎn)數(shù)據(jù)庫(kù)。
創(chuàng)建只讀或存檔數(shù)據(jù)庫(kù) (Creating a read-only or archive database)
First, let’s look at the application scenario of building a read-only database. This is applicable to some bank clients. A bank’s core services run in a closed system, and it may be impossible to migrate them to an open platform or a distributed database in a short time. However, some read-only applications, such as querying account details, bills, or monthly statements on the app client, can be completed without accessing the core production database, which only processes real transactions. Therefore, we can use database replication technology to replicate such read-only application data from the production database to the TiDB database and perform the read-only operations only in the TiDB database.
首先,讓我們看一下構(gòu)建只讀數(shù)據(jù)庫(kù)的應(yīng)用場(chǎng)景。 這適用于某些銀行客戶。 銀行的核心服務(wù)在封閉的系統(tǒng)中運(yùn)行,因此可能無(wú)法在短時(shí)間內(nèi)將其遷移到開(kāi)放平臺(tái)或分布式數(shù)據(jù)庫(kù)。 但是,某些只讀應(yīng)用程序(例如查詢(xún)應(yīng)用程序客戶端上的帳戶明細(xì),賬單或月結(jié)單)可以完成,而無(wú)需訪問(wèn)僅處理真實(shí)交易的核心生產(chǎn)數(shù)據(jù)庫(kù)。 因此,我們可以使用數(shù)據(jù)庫(kù)復(fù)制技術(shù)將此類(lèi)只讀應(yīng)用程序數(shù)據(jù)從生產(chǎn)數(shù)據(jù)庫(kù)復(fù)制到TiDB數(shù)據(jù)庫(kù),并僅在TiDB數(shù)據(jù)庫(kù)中執(zhí)行只讀操作。
Another scenario is building an archive database. If you use a traditional standalone database for production and its capacity is limited, but your application data is growing quickly, the data cannot be migrated to a distributed database in a short time. A solution is to save data in the production database for a specific period (for example, 30 or 40 days), delete expired data from the production database, and store the deleted data in TiDB. That is, the deletion operation is performed only in the production database, and TiDB is used as an archive database.
另一種情況是建立檔案數(shù)據(jù)庫(kù)。 如果您使用傳統(tǒng)的獨(dú)立數(shù)據(jù)庫(kù)進(jìn)行生產(chǎn)并且其容量有限,但是您的應(yīng)用程序數(shù)據(jù)正在快速增長(zhǎng),則無(wú)法在短時(shí)間內(nèi)將數(shù)據(jù)遷移到分布式數(shù)據(jù)庫(kù)。 一種解決方案是在生產(chǎn)數(shù)據(jù)庫(kù)中保存特定時(shí)間段(例如30或40天)的數(shù)據(jù),從生產(chǎn)數(shù)據(jù)庫(kù)中刪除過(guò)期的數(shù)據(jù),然后將已刪除的數(shù)據(jù)存儲(chǔ)在TiDB中。 即,僅在生產(chǎn)數(shù)據(jù)庫(kù)中執(zhí)行刪除操作,并且TiDB用作存檔數(shù)據(jù)庫(kù)。
匯總來(lái)自多個(gè)來(lái)源的數(shù)據(jù) (Aggregating data from multiple sources)
You can use TiDB as a data hub. You might run multiple applications in Online Transactional Processing (OLTP) databases and want to use the database replication technology to aggregate data from multiple sources to one TiDB database. Then, you can perform in-depth analysis on or read-only queries in the TiDB database. The main challenge for multi-source aggregation lies in the cross-database query after data is successfully aggregated to the TiDB database. The data may come from heterogeneous databases. It is impossible to create database links among them as database links can only be created among Oracle databases. To solve this problem, you can use heterogeneous database replication and use the TiDB database in a role similar to a widely deployed operational data store (ODS) for aggregating data.
您可以將TiDB用作數(shù)據(jù)中心。 您可能在在線事務(wù)處理(OLTP)數(shù)據(jù)庫(kù)中運(yùn)行多個(gè)應(yīng)用程序,并希望使用數(shù)據(jù)庫(kù)復(fù)制技術(shù)將數(shù)據(jù)從多個(gè)來(lái)源聚合到一個(gè)TiDB數(shù)據(jù)庫(kù)。 然后,您可以對(duì)TiDB數(shù)據(jù)庫(kù)中的只讀查詢(xún)執(zhí)行深入分析。 多源聚合的主要挑戰(zhàn)在于將數(shù)據(jù)成功聚合到TiDB數(shù)據(jù)庫(kù)后的跨數(shù)據(jù)庫(kù)查詢(xún)。 數(shù)據(jù)可能來(lái)自異構(gòu)數(shù)據(jù)庫(kù)。 由于只能在Oracle數(shù)據(jù)庫(kù)之間創(chuàng)建數(shù)據(jù)庫(kù)鏈接,因此無(wú)法在它們之間創(chuàng)建數(shù)據(jù)庫(kù)鏈接。 要解決此問(wèn)題,您可以使用異構(gòu)數(shù)據(jù)庫(kù)復(fù)制并以類(lèi)似于廣泛部署的操作數(shù)據(jù)存儲(chǔ)(ODS)的角色使用TiDB數(shù)據(jù)庫(kù)來(lái)聚合數(shù)據(jù)。
復(fù)制異構(gòu)數(shù)據(jù)庫(kù) (Replicating heterogeneous databases)
This section discusses some commonly used heterogeneous database replication methods.
本節(jié)討論一些常用的異構(gòu)數(shù)據(jù)庫(kù)復(fù)制方法。
通過(guò)接口文件進(jìn)行數(shù)據(jù)傳輸 (Data transfer via interface files)
This method is widely used when transferring data between OLTP and Online Analytical Processing (OLAP) systems. As the data transfer involves two different systems, it’s difficult to connect two database networks. Databases belong to backend systems. For security reasons, it is not suitable to directly connect them.
在OLTP和聯(lián)機(jī)分析處理(OLAP)系統(tǒng)之間傳輸數(shù)據(jù)時(shí),此方法被廣泛使用。 由于數(shù)據(jù)傳輸涉及兩個(gè)不同的系統(tǒng),因此很難連接兩個(gè)數(shù)據(jù)庫(kù)網(wǎng)絡(luò)。 數(shù)據(jù)庫(kù)屬于后端系統(tǒng)。 出于安全原因,不適合直接連接它們。
A comma-separated values (CSV) file is a typical interface file. The interface file here refers to the file generated by an application, based on the predefined format and rules for adding delimiters and line breaks. After receiving a generated interface file, the receiving end parses the interface file based on the agreed format, converts the file into an INSERT statement, and inserts it into the target database.
逗號(hào)分隔值(CSV)文件是典型的接口文件。 此處的接口文件是指由應(yīng)用程序根據(jù)預(yù)定義的格式以及添加定界符和換行符的規(guī)則生成的文件。 接收到生成的接口文件后,接收端會(huì)根據(jù)約定的格式解析接口文件,然后將其轉(zhuǎn)換為INSERT語(yǔ)句,然后將其插入目標(biāo)數(shù)據(jù)庫(kù)。
The advantage of this method is that it applies to any database. As long as the upstream and downstream databases support standard SQL interfaces, you can transfer data through an interface file.
這種方法的優(yōu)點(diǎn)是它適用于任何數(shù)據(jù)庫(kù)。 只要上游和下游數(shù)據(jù)庫(kù)支持標(biāo)準(zhǔn)SQL接口,您就可以通過(guò)接口文件傳輸數(shù)據(jù)。
However, this approach has several disadvantages:
但是,這種方法有幾個(gè)缺點(diǎn):
- It requires additional development in your application code. For example, if the application was originally developed in Java, you need to add more programming logic. If you add logic to the upstream database code that generates an interface file, you also need to add logic to the downstream database code that imports the interface file. Moreover, to improve performance, you may need to control the concurrency of the file import. 它需要在應(yīng)用程序代碼中進(jìn)行其他開(kāi)發(fā)。 例如,如果應(yīng)用程序最初是用Java開(kāi)發(fā)的,則需要添加更多的編程邏輯。 如果將邏輯添加到生成接口文件的上游數(shù)據(jù)庫(kù)代碼中,則還需要將邏輯添加到導(dǎo)入接口文件的下游數(shù)據(jù)庫(kù)代碼中。 此外,為了提高性能,您可能需要控制文件導(dǎo)入的并發(fā)性。
Interface files are only useful for full refresh and append write operations. It is difficult to obtain data changes generated by UPDATE and DELETE operations through an interface file.
接口文件僅對(duì)完全刷新和追加寫(xiě)入操作有用。 通過(guò)接口文件很難獲得由UPDATE和DELETE操作生成的數(shù)據(jù)更改。
Data may not be timely. As 5G technology gradually rolls out, terminal devices require lower latency. For example, banks are gradually changing from the traditional T+1 analytics to T+0 or even near real-time analytics. When you transfer data using an interface file, it’s hard to ensure that the data is timely. This is because the interface file triggers file loading at a specific time with low frequency and efficiency.
數(shù)據(jù)可能不及時(shí)。 隨著5G技術(shù)的逐步推出,終端設(shè)備需要更低的延遲。 例如,銀行正在逐漸從傳統(tǒng)的T + 1分析變?yōu)門(mén) + 0甚至接近實(shí)時(shí)分析。 使用接口文件傳輸數(shù)據(jù)時(shí),很難確保數(shù)據(jù)及時(shí)。 這是因?yàn)榻涌谖募谔囟〞r(shí)間以較低的頻率和效率觸發(fā)文件加載。
- When data is exported, the upstream database must be scanned extensively to access the data through the SQL interface. This may affect performance. Therefore, as a common practice, the upstream application will open an SQL interface in the secondary database for exporting a read-only file to the downstream database. 導(dǎo)出數(shù)據(jù)時(shí),必須對(duì)上游數(shù)據(jù)庫(kù)進(jìn)行全面掃描以通過(guò)SQL界面訪問(wèn)數(shù)據(jù)。 這可能會(huì)影響性能。 因此,通常的做法是,上游應(yīng)用程序?qū)⒃谳o助數(shù)據(jù)庫(kù)中打開(kāi)一個(gè)SQL接口,以將只讀文件導(dǎo)出到下游數(shù)據(jù)庫(kù)。
開(kāi)發(fā)ETL作業(yè)并安排該作業(yè)以進(jìn)行數(shù)據(jù)傳輸 (Developing an ETL job and scheduling the job for data transfer)
You can develop an extract, transform, load (ETL) job and schedule the job on a regular basis to transfer data. This method is commonly applied to data transfer and processing between OLTP and OLAP systems.
您可以開(kāi)發(fā)提取,轉(zhuǎn)換,加載(ETL)作業(yè)并定期計(jì)劃該作業(yè)以傳輸數(shù)據(jù)。 此方法通常應(yīng)用于OLTP和OLAP系統(tǒng)之間的數(shù)據(jù)傳輸和處理。
If you need to run ETL jobs for a long time, you may take a long time to obtain the incremental data and write it to the target database. This requires the ability to schedule ETL jobs, which involves additional development.
如果需要長(zhǎng)時(shí)間運(yùn)行ETL作業(yè),則可能需要很長(zhǎng)時(shí)間才能獲取增量數(shù)據(jù)并將其寫(xiě)入目標(biāo)數(shù)據(jù)庫(kù)。 這要求能夠安排ETL作業(yè),這涉及其他開(kāi)發(fā)。
Using an ETL job has the following advantages:
使用ETL作業(yè)具有以下優(yōu)點(diǎn):
- Just like an interface file, an ETL job uses SQL interfaces and is applicable to any database. As long as the upstream and downstream databases support SQL standards, you can use ETL jobs. 就像接口文件一樣,ETL作業(yè)使用SQL接口,并且適用于任何數(shù)據(jù)庫(kù)。 只要上游和下游數(shù)據(jù)庫(kù)都支持SQL標(biāo)準(zhǔn),就可以使用ETL作業(yè)。
- Additionally, you can process data during the ETL job. If the upstream and downstream databases have different table schemas, or if you need to add logic to the table schema, ETL jobs are the best choice. 此外,您可以在ETL作業(yè)期間處理數(shù)據(jù)。 如果上游數(shù)據(jù)庫(kù)和下游數(shù)據(jù)庫(kù)具有不同的表架構(gòu),或者需要在表架構(gòu)中添加邏輯,則ETL作業(yè)是最佳選擇。
The disadvantages of an ETL job are similar to those of using an interface file:
ETL作業(yè)的缺點(diǎn)類(lèi)似于使用接口文件的缺點(diǎn):
- An ETL job requires additional development. You need to create a set of independent SQL jobs and build up a scheduling system. ETL作業(yè)需要額外的開(kāi)發(fā)。 您需要?jiǎng)?chuàng)建一組獨(dú)立SQL作業(yè)并建立調(diào)度系統(tǒng)。
The data changes incurred by UPDATE and DELETE operations are difficult to obtain via ETL jobs. Compared to using an interface file, the timeliness of ETL may be slightly better, but it depends on the scheduling frequency. However, the scheduling frequency is actually related to the processing time required by the job after each scheduling. For example, when data is imported each time, if a job requires 5 minutes for processing, the delay may be as long as 5 to 10 minutes.
UPDATE和DELETE操作引起的數(shù)據(jù)更改很難通過(guò)ETL作業(yè)獲得。 與使用接口文件相比,ETL的及時(shí)性可能會(huì)稍好一些,但它取決于調(diào)度頻率。 但是,調(diào)度頻率實(shí)際上與每次調(diào)度后作業(yè)所需的處理時(shí)間有關(guān)。 例如,每次導(dǎo)入數(shù)據(jù)時(shí),如果作業(yè)需要5分鐘進(jìn)行處理,則延遲可能長(zhǎng)達(dá)5到10分鐘。
- To access the data through the SQL interface, extensive scanning of the upstream database is required. This may affect performance. 要通過(guò)SQL界面訪問(wèn)數(shù)據(jù),需要對(duì)上游數(shù)據(jù)庫(kù)進(jìn)行全面掃描。 這可能會(huì)影響性能。
使用CDC工具 (Using a CDC tool)
We recommend that you use change data capture (CDC) tools to replicate heterogeneous databases. There are many CDC tools, such as Oracle GoldenGate (OGG), IBM InfoSphere CDC, and TiDB Data Migration (DM).
我們建議您使用更改數(shù)據(jù)捕獲(CDC)工具來(lái)復(fù)制異構(gòu)數(shù)據(jù)庫(kù)。 有許多CDC工具,例如Oracle GoldenGate(OGG),IBM InfoSphere CDC和TiDB Data Migration (DM)。
The following table summarizes the advantages and disadvantages of using CDC tools. As you can see, there are far more advantages.
下表總結(jié)了使用CDC工具的優(yōu)缺點(diǎn)。 如您所見(jiàn),還有更多優(yōu)勢(shì)。
AdvantagesDisadvantagesYour application requires no additional development.
優(yōu)點(diǎn)缺點(diǎn)您的應(yīng)用程序不需要其他開(kāi)發(fā)。
CDC tools can obtain all DML changes, like DELETE and UPDATE.
CDC工具可以獲取所有DML更改,例如DELETE和UPDATE 。
Because the workload is distributed through the day, these tools have higher performance.
由于工作量是全天分配的,因此這些工具具有更高的性能。
CDC tools bring low latency and near real-time replication.
CDC工具帶來(lái)低延遲和近乎實(shí)時(shí)的復(fù)制。
Upstream data is obtained by reading redo logs, which does not impact the SQL performance.
通過(guò)讀取重做日志可以獲取上游數(shù)據(jù),這不會(huì)影響SQL性能。
CDC tools are mostly commercial products, and you need to purchase them.
CDC工具大部分是商業(yè)產(chǎn)品,您需要購(gòu)買(mǎi)它們。
Most CDC tools only allow a specific database as an upstream database. If you have multiple types of upstream databases, you need to use multiple CDC tools.
大多數(shù)CDC工具僅允許將特定數(shù)據(jù)庫(kù)作為上游數(shù)據(jù)庫(kù)。 如果您有多種類(lèi)型的上游數(shù)據(jù)庫(kù),則需要使用多個(gè)CDC工具。
TiDB中異構(gòu)數(shù)據(jù)庫(kù)復(fù)制的最佳實(shí)踐 (Best practices for heterogeneous database replication in TiDB)
I’d like to offer some best practice tips for heterogeneous database replication in TiDB:
我想為T(mén)iDB中的異構(gòu)數(shù)據(jù)庫(kù)復(fù)制提供一些最佳實(shí)踐提示:
Tips based on replication tasks:
根據(jù)復(fù)制任務(wù)的提示:
If you want to replicate incremental changes incurred by operations such as UPDATE and DELETE, a CDC tool is your best choice.
如果要復(fù)制由UPDATE和DELETE等操作引起的增量更改,則CDC工具是最佳選擇。
- If you want full data replication, you can use lightweight ETL tools dedicated to data migration such as Kettle or DataX. You do not need to purchase CDC tools or build other architectures to complete the replication. Instead, you only need to ensure that the ETL tool can access the upstream and downstream databases simultaneously and perform ETL jobs to do the full data replication. 如果要進(jìn)行完整的數(shù)據(jù)復(fù)制,則可以使用專(zhuān)用于數(shù)據(jù)遷移的輕型ETL工具,例如Kettle或DataX。 您無(wú)需購(gòu)買(mǎi)CDC工具或構(gòu)建其他體系結(jié)構(gòu)即可完成復(fù)制。 相反,您只需要確保ETL工具可以同時(shí)訪問(wèn)上游和下游數(shù)據(jù)庫(kù)并執(zhí)行ETL作業(yè)即可進(jìn)行完整的數(shù)據(jù)復(fù)制。
Tips based on scenarios:
根據(jù)方案的提示:
If you are creating a disaster recovery database, creating a read-only or archive database, or aggregating data from multiple sources, we recommend that you use a CDC tool for data replication. If you use ETL jobs in these scenarios to obtain all DML changes, the development costs will be very high.
如果要?jiǎng)?chuàng)建災(zāi)難恢復(fù)數(shù)據(jù)庫(kù) , 創(chuàng)建只讀數(shù)據(jù)庫(kù)或存檔數(shù)據(jù)庫(kù) ,或者聚合來(lái)自多個(gè)源的數(shù)據(jù) ,建議您使用CDC工具進(jìn)行數(shù)據(jù)復(fù)制。 如果在這些情況下使用ETL作業(yè)來(lái)獲取所有DML更改,則開(kāi)發(fā)成本將非常高。
If your upstream database is a MySQL-like database or a MySQL-based database (such as Amazon RDS on the public cloud, including Aurora and some sharding products developed based on MySQL), you can use the TiDB DM tool for data transfer. For more information about DM, see TiDB Data Migration Overview.
如果上游數(shù)據(jù)庫(kù)是類(lèi)似MySQL的數(shù)據(jù)庫(kù)或基于MySQL的數(shù)據(jù)庫(kù)(例如公共云上的Amazon RDS,包括Aurora和一些基于MySQL開(kāi)發(fā)的分片產(chǎn)品),則可以使用TiDB DM工具進(jìn)行數(shù)據(jù)傳輸。 有關(guān)DM的更多信息,請(qǐng)參見(jiàn)TiDB數(shù)據(jù)遷移概述 。
If you have any questions on any of the topics we covered today, you can join our community on Slack, and send us your feedback.
如果您對(duì)我們今天討論的任何主題有任何疑問(wèn),可以加入Slack上的社區(qū) ,并將您的反饋發(fā)送給我們。
Originally published at www.pingcap.com on July 30, 2020
最初于 2020年7月30日 在 www.pingcap.com 上 發(fā)布
翻譯自: https://medium.com/swlh/heterogeneous-database-replication-to-tidb-c10478d11b29
tidb數(shù)據(jù)庫(kù)
總結(jié)
以上是生活随笔為你收集整理的tidb数据库_异构数据库复制到TiDB的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 在PyTorch中转换数据
- 下一篇: 梦到哪吒什么意思