Characteristics of open source distributed database
In recent years, the technology of deep learning and knowledge graph has developed rapidly. compared with the "black box" of deep learning, knowledge graph has strong interpretability and is widely used in search recommendation, intelligent assistant, financial risk control and other scenarios. Based on the accumulated massive business data and fully mining associations in combination with use scenes, Meituan gradually established nearly ten domain knowledge graphs, including food atlas, tourism atlas, and commodity atlas, and landed in multi-business scenes to help the intelligence of local life services.
In order to store and retrieve graph data efficiently, compared with the traditional open source relational database, the graph database is selected as the storage engine, which has obvious performance advantages in multi-hop query. At present, there are dozens of well-known map database products in the industry. Selecting a map database product that can meet the actual business needs of Meituan is the basis for building a map storage and map learning platform. According to the current situation of our business, we have formulated the basic conditions for type selection:
Open source projects, friendly to business applications.
Only by having control over the source code can we ensure data security and service availability.
Supports cluster mode and has the ability to scale out for storage and computing.
The business data volume of Meituan atlas can reach more than 100 billion points, and the throughput can reach tens of thousands of qps. Single-node deployment can not meet the storage needs.
Able to serve OLTP scenarios, with millisecond multi-hop query capability.
In the Meituan search scenario, in order to ensure the user search experience, the timeout time of each link is strictly limited, and the query response time above seconds is not acceptable.
Have the ability to import data in bulk.
The atlas data are generally stored in data warehouses such as Hive. There must be a means to quickly import data into graph storage so that the timeliness of the service can be guaranteed.
Characteristics of open source distributed database.
(1) the distribution of data. The data in the distributed database is distributed in each node in the network, which is different from the traditional centralized database and the centralized database system shared through the computer network.
(2) Unity. It is mainly shown in two aspects: the unity of data in logic and the unity of data management. Through the network technology, the distributed database system forms the local and decentralized database into a single database in logic, so that what is presented to the users is like a unified and centralized database. This is the logical unity of the data, so it is different from multiple independent databases interconnected by the network. Distributed database is uniformly managed and maintained by distributed database management system, which makes it different from general distributed file system.
(3) transparency. When using a distributed database, like using a centralized database, users do not need to know where the data they care about is stored and stored several times. All users need to care about is the logical structure of the entire database.