Ali Mama announced the deep learning framework Euler
Ali sister guide: a long-awaited start! Ali Mama officially announced the heavy open source project – the deep learning framework Euler. This is the first deep learning framework in China that is open source after large-scale application of core business. This open source, Euler built a large number of algorithms for users to use directly, the relevant code can be downloaded on GitHub.
Graph learning and deep learning are all branches of artificial intelligence. As a big data marketing platform of Alibaba, Ali Mama innovatively combines graph learning with deep learning and launched Euler, which can greatly improve marketing efficiency. Euler has been honed and validated in Ali’s core business scenario, and has high application value in scenarios involving complex network analysis such as finance, telecommunications, and medical. For example, users can use Euler to learn and reason about complex heterogeneous graphs built on financial data such as user transactions, and then apply them to scenarios such as financial anti-fraud.
Let’s walk into the world of Euler.
Euler open source address
In the past few years, with the rapid growth of data scale and hardware computing power, deep learning technology has been widely used in the industry and has generated a huge technical dividend. The current application is relatively mature, and the next technology dividend is still actively explored. Graph neural network combines end-to-end learning with inductive reasoning, and it is expected to solve a series of problems such as relational reasoning and interpretability that deep learning cannot handle. The expression, calculation and combination generalization of structural knowledge are the key to realize human-like AI. The neural network of the graph hopes to make breakthroughs in these aspects, which further enhances the machine capability. Therefore, the in-depth application of the graph neural network is promising. A wave of technology bonuses.
As a universal data structure with strong expressive ability, the graph can be used to describe many problems in the real world, such as user networks in social scenes, users and commodity networks in e-commerce scenarios, communication networks in telecommunication scenarios, and trading networks in financial scenarios. And drug molecular networks for medical scenarios and more. Compared with the text, voice and image data, it is easier to process into the Grid-like type of European space, which is suitable for the existing deep learning model processing. The graph is a non-European space data, and can not directly apply existing methods. A specially designed graph neural network system is required.
1. 1Euler’s core competencies
1) Distributed learning of large-scale graphs
Industry maps often have billions of nodes and tens of billions of sides, and some scenarios can even reach tens of billions of nodes and hundreds of billions of sides. Single-machine training on maps of this size is not feasible. Euler supports graph segmentation and efficient and stable distributed training, which can easily support the computing scale of billions of points and tens of billions of sides.
2) Support the characterization of complex heterogeneous graphs
The graph relationships in the industry are mostly intricate, which are reflected in the heterogeneous nodes and the heterogeneous edges. On the other hand, there may be very rich attributes on the nodes and edges, which makes it difficult for some common graph neural networks to learn effective expression. Euler supports the operation of heterogeneous points and heterogeneous edge types in the abstraction of graph structure storage and graph computation, and supports rich heterogeneous properties. It can easily perform the representation learning of heterogeneous graphs in graph learning algorithms. .
3) Combination of graph learning and deep learning
There are many classic scenes in the industry, such as search/recommendation/advertising scenes. The traditional deep learning method has a good effect. How to combine graph learning with traditional methods to further enhance the model ability is worth exploring. Euler supports mini-batch training based on deep learning samples, which directly maps representations into joint training in deep learning networks.
4) Layered abstraction and flexible extension
The Euler system is abstracted into three levels: the graph engine layer, the graph operation operator layer, and the algorithm implementation layer. It can quickly expand a graph learning algorithm at a high level. In fact, Euler also has a lot of built-in algorithms for everyone to use directly.
1. 2 Euler built-in algorithm implementation
Considering the ease of use of the framework, we have built in a number of well-known algorithms and several innovative algorithms within us. All implementations, we have carefully tested to ensure the efficiency of the algorithm, and the algorithm effect is aligned with the original paper.Users do not need to develop, after injecting data into the platform, they can use it directly. The list of our built-in algorithms is shown in the table below. See section 2.3 for more information on our internal algorithms.
2. System design
The Euler system can be divided into three layers: the lowest-level distributed graph engine, the middle-layer graph semantic operator, and the high-level graph representation learning algorithm.
Below we describe the core functions of each level separately.
Figure1 Euler Architecture Overview
2. 1 distributed graph engine
In order to support our business, we not only face the challenges of hyperscale graph storage and computing, but also the complexity of constructing heterogeneous graphs from many different types of points, edges and their attributes. Our distributed graph engine is optimized for massive graph storage, distributed parallel graph computing and heterogeneous graphs to ensure efficient application in industrial scenarios.
- First, in order to store hyperscale maps (billions of tens of billions, tens of billions of sides), Euler must break through the limitations of stand-alone, thus adopting a distributed storage architecture. When the graph is loaded, the entire graph is divided into multiple subgraphs inside the engine, and each compute node is assigned one or several subgraphs for loading.
- In order to fully utilize the capabilities of the various compute nodes, the operations of the top-level operations that are decomposed into multiple pairs of sub-graphs are performed in parallel by the various nodes while performing the operations of the graph. This way, with the addition of more nodes, we can get better service capabilities. Second, we introduced support for multiple replicas. This allows users to flexibly balance the number of shards and replicas to achieve better service capabilities. Finally, we optimized the underlying graph storage data structure and operation algorithm for the graph representation learning, and the performance of the graph operation of the single machine has been improved several times.
- A heterogeneous map of many different types of edges, points and attributes is essential for many complex business scenarios. In order to support the heterogeneous graph computing power, the underlying storage is organized according to different node and edge types. This way we can efficiently support heterogeneous graph operations.
2. 2 middle graph operation operator
Due to the diversity of graph learning algorithms and the complexity of the business, several fixed or even dozens of algorithm implementations cannot meet all the needs of customers. So in the Euler design, we focus on the flexible and powerful graph operation operators around the core capabilities of the underlying system, and all operators support the heterogeneous graph operation semantics. Users can use it to quickly build their own algorithm variants to meet unique business needs.
First, the Euler Distributed Graphing Engine provides a C++ API to provide all the graph operations. Based on this API, we can easily add graph operators based on a deep learning framework to leverage the Euler C++ interface to access the underlying graph engine. We support widely used deep learning frameworks such as Alibaba’s X-DeepLearning and the popular TensorFlow. We will also consider supporting other deep learning frameworks, such as PyTorch.
Using a flexible graph operator, the machine learning framework can interact with Euler in each mini-batch to dynamically expand and organize training samples. In this way, Euler not only supports the traditional graph-centric learning model, but also can inject the ability of graph learning into traditional learning tasks to achieve end-to-end training.
According to the functional classification, the APIs provided by our core system can be classified as follows:
- The ability to globally sample points and edges. Mainly used for random generation of mini-batch samples and Negative Sampling.
- Neighbor operations based on a given node. This is the core capability of graph calculation, including neighbor weighted sampling, neighbors with Top weights, and so on.
- Point/edge attribute lookup. This ability allows the algorithm to use richer features than just point/edge ID features.
2. 3 high-level algorithm implementation
As described in Section 1.2, in addition to the LINE algorithm, the algorithms we implement can be divided into two types of algorithms: random walk and neighbor aggregation. For more information on external algorithms, see the links to the papers in section 1.2. Below we detail the three internal innovation algorithms, and the links to related papers will be given on github.
It is an efficient GCN training algorithm. GCN and the more general Graph Neural Network (GNN) method have achieved more results than previous methods in many tasks because they can effectively extract graph structure information. However, the GCN model introduces a huge amount of computation, resulting in unacceptable training time for the model. Scalable-GCN pushes the computational complexity of the mini-batch GCN from the exponential function of the layer number to linearity while ensuring excellent results. This makes it possible to apply three-layer GCN under the massive data of Ali Mama, and the effect of ad matching has been significantly improved.
LsHNE is an innovative unembroided large-scale heterogeneous network embedding learning method combined with Ali Mom’s search advertising scene. Different from the DeepWalk class algorithm, the characteristics of LsHNE include: a) using deep neural network to learn expression, which can effectively fuse Attribute information; b) considering the distance sensitive requirement of embedding representation, and propose two negative sampling principles: distribution consistency principle and weak correlation Principle of sexuality; c) support for heterogeneous networks.
LasGNN is a semi-supervised large-scale heterogeneous graph convolutional neural network learning method. It effectively combines graph structure knowledge information and massive user behavior information, greatly improving the accuracy of the model. It is the first application of semi-supervised graph in industrial advertising scenes. method. There are several innovations in this method, such as applying the idea of metapath to the graph convolutional network, and proposes the metapathGCN model, which effectively solves the convolution problem of heterogeneous networks; proposes the metapathSAGE model, in which we design efficient neighbors The sampling method makes large-scale multi-layer neighbor convolution possible.
3. Application examples
The Euler platform has been widely used in a variety of scenarios for Alibaba’s search ads, and has achieved excellent business results, such as retrieving matching scenarios, CTR prediction scenarios, marketing tool scenarios, and anti-cheat scenarios. Let’s take a look at the scene as an example of Euler’s application.
The task of advertisement matching is to give a user search request. The matching module can quickly and accurately find a high-quality small-scale candidate advertisement set from the mass advertisement by understanding the user’s intention, and deliver it to the downstream sorting module for sorting.
We first use some traditional mining algorithms to mine the relationship between Query (question word), Item (goods) and Ad (advertising) from the user behavior log, content attributes and other dimensions, and then use the LsHNE method of Euler platform to learn the graph. The embedding of the node, where the space distance after the embedding of the node characterizes the relationship in the original graph, and efficiently vectorizes the request for the online request by calculating the distance between the user query word vector, the node vector in the pre-behavior and the advertising node vector. Nearest neighbor search can quickly match ads that match the user’s intent. Figure 2 shows the offline and online processes of the LsHNE method. Figure 3 shows the sample structure and network structure.
Figure 2 DeepMatch recall framework
Figure 3 Offline training process