March 2, 2023

9:35 am - 10:05 am PST

NoSQL Database and Hybrid Cloud Usage Scenarios at Rakuten Travel

Rakuten Travel uses many NoSQL databases in our service and all run in the private cloud in Rakuten. For several reasons, we wanted to use both private and public clouds. For example, high availability was fully achieved in the multi-region architecture but required securing physical resources. Today our document databases can be used in a hybrid cloud model (private cloud and GCP). This talk will share the background, verification, and use case, and highlight the benefits of using document databases in a hybrid cloud.

Topics Covered

Real-world implementation

Sign up to watch all Subsurface 2023 sessions


Note: This transcript was created using speech recognition software. It may contain errors.

Akihiro Tatematsu:

Hello everyone. Thank you for coming to the Lockton Group session. I’m very happy to be able to talk to you today. We’d like to talk about the benefit of using document databases in hybrid cloud and we are presenting from Tokyo, Japan. Today. I direct to briefly introduce myself. I have been working as a database engineer since joining the Lockton Group in 2010. When I first joined the company, I worked as a duration database engineer. Since Nosco databases became popular, I have mainly managed them. Edward who speaks today’s main technical part, joined reluctant group in 2019. Since then, he has mainly managed Noe database. Together he has been leading the verification of the Noe databases and hybrid cloud, which we will talk about this time. The other of this session is Ashan. I will talk about the forest introduction and background part. Edward, we’ll talk about the technical part of the . First, let me briefly explain Lockton Group and Lockton Travel. Founded in 1997, the Lockton Group is now one of Japan’s largest web service companies. On the other hand, we are also expanding our businesses overseas in Saudi countries

Overseas we are developing bible athlete core message app, Cobo and ebook services and so on. We have over 70 services worldwide and are supported by 1.6 billion users. We actively investing not only in business but also in sports and entertainment. Essentially. Some people may say that they only know our company name. From the NBA’s Golden State Warriors Union Home, this is a decent list, gross global gross transaction barrier. It has continued to grow 30 since the I P O and reaching 33.8 trillion Japanese last year. This is also the decry part of our Japan domestic key performance indicators in Japan. Businesses such as mobile phone and credit card securities and banks continue to grow.

What we will talk about today is this Lockton travel system. Lockton travel business in Japan is already outperforming the pre covid 19 era. Lockton Travel is an online travel agency operated by the Lockton Group and Lockton Travel is also known as the top online travel agency in Japan. This is the lock time lock image of the data access of Lockton Travel. Since Lockton Travel is a service that contain to grow, flexible system development and expansion are required depending on the speed of data acquisition and the shape of the store data. We sometimes use a database or NOCO database that synchronize part of the data from master data. one, one of our duties as database engineers is to consider the characteristics of each system and prepare the most suitable database in the most suitable form and from the prospective of cost deduction and efficiency. Basically everything is completed within locked end private crowd.

The NOCO database we will talk about in this main in the main design is MongoDB as think MO DB is the most used NOCO database and a kind of the document store. Rakuten Travel operate 126 node for shadi cluster and 24 replica set for the document store alone. Due to the flexible feature of the system, this number may increase or decrease each month. Document store is basically easy to use, so the number node system using it is increasing. As an insight, the Lockton group is currently actively hiring engineers, so if you are interested in such an environment, please feel free to contact me on LinkedIn. in time before we get the point, let me give you a little background. Rocket travel works on various events throughout the year. When an event starts, the load on the system can swear dozen of times. Basically, we anticipate the load of the event, we plan and prepared the environment in advance to withstand load. It’s not difficult for us, but some events have had difficulty predicting STA times and low sizes, especially under the COVID 19. There was many such government-led events. Again, this background, we considered the necessity of a hybrid crowd as Edward speaks from here.

Edward Chow:

alright, it’s almost 3:00 AM here from Tokyo, but good morning everyone. Thank you for coming today and I hope you’re enjoying Subservice Life 2023. This is Edward Chow a database engineer at Rock 10 Travel. Just now TAMA has covered some reasons, background information on Japan’s tourism industry. Next, I will talk about why we are considering a transition to a hybrid cloud architecture. For the NoQ database clusters we are managing. First, let’s consider a multi-region cluster In a private cloud environment, the architecture we are currently adopting, it offers high availability via and automatic fail mechanism. In case of a failure, the system could offload production workloads to node located in other unexpected data centers without human intervention, avoiding data loss or service downtime.

In addition, as the cluster nodes are spread across multiple regions, applications could choose the node closest to them geographically, which resulted better performance. And this advantage is especially important for businesses providing services globally such as Rakuten 10 travel. However, there are also some challenges associated with the use of private cloud. In the past few years, the Japanese government has been encouraging people to travel since the COVID 19 situation has stabilized by holding various sales campaigns such as the go-to travel program and the National Travel Resistance program. As most of these coupons were distributed within a very limited timeframe, it could cause severe congestion. Under this high databases involved could become unstable or even crash causing surfaces to become UNA favorable. To mitigate this risk, the most obvious solution is to accommodate peak load by provisioning additional hardware in the private cloud. However, not only is this approach inefficient and uneconomical as additional hardware would be in IDing status when the campaigns are over, but scaling this way is also not very flexible. Say we are using bare metal server in the private cloud, preparation could take a lot of time and effort. This is where a hybrid cloud comes into play.

A hybrid cloud refers to an environment that combines public cloud such as AWS, G Z P or a Azure and private clouds where data or applications sharing between them is possible. In this context, the introduction of the hybrid cloud is done by adding notes set up on the public cloud side to our existing cluster located in our self-managed Rakuten cloud. As shown here on the left begging the question, what advantages does the hybrid cloud offer? The four major benefits brought by hybrid cloud flexibility and scalability, cost saving, maximum resiliency, and automated management. Let me go through them briefly. While preparing on-premises machines could be time consuming, provisioning new service on a public cloud only takes a few clicks within seconds. While the flavor and OS middleware site could be specified. Using intuitive GI and cluster expansion becomes seamless and swift. It could also help cut costs.

It does not commit you to paying full costs for all the machines necessary when starting up new services, meaning that init initial investment is more affordable. We could easily adjust the number of servers only employing more when there is anticipated user growth during sales campaigns, avoiding ways of resources, there are scenarios where long-term usage is feasible too. While private cloud could withstand data center level disasters, a hybrid cloud provides an extra layer of protection having at least OneNote on the public cloud active. Make sure that your mission critical applications are always up, thus achieving maximum resiliency. Finally, hybrid clouds nowadays provide a wide range of features like self-monitoring and alerting budget control and reports generation, and these automation features make server administration a walk in the park.

This is a brief introduction to hybrid cloud and next I will talk about how we have verified database clusters living in it. This time we used MongoDB very popular or the most popular no SQL product in the market. Many of you might have some experience working with it, but let’s have a reveal of the architecture of the MongoDB shut cluster. First are the MongoDB service on the right where data is stored for low balancing purposes, data is distributed across multiple shots to ensure high availability and redundancy. Each shot is deployed as a replica set, which contains a cluster of service holding the same data By increasing the number of notes inside the cluster could handle more incoming requests, thus improving performance. In this verification, we added MongoDB notes to GCP Tokyo as part of the replica sets and various performance tests were conducted. Mongo s on the left provides an interface between client applications and the charter cluster and X as a query router that routes queries to appropriate shots. Those shouting configurations are stored on a conflict surface, which are not shown here for simplicity.

Next is the verification methodology. First, we verified the speed of initial synchronization happening in a hybrid cloud. Cluster notes nearly added to an existing cluster are not immediately queryable until initial synchronization has been finished. We added a note from either the rockin data center site or the GCP site to an existing hybrid cloud cluster and the times taken by initial synchronization were measured and compared. Second, we measured the re performance of the hybrid cloud cluster. Specifically, we specified re preference text a convict that tells the MongoDB driver, which note to read from and measured the response time of re query reset to different notes. Results from a private cloud and a hybrid cloud were obtained and compared based on the FH values. The parameters used as young here as a reference aiming to simulate an actual sales campaign going on. As we were using actual production data, we did not test right performance this time, however, we should be able to predict it based on the results we obtained from initial synchronization.

Let’s see how the hybrid cloud cluster performs. In terms of initial zinc. The hybrid cloud synchronizes at 2.1 gigabyte per minute, which is 2.4 gigabyte per minute or 15% slower than the private cloud cluster. This matches our expectation as copying vows between a private cloud and a public cloud is always going to be slower than in land. Next is meet query response time. When compared to the public cloud or the private cloud, the hybrid cloud cluster takes up to 50% more time in the worst case, but when looking at the average values, they only differ by slight 2% in the same thing as the initial thing. We will expect that the hybrid cloud clusters requery response time to be much longer, especially when Mongo s the routers need to direct the query to multiple shots across the public cloud and private cloud. However, it is quite surprising to see that the hybrid cloud cluster holds up well even in mixed cloud environment.

Finally, it’s the slow query count comparison. A slow query refers to the queries taking more than 100 milliseconds in a production environment. We would like to afford slow queries at all costs as they mean that the applications will be spending longer waiting for response. By checking the slow query count, we get to know how well low balancing is from this chart. We can see that after running four sets of tests on each cluster, the hybrid cloud cluster got 30% fuel slow queries, which supports the argument that increasing no count helps improve cluster performance. In short, in the hybrid cloud cluster, initial zinc takes significantly longer. This is unavoidable due to higher net latency, but it is only required once during node edition and should not be a major concern. While the weak query performance is comparable to the private cloud cluster, the slow query count has decreased thanks to low balancing.

Here is a table comparing the time required for scaling our cluster in a private cloud and a public cloud. In the case of a pub private cloud, we must undergo the selection purchase, pro provision and configuration of bare service before they become ready and the entire process could take up to four months. In contrast for public cloud, we only need to set up a dedicated interconnect from our private cloud to GCP and conf configure firewall and port forwarding before proceeding to server provision, and this could be finished in a shortest two weeks. Creating service on a public cloud is fast and enhance our hybrid cloud offers great flexibility after explaining our verification results. I hope you now understand the advantages our hybrid cloud offers even so there are things that we need to pay attention to when using a hybrid cloud. Let’s look at some of the best practices as well as precautions.

Firstly, always specify which node to connect to in the driver contact if possible. For MongoDB, we have we preference text to prioritize certain node and other database products usually have the equivalent for applications deployed worldwide. Such settings could have a significant impact on database performance. Secondly, remember to confirm security configurations necessary with relevant parties. For example, the network department. We as DBAs often focus on database management and it is easy to overlook something related to network security. Always discuss with the best topology and security measures with your network experts before employing a hybrid cloud architecture. Firstly, ensure that permission settings are in place on the public cloud. This is important as other departments in your company might also be using the same public cloud for their services. Make sure that only those authorized may assess a database where confidential data is stored. Fourthly, it may be a little bit too risky to go all in at the beginning with no prior experience and it might be a better idea to start small and handle only noncritical data on the public cloud.

As for the precautions, the first thing to pay attention to is latency. It is the database performance. Before using a hybrid cloud in a production environment, it is as sizeable to measure latency time and identify which regions have an acceptable latency. Some products like MongoDB do not care much about latency between different node, but others like Elasticsearch assume a low latency involvement within single cluster. Read the documentation carefully to see if your use case is recommended. Next is the encryption of data and traffic. Don’t forget that a public cloud is public and everyone could assess it with the correct credentials. To protect the data inside, we must utilize encryption at rest to prevent unauthorized assess. The use of S S L T O S and a firewall is also compulsory.

Cost monitoring is another thing that often gets overlooked. Some hybrid clouds offer the feature of dynamic scaling, which allows service to scale automatically based on the current workload. If such a feature is enabled, then it is possible that the cost associated with grow beyond expectation, it is a must to check cost information regularly. The last one is compatibility. Middleware or automation tools used internally might not be compatible with the public cloud. Due to always version of hardware differences and further testing with the tools is desirable. To summarize, a hybrid cloud allows us to make the initial investment as affordable as possible while maximizing system availability. It also makes the cluster more flexible in terms of scaling, however, it might not be as as suitable if the existing system is already complicated or confidential data is being handled. Long-term usage of a hybrid cloud might not be as economical too.

Of course, hybrid cloud verification does not stop here as many things are still unknown to us. For instance, a hybrid cloud involving regions other than Southeast Asia, other NoSQL products or even other public clouds, different architectures might be required in different cases accordingly. As part of our next steps, we also plan to migrate part of our clusters in service to hybrid cloud. With all these upcoming projects, we are seeking the best talents like you to join our team. If you’re interested, do make sure to reach out to us and learn more about the opportunities available here. I hope you have learned more about hybrid cloud usage of this speech.