What's a commit log (Apache Kafka example)

What's a commit log (Apache Kafka example)

Throughout the software development lifecycle, logs are a crucial tool. Relational databases, git and most analytics tools would not exist without them.

A lot of coders are done with logs after this. Log-based technologies are prevalent in the tools we use. In order to figure out what went wrong, we look at the log files. In fact, it wasn't until lately that you could potentially go a whole lifetime without giving them any more thought than was necessary.

It has risen from the shadows and seized center stage, in particular, the commit logs. Microservices designs need commit logs, which are the backbone of solutions like Apache Kafka, since they help to handle ever-increasing amounts of data while also ensuring consistency across distributed systems.

You can use Kafka and other technologies to build more long-lasting systems if you know how commit logs work.

What's a log

The good news is that commit logs are as simple as they come. They consist of a series of records, each having its own unique identifier. Do you want to add a record? It's no issue; it'll be appended at the end of the log. Do you want to make a modification to an existing record? That is not possible; once written, records are unchangeable.

What about reading, for example? This usually occurs from left to right. While there is no query, you may specify the start and end positions of your read using offsets.

That's the end of it. We're all free to depart. Perhaps not just now. The raw "what" is frequently considerably less intriguing than the "why."

Why are logs necessary?

Commit logs have existed for a long period of time because they address a critical issue in software development. They serve as a repository of information about what occurred in a system and in what sequence.

Consider a relational database management. Before data is modified in a table or an index, each write to a database must be logged in the log. The primary advantage is that it accelerates database write operations.

Perhaps the most significant feature is that any database can be rebuilt from scratch by just following the write ahead log. Whether for disaster recovery purposes or to broadcast live updates to a read-only duplicate.

Now, what if the commit log could serve the same function for a full software architecture, rather than simply a database?

Logs are capable of handling large volumes

Assume you are developing an online store application. You want to gain a better understanding of how people navigate your site, so you track every click, every search phrase, and every page visit.

Regardless of how the data is eventually processed, it must initially be captured. The simplest option would probably be to save each occurrence in your operational database. However, this places you on the losing end of a trade-off. The operational database, which is almost certainly a relational database, has a slower write time. That is acceptable since it provides advantages like as extensive query, transactional support, and mutability. However, at the moment, all you care about is capturing the data and deciding what to do with it afterwards.

One possibility is to place Redis in front of your operating database, where it may absorb the volume and slowly release it at a rate compatible with the main database. However, this just mitigates the issue, leaving your very expensive operational database brimming with semi-structured data. The visitor data is arriving in massive quantities, and there is no need to increase the cost and complexity of your operating database in order to include "good to have" data.

However, a commit log is excellent. The data you must save is composed of distinct, sequential occurrences. Additionally, due to the simplicity of commit logs, they can readily accommodate far bigger data quantities than a normal relational database.

Indeed, this is a common use case for Apache Kafka, which bases its data architecture on a commit log.

Logs serve as a reliable source of information.

Logs are speedy, easy to use, and can handle large volumes of data. They are also a reliable source of information. As a result, they're well-suited to circumstances in which a unified system is made up of multiple separate components. This might be a fully fledged microservices architecture or a monolith that has been hived off into one or two processes.

Conclusion

You could create your own log, but Apache Kafka is likely to provide a higher return on your investment.

Kafka combines the features of a commit log — an immutable, ordered record of events — with the flexibility to integrate with standard data sources, write data to other systems like Postgres, and act on and change the data it processes. Most major languages have SDKs, and if you utilize a hosted service, you won't have to worry about managing your own Kafka cluster.


  • Date: