How can companies benefit from data-driven applications and open-source software?

Apache Flink

17 December 2018 by Jakub Piasecki

This blog post is a Q&A session with Jakub Piasecki, Director of Technology at Freeport Metrics, about the benefits of real-time, data-driven applications and open-source software.

Q: Freeport Metrics has extensive experience working with companies in Fin-Tech, Ag-Tech, Life Sciences & Bioinformatics, Healthcare, and other sectors. Can you give us some example use cases where Flink was chosen as the underlying processing framework and the type of results the customers have seen?

Jakub: Our journey with Flink started 2 years ago when we decided to use it for a large-scale, real-time asset tracking system for one of our clients. We had to combine data from multiple sources like RFID antennas, hand-held scanners as well as mobile and web applications. What is more, we needed it to happen in real-time — after all, you want to be alerted immediately whenever something is off, for example when a valuable asset leaves your facility when it shouldn’t. The solution is targeted for enterprise customers like warehouses or hospitals so we had to make sure we could support large volumes of data.

Looking back, I can honestly say that Flink helped us a lot with solving those challenges and delivering a solution that met our client’s functional and performance requirements. Flink pretty much allowed us to create a quite complex event processing component with a relatively small team. That let us deliver more value in other areas of the system.

Q: Do you think that open source software can benefit companies across such industries when it comes to building cutting-edge, data-driven applications?

Jakub: Definitely. We have seen significant needs from our clients for large-scale and real-time data processing in all the verticals mentioned previously. From analysing solar farms data to parallel computations on genomic and proteomic data to diagnose cancer. Most of those problems are non-trivial to solve, but luckily there are many great open source tools out there to help. At Freeport Metrics, we have a strong Java background, so we have been using The Apache Software Foundation’s (ASF) solutions for a long time. Looking at the available data processing tools from the ASF, Flink was the natural course of action for us.

Q: What are the reasons these companies can benefit from open-source technologies?

Jakub: If I had to choose one, I would say that it is probably the knowledge sharing aspect that drives the popularity of those technologies. One of the main challenges for IT companies right now is talent shortage. By building your solutions based on standardised and widely adopted technologies, you position yourself better for attracting top developers. You also reduce the cost of ramp-up, if you need to scale up your team. Last but not least, it is always easier to find an answer on StackOverflow if the particular technology community is large and active.

Q: What would be your advise to a tech lead or business owner interested in building data-driven applications and moving to stream processing?

Jakub: First of all, assess your needs in-depth to decide whether you really need to move into the large-scale, distributed data processing world. If a transactional database and a single node server are good enough for your needs, it may be the optimal choice.

Assuming you are past that step, there are a couple of considerations when you first start with large-scale, distributed data processing.

Firstly, the deployment model of distributed applications may be different from what your team is used to and may require new procedures and tools. This may be especially important when you need to deploy to a regulated, on-premises enterprise environment, that you don’t have the full control of.

Secondly, be prepared for a lot of learning if you want to transition a team that has worked in more ‘traditional’ technologies so far. This is, of course, relevant to any technology transition. I just want to point out that switching to distributed computing may require more effort than replacing one web framework with another. (As a side note, I find all initiatives that try to lower entry barriers for newcomers, like Streaming SQL, very interesting in this context).

The good news is that data streaming pays off. Once you get past the artificial barriers imposed by batch processing, you can explore new use cases and build better products. Real-time data processing is still perceived as a competitive advantage, but soon enough it will be a necessity to stay relevant in many industries. Users don’t like to wait and sometimes just cannot afford to wait.

Q: What do you think about the future of open source technologies?

Jakub: Open source is crucial to the strategy of most of IT companies right now — it is best demonstrated by the increased corporate involvement from big brands like Microsoft or IBM with the recent acquisition of RedHat. Having full-time contributors employed by sponsoring companies is good news for the stability of projects. How this involvement will transform the open source community in the long term remains to be seen.

Q: Do you think that Apache Flink is a good fit for companies eager to build their stream processing architecture? If yes, why?

Jakub: Each company or tech startup has its own unique needs, so it is hard for me to answer that question. The one thing I know is that we were able to deliver our solution with a relatively small team. I can also imagine that the cost factor may be an advantage to startups as well.

Q: What are the differentiating factors between Flink and other stream processing frameworks? What makes Flink unique and where would you see Apache Flink in the future?

Jakub: In our case, we were looking for a solution dedicated to real-time streaming and we liked that Flink was built specially for that purpose. We also needed something that can be deployed on-premises. While cloud has definitely been a hot topic in the enterprise world for the last couple of years, many large organisations still maintain their own data centers because of cost, compliance or historical reasons. Lastly, it is our good practice to always evaluate the maturity of any new tool that we want to use, as well as the community supporting it. We want to be sure that our proposed solution will be stable and can also be actively maintained for the foreseeable future.

Q: What is your impression about the Apache Flink community?

Jakub: When we attended the Flink Forward conference last September, I was really impressed by the quality of talks and the variety of use cases presented by the speakers. Prior to the conference, I don’t think I had a full understanding of how popular Flink is in China. Additionally, the mere presence of big brands like Microsoft, Netflix or Airbnb shows that Flink isn’t a niche technology anymore, but rather one with a great adoption in the enterprise. Leaving the conference, I felt confident that our investment in adopting Flink at Freeport Metrics was a good move.