Data Strategy: Part 1 - Data Management is Fundamental


zurück zur Übersicht

Im Jahr 2012 veröffentlichte MIT Sloan einen Artikel, in dem die "Big Data Revolution" erläutert wurde. Die Autoren betonten die Notwendigkeit, die enormen Datenmengen zu nutzen, die durch die Digitalisierung, das Internet und das damals noch in den Kinderschuhen steckende Internet der Dinge (Internet of Things, IoT) entstehen. Spulen Sie die Zeit bis heute vor. Vor einigen Monaten verfassten wir einen Artikel mit dem Titel "Screwed: The Real Value of Data ". Darin argumentierten wir, dass es nicht mehr um die Frage geht, ob Daten einen Wert haben, sondern vielmehr darum, ob Ihre Organisation sich in eine solche verwandeln kann, die Daten nutzt, um ihre digitale Grundlage für die Zukunft zu schaffen.

Anstatt Sie einfach auf dieser Plattitüde sitzen zu lassen, dass Daten die "digitalen Schrauben" der Zukunft sind, haben wir bei INFORM beschlossen, eine dreiteilige Serie über effektive Datenstrategie zusammenzustellen. Sie wird eine Menge Informationen enthalten, die den Lesern helfen wird, die Grundlagen der Datenstrategie zu verstehen, und zu ermöglichen, den Weg zu einer solchen Strategie zu beschreiten. In unserer Reihe geben wir Ihnen eine Einführung in das, was uns die Spitzentechnologie ermöglicht und was die Auswirkungen auf die maritimen Operationen sind. Am wichtigsten ist aber, dass sie eine Vorstellung bekommen, wie Sie diese Vorteile nutzen können. In Teil 1 beginnen wir mit einem Blick auf das Datenmanagement.

Dieser Artikel ist ausschließlich in englischer Sprache erhältlich.

What is Data Management in Layman’s Terms?

Data Management is a term that can be interpreted in various ways. Its definition ranges from being used as a term for data sources, storage, connectivity, transfer, transformation, and modeling, but also less technical terms like governance, security, or cataloging. So, what is it? Well – all of the above. Data Management describes the collection, refinement, and provisioning of data. It includes everything that has to happen between the creation of a data point and that data being made available in an appropriate form for consumption by data analytics, data science, artificial intelligence, operations research, and other advanced computing practices.

Why is Data Integration Relevant?

Spoiler alert: Speed is what is important here.

One key component of data management is data integration. The value of datasets is vastly increased if they are enriched with context, especially across system boundaries. If we can, for example, integrate the information of inbound shipments with metadata from the port of origin, shipping line, container master data, handling equipment parameters, etc., we can start to form the ominous, digital twin that is quickly becoming a focal point of many ports and terminals around the world.

The availability of associated information with regards to every part of port and terminal operations would incredibly increase transparency, control, and provide the best foundation for making insight-based, split-second decisions. Speed is crucial. Insights, in retrospect, can be helpful in refining processes, but it does not help you identify problems in real-time and certainly does not give you data-based options for resolving the issues.

Modern Tools Make Data Integration Straight-forward

Up until recently, the entry cost and effort to implement solu ons for Extract, Transf tiorm, and Load (ETL), as well as storage (data warehouses especially), was a significant deterrent to implementing data integration as part of your data strategy. Add to this the lack of qualified personnel, and the challenges typically outwaited the ROI potential.

However, since 2012, many things have changed. “Big Data” has just become “data.” No one really bats an eye at millions of records anymore. Machine learning is ubiquitous in our everyday life, be it in navigation, shopping, meal recommendations, or smart assistants. Computing storage cost and power have been made highly accessible and extremely affordable through the propagation of cloud-based computation business and service models. Before, the scope of data-driven projects used to be limited by the horsepower available in one’s on-premises servers. Nowadays, fully scalable resources are available through cloud providers like Amazon, Google, and Microsoft – to name just a few of the prominent players.

This has seen the cost of storing a terabyte of data in a cloud data warehouse drop to as low as 23 USD (20 EUR) per month. To put this into perspective, a consumer, solid-state disk is five times the cost and does not come with built-in enterprise-level security. The same goes for the ability to run analytics queries on the data. Cloud computing power is scaled to facilitate whatever complex calculation is thrown at it and charged by the minute of usage. Gone are the days of paying for dormant CPUs that only spin up occasionally.

Another major development is the emergence of capable ETL tools that, most often, do not only move data from the source to the centralized data storage (be it data lake or warehouse – more on that below), but also will assess data quality (at a rudimentary level), create data models, and, in some cases, will automatically create data marts for immediate consumption by data analytics solutions. Every process along the value-added data chain where data gets handled, transferred. or transformed is also often referred to as “data in motion.” Capable contenders include Qlik Data Integration or TimeXtender as well as proprietary data pipelines like Snowpipe (Snowflake) or Microsoft Azure Data Factory (MS Azure). Other tools come with built-in data catalogs that allow business users to simply “shop” for data necessary to tackle the business challenges before them.

This allows companies to approach data management in a more flexible and versatile fashion. In traditional systems, the design of the solution determines the necessary data model. Based on the data model, lengthy data architecture projects are necessary to facilitate data analytics projects. If, at a later stage, additional fields or transformations are necessary, these changes could only be embedded after days, weeks, or even months of modeling. This greatly delays the benefits generated by the insights coming from that data, often to the point of redundancy.

Using ETL and the more modern form of ETL – Extract, Load, Transform (ELT) (i.e., you move the data, store it, and then transform it by purpose) tools – combined with data warehouse or data lake automation reduce the time, effort, and human resources required to react to new developments and requirements in the rapidly evolving context of data analytics and data science up to a factor of ten.

Den vollständigen Artikel senden wir Ihnen gerne zu. Füllen Sie bitte das untenstehende Formular aus.

Nach oben