Solving Real-World Problems with Data

Real-world problems are rarely solved directly with algorithms. Instead, a detour through the world of data is required.

The process consists of three phases:

  1. Formulation of the real-world problem as a data problem.
  2. Solving using a methodology based on data and algorithms.
  3. Transfer of the solution back to the real world.

By data problem, I refer to classes of tasks that can be addressed using data and algorithms – such as optimization, prediction, planning, clustering, search, or learning.

The concrete methodology for solving a data problem typically involves three central decisions:

  • How is the problem environment represented in data?
  • Which algorithms are selected, applied, or developed?
  • How is the solution objectively evaluated?

The result – for example, a trained model or an implemented algorithm – is then deployed in the real-world application. This process is illustrated in Figure 1.

Flowchart: From the real-world problem (left) via formulation as a data problem, solving using a methodology in the data space, to the transfer of the solution back to the real world (right). Arrows highlight the three-phase process.
The three-phase process for solving real-world problems using algorithms – from the real-world problem via formulation and solving as a data problem to the transfer of the solution back to the real world.

Example: Movie Recommender System

  1. Real-World Problem β†’ Data Problem
    Users should discover suitable movies β†’ Formulate as a prediction task: What rating will user X give to movie Y?
  2. Solving in the Data Space
    • Data: User IDs, movies, ratings (1–5), click history, etc.
    • Algorithm: Matrix factorization (e.g., SVD) or collaborative filtering
    • Evaluation: RMSE on validation data
  3. Transfer to the Real World
    The trained model delivers top-10 recommendations β†’ integrated directly into the app.

This post summarizes my experiences with the described process and provides practical tips for its application – from defining the real-world problem and formulating it as a data problem to identifying, selecting, and developing an appropriate methodology.

Defining the Real-World Problem

Before developing a solution, it is worth reflecting on a few key points.

First, assess whether solving the problem delivers genuine value. Does it create benefit for others? A purely personal interest is legitimate for self-directed learning and growth, but when significant time is involved, collaboration is required, or the solution is intended for others, the value of the solution should take priority.

Once this is settled, sketch the ideal solution. What observations can be made in the problem domain? How do they influence the outcome? Documenting these thoughts early – before external ideas come into play – makes it easier to contextualize them later.

Afterward, adopt a pragmatic stance and investigate whether and how others have already addressed the problem, before attempting to solve every aspect of the process from scratch.

Review articles provide an effective starting point for identifying related work and offer an overview of existing solution approaches. However, this overview typically does not replace the need to study the original sources firsthand.

Contributions addressing the same problem can offer valuable insights and ideas across multiple steps of the process outlined above. They reveal how others have formulated the problem as a data problem and which methodology they applied in their solution attempt. From the methodology, one can infer what type of data was used, how it was preprocessed and transformed, which families of methods and specific methods were employed, and which criteria were used to evaluate the solutions. Over time, this builds a comprehensive understanding of prior approaches to the problem.

If little or no literature is found for the specific problem, it is worthwhile to examine similar or more general variants and apply the same questions. For instance, in the case of a movie recommender system, relevant work on music or e-commerce recommendations can be consulted – a strategy that is generally recommended to gain a thorough overview of the broader research area.

Evaluation of Approaches

After the survey, the identified work must be evaluated – particularly whether any of the presented methodologies meet the requirements for an ideal solution, as defined upfront. These requirements may evolve during reading; such adaptation is legitimate and often beneficial. Nevertheless, it remains valid to adhere to the original criteria – precisely why they were documented early.

At this stage, three scenarios emerge: a fully suitable methodology has been found, a partially suitable one, or none at all. The next steps depend on this assessment.

Fully Suitable Methodology

If the goal is simply a working solution, it has been achieved at this point. Following the motto β€œNo problem should ever have to be solved twice” (Raymond, 2001), the identified methodology can be implemented and deployed directly.

If no suitable solution exists, however, further development or an entirely new methodology is required. This reveals a research gap – from here, new knowledge has to be generated.

Partially Suitable Methodology

In this case, the identified methodology must be refined: proven components are retained, while unsuitable ones are replaced. This requires a systematic assessment of which elements meet the requirements and why, which fall short and for what reasons, how necessary modifications should be designed and on what grounds – as well as which variations should explicitly be avoided, and why.

No Suitable Methodology

If no viable solution exists – whether because the research is not public, one is conducting pioneering work, or a deliberate from-scratch approach is desired – the following systematic procedure can be applied:

  1. Define the problem.
  2. Generalize the problem (as a data problem).
  3. Identify a suitable method family capable of solving problems of this type and design the data model – i.e., determine how the real world is represented in data.
  4. Derive requirements from observations, the use case, and the data model, then select a specific method from the family.
  5. Verify whether the method meets all requirements – if not, make targeted adjustments.
  6. Implement the method: adopt an existing implementation or develop one from scratch, test it, and optimize as needed.
  7. Apply the methodology, including hyperparameter tuning if applicable.
  8. Evaluate the results objectively.

References

  1. Raymond, E. S. (2001). How To Become A Hacker. http://www.catb.org/~esr/faqs/hacker-howto.html