Home
Writers Hub
From Business Problems to Machine Learning Models

From Business Problems to Machine Learning Models

Data Science & ML, Data Transformation & Data Projects

Machine learning is everywhere in modern discussions about technology, from finance and healthcare to retail and entertainment. Its value lies in its ability to extract patterns from data and use those patterns to make predictions, support decisions, or automate tasks that would otherwise require human judgement. Despite this widespread attention, the term “machine learning” is often used vaguely, leaving many unsure of what it involves or how it should be applied in practice.

At its core, machine learning is a way of building systems that learn from data rather than relying on explicitly programmed rules. Instead of telling a computer exactly how to solve a problem, we provide examples and allow the system to identify relationships and trends on its own. These learned patterns can then be used to predict future outcomes, classify new observations, or uncover structure within data. Machine learning can be applied to a wide range of tasks, from forecasting demand and detecting fraud to recognising images and understanding language.

A practical way to begin understanding machine learning is to take a bottom-up approach. Rather than starting with algorithms and technical details, it is often more effective to begin with a real problem that is already familiar, such as predicting prices, classifying emails, or recommending products. By starting with the problem and working towards a suitable technique, machine learning becomes less abstract and more intuitive. This approach highlights that different techniques exist because different problems make different assumptions about how data behaves.

This article follows that bottom-up perspective. By grounding each machine learning technique in a realistic scenario, it explains not only what each model does, but why it is appropriate for certain types of problems. In doing so, it provides a framework for choosing the right technique based on the structure of the problem, the nature of the data, and the kind of insight or prediction required.

Linear Regression

A property analyst comparing house prices across a city might observe that larger homes generally cost more, and that factors such as location and number of bedrooms influence price in fairly predictable ways. When changes in input variables lead to gradual and proportional changes in the outcome, linear regression provides a natural starting point.

Linear regression works by fitting a straight line through the data that best captures the relationship between inputs and output. Each input is assigned a weight that represents its contribution to the prediction, allowing the model to remain both simple and interpretable. This clarity makes linear regression especially valuable in domains such as economics, business forecasting, and public policy, where understanding why a prediction was made is as important as the prediction itself.

Quick Facts: Linear Regression

What it does (summary):
Predicts a numerical value by modelling a straight-line relationship between inputs and the output.

Example usage:
• Predicting house prices
• Forecasting sales revenue

Input data:
• Numerical features
• Continuous target variable

Output data:
• Continuous numeric value

Assumptions:
• Linear relationship between features and target
• Independent errors
• Constant variance of errors
• Low multicollinearity

Decision Trees (Regression)

In many professional settings, decisions are rarely based on equations alone. A loan officer, for example, may reason using a series of conditions: income above a certain level, a stable employment history, and a clean credit record all reduce perceived risk. Decision trees formalise this kind of rule-based reasoning.

A decision tree divides data into increasingly specific groups by applying simple conditions at each step. Each split narrows the range of possible outcomes, and the final prediction is determined by the path taken through the tree. This structure makes decision trees highly interpretable and well suited to contexts where decisions must be explained or justified.

Quick Facts: Decision Trees (Regression)

What it does (summary):
Predicts numerical values using a series of decision rules learned from data.

Example usage:
• Credit risk assessment
• Property valuation

Input data:
• Numerical and/or categorical features

Output data:
• Numeric prediction

Assumptions:
• Data can be divided into meaningful decision regions
• Features contain useful split points

Random Forests (Regression)

Although decision trees are intuitive, their simplicity can be a weakness. Small changes in the data can lead to very different trees, making predictions unstable. This becomes problematic in complex environments, such as forecasting demand across thousands of products where many factors interact.

Random forests were developed to address this issue by combining many decision trees into a single model. Each tree captures a different perspective on the data, and their aggregated predictions reduce variance and improve reliability. While this ensemble approach sacrifices some interpretability, it delivers stronger and more consistent performance, which is often critical in large-scale applications.

Quick Facts: Random Forests

What it does (summary):
Improves numerical predictions by combining the results of many decision trees.

Example usage:
• Demand forecasting
• Financial risk prediction

Input data:
• Numerical and/or categorical features
• Large labelled datasets

Output data:
• Numeric prediction

Assumptions:
• Individual trees make different errors
• Aggregating trees improves generalisation

k-Nearest Neighbours (k-NN)

Some prediction tasks are best approached through comparison rather than abstraction. Recommendation systems, for instance, rely on the idea that users with similar behaviour in the past are likely to make similar choices in the future. k-Nearest Neighbours is built directly on this principle.

Rather than learning a global model, k-NN stores the training data and makes predictions by identifying the most similar past observations. The predicted outcome is based on these neighbours. This makes the method intuitive and flexible, though its reliance on distance calculations means it can struggle with very large datasets or poorly scaled features.

Quick Facts: k-Nearest Neighbours

What it does (summary):
Makes predictions by comparing a data point to its most similar past examples.

Example usage:
• Recommendation systems
• Handwritten digit recognition

Input data:
• Numerical features (scaled)
• Labelled dataset

Output data:
• Class label or numeric value

Assumptions:
• Similar inputs produce similar outputs
• Distance metric reflects true similarity
• Local patterns are meaningful

Logistic Regression

Many real-world problems require clear decisions rather than numerical estimates. Emails must be classified as spam or not spam, and customers must be identified as likely to leave or stay. Logistic regression addresses these problems by shifting the focus from predicting values to estimating probabilities.

By modelling the probability that an observation belongs to a particular class, logistic regression allows decisions to be made using thresholds while preserving information about uncertainty. Its simplicity, efficiency, and interpretability make it a common baseline model for classification tasks across many domains.

Quick Facts: Logistic Regression

What it does (summary):
Predicts the probability that an observation belongs to a given class.

Example usage:
• Spam detection
• Customer churn prediction

Input data:
• Numerical or encoded categorical features

Output data:
• Class probabilities
• Predicted class

Assumptions:
• Linear relationship between features and log-odds
• Independent observations
• No strong multicollinearity

Naive Bayes

Some classification problems involve many input features that individually carry weak signals but together provide strong evidence. In these situations, it can be useful to combine many small pieces of information in a principled way. Naive Bayes was designed for exactly this purpose.

Naive Bayes models the probability of each class by combining the contributions of individual features, under the simplifying assumption that those features are conditionally independent given the class. This assumption allows the model to estimate probabilities efficiently, even when the number of features is large. While the independence assumption is rarely true in real data, it often produces useful results because the model captures the overall balance of evidence rather than exact feature interactions.

This approach makes Naive Bayes especially effective for high-dimensional problems, such as text classification, where features are numerous and sparse. Its simplicity, speed, and robustness explain why it remains widely used despite its strong assumptions.

Quick Facts: Naive Bayes

What it does (summary):
Classifies data by estimating the probability of each class given the observed features.

Example usage:
• Spam filtering
• Sentiment analysis
• Document categorisation

Input data:
• Feature vectors (categorical or numerical)
• Often high-dimensional and sparse

Output data:
• Class probabilities
• Predicted class

Assumptions:
• Features are conditionally independent given the class
• Feature distributions are appropriately modelled

Neural Networks

Some problems resist simple rules or linear assumptions altogether. Tasks such as recognising faces in images or converting speech into text involve patterns that emerge only through complex, layered interactions. Neural networks were developed to meet these challenges.

By stacking multiple layers of computation, neural networks transform raw data into increasingly abstract representations. This allows them to capture highly non-linear relationships, making them well suited to image, audio, and language tasks. However, this flexibility comes at the cost of increased data requirements and reduced interpretability.

Quick Facts: Neural Networks

What it does (summary):
Learns complex patterns through layered transformations of data.

Example usage:
• Image recognition
• Speech recognition
• Language translation

Input data:
• Numerical tensors (images, sequences, embeddings)

Output data:
• Continuous values
• Class probabilities
• Structured outputs

Assumptions:
• Large amounts of data are available
• Patterns are highly non-linear
• Training converges to useful representations

Conclusion

By working from real problems rather than from algorithms, this article has tried to reframe how machine learning is approached in practice. Instead of treating models as abstract tools to be memorised, it shows them as responses to different ways the world behaves: sometimes relationships are smooth and predictable, sometimes they follow rules, sometimes similarity matters most, and sometimes the patterns are too complex to describe explicitly. Seeing models in this way makes their differences feel purposeful rather than arbitrary.

What this article has not done is claim that choosing a technique is enough. Many of the ideas touched on here—assumptions about error behaviour, independence, or variability—carry far more weight than their brief descriptions suggest. Nor does selecting a model say anything about whether it will perform well, fail quietly, or produce misleading results. Those questions depend on how data is prepared, how models are evaluated, and how their limitations are understood.

This is where machine learning becomes less about tools and more about judgement. Knowing which model might fit a problem is only useful when paired with an understanding of when that model can be trusted. Concepts such as evaluation metrics, validation strategies, and diagnostic checks are what turn an appropriate choice into a reliable one. They are also where many real-world failures occur, not because the wrong algorithm was chosen, but because its behaviour was never questioned.

Seen this way, this article is best read as a starting point: a way of orienting yourself before going deeper. The next step is learning how to test assumptions, measure performance, and recognise when a model’s confidence exceeds its understanding. Those ideas deserve the same problem-first treatment—and they are where the real work of applied machine learning begins.

About the Author:

Effy Abbsar is a masters student in Statistics and Operations Research at RMIT University, with expertise in research and data analysis. She has worked across both commercial and public sectors, using data to uncover trends and drive meaningful decisions. Effy is passionate about transforming complex datasets into clear, impactful narratives, making data accessible to diverse audiences.

(https://www.linkedin.com/in/iffatabbsar/)

See Effy’s profile here