Logistic regression is one of the most widely used statistical and machine learning algorithms for classification problems. It is simple, interpretable, and effective in many real-world applications. However, one limitation of logistic regression is that it assumes a linear relationship between the independent variables (features) and the log-odds of the dependent variable (target). This raises an important question: how does logistic regression handle non-linear relationships?
This article explores how logistic regression works, why it struggles with non-linear relationships, and various techniques to extend it for handling non-linearity effectively.
Understanding Logistic Regression
What Is Logistic Regression?
Logistic regression is a classification algorithm that predicts the probability of an instance belonging to a particular class. Unlike linear regression, which predicts continuous values, logistic regression applies a sigmoid function (logistic function) to map outputs to the range [0,1], making it suitable for binary classification.
The Logistic Function
The logistic function, also known as the sigmoid function, is given by:
\[P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + … + \beta_nX_n)}}\]where:
- P(Y = 1 | X) is the probability that the dependent variable is 1 given the features .
- B0, B1, B2, … are the coefficients (weights) of the model.
- X1, X2, …, Xn are the independent variables (features).
The model estimates these coefficients to maximize the log-likelihood function.
Why Logistic Regression Assumes Linearity
Logistic regression does not directly model the relationship between the input variables and the output probability. Instead, it models the log-odds (logit function) as a linear combination of the input features:
\[\log \left( \frac{P(Y=1)}{1 – P(Y=1)} \right) = \beta_0 + \beta_1X_1 + \beta_2X_2 + … + \beta_nX_n\]This assumption means that the decision boundary is linear, making logistic regression poorly suited for handling non-linear relationships unless modifications are made.
How to Handle Non-Linear Relationships in Logistic Regression
1. Polynomial Features
One of the simplest ways to introduce non-linearity into logistic regression is to add polynomial features. Instead of using just the raw features, we introduce polynomial terms such as quadratic and cubic features.
For example, if we have a single feature X, instead of using just X, we can create:
\[X, X^2, X^3, …\]Example:
Instead of using:
\[logit(P) = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3\]We modify it to:
\[logit(P) = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3\]This allows the model to capture curved relationships between the features and the log-odds.

2. Interaction Terms
Another approach is to add interaction terms between features. If two features interact non-linearly, we can include their multiplicative interaction in the model.
For example, instead of just and , we can introduce interaction terms:
\[logit(P) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \times X_2)\]This approach is useful when the effect of one feature depends on the value of another.
3. Using Basis Functions (Splines)
Basis functions, such as splines, allow the model to fit complex relationships by dividing the data into multiple segments and fitting different functions to each segment.
Types of Splines:
- B-splines (Basis Splines): Allow smooth curve fitting with piecewise polynomials.
- Natural Splines: Reduce overfitting by restricting curve flexibility at the extremes.
- Cubic Splines: Fit cubic polynomials between defined knots (data points).
Example Use Case:
Splines are widely used in medical research and finance, where the relationship between variables changes at different points.
4. Kernel Trick (Kernelized Logistic Regression)
Kernel methods transform the input space into a higher-dimensional space, allowing logistic regression to fit non-linear decision boundaries without explicitly adding polynomial terms.
How It Works:
Instead of applying logistic regression directly to X, we map X into a higher-dimensional feature space using a kernel function :
\[logit(P) = \beta_0 + \beta_1 K(X)\]Common Kernel Functions:
- Polynomial Kernel: Captures polynomial relationships without explicitly computing all polynomial features.
- Radial Basis Function (RBF) Kernel: Captures complex relationships by measuring distance from key points.
Kernel-based logistic regression is powerful but computationally expensive.
5. Non-Linear Feature Transformations
Applying mathematical transformations such as logarithm, square root, or exponential transformations can help in handling non-linearity.
Example:
If the relationship between and is exponential, we can transform using:
\[X’ = \log(X)\]This makes the transformed variable linearly related to the log-odds.
These plots will help illustrate how different methods improve decision boundaries for non-linear datasets.
Graphical Representation
Including a graph that compares:
- Standard logistic regression (linear decision boundary).
- Logistic regression with polynomial features.
- Kernelized logistic regression.

These plots will help illustrate how different methods improve decision boundaries for non-linear datasets.
Conclusion
Logistic regression is a powerful classification algorithm, but it struggles with non-linear relationships due to its inherent linear assumption. However, there are several techniques to extend logistic regression to handle non-linearity:
- Polynomial Features: Introduce squared or cubic terms to model curved relationships.
- Interaction Terms: Capture dependencies between multiple features.
- Splines: Use piecewise polynomials for flexible curve fitting.
- Kernel Methods: Transform input data into a higher-dimensional space.
- Feature Transformations: Use logarithmic or exponential functions to linearize relationships.
By leveraging these approaches, logistic regression can be adapted to handle non-linear relationships effectively, making it useful even in complex classification problems.