Introduction to Locally Weighted & Logistic Regression
Locally Weighted Regression (LWR), also known as Locally Weighted Scatterplot Smoothing (LOWESS) or LOESS, is a non-parametric regression technique used to fit a smooth curve through a scatterplot of data points. Unlike traditional regression methods that fit a single model to the entire dataset, LWR fits multiple local models to different subsets of the data, allowing it to capture complex, non-linear relationships.
Key Features of LWR:
Locality: LWR focuses on fitting the model locally around each data point of interest. It assigns higher weights to data points that are closer to the point being predicted and lower weights to those farther away.
Weighted Least Squares: The method uses weighted least squares to fit a local linear or polynomial regression model. The weights decrease with distance from the point of interest, often according to a kernel function like the Gaussian kernel.
Flexibility: Because it fits separate models at different points, LWR can adapt to changes in the relationship between variables across different regions of the dataset. This makes it particularly useful for capturing non-linear patterns that might be missed by global models.
Bandwidth: The concept of "bandwidth" in LWR refers to the size of the neighborhood around each point. A smaller bandwidth focuses more narrowly on nearby data, providing a more flexible but potentially noisier fit. A larger bandwidth smooths over more data points, resulting in a less flexible but more stable fit.
Computational Cost: LWR can be computationally intensive, especially for large datasets, because it involves fitting multiple local models. The computational complexity depends on the number of data points and the dimensionality of the data.
Applications:
Non-linear Relationships: Ideal for situations where the relationship between the dependent and independent variables is complex and non-linear.
Exploratory Data Analysis: Helps in visualizing data and identifying patterns without making strong parametric assumptions.
Capture Non-linear Relationships: LWR is useful when the relationship between the independent and dependent variables is not linear. It fits multiple local models rather than a single global model, allowing for greater flexibility in capturing complex patterns in the data.
Handle Small to Medium-sized Datasets: Due to its computational intensity, LWR is generally used for datasets that are not too large. It works well when you can afford the computational cost of fitting local models.
Focus on Local Data: LWR gives more weight to nearby points when fitting the model at a particular point. This is helpful when you believe that local trends are more important than global trends in the dataset.
Visualization and Smoothing: It is often used for smoothing scatterplots, especially in exploratory data analysis, to understand the underlying trends without making strong parametric assumptions.
Handling Heteroscedasticity: LWR can be useful when there is heteroscedasticity in the data (i.e., the variance of the errors is not constant) since it fits different models in different parts of the data space.