Analysis of Household Dataset
Overview
This project analyzes the "Individual Household Electric Power Consumption" dataset from the UCI Machine Learning Repository. The analysis involves statistical methods, hypothesis testing, regression, clustering, and dimensionality reduction techniques to gain insights into household electricity consumption patterns.
Dataset
Source: UCI Machine Learning Repository
Timeframe: 4 years of electric power consumption data
Attributes: Various electrical parameters such as active power, reactive power, voltage, and current
Analysis Phases
Phase 1: Statistical Analysis
Population Sampling & Hypothesis Testing:
Creating a normal population from the dataset
Extracting samples and comparing variances between attributes
Regression Analysis:
Identifying linear relationships between variables
Phase 2: Machine Learning Techniques
Dimensionality Reduction & Clustering:
Applying Principal Component Analysis (PCA)
Implementing clustering techniques for better data interpretation
Analysis of Variance (ANOVA):
Comparing means of specific characteristics across different groups
Resources:
Habilities
Probability & Statistics
Data Visualization
Data Manipulation
PCA Analysis
Hypothesis Testing
Regression Analysis
Anova Test