Data Description and Problem Statement The California Housing dataset is a widely used dataset for regression tasks in machine learning. It contains information about the median house value in various districts across Califomia, along with several other features that may be useful in predicting house prices. The dataset was collected from the 1990U.S. Census and has 20,640 instances and 8
input features. The input features are: -
Medinc: Median income in the district HouseAge: Median age of houses in the district -
AvoRooms: Average number of rooms per houschold -
AveBedras: Average number of bedrooms per household -
Population: Total population in the district -
Aveoccup: Average number of people per household -
Latitude: Latitude of the districts location -
Longitude: Longitude of the districts location. The target variable is: -
Moolouseval: Medien house value in the distict (in thousands of dollars)
The problem statement for this datsset is to build a reqression modol that cen accurotcyy predict the median house vaxue in a given district based on the othar input fostures. The performance of the model can be evaluated ualng metics such ss mean squared ertor (MSA),
meen absolute error (MAE,and R-squared. +Code +Text -TASK #1
: IMPORT LIBRARIES AND DATASETS Import necessary libraries then import provided datasets into the notebook and explore the dataset. []
"import essential Libraries N Note that you can inport 1 ibraries whenever you need in later code cell Run the following code to fetch the 'california housing' dataset from sklearn. [] from skloarn,datasets inport fetch_ californta_ housing california = fetch_california housing(as_frane= True) x= california.data wfeatures y= caltfornia, target # target [] Mdisplay first five rows of data (x)[]
# How many rows are ther in the dataset?