Report:
12 pt Times New Roman
1.5 spacing
Professional Table of contents required
All figures captioned below in font size 10pt centered
All tables captioned above in font size 10pt centered
All tables and figures MUST be explained in the document
All non-original data MUST be cited
Between four to 6 pages
Figures and Tables must be clear & readable.
Section headers are necessary. Format guide:
Level 1 – Bold, centered 14pt
Level 2 – Bold, left justified, underlined 12 pt.
Level 3 – Bold, left justified, italicized 12 pt.
Include the Jupyter notebook snippets of code and out in your report
20
Introduction of the report:
Deliverables & Methodology: Using the deliverables introduce the reader to the types of analysis/information the project is attempting to deliver.
For each deliverable, inform and briefly describe the tools/methods that will be used, and the criteria for success
30
Obtain a dataset (.csv/.xls file) with at least 4 variables
2 numerical
2 categorical
Minimum of 150 rows of data
-------------------------------------------------------------------------------------------------------------
Dataset description: Describe the dataset, including necessary citations:
Provenance: Where did you obtain it? What organization built it? What is it used for?
Description: # of rows
Identify Variable tags (n.b. a variable can be more than one thing):
-Input
-Output
-Categorical
Interval
Rank
Scale
Etc
80
Deliverables:
Perform Exploratory Data Analysis(EDA):
[20 points]. For each of the 2 categorical variables plot the counts of the distinct levels, and of the crosstabs
[5 points]. Show the mean and SD of the overall numerical variables
[15 points]. For each of the (2) numerical variables plot the sub-categories’ mean and SD
[20 points]. Using an appropriate method, demonstrate the dispersion for the two numerical variables including for their subcategories
[10 points]. Using an appropriate visual method, show the relationship between the two numerical variables
[10 points] Using an appropriate analytical method, show the relationship between the two numerical variables