ORDER NEW COPY OF ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENT AND SECURE HIGHER MARKS!
ICT110 Introduction to Data Science  University of the Sunshine Coast
Data analysis report of the health and population statistics of East Asian and Pacific countries
NEVER BE CAUGHT IN PLAGIARISM, AVAIL ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENT HELP SERVICE OF EXPERTSMINDS.COM AND SAVE HIGHER MARKS!
1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from, what are the dimensions and structures of the data.
1.1 Authorization and Purpose
The data taken into consideration represents the initial basic information about road transport crash deaths in Australia and it was consolidated based on the reports generated by the police each month to the State and Territory road safety authorities
1.2 Limitations
Here, the study is limited to only Australian population and hence the interpretation will be valid to the population and characteristics of Australian people and cannot be used for general population
1.3 Scope
The major objective of this study is to determine the factors that influence the road death accidents. Also, we try to determine the distribution of road death accidents which helps to understand the highest frequency of death rates along with its time and speed limit
1.4 Methodology
The data was gathered in csv format and the analysis was performed using R software package. Descriptive statistics was used to understand the distribution of interval or ratio variables and frequency distribution was normally used for qualitative variables which we use to understand the shape of these variables. Independent sample t test, One way ANOVA and Chi  square test for independence was used to test the claim
EXPERTSMINDS.COM GIVES ACCOUNTABILITY OF YOUR TIME AND MONEY  AVAIL TOP RESULTS ORIGINATED ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENT HELP SERVICES AT BEST RATES!
2. Data Setup
Describe how to load the data, and how the preprocessing is performed.
The data cleaning procedure was performed by the researcher or the staff who are well trained in this process. Data cleaning is a process of identifying inconsistent records that deviate far away from the normal dataset value and therefore, these values or variables needs to be removed from the dataset before performing the statistical analysis. For example, let us consider the obesity data. Here, some of the patients demographic details seems to be missing and therefore, these missing values cannot be replaced as it related to individuals personal details. Therefore, we need to replace the missing information carefully. For the questions related with likert scale ratings, we can replace the missing information either by the median or mode value. For variables, the missing information can be replaced by the average values
ENROL WITH ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENT HELP AND HOMEWORK WRITING SERVICES OF EXPERTSMINDS.COM AND GET BETTER RESULTS IN ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENTS!
3 Exploratory Data analysis
Perform 2 onevariable analyses. Plot one graph for each variable. Explain the finding for each graph.
3.1 One variable analysis
3.1.1 One variable analysis 1
mydata<read.csv("D:/Downloads/crashdata.csv",header=T,sep=",")
attach(mydata)
head(mydata)
boxplot(Speed.Limit)
The box plot for speed limit variable is given below
From the above box plot, we see that the distribution of speed limit follows normal distribution approximately. The middle line in the box plot called the median divides the box in equal halves, indicating that the distribution of box plot follows normal distribution
3.1.2 One variable analysis 2
The distribution of crash type is given below
plot(Crash.Type)
The bar chart given above shows that the maximum number of crash type is either single or multiple. Thus, majority of the crash type fall between these two categories and therefore proper preventive action should be taken to minimize these kind of fatalities
3.1.3 One variable analysis 3
The time variable is categorized into eight groups and are given below
12:00 am  3:00 am
3:00 am  6:00 am
6:00 am  9:00 am
9:00 am  12:00 pm
12:00 pm  3:00 pm
3:00 pm  6:00 pm
6:00 pm  9:00 pm
9:00 pm  12:00 am
From the above bar chart, we see that the maximum number of accident occurred during the time of 9 AM and 12 PM which shows a clear indicating that the fatalities occurs during the peak hours of the day
3.2 Twovariable analysis
3.2.1 Twovariable analysis 1
The association between crash type and time is assess using clustered bar chart
3.2.2 Twovariable analysis 2
counts< table(Crash.Type,Time3hours)
>barplot(counts)
>counts< table(Crash.Type,Christmas.Period)
>barplot(counts)
Here, we see that most of the crash type fatalities are happening on nonChristmas days
24/7 AVAILABILITY OF TRUSTED ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENT WRITERS! ORDER ASSIGNMENTS FOR BETTER RESULTS!
4 Advanced analysis
Briefly explain the concept of clustering and kmeans.
Perform 1 clustering analysis to group years according to a selected cause.
4.1 Clustering
The kmeans clustering is a statistical technique used in cluster analysis to partition n observations in to k clusters where each observations is said to be belong to the nearest mean clusters. On the other hand, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Therefore, to partitioning n observations in K nonoverlapping clusters such that there is at least one observation in each cluster can be done in nCk ways
4.1.1 Brief explanation of kmeans and clustering
4.2 Linear Regression
Briefly explain the concept of linear regression.
Perform 2 linear regression analysis. Plot the learned models.
4.2.1 Brief definition of Linear Regression
Simple linear regression analysis was performed to predict speed limit using bus accident as independent variable
SUMMARY OUTPUT













Regression Statistics






Multiple R

0.071177






R Square

0.005066






Adjusted R Square

0.005043






Standard Error

20.5181






Observations

43852













ANOVA








df

SS

MS

F

Significance F


Regression

1

93999.58

93999.58

223.281

2.32E50


Residual

43850

18460515

420.9924




Total

43851

18554514













Coefficients

Standard Error

t Stat

Pvalue

Lower 95%

Upper 95%

Intercept

82.33147

0.098823

833.1203

0

82.13777

82.52516

Bus Involvement

11.3368

0.758694

14.9426

2.32E50

12.8239

9.84979

The coefficient of determination is 0.0056, indicating that 0.56% of the variation in speed limit is explained by the regression model, while the remaining left unexplained.
The regression equation is
Speed limit = 82.33  11.34 * Bus Involvement
4.2.1 Linear Regression Model 2
Simple linear regression analysis was performed to predict speed limit using crash type as independent variables
SUMMARY OUTPUT













Regression Statistics






Multiple R

0.333198






R Square

0.111021






Adjusted R Square

0.11098






Standard Error

19.39418






Observations

43861













ANOVA








df

SS

MS

F

Significance F


Regression

2

2060177

1030088

2738.619

0


Residual

43858

16496494

376.1342




Total

43860

18556670













Coefficients

Standard Error

t Stat

Pvalue

Lower 95%

Upper 95%

Intercept

87.23772

0.143139

609.4605

0

86.95716

87.51827

Pedestrian

19.4699

0.264562

73.5928

0

19.9884

18.9513

Multiple

4.22375

0.203706

20.7346

4.84E95

4.62302

3.82449

The coefficient of determination is 0.111, indicating that 11.1% of the variation in speed limit is explained by the regression model, while the remaining left unexplained.
The regression equation is
Speed limit = 87.238  19.47 * Pedestrian  4.22 * Multiple
5 Conclusion
The data taken into consideration represents the initial basic information about road transport crash deaths in Australia and it was consolidated based on the reports generated by the police each month to the State and Territory road safety authorities. Here, we see that the majority of the crash type fall between these two categories and therefore proper preventive action should be taken to minimize these kind of fatalities. Also, the maximum number of accident occurred during the time of 9 AM and 12 PM which shows a clear indicating that the fatalities occurs during the peak hours of the day
6 Reflection
The major objective of this study is to determine the factors that influence the road death accidents. Also, we try to determine the distribution of road death accidents which helps to understand the highest frequency of death rates along with its time and speed limit. Here, the study is limited to only Australian population and hence the interpretation will be valid to the population and characteristics of Australian people and cannot be used for general population.
GET ASSURED A++ GRADE IN EACH ICT110 INTRODUCTION TO DATA SCIENCE ASSIGNMENT ORDER  ORDER FOR ORIGINALLY WRITTEN SOLUTIONS!
Access our University of the Sunshine Coast Assignment Help Services for its related courses such as:
 ICT701 Relational Database Systems Assignment Help
 ICT112 Creative Problem Solving with Programming assignment help
 ICT705 Data and System Integration Assignment Help
 ICT115 Introduction to Systems Design assignment help
 ICT702 Data Wrangling Assignment Help
 ICT120 Introduction to Computer Networks assignment help
 ICT703 Network Technology and Management Assignment Help
 ICT351 ICT Professional Practice assignment help
 ICT707 Data Science Practice Assignment Help