Data Collection, Processing and Transformation Assignment Help

Get Best and Affordable Data Collection, Processing and Transformation Assignment Help Service At Expertsminds!!

Home Course
Previous << || >> Next

ARE YOU LOOKING FOR RELIABLE KIT108 ARTIFICIAL INTELLIGENCE ASSIGNMENT ASSIGNMENT HELP SERVICES? EXPERTSMINDS.COM IS RIGHT CHOICE AS YOUR STUDY PARTNER!

KIT108 Artificial Intelligence Assignment - University of Tasmania, Australia

Unit Learning Outcomes -

1. Understand the local and global impact of AI on individuals, organizations, and society

2. Adapt and apply techniques for acquiring, representing, and reasoning with data, information, and knowledge

3. Select and effectively apply techniques to develop simple AI solutions

4. Analyze a problem, apply knowledge of AI principles, and use ICT technical skills to develop potential solutions

5. Evaluate strengths and weaknesses of potential AI solutions

ESTIMATING THE AGE OF ABALONE BASED ON PHYSICAL MEASUREMENTS

Introduction

Abalone is a general name for small and large sea snail of the Haliotidae family. They can be edible or poisonous. Typically the age is determined through a complex laboratory procedure, which involves cutting through the subject, staining it and recording the number of rings through a microscope. Many scientific methods has been developed which support prediction of the age from its physical measurements. The main objective of this analysis is to develop ML models which can be used to predict the number ofrings based on its physical measurements.

Task 1: Data Collection - Identify irrelevant information from the data and remove it to clean the data.

Answer - Data collection

Data used on this analysis was obtained from a publicly available data publishedby (Warwick J Nash et al, 1994).The following variables are included on the dataset; Gender, length, Thickness, Tallness, weight attributes and ring (see appendix table 1). The original dataset can be found on UCL machine learning repository link below found on the references.

SAVE YOUR HIGHER GRADE WITH ACQUIRING DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT HELP & QUALITY HOMEWORK WRITING SERVICES OF EXPERTSMINDS.COM!

Task 2: Data Pre-processing - There are some missing values for height attribute and rings. Decide the way you handle this issue and explain why.

Answer - Data Processing

The collected data includes some missing values for the variable height and rings; the missing values were estimated using data imputation in weka. Through the Generic user interface (GUI) of the software the filter replacemissingvalues was used for data imputation.The imputation algorithm works by replacing missing values with the mean of attribute. Imputing missing values was found the most appropriate to ensure that the sample is preserved

Task 3: Data Transformation - We need to create a new attribute called volume from other attributes as: volume = length * diameter * height. Normalise the data into [0-1] range.

Answer - Data Transformation

For this analysis data transformation was done in both weka and excel. In excel a new attribute was calculated by multiplying length, volume and diameter i.e multiplying columns A,Band C of our dataset. The file was saved as a CSV file for further use. On weka the normalize filter under supervised option,by default the filter scales data into a scale 0-1 by a method called min-max normalization. The formula used in mini-max normalization is ;

Normalized value =((a- b1))/((c1-b1)) *(c2-b2)+b2

Where a is the value to be normalized, b1 is the minimum value of the variable, c1 is the maximum value of the variable. c2 is the desired maximum value and b2 is the desired minimum value.

Task 4: Data Mining & Pattern Evaluation - Prepare your data from the to have: A training set of the first 2500 samples, A validation set of the next 633 samples, and A test set of the last 1044 samples. Run 15 machine learning algorithms and report their accuracy on the validation set to a table. Explain how the best algorithms work (in the report) Tips: How to improve performance?

Answer - DATA MINING & Pattern Evaluation

Data preparation

The data was partitioned into training, validation and testing sets in by random sampling in excel, the following step; first an index (id) for each row was created by adding a new column with values 1 to 4177, a second column with a random value for each row was added using excel function =RAND().A third column was added which is a random sample without replacement from the index column using the following excel formula =INDEX($L$1: $L$4177),RANK(M1,$M$1: $M$4177),1) then drag to fill. The new variable contained the new sample for rows; the data was then sorted by the sampled indexes. The first 2500, samples were taken to training, the second 633 samples were taken to validation then the lastset was taken 1044 observation to testing test.

DO YOU WANT TO EXCEL IN DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT? HIRE TRUSTED TUTORS FROM EXPERTSMINDS AND ACHIEVE SUCCESS!

Modeling

15 machine learning algorithms were run using the training set then their performance on the validation set was extracted and casted on table 1 below

Table 1: Results of the 15 algorithms on the validation set

	R	MAE	RMSE	Relative absolute error	Root relative squared error	Total Number of Instances
RandomForest	0.7306	0.0542	0.0792	64.18%	68.29%	633
K-star	0.7146	0.0546	0.0816	64.71%	70.33%	633
SMReg	0.7212	0.0549	0.0816	65.00%	70.31%	633
M5algorithm	0.731	0.056	0.0795	66.37%	68.55%	633
multlayer	0.7338	0.0568	0.0839	67.31%	72.31%	633
linearReg	0.7181	0.0569	0.0808	67.40%	69.66%	633
Reptree	0.654	0.0602	0.0891	71.35%	76.79%	633
M5 tree model	0.7143	0.0608	0.083	72.01%	71.52%	633
Decision table	0.6584	0.0609	0.0874	72.12%	75.30%	633
additive regression	0.6577	0.0625	0.0875	74.05%	75.41%	633
IBK(K neighbours)	0.6221	0.0661	0.0968	78.27%	83.47%	633
LWL	0.5285	0.0716	0.0985	84.78%	84.92%	633
Decision stump	0.5146	0.072	0.0995	85.30%	85.75%	633
Randomtree	0.5525	0.0779	0.1113	92.24%	95.92%	633
ZeroR	0	0.0844	0.116	100%	100%	633

The best performing model on the validation set was found to be a random forest; the model had root mean square error value equal to 0.0792, a mean absolute error 0.0542 and a root relative squared error equal to 68.29%. These values are a measure of deviation between the actual value and predicted value. Small values are therefore preferred. The model was then taken to the testing set.

EXPERTSMINDS.COM ACCEPTS INSTANT AND SHORT DEADLINES ORDER FOR DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT - ORDER TODAY FOR EXCELLENCE!

Table 2: error measures of the random forest on the testing set

	R	MAE	RMSE	Relative absolute error	Root relative squared error	Total Number of Instances
Random Forest	0.7434	0.0563	0.0791	65.26%	67.09%	1044

Table 3: Variable importance

Node impurity	variable
0.09 (10939)	shell weight
0.04 (9367)	Volume
0.03 (10746)	height
0.03 (13517)	shucked weight
0.02 (14734)	whole weight
0.02 (13434)	Diameter
0.02 (4610)	sex
0.02 (11769)	Viscera weight
0.01 (16840)	Length
0 (0)	edible

NEVER LOSE YOUR CHANCE TO EXCEL IN DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT - HIRE BEST QUALITY TUTOR FOR ASSIGNMENT HELP!

Conclusion

How the model works

A random forest is a modification of a random tree model to include more than one tree; it builds bagged trees on bootstrapped training samples. The splits of trees into more than one tree are based on sampling of the predictors. The algorithm has therefore two main parameters, the number of variables to sample before splitting a tree and the minimum number of observations to build a tree on, when this value is reached, sampling stops then trees vote, the most voted values are chosen as prediction.

From the above output , the model had a root mean squared error equal to 0.0791, and a mean absolute error equal to 0.0563. These accuracy measures show that the model performs better on the testing set, probably because of large sample size. The variable importance were computed in terms of node impurity where on each node permutation were carried out and the amount of change on out of bag error is calculated the overall decrease in error is therefore considered to determine the sensitivity of the given variable. The most important variable to the model is therefore the weight of the abalone followed by the computed variable volume. The least important is edible.

ORDER NEW DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT & GET 100% ORIGINAL SOLUTION AND QUALITY WRITTEN CONTENTS IN WELL FORMATS AND PROPER REFERENCING.

Get our University of Tasmania, Australia Assignment Help services for below mentioned courses like:-

KIT502 Web Development Assignment Help
KIT505 Computational Thinking and Impact of Emerging Technology Assignment Help
KIT507 Games Design and Production Assignment Help
KIT508 Virtual and Mixed Reality Technology Assignment Help
KIT707 Knowledge and Information Management Assignment Help
KIT708 ICT Systems Strategy and Management Assignment Help
KIT709 Enterprise Architecture and Systems Assignment Help
KIT710 eLogistics Assignment Help
KIT711 Network Security Techniques and Technology Assignment Help
KIT712 Data Management Technology Assignment Help

Tag This :- EM201924ABD530SE Data Collection, Processing and Transformation Assignment Help

Funding in Business Assignment Help

funding in business assignment help - Aim of the research is to identify innovative strategies that would help business leaders to solve any financial problem.

Popular Press Reaction Assignment Help

popular press reaction assignment help - identify a recently published news article that exemplifies technology-related issue

Leadership Skills Assignment Help

leadership skills assignment help - The paper is the related to the strategic implementation of the policies by the HR within an organization.

Construction Hazards: Falls from Ladder Assignment Help

construction hazards: falls from ladder assignment help - The assignment has explored the aspects of ladder fall incidents that take place at the construction s

Labor Market Assignment Help

The present solution is based on the topic "Logistics and Supply chain industry" & demonstrates its detailed understanding along with SCM activities of the firm

Social Networks Assignment Help

Paper is a essay on the consequences of using social network that have emerged as an exceptional, influential and economical platform on the web.

COIT20256 Data Structures And Algorithms Assignment Help

coit20256 data structures and algorithms assignment help, Central Queensland University, Australia - Analyse the given problem, model, and design the required.

Excel in your Course

Experts are helping students not just improving grades but also to provide better learning of subject concepts and its problem statements. They are providing you world class assistance which may help you to excel in course or assignments.

Get Best and Affordable Data Collection, Processing and Transformation Assignment Help Service At Expertsminds!!

Assignment Samples

Excel in your Course

Leave a Comment [EM201924ABD530SE Data Collection, Processing and Transformation Assignment Help]

Featured Services

Popular Subjects Covered