Get Best and Affordable Data Collection, Processing and Transformation Assignment Help Service At Expertsminds!!

Home   Course   University of Tasmania Assignment Help
Previous << || >> Next

ARE YOU LOOKING FOR RELIABLE KIT108 ARTIFICIAL INTELLIGENCE ASSIGNMENT ASSIGNMENT HELP SERVICES? EXPERTSMINDS.COM IS RIGHT CHOICE AS YOUR STUDY PARTNER!

KIT108 Artificial Intelligence Assignment - University of Tasmania, Australia

Unit Learning Outcomes -

1. Understand the local and global impact of AI on individuals, organizations, and society

2. Adapt and apply techniques for acquiring, representing, and reasoning with data, information, and knowledge

3. Select and effectively apply techniques to develop simple AI solutions

4. Analyze a problem, apply knowledge of AI principles, and use ICT technical skills to develop potential solutions

5. Evaluate strengths and weaknesses of potential AI solutions

ESTIMATING THE AGE OF ABALONE BASED ON PHYSICAL MEASUREMENTS

Introduction

Abalone is a general name for small and large sea snail of the Haliotidae family. They can be edible or poisonous. Typically the age is determined through a complex laboratory procedure, which involves cutting through the subject, staining it and recording the number of rings through a microscope. Many scientific methods has been developed which support prediction of the age from its physical measurements. The main objective of this analysis is to develop ML models which can be used to predict the number ofrings based on its physical measurements.

Task 1: Data Collection - Identify irrelevant information from the data and remove it to clean the data.

Answer - Data collection

Data used on this analysis was obtained from a publicly available data publishedby (Warwick J Nash et al, 1994).The following variables are included on the dataset; Gender, length, Thickness, Tallness, weight attributes and ring (see appendix table 1). The original dataset can be found on UCL machine learning repository link below found on the references.

SAVE YOUR HIGHER GRADE WITH ACQUIRING DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT HELP & QUALITY HOMEWORK WRITING SERVICES OF EXPERTSMINDS.COM!

Task 2: Data Pre-processing - There are some missing values for height attribute and rings. Decide the way you handle this issue and explain why.

Answer - Data Processing

The collected data includes some missing values for the variable height and rings; the missing values were estimated using data imputation in weka. Through the Generic user interface (GUI) of the software the filter replacemissingvalues was used for data imputation.The imputation algorithm works by replacing missing values with the mean of attribute. Imputing missing values was found the most appropriate to ensure that the sample is preserved

Task 3: Data Transformation - We need to create a new attribute called volume from other attributes as: volume = length * diameter * height. Normalise the data into [0-1] range.

Answer - Data Transformation

For this analysis data transformation was done in both weka and excel. In excel a new attribute was calculated by multiplying length, volume and diameter i.e multiplying columns A,Band C of our dataset. The file was saved as a CSV file for further use. On weka the normalize filter under supervised option,by default the filter scales data into a scale 0-1 by a method called min-max normalization. The formula used in mini-max normalization is ;

Normalized value =((a- b1))/((c1-b1)) *(c2-b2)+b2

Where a is the value to be normalized, b1 is the minimum value of the variable, c1 is the maximum value of the variable. c2 is the desired maximum value and b2 is the desired minimum value.

Task 4: Data Mining & Pattern Evaluation - Prepare your data from the to have: A training set of the first 2500 samples, A validation set of the next 633 samples, and A test set of the last 1044 samples. Run 15 machine learning algorithms and report their accuracy on the validation set to a table. Explain how the best algorithms work (in the report) Tips: How to improve performance?

Answer - DATA MINING & Pattern Evaluation

Data preparation

The data was partitioned into training, validation and testing sets in by random sampling in excel, the following step; first an index (id) for each row was created by adding a new column with values 1 to 4177, a second column with a random value for each row was added using excel function =RAND().A third column was added which is a random sample without replacement from the index column using the following excel formula =INDEX($L$1: $L$4177),RANK(M1,$M$1: $M$4177),1) then drag to fill. The new variable contained the new sample for rows; the data was then sorted by the sampled indexes. The first 2500, samples were taken to training, the second 633 samples were taken to validation then the lastset was taken 1044 observation to testing test.

DO YOU WANT TO EXCEL IN DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT? HIRE TRUSTED TUTORS FROM EXPERTSMINDS AND ACHIEVE SUCCESS!

Modeling

15 machine learning algorithms were run using the training set then their performance on the validation set was extracted and casted on table 1 below

Table 1: Results of the 15 algorithms on the validation set

 

R

MAE

RMSE

Relative absolute error

Root relative squared error

Total Number of Instances

RandomForest

0.7306

0.0542

0.0792

64.18%

68.29%

633

K-star

0.7146

0.0546

0.0816

64.71%

70.33%

633

SMReg

0.7212

0.0549

0.0816

65.00%

70.31%

633

M5algorithm

0.731

0.056

0.0795

66.37%

68.55%

633

multlayer

0.7338

0.0568

0.0839

67.31%

72.31%

633

linearReg

0.7181

0.0569

0.0808

67.40%

69.66%

633

Reptree

0.654

0.0602

0.0891

71.35%

76.79%

633

M5 tree model

0.7143

0.0608

0.083

72.01%

71.52%

633

Decision table

0.6584

0.0609

0.0874

72.12%

75.30%

633

additive regression

0.6577

0.0625

0.0875

74.05%

75.41%

633

IBK(K neighbours)

0.6221

0.0661

0.0968

78.27%

83.47%

633

LWL

0.5285

0.0716

0.0985

84.78%

84.92%

633

Decision stump

0.5146

0.072

0.0995

85.30%

85.75%

633

Randomtree

0.5525

0.0779

0.1113

92.24%

95.92%

633

ZeroR

0

0.0844

0.116

100%

100%

633

The best performing model on the validation set was found to be a random forest; the model had root mean square error value equal to 0.0792, a mean absolute error 0.0542 and a root relative squared error equal to 68.29%. These values are a measure of deviation between the actual value and predicted value. Small values are therefore preferred. The model was then taken to the testing set.

EXPERTSMINDS.COM ACCEPTS INSTANT AND SHORT DEADLINES ORDER FOR DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT - ORDER TODAY FOR EXCELLENCE!

Table 2: error measures of the random forest on the testing set

 

R

MAE

RMSE

Relative absolute error

Root relative squared error

Total Number of Instances

Random Forest

0.7434

0.0563

0.0791

65.26%

67.09%

1044

Table 3: Variable importance

Node impurity

variable

0.09 (10939)

shell weight

0.04 (9367)

Volume

0.03 (10746)

height

0.03 (13517)

shucked weight

0.02 (14734)

whole weight

0.02 (13434)

Diameter

0.02 (4610)

sex

0.02 (11769)

Viscera weight

0.01 (16840)

Length

0    (0)

edible

NEVER LOSE YOUR CHANCE TO EXCEL IN DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT - HIRE BEST QUALITY TUTOR FOR ASSIGNMENT HELP!

Conclusion

How the model works

A random forest is a modification of a random tree model to include more than one tree; it builds bagged trees on bootstrapped training samples. The splits of trees into more than one tree are based on sampling of the predictors. The algorithm has therefore two main parameters, the number of variables to sample before splitting a tree and the minimum number of observations to build a tree on, when this value is reached, sampling stops then trees vote, the most voted values are chosen as prediction.

From the above output , the model had a root mean squared error equal to 0.0791, and a mean absolute error equal to 0.0563. These accuracy measures show that the model performs better on the testing set, probably because of large sample size. The variable importance were computed in terms of node impurity where on each node permutation were carried out and the amount of change on out of bag error is calculated the overall decrease in error is therefore considered to determine the sensitivity of the given variable. The most important variable to the model is therefore the weight of the abalone followed by the computed variable volume. The least important is edible.

ORDER NEW DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT & GET 100% ORIGINAL SOLUTION AND QUALITY WRITTEN CONTENTS IN WELL FORMATS AND PROPER REFERENCING.

Get our University of Tasmania, Australia Assignment Help services for below mentioned courses like:-

  • KIT502 Web Development Assignment Help
  • KIT505 Computational Thinking and Impact of Emerging Technology Assignment Help
  • KIT507 Games Design and Production Assignment Help
  • KIT508 Virtual and Mixed Reality Technology Assignment Help
  • KIT707 Knowledge and Information Management Assignment Help
  • KIT708 ICT Systems Strategy and Management Assignment Help
  • KIT709 Enterprise Architecture and Systems Assignment Help
  • KIT710 eLogistics Assignment Help
  • KIT711 Network Security Techniques and Technology Assignment Help
  • KIT712 Data Management Technology Assignment Help
Tag This :- EM201924ABD530SE Data Collection, Processing and Transformation Assignment Help

get assignment Quote

Assignment Samples

    Promotional Plan Assignment Help

    promotional plan assignment help - write the integrated promotional plan necessary to successfully market your health care product.

    General Psychology Assignment Help

    general psychology assignment help - the whole analysis prescribes the analytical process in a court and legal perspective the points comes it clear.

Get Academic Excellence with Best Skilled Tutor! Order Assignment Now! Submit Assignment