Data Collection
The main aim of the analysis was to understand the effect of physical characteristics of a diamond on its price and develop a model to predict the price of diamonds. For this purpose, a data comprising of information about physical characteristics of diamond and their prices were collected. The sample comprised of 53940 diamonds with price ranging from$ 326 to $ 18823. The data contained information about the length, width, depth, colour, cut, price and weight of the diamonds. Diamonds are expensive commodities often used in making jewelry products to give them a glittering and shining appearance. It is a solid form of carbon element and is crystalline in nature (Moriguchi, Ohara & Tsujioka, 2016). Besides being used in making jewelry diamonds are widely used in several industrial processes as well. The sections that follow in this report discuss about the analytic findings and their interpretations in detail from the data collected.
Data was collected from a primary source and is a secondary data. Using secondary data for the purpose of this analysis saved time and effort in conducting the study. We assume that the data collected from the primary source was originally collected using random sampling process. There was no missing element in the data as the data was complete. Overall, the data was of good quality and highly informative.
Descriptive Analysis
For the quantitative variables descriptive analytic approach included assessing the measures of center and dispersion. The measures of center included – mean, median and mode and the measures of dispersion or spread included the sample variance, standard deviation and range (Kaliyadan & Kulkarni, 2019).
carat |
depth |
table |
price |
length |
width |
depth |
|
Mean |
0.80 |
61.75 |
57.46 |
3932.80 |
5.73 |
5.73 |
3.54 |
Standard Error |
0.0020 |
0.0062 |
0.0096 |
17.1774 |
0.0048 |
0.0049 |
0.0030 |
Median |
0.7 |
61.8 |
57 |
2401 |
5.7 |
5.71 |
3.53 |
Mode |
0.3 |
62 |
56 |
605 |
4.37 |
4.34 |
2.7 |
Standard Deviation |
0.47 |
1.43 |
2.23 |
3989.44 |
1.12 |
1.14 |
0.71 |
Sample Variance |
0.22 |
2.05 |
4.99 |
15915629.42 |
1.26 |
1.30 |
0.50 |
Kurtosis |
1.26 |
5.74 |
2.80 |
2.18 |
-0.62 |
91.21 |
47.09 |
Skewness |
1.12 |
-0.08 |
0.80 |
1.62 |
0.38 |
2.43 |
1.52 |
Range |
4.81 |
36 |
52 |
18497 |
10.74 |
58.9 |
31.8 |
Minimum |
0.2 |
43 |
43 |
326 |
0 |
0 |
0 |
Maximum |
5.01 |
79 |
95 |
18823 |
10.74 |
58.9 |
31.8 |
Sum |
43040.87 |
3330762.9 |
3099240.5 |
212135217 |
309138.62 |
309320.33 |
190879.3 |
Count |
53940 |
53940 |
53940 |
53940 |
53940 |
53940 |
53940 |
The average weight of the diamonds in the sample was 0.8 carats. The average can be defined as the arithmetic mean which is the sum total of all observations divided by the number of total observations in the sample data (George & Mallery, 2018). The median value for the weight of the diamond in the sample was 0.7. According to definitions median is that observation in a data which comes in the middle when the data is either arranged in an ascending order or in descending order (Kaur, Stoltzfus & Yellapu, 2018). It is thus indicated that 50 % of the diamonds in the sample weighed more than 0.7 carats. Mode is the observation in a data which is observed to occur for the most number of times in the data (Conner & Johnson, 2017). The modal diamond weight in the sample was 0.3 carats. The sample variance of the weight of diamonds was 0.22. The standard deviation of the weight of diamonds about the observed mean was 0.47 carats. Standard deviation is the quantification of the amount of dispersion of the observations in data about the mean (Holcomb, 2016). The maximum weight observed in the sample was 5.01 carats and the minimum observed weight was 0.2 carat. The range was 4.81 carats. The range of a data is the difference between the maximum and minimum observation in the data (McCarthy et al., 2019).
Descriptive Analytic Approach
The depth percentage was computed as the actual depth divided by the average of the length and width. The average depth percentage of the diamonds was 61.75 %. The median depth percentage of the diamonds was 61.8 %. About 50 % of the diamonds had depth percentage greater than 61.8 %. The modal depth percentage was 62 %, indicating that the depth percentage of majority of the diamonds was 62 %. The sample variance was 2.05 %. The standard deviation of the depth percentage about the mean was 1.43 %. The standard deviation was quite less indicating very less dispersion of the observation in the variable about its mean.
Table is the width of the diamond top relative to the point with highest width. The average width of the top of diamonds was 57.46 mm. The median top width was 57 mm. The width of top of 50 % of the diamonds was greater than 57 mm. The sample variance was 4.99. the standard deviation in width of the diamond top was about the observed mean was 2.23 mm. The maximum relative top width of diamonds in the sample was 95 mm and the minimum relative top width was 43 mm. The range in relative top width of the diamonds in the sample was 52 mm.
On an average the length of the diamonds was 5.73 mm. The median length of the diamonds was 5.7 mm. 50 % of the diamonds had length greater than 5.7 mm. The observed modal length of the diamonds was 4.37 mm, which indicated that the length of majority of the diamonds was 4.37 mm. The sample variance was 1.26 and the standard deviation of the length of the diamonds about the mean observed length in the sample was 1.12 mm. The maximum length for any diamond observed in the sample was 10.74 mm and the minimum observed length was 0 mm. The range was 10.74 mm.
On an average the width of the diamonds was 5.73 mm. The median width of the diamonds was 5.71 mm. 50 % of the diamonds had width greater than 5.7 mm. The observed modal width of the diamonds was 4.34 mm, which indicated that the width of majority of the diamonds was 4.34 mm. The sample variance was 1.30 and the standard deviation of the width of the diamonds about the mean observed width in the sample was 1.14 mm. The maximum width for any diamond observed in the sample was 58.9 mm and the minimum observed width was 0 mm. The range was 58.9 mm.
On an average the depth of the diamonds was 3.54 mm. The median depth of the diamonds was 3.53 mm. 50 % of the diamonds had depth greater than 3.53 mm. The observed modal depth of the diamonds was 2.7 mm, which indicated that the depth of majority of the diamonds was 2.7 mm. The sample variance was 0.50 and the standard deviation of the depth of the diamonds about the mean observed depth in the sample was 0.71 mm. The maximum depth for any diamond observed in the sample was 31.8 mm and the minimum observed depth was 0 mm. The range was 31.8 mm.
Weight of the Diamonds
The average price of the diamonds in the sample was $ 3932.8. The median price of diamonds in the sample was $ 2401. 50 % of the diamonds in the sample had a price higher than $ 2401. The modal price of all diamonds in the sample was $ 605, indicating that majority of the diamonds had a price of $ 605. The sample variance for prices in the sample was 15915629.42 and the standard deviation for the price of the diamonds in the sample about the observed mean price was $ 3989.44. The maximum price of any diamond in the sample was $ 18823 and the minimum price of the diamonds in the sample was $ 326. The range of price of the diamonds was $ 18497.
For the variables which were categorical, the frequencies for each category under each variable was obtained.
Figure 1:Pie chart demonstrating the frequency of diamonds by their cut
Majority of the diamonds (40 %) had an ideal cut. 26 % of the diamonds had a premium cut. 22 % of the diamonds in the sample had very good cut. 9 % of the diamonds had a good cut and a minimum of only 3 % of the diamonds had a fair cut.
Figure 2: Pie chart demonstrating the frequency of diamonds by their color
The colours of the diamonds were ranked from D to J. According to data sources, D was the best colour and J was the worst colour and the colour quality has been arranged in the alphabetic order from D to J. Data revealed that majority of the diamonds had a colour of G (21 %). 18 % of the diamonds were of colour F and another 18 % of the diamonds were coloured E. 15 % of the diamonds were coloured H. 13 % of the diamonds were coloured D. 10 % of the diamonds were of I colour. Least diamonds which was only 5 % of the diamonds were of colour J.
Figure 3: Percentage of diamonds based on clarity categories.
The diamonds were categorised into 8 categories based on their clarity. According to data sources, IF was the best category in terms of clarity and I1 was the worst kind of diamonds in terms of clarity. 24 % of the diamonds were classified as SI1 clarity type which occupied a majority of the sample space. 23 % of the diamonds in the sample were classified VS2. 15 % of the diamonds were classified as VS1 type. 10 % of the diamonds were classified as VVS2. 17 % of the diamonds were classified as SI2. 7 % of the diamonds were classified as VVS1. 3 % of the diamonds in the sample were classified as IF. Only a minimum of 1 % of the diamonds in the sample were classified as I1.
Regression Analysis
A regression analysis was performed to investigate the impact of the physical attributes of a diamond on its price (Montgomery, Peck & Vining, 2021). The aim was to develop a predictive model for the price of diamonds.
Depth and Table of the Diamonds
Table 1: Regression Statistics |
||||||
Multiple R |
0.9524 |
|||||
R Square |
0.9070 |
|||||
Adjusted R Square |
0.9070 |
|||||
Standard Error |
1216.6653 |
|||||
Observations |
53940 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
9 |
778641936563.20 |
86515770729.24 |
58445.765 |
0.000 |
|
Residual |
53930 |
79831198954.16 |
1480274.41 |
|||
Total |
53939 |
858473135517.36 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
2781.86 |
428.810 |
6.49 |
0.00 |
1941.387 |
3622.330 |
cut |
120.74 |
5.715 |
21.13 |
0.00 |
109.539 |
131.941 |
color |
322.69 |
3.259 |
99.00 |
0.00 |
316.298 |
329.075 |
clarity |
501.86 |
3.523 |
142.45 |
0.00 |
494.951 |
508.761 |
carat |
10744.04 |
51.837 |
207.26 |
0.00 |
10642.437 |
10845.640 |
Depth_Percenetage |
-79.80 |
4.794 |
-16.65 |
0.00 |
-89.192 |
-70.400 |
table |
-26.76 |
2.948 |
-9.08 |
0.00 |
-32.539 |
-20.984 |
Length |
-877.70 |
35.226 |
-24.92 |
0.00 |
-946.743 |
-808.655 |
Width |
43.74 |
20.751 |
2.11 |
0.04 |
3.064 |
84.407 |
Depth |
-29.34 |
36.017 |
-0.81 |
0.42 |
-99.935 |
41.254 |
The regression model was statistically significant, F (9, 53930) = 58445.765, p < 0.05. The qualities – cut, colour, clarity, carat, depth percentage, table, length and width were found to have a significant impact of a diamond were found to have a statistically significant impact on the price of the diamond. The R-squared value being 0.9524 indicated the model was an excellent fit to the data (Arkes, 2019). The interpretations from the model are discussed in the next section.
The regression was statistically significant which indicated that the coefficient of at least one independent variable in the model was different from 0 (Darlington & Hayes, 2017). The impact of cut on the price of the diamonds was statistically significant (p < 0.05). Colour had a statistically significant impact (p < 0.05) on the price of diamonds. Increase of colour quality led to an increase in the price of diamonds. The effect of clarity on the price of diamonds was statistically significant (p < 0.05) and positive. Better the quality higher was the price of the diamonds. The weight of the diamond had a statistically significant impact on its price (p < 0.05). By observing the coefficient associated it was interpreted that for every increase in weight by 1 carat, the price increased by $ 10744.04 (Gunst & Mason, 2018). Depth percentage of diamonds had a negative but significant impact on its price (p < 0.05). For increase of depth percentage by 1% the price decreased by $ 79.8. Impact of relative top width of diamond on the price was statistically significant and negative (p < 0.05). 1 mm decrease in table led to the decrease of diamond price by $ 26.76. Length of diamond had a negative but statistically significant impact on its price. A 1 mm increase in length of diamond led to increase in its price by $ 877.7. Width had a significant and positive impact on the price of diamonds (p = 0.035). Increase of width by 1 mm, the price increased by $43.74. Actual depth was found to have no statistically significant effect on the price of the diamonds (p = 0.415). Based on the R – squared value it was concluded that the developed regression model could explain 95.24 % of the variation in the price of the diamonds (Schroeder, Sjoquist & Stephan, 2016).
Conclusion
The primary aim of the analysis was to develop a regression model to assess the impact of the physical characteristic attributes of a diamond on its price and develop a predictive model for the price of diamonds. The report discussed the descriptive statistical findings from the data of a sample of 53940 diamonds and the interpretations from the developed regression model model. The average price of diamonds in the sample was $ 3932.8. The physical characteristics of diamond taken into account for the regression model were colour, cut, weight, clarity and dimensional attributes such as length, width and depth. Colour, cut, weight, width and clarity had a positive impact on the price of diamonds. Depth percentage and length had a negative impact on the price of the diamonds. The developed model was a good fit to the price of the diamonds and it can be used as a predictive model to predict diamond prices.
References
Arkes, J. (2019). Regression analysis: A practical introduction. Routledge.
Conner, B., & Johnson, E. (2017). Descriptive statistics. American Nurse Today, 12(11), 52-55.
Darlington, R. B., & Hayes, A. F. (2017). Regression analysis and linear models. New York, NY: Guilford, 603-611.
George, D., & Mallery, P. (2018). Descriptive statistics. In IBM SPSS Statistics 25 Step by Step (pp. 126-134). Routledge.
Gunst, R. F., & Mason, R. L. (2018). Regression analysis and its application: a data-oriented approach. CRC Press.
Holcomb, Z. (2016). Fundamentals of descriptive statistics. Routledge.
Kaggle.com. (2022). Diamonds. Retrieved 3 April 2022, from https://www.kaggle.com/datasets/shivam2503/diamonds
Kaliyadan, F., & Kulkarni, V. (2019). Types of variables, descriptive statistics, and sample size. Indian dermatology online journal, 10(1), 82.
Kaur, P., Stoltzfus, J., & Yellapu, V. (2018). Descriptive statistics. International Journal of Academic Medicine, 4(1), 60.
McCarthy, R. V., McCarthy, M. M., Ceccucci, W., & Halawi, L. (2019). What do descriptive statistics tell us. In Applying Predictive Analytics (pp. 57-87). Springer, Cham.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis. John Wiley & Sons.
Moriguchi, H., Ohara, H., & Tsujioka, M. (2016). History and applications of diamond-like carbon manufacturing processes. Sei Technical Review, 82, 52-58.
Schroeder, L. D., Sjoquist, D. L., & Stephan, P. E. (2016). Understanding regression analysis: An introductory guide (Vol. 57). Sage Publications.