Module 6. basic data visualization using R
I created a boxplot using the mtcars dataset to compare miles per gallon (MPG) among vehicles with different numbers of cylinders. A boxplot is useful for showing the distribution and variability of data across categories, highlighting the median, quartiles, and outliers. I chose this chart because it provides a clear summary of how fuel efficiency changes as engine size (cylinders) increases, without too much intricate detail. To add, the visualization did reveal clear differences between the groups. Cars with four cylinders had noticeably higher MPG values, showing that they are generally more fuel-efficient. In contrast, vehicles with six or eight cylinders had lower MPG values, suggesting that as engine size increases, fuel efficiency tends to decrease. This pattern shows a strong inverse relationship between the number of cylinders and miles per gallon. In terms of deviations in the data, the chart mostly confirmed the expected trend that larger engines consume more fuel. However, I noticed a few outliers within each group, where some six-cylinder cars performed almost as efficiently as the best four-cylinder cars. This deviation could indicate differences in car design, weight, or transmission that influence performance beyond just the number of cylinders. These small variations highlight how real-world data can deviate from simplified assumptions!
According to Few’s principles in Chapter 9, effective visualizations should emphasize clarity, accuracy, and simplicity — all qualities this boxplot embodies. The chart avoids unnecessary colors, colors easily differentiable, decorations and uses clear axis labels, making it easy to interpret. In line with Yau’s ideas in Chapter 7, the visualization effectively matches the type of chart to the data, using categorical grouping on the x-axis and continuous measurement on the y-axis. Both authors stress that good data visualization helps reveal insight rather than obscure it, and I believe this plot achieves that goal. One challenge was determining which type of chart would most effectively display the differences in MPG among cylinder categories. Initially, I tried a scatterplot, but it was cluttered and made it hard to compare groups clearly. The boxplot solved this by summarizing group distributions, but interpreting outliers still required careful attention to ensure they weren’t mistaken for errors.
Comments
Post a Comment