Posts

Final Project Diamond Database

Image
 Problem Description:  I have always been interested in diamonds—their value, how they are traded, and the global network behind them. Growing up in India, where diamond buying is culturally significant and cities like Surat serve as major worldwide centers for diamond cutting and polishing, I was frequently exposed to discussions about diamond quality, pricing, and craftsmanship. I also enjoy watching documentaries about diamond grading and international trading, which deepened my curiosity about how specific characteristics influence market value. Although my personal experience is rooted in the Indian diamond market, the dataset used in this project—the well-known diamonds dataset—reflects U.S. retail pricing. This creates an interesting opportunity to compare my personal understanding with patterns observed in a different but well-established market. Importantly, many of the attributes used to evaluate diamonds—carat weight, cut quality, clarity, and color—are universal gr...

Module 12. Social Network Analysis

Image
  Once I had the correct Python version installed, the code for building and visualizing the social network worked smoothly. Using NetworkX made it easy to generate the random graph structure, and Plotnine provided a clean, ggplot-style way to visualize the nodes and edges. The final plot rendered clearly, and saving it as a PNG was straightforward! The biggest issue I ran into was with package compatibility. My computer was using Python 3.13, but the Plotnine package is not supported on that version yet. Because of that, the install kept failing and the plotnine module couldn't be imported. I solved the problem by installing Python 3.11, which is fully compatible with Plotnine. After switching to Python 3.11 and reinstalling the required packages, everything ran correctly. Yes — I would definitely use this approach again. NetworkX and Plotnine work well together: NetworkX handles the graph structure, while Plotnine gives me fine control over the aesthetics. For simple social netwo...

Module 11. modern historian in the field of data visualization

Image
  Recreated plot from  E R. Tufte:

Module 10. Time Series and Visualization

Image
  Visualization plays a big role in time series analysis because it helps us actually see how data changes over time instead of just looking at numbers. By plotting data, we can quickly spot patterns, trends, or seasonal behaviors that might not be obvious from raw values alone. For example, a line chart can show whether unemployment rates rise or fall during certain periods, helping us understand the bigger picture. Visuals also make it easier to detect sudden changes or outliers that could affect the analysis. Overall, visualization turns complex time-based data into something more intuitive and meaningful, making it a key first step in analyzing and interpreting time series data. R code for future reference: #Hot Dog eating contest library(readr) library(ggplot2) # Load the dataset hotdogs <- read_csv("http://datasets.flowingdata.com/hot-dog-contest-winners.csv") #view head(hotdogs) # Set color for bars: dark red if new record, grey otherwise colors <- ifelse(hotdogs...

Module 9. Visual Multi Variances Analysis

Image
  For this visualization, I used the built-in mtcars dataset in R, which contains data on various car models including fuel efficiency, horsepower, weight, and engine characteristics. I chose this dataset because it’s small, clean, and ideal for exploring relationships among several continuous and categorical variables. I have also worked with it before. The scatter plot shows how miles per gallon (MPG) decreases as weight (wt) increases. The points are colored by the number of cylinders, sized by horsepower, and faceted by transmission type (automatic or manual). This multivariate approach reveals that cars with more cylinders and higher horsepower generally weigh more and have lower fuel efficiency, while manual cars tend to achieve slightly higher MPG overall.The  multivariate visualization  was effective because it allowed multiple comparisons in one clean view. I applied three design principles from this module: Alignment:  The axes, labels, and panels are ...

Module 8. Correlation Analysis and ggplot2

Image
  From the patterns that can be seen in the regression model above, there is a strong, negative relationship between horsepower and miles per gallon. So, cars with higher horsepower will have lower gas mileage. The regression line clearly slopes downward, showing a consistent trend across all data points that confirms the negative correlation between the two variables.  In terms of grid layout enhancing interpretation, while my visual focused on a single regression model, using a clean and evenly spaced layout supported readability and emphasis on the regression line. The structured design drew attention to the main pattern—how horsepower variable affected mpg—without unnecessary clutter or visual noise. In general, placing the scatter plots side by side in a grid layout makes it easier to compare how each factor can influence mpg. The consistent scales, neutral color palette, and aligned axes allow quick visual comparison without distraction.  In my opinion, Few’s r...

Module 7. Visualizations by mtcars

Image
Scatter Plot using mtcars: The dataset used was mtcars. The faceted scatter plot of miles per gallon versus horsepower, grouped by cylinder count, shows a clear negative relationship between horsepower and fuel efficiency—cars with higher horsepower tend to have lower miles per gallon. Each facet showed distinct clusters, with 4-cylinder cars being more fuel-efficient while 8-cylinder cars had high horsepower but poor mileage. The 6-cylinder cars lie in between the two, in terms of mpg and hp. These patterns highlight how engine size directly influences performance trade-offs. My design aligned with Few’s and Yau’s recommendations by using consistent axes across facets, minimal and distinctive colors used to distinguish cylinder categories, and clean gridlines for easy comparison. I avoided decorative elements and focused on clarity, so it would not be too busy allowing viewers to interpret patterns without distraction. This approach emphasized data comprehension over aes...