Visualization & Data Source
Baseball Player Data Visualization
Summary
This data visualization is the project I worked on during my Udacity Data Analysis nanodegree. This interactive and animated visualization shows a number of summary statistics based on individual baseball players. The animated charts would give readers insights into the correlation between the player’s height, handedness and the home run. Some of them are listed below.
- The players’ height distribution chart shows normal curve pattern as expected, with the majority of players’ heights are in the range of 72-73 inches. This is the same for all handedness group.
- Interestingly, the maximum home run of each height group also follows a similar pattern, albeit less consistent. Only in Right handedness group that the highest maximum value falls within 74-75 inches; for the other two handedness groups, it falls in 72-73 inches.
- Across the handedness groups, the highest home run recorded is 548. This record is of a right-handed player within height range 74-75 inches. However, his performance is an outlier since the average home run of his group is only 55.
- In general, the average home run value in each group is not more than 100.
Design
The most important decision is what information to present in the chart. First I chose bubble chart (Project6Baseball4.htm and earlier versions), however, I found that this type of chart alone was not the best option in communicating my data. I then decided to use bar chart and bubble chart with height as X-axis and two Y-axis: count of players and the home run. Then I used handedness as the filter of data for animation. The next decision was to choose between D3 JS or Dimple JS. I chose Dimple JS with storyboard based on its simpler way of building the animated and interactive chart. Furthermore, there are multiple advance Dimplejs examples available at http://dimplejs.org/.
I don’t consider removing outliers in the data since the dataset is curated one and I believe I need to present all of them in the chart. With that being said, I removed one duplicated row with the same name and other metrics from the dataset and couple of columns as explained in the Fix section
Resources
I use mainly two resources:
- Example codes in Data Visualization course
- Advance examples with storyboard in Dimplejs.org
- Wiki github of dimplejs https://github.com/PMSI-AlignAlytics/dimple/wiki/ to understand options and parameters.