"A map by itself requires little explanation, but once data are superimposed, readers will probably need labels on the maps, and legends explaining encodings like the color of markers." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"A recurring theme in machine learning is combining predictions across multiple models. There are techniques called bagging and boosting which seek to tweak the data and fit many estimates to it. Averaging across these can give a better prediction than any one model on its own. But here a serious problem arises: it is then very hard to explain what the model is (often referred to as a 'black box'). It is now a mixture of many, perhaps a thousand or more, models." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Any fool can fit a statistical model, given the data and some software. The real challenge is to decide whether it actually fits the data adequately. It might be the best that can be obtained, but still not good enough to use." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Apart from the technical challenge of working with the data itself, visualization in big data is different because showing the individual observations is just not an option. But visualization is essential here: for analysis to work well, we have to be assured that patterns and errors in the data have been spotted and understood. That is only possible by visualization with big data, because nobody can look over the data in a table or spreadsheet." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"As a first principle, any visualization should convey its information quickly and easily, and with minimal scope for misunderstanding. Unnecessary visual clutter makes more work for the reader’s brain to do, slows down the understanding (at which point they may give up) and may even allow some incorrect interpretations to creep in." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Cross-validation is a useful tool for finding optimal predictive models, and it also works well in visualization. The concept is simple: split the data at random into a 'training' and a 'test' set, fit the model to the training data, then see how well it predicts the test data. As the model gets more complex, it will always fit the training data better and better. It will also start off getting better results on the test data, but there comes a point where the test data predictions start going wrong." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Dashboards are collections of several linked visualizations all in one place. The idea is very popular as part of business intelligence: having current data on activity summarized and presented all inone place. One danger of cramming a lot of disparate information into one place is that you will quickly hit information overload. Interactivity and small multiples are definitely worth considering as ways of simplifying the information a reader has to digest in a dashboard. As with so many other visualizations, layering the detail for different readers is valuable." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Decision trees show the breakdown of the data by one variable then another in a very intuitive way, though they are generally just diagrams that don’t actually encode data visually." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"One very common problem in data visualization is that encoding numerical variables to area is incredibly popular, but readers can’t translate it back very well." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"The relevance to data visualization is that we are always conveying a message to some extent, and in the case of associations between variables, that message is sometimes a step removed from the data itself. If you are making visualizations, be careful not to impose your own interpretation too much when showing associations. If you are reading them, don’t assume that the message accompanying the data is as sound and scientifically based as the data themselves." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"The term 'infographics' is used for eye-catching diagrams which get a simple message across. They are very popular in advertising and can convey an impression of scientific, reliable information, but they are not the same thing as data visualization. An infographic will typically only convey a few numbers, and not use visual presentations to allow the reader to make comparisons of their own." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"There is often no one 'best' visualization, because it depends on context, what your audience already knows, how numerate or scientifically trained they are, what formats and conventions are regarded as standard in the particular field you’re working in, the medium you can use, and so on. It’s also partly scientific and partly artistic, so you get to express your own design style in it, which is what makes it so fascinating." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"With skewed data, quantiles will reflect the skew, while
adding standard deviations assumes symmetry in the distribution and can be
misleading."
"Random forests are essentially an ensemble of trees. They use many short trees, fitted to multiple samples of the data, and the predictions are averaged for each observation. This helps to get around a problem that trees, and many other machine learning techniques, are not guaranteed to find optimal models, in the way that linear regression is. They do a very challenging job of fitting non-linear predictions over many variables, even sometimes when there are more variables than there are observations. To do that, they have to employ 'greedy algorithms', which find a reasonably good model but not necessarily the very best model possible." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
No comments:
Post a Comment