Loose tool notes
Since the last time I tried it a few years ago, google translate lets you download a pdf of the translation of a pdf (preserving the page number). Also machine translation seems a lot better than it used to be, the scope of what’s an easily understandable source just keeps expanding.
Jittering overlapping values for an Altair graph
ggplots in R has a better set of functions for slightly offsetting overlapping points so you get a sense that a lot of points are at 0,0.
Altair in Python doesn’t have a way of doing this, and I found this stackoverflow answer that did most of the needed bits. I’ve adjusted it so the random offsets can be negative and the process repeats until the minimum offset value is reached for all points. I wasn’t sure about how some of the numpy bits were working, so I’ve made some of the coments more explicit.
from scipy.spatial.distance import pdist import numpy as np import pandas as pd def jitter_df( df: pd.DataFrame, cols: List[str], threshold: float = 0.2, jitter: float = 0.1 ) -> pd.DataFrame: """ Stops overlap in plotted graphs by moving apart overlapped values in specified cols. extends answer from https://stackoverflow.com/a/58772101 """ n = len(df) while True: # calculate distance matrix for specified columns p = pdist(df[cols]) # the distance matrix will contain duplicate values (A,B and B,A) # this lets us just get one set, the upper triangle i, j = np.triu_indices(n, 1) # Initialize a mask of False too_close = np.zeros(n, bool) # in-place operation # for indices (i), check if distance (p) is below threshold # and update mask (too_close) at same place np.logical_or.at(too_close, i, p <= threshold) overlap_count = too_close.sum() if overlap_count == 0: # we're done, escape return df # random offset either side of 0 shape = (overlap_count, len(cols)) rng = (np.random.rand(*shape) * jitter) - (jitter / 2) # apply offset to items that are too close df.loc[too_close, cols] += rng