Posts

Showing posts from January, 2019

Random Thoughts For the Day

Random Thoughts For the Day “The plural of anecdote is not data.” Data Science and the Art of Persuasion Article from Scott Berinato,  author of Good Charts ,    From the January – February 2019 issue Did you know  Florence Nightingale created  Coxcomb Charts ?  A story you probably heard in your stats 101 class, Student's t test and Guinness

UNIX Quick Hit

UNIX Quick Hit I was doing some work on the command line and thought it would be helpful to list out some useful Unix tidbits. List files in human readable format $ ls -lhS -l     use a long listing format -h     with -l, print sizes in human readable format (e.g., 1K 234M 2G) -S     sort by file size You can use man ls to see all possible parameters FIND Recursively find all files with extension .pyc $ find <directory> -type f -name "*.pyc" For example, if you were looking for all .pyc files in your local directory and all sub directories you would type $ find . -type f -name "*.pyc" How to Create an SSH Shortcut If you find yourself ssh'n to the same host repeatedly, I recommend creating a shortcut for this command in the config file in your .ssh directory. $ cd ~/.ssh $ vim config From here, you can now create shortcuts. You can specify the hostname, username, port, and the private key. For a ful...

WINDOW functions and LAG()

Window functions This is an example taken straight from the  PostgreSQL documentation So what is a window function and why would you use it? Well, window functions all you to perform aggregations for groups while keeping the rows separate. So lets say you want to know the average salary by department and use that to find each employees difference from the average. The average is not stored in you employee salary table (empsalary) so you need to calculate it. You can use the WINDOW function to create a "window" around each deparment, then, the aggregate function will only be applied to that window. SELECT depname, empno, salary,         avg(salary) OVER (PARTITION BY depname) FROM empsalary; Returns depname | empno | salary | avg -----------+-------+--------+----------------------- develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.000000...

New books and podcasts!

Image
New Books Just got Storytelling with Data: A Data Visualization Guide for Business Professionals   by Cole Nussbaumer Knaflic in the mail.  Storytelling with Data The first line of the introduction is a quote from  Yale Professor Emeritus   Edward Tufte:  "Power corrupts. PowerPoint corrupts absolutely."  Wired magazine 2003 This book is focused on everything that goes into conveying information inside your organization. It can be difficult working across teams. I have found you really need to know your audience to create the most effective visualization. Many times I have created what I thought was a great visualization with an obvious trend or insight only to get the response, "What is this this telling me?". Or "I just want to know xyz.". Similarly, there are time when I present a simple graph which spurs a series of questions that get's to what the stakeholder's REAL question was all along.  Also that quote for Tufte makes me want...

Chart Types & Styles

Image
Chart Types & Styles What is your process when you set out to make a data vizualization? Do you sketch what you are trying to show or discover? If using Tableau do you just start throwing variables into frame? Below is a useful though starter as you plan your data viz. Chart suggestions a thought starter Visualization of most common business data only requires 2 dimensional representation. Adding a third variable can be confusing for people who don't work with data everyday. So lets keep it simple and go through some of the decision making steps when deciding which visualization to use.  Three important things to consider are: Number of variables If you have more than four you may be in for a cluttered, unclear chart. Type of variables Numeric- Discrete or continuous Categorial - related or unrelated categories Association you are trying to show Relationship Multiple variables. Dependent or Independent. Comparison Difference or Similarity. Trends over...

New Years Resolutions

Image
New Years Resolutions It's that time of year again. Time to make resolutions. Below are my data resolutions for 2019! Read More  DataViz Books   Programming Books  Business Books  The first book on my to read list is Fluent Python. It takes you in depth into  core language features and libraries. This book also gets in the weeds on some python 3 features I have not utilized yet. Get Disciplined  Buy some index cards &  Some more Tips from Reddit Creating good habits is one of the harder things to do.  Exercise more Daily (code) workouts While managing data and working with multiple teams it is easy to fall off on coding and statistics skills. Daily practice is one way to stay sharp.