Posts

Showing posts from September, 2018

Uni-WHAT!?

Unicode errors in python 2.7 Ever seen this at the top of a python file? import sys reload(sys) sys.setdefaultencoding('utf8') If you see this it means the person who wrote the python code received a unicode error, didn't know how to fix it properly, then Google'd the error and found this quick solution. Here's a good stack overflow post on why not to use setdefaultencoding() The short answer is, in Python 2 the default encoding is ASCII. When it encounters other encodings it will raise errors. When you change the default encoding you can't be sure whats going out of your code is what came in. Also, when you reload a module, you actually get two copies of the module in your runtime which can cause some weird issues. So how to deal with different encodings ? use decode() and deal with the text as it comes in. Use var.decode(encoding) to decode a byte string as text. Then use var.encode(encoding) to encode a text string as bytes. Better yet, Py...

SQL Quick Hits

SQL tidbits LIKE '_' You've probably user LIKE and % to search partial strings, but did you know underscore _ can match exactly one character. id name 1 first 2 second 3 third >>> Select * from employee where name like 'fi%'; Returns the same result as: >>> Select * from employee where name like 'firs_'; UNION vs UNION ALL While you may not have thought about it, UNION is essentially a select distinct on the second table you are appending. UNION ALL won't remove duplicate rows, it just pulls all rows from all tables fitting your query specifics and combines them into a table. Creating table from csv file in PostgreSQL Sometime you want to quickly populate a table from a csv file. This can be done from the command line. After you create your table in the database, use the following command to upload your data. $ psql -h <serverhostname>  -d <database> -U <user> -c ...

Upcasting in python

Upcasting Link to official Docs Have you ever run into a scenario where you have set your column type to int but when you go to display it either in a report or visualization it comes out a float? This happens because of something called upcasting. "Types can potentially be upcasted when combined with other types, meaning they are promoted from the current type (e.g. int to float)." Lets start with a DataFrame of 1 column and 8 rows where the values are random numbers from the normal distribution. >>> import pandas as pd >>>   import numpy as np >>>   df1 = pd.DataFrame(np.random.randn(8, 1), columns=['A'], dtype='float32') >>>   df1    A 0  0.406792 1  0.810450 2  1.161985 3 -1.402411 4  1.385434 5 -1.091746 6  0.018586 7 -0.606741 Now lets create a 3x8 DataFrame >>>   df2 = pd.DataFrame( dict(                  A =...

Merge, join, and concatenate in Pandas

Merge, Join, Concat in Pandas First thing, go skim the official Pandas docs . It is pretty straightforward instructions on how to perform merge, join, concat using pandas. It has everything you need to know, but it is written very dryly and covers a lot of cases I rarely come across in my day to day, so I'll try to summarize quickly. I'll use their DataFrame in my notes below. df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],                     'B': ['B0', 'B1', 'B2', 'B3'],                     'C': ['C0', 'C1', 'C2', 'C3'],                     'D': ['D0', 'D1', 'D2', 'D3']},                     index=[0, 1, 2, 3])       df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],                     'B': ['B4', 'B5', '...

First Post

Image
After years of keeping (and sometimes losing) notes on my laptop I decided to post them here. I recently started doing a complete documentation update for the analytics department at my job and I was surprised how it helped me refactor parts of my workflow as well as go back and update some older methods with improved ones. So I thought, why not post those here as a reference for myself, and hopefully some of these tips others will find helpful as well. Some posts will be quick tips. Some will be longer tutorials. It will span Python, SQL, testing, reporting, visualizations, and more.