Have you ever dreamed of having some snippets of code to read any type of files on disk, display many graphs at same time, create and auto-size save histogram in one liner in python,…. ?
Of course, there are pandas, matplotlib, seaborn, but don’t you write over and over the same code, or search into Stack-Overflow the same snippets.
This is what utilmy does :
A large collection of One-Liner functions to increase daily productivity and reduce the code for daily data science and fast output.
Some example :
from utilmy import pd_read_file
df = pd_read_file([“path1/data*.parquet”, ‘path2/datab_*.csv’], n_pool=4)
Read and concatenate files from disk in parallel way into Pandas dataframe.
Install is : pip install utilmy
List of function available (in in Pycharm):
from utilmy import XXXXX
from utilmy.tabular import XXXX
Reading a File to Pandas Dataframe
from utilmy import pd_read_filedf = pd_read_file([“path1/data*.parquet”, ‘path2/datab_*.csv’], n_pool=4)
pd_read_file function lets you read and concatenate files from the local disk. It applies parallelization to improve speed and efficiency. The resulting variable is a pandas dataframe that can be used with numerous other functions compatible with the library.
Saving a Dataframe to File
from utilmy import
pd_to_file() function lets us easily save a pandas dataframe to the local disk. It automatically detects the file format.
Plotting Multiple Variables
from utilmy import pd_plot_multi
cols = ['T (degC)', 'Tpot (K)']
pd_plot_multi() function lets us quickly plot multiple variables from a pandas dataframe. We only have to specify the dataframe, as well as a list with the columns that we want to be plotted. After doing that, a matplotlib graph is displayed.
Applying Stratified Sampling to a Dataframe
from utilmy import pd_sample_strat
pd_sample_strat(df1, col, n)
pd_sample_strat() function lets us apply stratified sampling on a specific dataframe column. For every unique value of the specified column
n random samples will be selected, while the rest are going to be dropped.
Binning Numeric Values
from utilmy import pd_col_bins
pd_col_bins(df1, col, nbins=10)
Binning is the process where continuous numeric values are grouped in intervals known as bins. This can be easily accomplished with the
pd_col_bins function. You simply have to specify a pandas dataframe, the numeric column you want to apply binning to and the number of bins.
Converting a Date to Unix Timestamp
from utilmy import to_timeunix
Getting the Unix timestamp of a date can be useful in many cases. This can be accomplished easily with the
to_timeunix() function. You simply pass the date as a string, and the Unix timestamp is returned.
Getting OS Memory Information
from utilmy import os_memory
Knowing how much RAM memory is available is useful in many occasions. The
os_memory() prints the total RAM of the system, as well as the amount that is free and used at that time.
Getting the Number of CPU Cores
from utilmy import os_cpu
os_cpu() function prints the number of CPU cores available. This function can be useful in case you are working on a remote machine and don’t know how many cores it has. Furthermore, you can use it to define the number of cores that should be utilized in functions that support parallel processing.
Getting the Working Directory in Unix format
from utilmy import os_getcwd
os_getcwd() function returns the current working directory of the OS,
in correct Unix format “/” (windows is converted to Unix).
Getting the Intersection of Two Lists
from utilmy import np_list_intersection
np_list_intersection([1, 2, 3, 4], [3, 4, 5, 6])
The intersection of two lists contains the elements that are contained in both of them. We can do that easily with the
np_list_intersection() function, that takes two lists as parameters. In this example, the returned list will be
[3, 4] as those are the two common elements.