It seems that in the years that have passed since my first
hello.py
I have forgotten the harrowing experience of setting up a python environment.
Compared to R + RStudio, stata, or even julia, python installation seems unnecessarily
complex. Here I’ll briefly talk about how to manage your python environments, and how
to develop python effectively. I’m just grateful that the language of choice for data
science wasn’t Javascript.
Getting Around Your Computer with The Terminal
The terminal is powerful and precise tool, and key to developing effective applications. It’s power also makes it hard to learn, but you only need a few to get around your system.
pwd
: short for present working directory this will tell you where you are
ls [dir] [-a] [-l]
: short for list, lists all the files in the target directory
- by default,
ls
will show you the files in your present working directory, meaning thatls
andls .
are equivalent - if you want to see all the files (including hidden files) in your present directory
then
ls -a
can be used ls
can be used to inspect a directory without traversing to itls Code
will list all the top level files from theCode
directory
tree [dir]
: tree works the same way as ls
but shows you the entire tree of files
recursively within a directory, tree isn’t installed by default on MacOs, but can
be easily installed with homebrew - brew install tree
Example:
tree
.
├── Apps
│ ├── Octave.md
│ ├── README.md
│ ├── Settings.md
│ ├── araxis-merge.jpg
│ ├── beyond-compare.png
│ ├── delta-walker.jpg
│ ├── filemerge.png
│ └── kaleidoscope.png
├── CONTRIBUTING.md
├── Cpp
│ └── README.md
├── Docker
│ └── README.md
├── Git
│ ├── README.md
│ └── gitignore.md
└── Go
└── README.md
cd
: short for change directory, changes your present working directory whatever
you specify
Anaconda
Anaconda believe it or not makes installing python (relatively) easy. Anaconda is largely compatible with most packages and does fancy magic to make your environments flexible. Miniconda is a minimalist Anaconda distribution, that comes with only the tools you need to manage an environment and nothing else.
Package Managers
Anaconda is compatible with both the pip
and conda
package managers. This means that
any package you can find on PyPI, the python package index or on
conda forge. This mean pip install
and conda install
are basically the same. I
recommend using conda
first and then falling back to pip
if something goes wrong.
In general your first choice for installing software should be whatever is listed
on the library’s website.
Environments
Python is an old language with lots of packages needed backwards compatibility. Sometimes two packages will have conflicting dependancies. For this reason, it is recommended that you set up different python environments for different tasks. On my laptop I must have a dozen different environments.
When you open your terminal you’ll see something like this after installing miniconda
(base) ~ $
This (base)
is the name of the current Anaconda environment, and the ~
is your pwd.
The tilde ~
represents your home directory. This is typically something like /Users/ayush/
on MacOS.
To create a new conda environment the following syntax can be used
conda create --name [name] python=3.x
A good first exercise would be creating an environment for pytorch called torch
. Once
the environment is created we can activate it using the conda activate [env-name]
command.
After activation you should see something like this
(torch) ~ $
Now you can install whatever packages you need here. If you run into a ModuleNotFoundError
it is typically an installation error that can be fixed with a quick pip install
.
Developing
Writing python is the easy part of using it. If you’d like to continue using spyder I’d reccomend installing it in your environment and launching it from the terminal. If you need access to your terminal again its easy enough to open up another window.
Spyder is great for fast development and iteration, but good python code is typically laid out in modules and packages. Here I’ll walk through creating your own directory structure and running python scripts from the command line.
Let .
be your present working directory/repository. We never want to leave this repository.
Every action we take should be from this directory. Assume we have a python script
./src/hello.py
. Instead of cd
-ing to this directory and running from there we should
run it from .
.
We run python code with the python
command.
(torch) ~ $ python ./src/hello.py
Assume we want to write some functions in one file and use them elsewhere.
tree .
.
└── src
├── hello.py
└── hello_from_over_here.py
These are the contents of the hello_from_over_here.py
def hi_tom():
print("Hi Tom! All the way from over here!")
To call this function from the hello.py
file we’ll import it in
from hello_from_over_here import hi_tom
print("hello")
hi_tom()
When we run python ./src/hello.py
we get the following output
hello
Hi Tom! All the way from over here!
This is great but sometimes we need more seperation. Each file is a module. We can
pack a bunch of modules into package, and make it easier to work with. Packages are
just folders of modules. The difference between a package and a normal folder on your
hard drive is that a package has __init__.py
file in it. This file can be empty but it is
needed so python knows to treat your folder as a package. Here’s an example of a package
structure.
.
└── src
├── hello.py
└── package_more_like_packing
├── __init__.py
└── hello_from_over_here.py
Becuase hi_tom
is now in a package we have to change our import
from package_more_like_packing.hello_from_over_here import hi_tom
hi_tom()