Setting up Python for GNNs

It seems that in the years that have passed since my first hello.py I have forgotten the harrowing experience of setting up a python environment. Compared to R + RStudio, stata, or even julia, python installation seems unnecessarily complex. Here I’ll briefly talk about how to manage your python environments, and how to develop python effectively. I’m just grateful that the language of choice for data science wasn’t Javascript.

Getting Around Your Computer with The Terminal

The terminal is powerful and precise tool, and key to developing effective applications. It’s power also makes it hard to learn, but you only need a few to get around your system.

pwd: short for present working directory this will tell you where you are

ls [dir] [-a] [-l]: short for list, lists all the files in the target directory

by default, ls will show you the files in your present working directory, meaning that ls and ls . are equivalent
if you want to see all the files (including hidden files) in your present directory then ls -a can be used
ls can be used to inspect a directory without traversing to it ls Code will list all the top level files from the Code directory

tree [dir]: tree works the same way as ls but shows you the entire tree of files recursively within a directory, tree isn’t installed by default on MacOs, but can be easily installed with homebrew - brew install tree

Example:

tree
.
├── Apps
│   ├── Octave.md
│   ├── README.md
│   ├── Settings.md
│   ├── araxis-merge.jpg
│   ├── beyond-compare.png
│   ├── delta-walker.jpg
│   ├── filemerge.png
│   └── kaleidoscope.png
├── CONTRIBUTING.md
├── Cpp
│   └── README.md
├── Docker
│   └── README.md
├── Git
│   ├── README.md
│   └── gitignore.md
└── Go
    └── README.md

cd: short for change directory, changes your present working directory whatever you specify

Anaconda

Anaconda believe it or not makes installing python (relatively) easy. Anaconda is largely compatible with most packages and does fancy magic to make your environments flexible. Miniconda is a minimalist Anaconda distribution, that comes with only the tools you need to manage an environment and nothing else.

Package Managers

Anaconda is compatible with both the pip and conda package managers. This means that any package you can find on PyPI, the python package index or on conda forge. This mean pip install and conda install are basically the same. I recommend using conda first and then falling back to pip if something goes wrong. In general your first choice for installing software should be whatever is listed on the library’s website.

Environments

Python is an old language with lots of packages needed backwards compatibility. Sometimes two packages will have conflicting dependancies. For this reason, it is recommended that you set up different python environments for different tasks. On my laptop I must have a dozen different environments.

When you open your terminal you’ll see something like this after installing miniconda

(base) ~ $

This (base) is the name of the current Anaconda environment, and the ~ is your pwd. The tilde ~ represents your home directory. This is typically something like /Users/ayush/ on MacOS.

To create a new conda environment the following syntax can be used

conda create --name [name] python=3.x

A good first exercise would be creating an environment for pytorch called torch. Once the environment is created we can activate it using the conda activate [env-name] command. After activation you should see something like this

(torch) ~ $

Now you can install whatever packages you need here. If you run into a ModuleNotFoundError it is typically an installation error that can be fixed with a quick pip install.

Developing

Writing python is the easy part of using it. If you’d like to continue using spyder I’d reccomend installing it in your environment and launching it from the terminal. If you need access to your terminal again its easy enough to open up another window.

Spyder is great for fast development and iteration, but good python code is typically laid out in modules and packages. Here I’ll walk through creating your own directory structure and running python scripts from the command line.

Let . be your present working directory/repository. We never want to leave this repository. Every action we take should be from this directory. Assume we have a python script ./src/hello.py. Instead of cd-ing to this directory and running from there we should run it from ..

We run python code with the python command.

(torch) ~ $ python ./src/hello.py

Assume we want to write some functions in one file and use them elsewhere.

tree .

.
└── src
    ├── hello.py
    └── hello_from_over_here.py

These are the contents of the hello_from_over_here.py

def hi_tom(): 
    print("Hi Tom! All the way from over here!")

To call this function from the hello.py file we’ll import it in

from hello_from_over_here import hi_tom

print("hello")
hi_tom()

When we run python ./src/hello.py we get the following output

hello
Hi Tom! All the way from over here!

This is great but sometimes we need more seperation. Each file is a module. We can pack a bunch of modules into package, and make it easier to work with. Packages are just folders of modules. The difference between a package and a normal folder on your hard drive is that a package has __init__.py file in it. This file can be empty but it is needed so python knows to treat your folder as a package. Here’s an example of a package structure.

.
└── src
    ├── hello.py
    └── package_more_like_packing
        ├── __init__.py
        └── hello_from_over_here.py

Becuase hi_tom is now in a package we have to change our import

from package_more_like_packing.hello_from_over_here import hi_tom

hi_tom()