Getting Started with Jupyter(IPython)

I have been working with IPython for a while but haven’t dig into what is happening with its seperation with Jupyter.

As is described on ipython.org

IPython is a growing project, with increasingly language-agnostic components. IPython 3.x was the last monolithic release of IPython, containing the notebook server, qtconsole, etc. As of IPython 4.0, the language-agnostic parts of the project: the notebook format, message protocol, qtconsole, notebook web application, etc. have moved to new projects under the name Jupyter. IPython itself is focused on interactive Python, part of which is providing a Python kernel for Jupyter.

With Jupyter, IPython itself has become a pure interactive kernel, which can be connected through Jupyter user interface as sush Notebook, console, and qtconsole.

Installation

To install Jupyter with IPython, run pip install jupyter for python2 or pip3 install jupyter. For installing jupyter in a virtualenv, simply create the virtualenv and make sure the terminal is working on that virtualenv.

Installation through anaconda can be done by run conda install jupyter

Using Jupyter in terminal console

Jupyter console uses terminal console as UI to interact with IPython kernel, similar to original ipython.

Run the following to open a Jupyter console

jupyter console

The complete document can be found at http://jupyter-console.readthedocs.org/en/latest/.

Using Jupyter in Notebook

Jupyter Notebook is a super rich-featured web-based IPython user interface that can run interactive python code, preserve input/output, work with documents, etc.

Try opening a Jupyter Notebook in your web browser by running

jupyter notebook

Read the whole Jupyter Notebook documentation here at https://jupyter-notebook.readthedocs.org/en/latest/notebook.html.

Using Jupyter in Qt Console

Jupyter qtconsole uses PyQt as its user interface with IPython.

Because Qt is not installed with Jupyter, separate installation is needed.

Here is the detailed instruction of installing PyQt5 on Mac I found from here

Requirements

  • xcode
  • python
  • Qt libraries
  • SIP
  • PyQt

Download

installation

  • install xcode
  • install the Command Line Tools (open Xcode > Preferences > Downloads)
  • install Qt libraries (qt-opensource-mac-x64-clang-5.*.dmg)
  • install python
  • create a virtual env
  • unzip and compile SIP and PyQt

Here are the whole commands for installation after download:

cd /var/tmp
cp /Users/gvincent/Downloads/PyQt-gpl-5.*.tar.gz .
cp /Users/gvincent/Downloads/sip-4.*.tar.gz .
tar xvf PyQt-gpl-5.*.tar.gz
tar xvf sip-4.*.tar.gz
cd sip-*/
python3 configure.py -d /path/to/virtualenv/site-packages --arch x86_64
make
sudo make install
sudo make clean
cd ../PyQt-gpl-5.2.1/
python3 configure.py --destdir /path/to/virtualenv/site-packages --qmake ~/Qt/5.*/clang_64/bin/qmake
make
sudo make install
sudo make clean
~/.env/ariane_mail/bin/python -c "import PyQt5"

I got a fatal error: 'qgeolocation.h' file not found error during installing PyQT5, I changed

python3 configure.py --destdir /path/to/virtualenv/site-packages --qmake ~/Qt/5.*/clang_64/bin/qmake

to

python3 configure.py --destdir /path/to/virtualenv/site-packages --qmake ~/Qt/5.*/clang_64/bin/qmake --disable=QtPositioning

and the problem is solved.

After these are successfully installed, you can open a Jupyter Qt console by running

jupyter qtconsole

Read the whole Jupyter QtConsole documentation here at https://jupyter.org/qtconsole/stable/

Summary

This post summarizes how to install Jupyter and IPython, and how to start using Jupyter in three different consoles:

  • Web-based Notebook
  • Terminal Console
  • Qt Console

It also highlights how to install PyQt5 so Qt Console can be started.

After these initial setup and exploration, we can further explore the rich world of Jupyter that can help our python workflow greatly.


Install Postgresql and PostGIS on AWS ubuntu 14.04 and Mac OSX

On Mac OSX

Because of Homebrew, installation is easier on Mac. Run these two commands to install both postgresql and PostGIS

$ brew install postgres
$ brew install postgis

It is done, how nice!

Start the service

type this command to start the postgresql service.

$ pg_ctl -D /usr/local/var/postgres start

then check if the service is up and running.

$ export PGDATA=/usr/local/var/postgres
$ pg_ctl status`

Make sure postgresql starts when mac starts up

If you don’t want to run the above start command every time you start your mac, do the following to make sure postgresql automatically starts.

  1. make sure you have this directory ~/Library/LaunchAgents. (create it if it does not exist)

  2. run ln -sfv /usr/local/opt/postgresql/*.plist ~/Library/LaunchAgents to create a symbolic link for postgres plist file

  3. run launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist to add the plist to launch control.

Postgresql should start automatically every time your mac starts.

On AWS Ubuntu

Remote login to aws ubuntu instance, then run the following:

$ sudo apt-get update
$ sudo apt-get install -y postgresql postgresql-contrib

After installation, you can create database by running

sudo -u postgres createdb DATABASE_NAME

If you want to create a specific user for a database, run

$ sudo -u postgres createuser -P USER_NAME
$ sudo -u postgres createdb -O USER_NAME DATABSE_NAME

Then you can test database connection by running psql -U USER_NAME DATABSE_NAME

Install PostGIS

After postgresql is installed and connection tested, we can install postgis:

9.* is the version of your postgresql, and 2.* the version of PostGIS

$ sudo apt-get install postgresql-9.*-postgis-2.*
$ sudo apt-get install postgresql-server-dev-9.*

Then you can create PostGIS extension for your regular postgresql database to enable GIS functions.

$ sudo -u postgres psql DATABASE_NAME

In psql session, do this to enable PostGIS and check if extension is created.

db=# CREATE EXTENSION postgis;
db=# select postgis_version();

Notes: if you use a created database user, make sure it has super role to create extension. If you are using GeoDjango, migrate database using django manage.py command will create the extension automatically.

Install PgAdmin III and connect to database

PgAdmin III is a s a comprehensive PostgreSQL database design and management system

go to http://www.pgadmin.org/download/macosx.php to download and install it.

After install, you can create a db connection to connect to your postgresql and PostGIS databases.

It is pretty straightforward to connect to a local database, just provide the localhost/port(default 5432) database username and password (if you have a password, or leave it empty).

If you want to connect to a remote database on AWS instance, you need to make some changes to config files

in /etc/postgresql/9.*/main/postgresql.conf add listen_addresses='*'

in /etc/postgresql/8.2/main/pg_hba.conf add host all all 0.0.0.0/0 md5

Because port 5432 (your postgresql port) may not be open by default security settings, you need to open it appropriately. On amazon, you can add that rule in your security group of the instance in aws console.

After the changes, you should be able to connect to your remote database from your local machine.


How I made my first contribution to Django

I have been working with Django for a few years and decided it was time to do something for the project in return. I have never made contributions to a project this level so I was not very certain at the beginning. Fortunately, there are clear documents and great people ready to help in the community. So it went out pretty smoothly. So here is a work through of my whole process, from learning how to do it, to my first pull request was closed.

Read how-tos

First I went to Contributing to Django page, which is a summary of how to make contributions.

Then I read through Advice for new contributors on the Django documentation to get some guidelines.

I signed the Contributor License Agreement from the page and sent by email. I got it responded pretty quick, so I was ready to go.

Reading to many documents can be boring, so I found two tutorials (Writing your first patch to django, Working with Git and GitHub) to get a better idea what the whole process looks like, and then just started to get my hands dirty by learning by doing.

Get a Copy (fork) of Django Source Code

I then went to (https://github.com/django/django)[https://github.com/django/django] to fork the repo into my own account. So I can have my own copy of the source code.

After the fork, clone it into my development machine using

git clone git@github.com:github_name/django.git.

Then I tell my local repo where is the original “upstream” remote by running

git remote add upstream git@github.com:django/django.git

It is always a good habit for setting up a virtualenv for every project you work on, so I created one called django using virtualenvwapper

mkvirtualenv --python=python3 django

django should be installed in the virtualenv for development also.

pip install -e /path/to/your/local/clone/django/

The initial preparation was all set at this point, now I needed to find out what I could work on.

Find a ticket

Find a ticket you are comfortable doing will be the first step. Since this is my first time, so I looked into those marked easy-picking on page Easy-Pickings.

The ticket I found was ticket #26179. It asked to drop a validation check during model field value assignment because the task should be left for database.

It looked doable so I logged in using my github account and assign the ticket to myself. After assignment I became the owner of the ticket.

Then I create a branch ticket_26179 and base the work on upstream/master as specified in django document.

git checkout -b ticket_xxxxx upstream/master

Run Test Before any Coding

Test should be run before any coding to make sure a clean start (on ticket_26179 branch).

cd into django_directory/tests

then run PYTHONPATH=.. python runtests.py --settings=test_sqlite

It took a few minutes and the test said ok. So it was time for some coding.

Make the Changes Asked by the Ticket and Submit Changes

Before I made fixes for the ticket, I created a small django project to play with the current behavior that needs to be changed, just to get familiar with the code. A demo project will help trouble shooting and test the changes along the way too.

After playing with it for a while, I made the changes asked in the ticket, and then after the new changes are confirmed by the demo project I created, I committed the changes to the branch by:

git commit

In the interactive window for commit message I used the code style in the doc with a title and a description.

Fixed #26179 -- Remove null assignment check for non-nullable foreign key

A longer description of the commit.

Close the window and the commit will be saved in git log.

All the changes I made are in this Pull Request

To create the pull request after code changes. I pushed the commit first using

git push origin ticket_26179

Then I go to my forked repo, find the branch and click Compare and Create Pull Request button.

Don’t forget to run test after code changes

I actually forgot to run test and realized it after I sent the pull request. The ticket saying no test needed does not mean you don’t need to test after changes I rerun the test, and I got three failures. Ouch.

I checked the error messages and located the failed test functions and lines. It looks like my changes caused expected ValueError to not be raised because I dropped the rasing error behavior. So I removed the tests thinking they are not needed any more.

Bottom line is run test before committing and submitting any code.

Read Response in the Pull Request

After pull request, I got response pretty quick with comments. Here are my rookie mistakes:

  • Instead remove those failed tests, I should change tests for changed behavior. So the tests should validate None assignment for foreign key field should not raise ValueError.

  • I should update release document since this changes will not be compatible with previous versions.

  • I need to updated the ticket page to mark has patch to yes since it includes a patch.

  • I need to include a link to Pull Request in the ticket page.

Take Further Actions

After seeing the comments, I saw where I need to improve and made changes accordingly. Run tests again and made sure everything is ok. Then I updated the ticket page and also pull request to notify you have made further changes.

To update your pull request, simply commit and push to the same branch. Your pull request will be updated automatically. Nice!

The Pull Request was Closed at the End

After the changes I did the second time, I got response that my fixes to the ticket are all right and the pull request was merged into the code base. Overall It took me two days (about two hours total work) and I was pretty glad I made my first contribution to Django.

Summary

So after the steps above, I got my first contribution to Django project finished and the sense of community made me feel good. Hope more work will be coming down the road.


Implement Singly Linked List in Python

A linked list is one of the basic types of data structure in computer science. Conceptually, a linked list is a collection of nodes connected by links. This post describes a simple implementation of singly linked list (a node only knows the next node, but not the previous) in python.

Implement the Node

A node is single data point in the linked list. It not only holds the data stored, but also has a pointer that can tell which the next node is.

Technically, you can implement the whole linked list only using node. As long as you know which node is the head, you can do everything with the whole list, such as add, search, delete, etc.

Here is the implementation

class Node(object):

    def __init__(self, data=None, next=None):
        self.data = data
        self.next = next

    def get_data(self):
        return self.data

    def get_next(self):
        return self.next

    def set_next(self, new_next):
        self.next = new_next

Implement the Linked List

The linked list should include the head of the list, and the api has the following methods:

  • size(): return the number of nodes in the list
  • insert(data): insert data at the head of the list
  • search(data): return the node that has the data, returns None if not found
  • delete(data): delete a node with the data, return the node if found, None if not found
  • print(): print the whole LinkedList, throw ValueError if data not found

Size

Size() will return the numbers of the nodes in the list. It iterates through the list from the head and keeps track of node count.

def size(self):
    current = self.head
    count = 0
    while current:
        count += 1
        current = current.get_next()
    return count

Insert

Insert(data) will create a new Node with the provided data, set it as new header, and set its next to the old header. In this way, newer data is always at the head.

def insert(self, data):
    new_node = Node(data)
    new_node.set_next(self.head)
    self.head = new_node

Search(data) iterates through the list from the head and check the data of each node visited. If the data is found, the node is returned.

def search(self, data):
    current = self.head
    while current:
        if current.get_data() == data:
            return current
        else:
            current = current.get_next()
    return None

Delete

Delete(data) is similar to Search, which iterates through the list and found the node with the provided data. But it has extra efforts to delete that node and returns it. To achieve this, it keeps track of the previous node. If the node to be deleted is found, we set the next of its previous node to its next node, and thus bypassing the node to be deleted.

def delete(self, data):
    current = self.head
    prev = None
    while current:
        if current.get_data() == data:
            if current == self.head:
                self.head = current.get_next()
            else:
                prev.set_next(current.get_next())
            return current
        prev = current
        current = current.get_next()
    return None

Print

The Print() is a helper function to display the whole list

def print(self):
    lst = []
    current = self.head
    while current:
        lst.append(str(current.get_data()))
        current = current.get_next()
    print('->'.join(lst))

The complete implementation can be found at
https://github.com/ZachLiuGIS/Algorithm-Enthusiasts/blob/master/algorithms/data_structures/LinkedList.py