Zach Liu's Blog about Programming
Getting Started with Jupyter(IPython)
Feb-15-2016
I have been working with IPython for a while but haven’t dig into what is happening with its seperation with Jupyter.
As is described on ipython.org
IPython is a growing project, with increasingly language-agnostic components. IPython 3.x was the last monolithic release of IPython, containing the notebook server, qtconsole, etc. As of IPython 4.0, the language-agnostic parts of the project: the notebook format, message protocol, qtconsole, notebook web application, etc. have moved to new projects under the name Jupyter. IPython itself is focused on interactive Python, part of which is providing a Python kernel for Jupyter.
With Jupyter, IPython itself has become a pure interactive kernel, which can be connected through Jupyter user interface as sush Notebook, console, and qtconsole.
Installation
To install Jupyter with IPython, run pip install jupyter
for python2 or pip3 install jupyter
. For installing
jupyter in a virtualenv, simply create the virtualenv and make sure the terminal is working on that virtualenv.
Installation through anaconda can be done by run conda install jupyter
Using Jupyter in terminal console
Jupyter console uses terminal console as UI to interact with IPython kernel, similar to original ipython.
Run the following to open a Jupyter console
jupyter console
The complete document can be found at http://jupyter-console.readthedocs.org/en/latest/.
Using Jupyter in Notebook
Jupyter Notebook is a super rich-featured web-based IPython user interface that can run interactive python code, preserve input/output, work with documents, etc.
Try opening a Jupyter Notebook in your web browser by running
jupyter notebook
Read the whole Jupyter Notebook documentation here at https://jupyter-notebook.readthedocs.org/en/latest/notebook.html.
Using Jupyter in Qt Console
Jupyter qtconsole uses PyQt as its user interface with IPython.
Because Qt is not installed with Jupyter, separate installation is needed.
Here is the detailed instruction of installing PyQt5 on Mac I found from here
Requirements
- xcode
- python
- Qt libraries
- SIP
- PyQt
Download
installation
- install xcode
- install the Command Line Tools (open Xcode > Preferences > Downloads)
- install Qt libraries (qt-opensource-mac-x64-clang-5.*.dmg)
- install python
- create a virtual env
- unzip and compile SIP and PyQt
Here are the whole commands for installation after download:
cd /var/tmp
cp /Users/gvincent/Downloads/PyQt-gpl-5.*.tar.gz .
cp /Users/gvincent/Downloads/sip-4.*.tar.gz .
tar xvf PyQt-gpl-5.*.tar.gz
tar xvf sip-4.*.tar.gz
cd sip-*/
python3 configure.py -d /path/to/virtualenv/site-packages --arch x86_64
make
sudo make install
sudo make clean
cd ../PyQt-gpl-5.2.1/
python3 configure.py --destdir /path/to/virtualenv/site-packages --qmake ~/Qt/5.*/clang_64/bin/qmake
make
sudo make install
sudo make clean
~/.env/ariane_mail/bin/python -c "import PyQt5"
I got a fatal error: 'qgeolocation.h' file not found
error during installing PyQT5, I changed
python3 configure.py --destdir /path/to/virtualenv/site-packages --qmake ~/Qt/5.*/clang_64/bin/qmake
to
python3 configure.py --destdir /path/to/virtualenv/site-packages --qmake ~/Qt/5.*/clang_64/bin/qmake --disable=QtPositioning
and the problem is solved.
After these are successfully installed, you can open a Jupyter Qt console by running
jupyter qtconsole
Read the whole Jupyter QtConsole documentation here at https://jupyter.org/qtconsole/stable/
Summary
This post summarizes how to install Jupyter and IPython, and how to start using Jupyter in three different consoles:
- Web-based Notebook
- Terminal Console
- Qt Console
It also highlights how to install PyQt5 so Qt Console can be started.
After these initial setup and exploration, we can further explore the rich world of Jupyter that can help our python workflow greatly.
Install Postgresql and PostGIS on AWS ubuntu 14.04 and Mac OSX
Feb-12-2016
On Mac OSX
Because of Homebrew, installation is easier on Mac. Run these two commands to install both postgresql and PostGIS
$ brew install postgres
$ brew install postgis
It is done, how nice!
Start the service
type this command to start the postgresql service.
$ pg_ctl -D /usr/local/var/postgres start
then check if the service is up and running.
$ export PGDATA=/usr/local/var/postgres
$ pg_ctl status`
Make sure postgresql starts when mac starts up
If you don’t want to run the above start command every time you start your mac, do the following to make sure postgresql automatically starts.
-
make sure you have this directory
~/Library/LaunchAgents
. (create it if it does not exist) -
run
ln -sfv /usr/local/opt/postgresql/*.plist ~/Library/LaunchAgents
to create a symbolic link for postgres plist file -
run
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
to add the plist to launch control.
Postgresql should start automatically every time your mac starts.
On AWS Ubuntu
Remote login to aws ubuntu instance, then run the following:
$ sudo apt-get update
$ sudo apt-get install -y postgresql postgresql-contrib
After installation, you can create database by running
sudo -u postgres createdb DATABASE_NAME
If you want to create a specific user for a database, run
$ sudo -u postgres createuser -P USER_NAME
$ sudo -u postgres createdb -O USER_NAME DATABSE_NAME
Then you can test database connection by running
psql -U USER_NAME DATABSE_NAME
Install PostGIS
After postgresql is installed and connection tested, we can install postgis:
9.* is the version of your postgresql, and 2.* the version of PostGIS
$ sudo apt-get install postgresql-9.*-postgis-2.*
$ sudo apt-get install postgresql-server-dev-9.*
Then you can create PostGIS extension for your regular postgresql database to enable GIS functions.
$ sudo -u postgres psql DATABASE_NAME
In psql session, do this to enable PostGIS and check if extension is created.
db=# CREATE EXTENSION postgis;
db=# select postgis_version();
Notes: if you use a created database user, make sure it has super role to create extension. If you are using GeoDjango, migrate database using django manage.py command will create the extension automatically.
Install PgAdmin III and connect to database
PgAdmin III is a s a comprehensive PostgreSQL database design and management system
go to http://www.pgadmin.org/download/macosx.php to download and install it.
After install, you can create a db connection to connect to your postgresql and PostGIS databases.
It is pretty straightforward to connect to a local database, just provide the localhost/port(default 5432) database username and password (if you have a password, or leave it empty).
If you want to connect to a remote database on AWS instance, you need to make some changes to config files
in /etc/postgresql/9.*/main/postgresql.conf add listen_addresses='*'
in /etc/postgresql/8.2/main/pg_hba.conf add host all all 0.0.0.0/0 md5
Because port 5432 (your postgresql port) may not be open by default security settings, you need to open it appropriately. On amazon, you can add that rule in your security group of the instance in aws console.
After the changes, you should be able to connect to your remote database from your local machine.
How I made my first contribution to Django
Feb-11-2016
I have been working with Django for a few years and decided it was time to do something for the project in return. I have never made contributions to a project this level so I was not very certain at the beginning. Fortunately, there are clear documents and great people ready to help in the community. So it went out pretty smoothly. So here is a work through of my whole process, from learning how to do it, to my first pull request was closed.
Read how-tos
First I went to Contributing to Django page, which is a summary of how to make contributions.
Then I read through Advice for new contributors on the Django documentation to get some guidelines.
I signed the Contributor License Agreement from the page and sent by email. I got it responded pretty quick, so I was ready to go.
Reading to many documents can be boring, so I found two tutorials (Writing your first patch to django, Working with Git and GitHub) to get a better idea what the whole process looks like, and then just started to get my hands dirty by learning by doing.
Get a Copy (fork) of Django Source Code
I then went to (https://github.com/django/django)[https://github.com/django/django] to fork the repo into my own account. So I can have my own copy of the source code.
After the fork, clone it into my development machine using
git clone git@github.com:github_name/django.git
.
Then I tell my local repo where is the original “upstream” remote by running
git remote add upstream git@github.com:django/django.git
It is always a good habit for setting up a virtualenv for every project you work on, so I created one called django using virtualenvwapper
mkvirtualenv --python=python3 django
django should be installed in the virtualenv for development also.
pip install -e /path/to/your/local/clone/django/
The initial preparation was all set at this point, now I needed to find out what I could work on.
Find a ticket
Find a ticket you are comfortable doing will be the first step. Since this is my first time, so I looked into those marked easy-picking on page Easy-Pickings.
The ticket I found was ticket #26179. It asked to drop a validation check during model field value assignment because the task should be left for database.
It looked doable so I logged in using my github account and assign the ticket to myself. After assignment I became the owner of the ticket.
Then I create a branch ticket_26179 and base the work on upstream/master as specified in django document.
git checkout -b ticket_xxxxx upstream/master
Run Test Before any Coding
Test should be run before any coding to make sure a clean start (on ticket_26179 branch).
cd into django_directory/tests
then run PYTHONPATH=.. python runtests.py --settings=test_sqlite
It took a few minutes and the test said ok. So it was time for some coding.
Make the Changes Asked by the Ticket and Submit Changes
Before I made fixes for the ticket, I created a small django project to play with the current behavior that needs to be changed, just to get familiar with the code. A demo project will help trouble shooting and test the changes along the way too.
After playing with it for a while, I made the changes asked in the ticket, and then after the new changes are confirmed by the demo project I created, I committed the changes to the branch by:
git commit
In the interactive window for commit message I used the code style in the doc with a title and a description.
Fixed #26179 -- Remove null assignment check for non-nullable foreign key
A longer description of the commit.
Close the window and the commit will be saved in git log.
All the changes I made are in this Pull Request
To create the pull request after code changes. I pushed the commit first using
git push origin ticket_26179
Then I go to my forked repo, find the branch and click Compare and Create Pull Request button.
Don’t forget to run test after code changes
I actually forgot to run test and realized it after I sent the pull request. The ticket saying no test needed does not mean you don’t need to test after changes I rerun the test, and I got three failures. Ouch.
I checked the error messages and located the failed test functions and lines. It looks like my changes caused expected ValueError to not be raised because I dropped the rasing error behavior. So I removed the tests thinking they are not needed any more.
Bottom line is run test before committing and submitting any code.
Read Response in the Pull Request
After pull request, I got response pretty quick with comments. Here are my rookie mistakes:
-
Instead remove those failed tests, I should change tests for changed behavior. So the tests should validate None assignment for foreign key field should not raise ValueError.
-
I should update release document since this changes will not be compatible with previous versions.
-
I need to updated the ticket page to mark has patch to yes since it includes a patch.
-
I need to include a link to Pull Request in the ticket page.
Take Further Actions
After seeing the comments, I saw where I need to improve and made changes accordingly. Run tests again and made sure everything is ok. Then I updated the ticket page and also pull request to notify you have made further changes.
To update your pull request, simply commit and push to the same branch. Your pull request will be updated automatically. Nice!
The Pull Request was Closed at the End
After the changes I did the second time, I got response that my fixes to the ticket are all right and the pull request was merged into the code base. Overall It took me two days (about two hours total work) and I was pretty glad I made my first contribution to Django.
Summary
So after the steps above, I got my first contribution to Django project finished and the sense of community made me feel good. Hope more work will be coming down the road.
Implement Singly Linked List in Python
Jan-30-2016
A linked list is one of the basic types of data structure in computer science. Conceptually, a linked list is a collection of nodes connected by links. This post describes a simple implementation of singly linked list (a node only knows the next node, but not the previous) in python.
Implement the Node
A node is single data point in the linked list. It not only holds the data stored, but also has a pointer that can tell which the next node is.
Technically, you can implement the whole linked list only using node. As long as you know which node is the head, you can do everything with the whole list, such as add, search, delete, etc.
Here is the implementation
class Node(object):
def __init__(self, data=None, next=None):
self.data = data
self.next = next
def get_data(self):
return self.data
def get_next(self):
return self.next
def set_next(self, new_next):
self.next = new_next
Implement the Linked List
The linked list should include the head of the list, and the api has the following methods:
- size(): return the number of nodes in the list
- insert(data): insert data at the head of the list
- search(data): return the node that has the data, returns None if not found
- delete(data): delete a node with the data, return the node if found, None if not found
- print(): print the whole LinkedList, throw ValueError if data not found
Size
Size() will return the numbers of the nodes in the list. It iterates through the list from the head and keeps track of node count.
def size(self):
current = self.head
count = 0
while current:
count += 1
current = current.get_next()
return count
Insert
Insert(data) will create a new Node with the provided data, set it as new header, and set its next to the old header. In this way, newer data is always at the head.
def insert(self, data):
new_node = Node(data)
new_node.set_next(self.head)
self.head = new_node
Search
Search(data) iterates through the list from the head and check the data of each node visited. If the data is found, the node is returned.
def search(self, data):
current = self.head
while current:
if current.get_data() == data:
return current
else:
current = current.get_next()
return None
Delete
Delete(data) is similar to Search, which iterates through the list and found the node with the provided data. But it has extra efforts to delete that node and returns it. To achieve this, it keeps track of the previous node. If the node to be deleted is found, we set the next of its previous node to its next node, and thus bypassing the node to be deleted.
def delete(self, data):
current = self.head
prev = None
while current:
if current.get_data() == data:
if current == self.head:
self.head = current.get_next()
else:
prev.set_next(current.get_next())
return current
prev = current
current = current.get_next()
return None
The Print() is a helper function to display the whole list
def print(self):
lst = []
current = self.head
while current:
lst.append(str(current.get_data()))
current = current.get_next()
print('->'.join(lst))
The complete implementation can be found at
https://github.com/ZachLiuGIS/Algorithm-Enthusiasts/blob/master/algorithms/data_structures/LinkedList.py