This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Curiosity Bits - curiositybits.com
• This tutorial is created for social scientists interested in grabbing data from the image-hosting site, Imgur (imgur.com).
• Find out more about Python for mining the social web, please visit Curiosity Bits (curiositybits.com).
• Social-Metrics.org also hosts a series of Python tutorials on aggregating and analyzing Twitter/Facebook data. More at social-metrics.org
Final note before we start: for simplicity, I am laying out only the most essential steps. Previous tutorials provide details about how to set up a Python programing environment, please visit curiositybits.com and click the PYTHON tab.
1. Install Anaconda Python (with Spyder and Ipython Notebook)2. Install SQLite Browser3. Install four essential Python packages (imgurpython, sqlalchemy, urllib, sqlite3)4. Register a Imgur client to get client ID and client secret 5. Create a SQLite database for images from keyword search6. Download images from the keyword search 7. Create a SQLite database for images from a reddit timeline8. Download images from a reddit timeline9. Create a SQLite database for images from a public album10. Download images from a public album
We will download all available images, but at first, let’s get their metadata. By metadata, I mean, attributes related to each image. Examples of the metadata include image title, image description, image upload date, image link (which is what we will use to download images).
Create a SQLite database for images from keyword search
You don’t have to change anything in this block of codes. It is used for importing necessary Python packages. Think of packages as apps running on iOS.
This is where you enter the keyword(s) you want to apply to the search. You can have multiple keywords, wrapped in parenthesis, and separated by a comma.
Go to line 136, this is where you specify the name of the SQLite database to be saved. If no absolute file path is given, the database will be saved in the same folder with your Python script (Imgur search v1.py).
Imgur images are indexed into multiple pages. Here, 5 means that we are to get five pages of images from the keyword search. You can put 3 or 1 or 2, just play around to see what number gives you the most adequate, while at the same time manageable amount of data.
Getting image from a reddit timeline is very similar to getting images from keyword search. This time, we will try executing the Python code in Ipython Notebook.
Use the script called Imgur reddit timeline v1.py
Create a SQLite database for images from a reddit timeline
Load the code in IPython Notebooks, Like what you do to download images from keyword search, make sure the file path in the code matches the database you have just created.
Create a SQLite database for images from a public albumExactly as what we do in previous steps. There are only a few places in the script that need tweaking. You need to enter client ID, client secret, the name of the album, the file path of the database, the number of pages to be grabbed. Then you are all good to go!
To create a SQLite database for images from a public album, use the script named Imgur album v1.py