Very slow to list rasters with arcpy - arcpy

I'm using simple arcpy ListRasters functions and the script is very slow to execute.
I'm using them with a folder containing about 100 rasters (20 years of historical data for 5 different categories). Each raster is ~800Mo so ~80Go in total. I'm using wildcards to list them in 5 different lists:
import arcpy, os, sys
from arcpy import env
from arcpy import *
from arcpy.sa import *
arcpy.CheckOutExtension("Spatial")
hist_data_path = "D:/Data/GIS/hist_data"
arcpy.env.workspace = hist_data_path
hist_pop_urban = arcpy.ListRasters("*pop_urb*")
hist_pop_rural = arcpy.ListRasters("*pop_rur*")
hist_ppc_urban = arcpy.ListRasters("*ppc_urb*")
hist_ppc_rural = arcpy.ListRasters("*ppc_rur*")
hist_ww_int = arcpy.ListRasters("*ww_int*")
[...]
It takes about 10 minutes to list each bloc of 20 rasters... so ~50 min to list all the rasters... How is that possible? Do I miss something in the code? Is it because of the size of the rasters? Is there some "hidden" option or "trick" I could check? I'm using Win 7 64 on an i7 computer with 16Go RAM.
Thanks for any idea that could reduce this processing time..!

In my experience arcpy is generally slow, so I try to avoid it whenever possible. Of course, there may be some way to optimize the arcpy.ListRasters function, and I would love to hear about it if anyone knows about it.
Here is an out-of-the-box Python alternative to arcpy.ListRasters:
import os
directory = r"D:\Data\GIS\hist_data"
extension_list = [".tif", ".tiff"]
hist_pop_urban = []
hist_pop_rural = []
#etc.
for file in os.listdir(directory):
for extension in extension_list:
if file.endswith(extension):
if "pop_urb" in file:
hist_pop_urban.append(file)
elif "pop_rur" in file:
hist_pop_rural.append(file)
#etc.
You could build extension_list based on the contents of this webpage and your knowledge of the particular file types you are dealing with:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009t0000000q000000
Depending on the format of your rasters, each raster may comprise more than one file. If so, you would have to incorporate that into the code as well.
Good luck!
Tom

I prefer to use glob for listing data such as rasters. You'll find the operation extremely fast compared to the arcpy.ListRasters() method. I modified your example to use glob--here I am assuming you are using tif format raster data (change if needed).
import glob
inws = r'C:\path\to\your\workspace'
hist_pop_urban = glob.glob(os.path.join(inws, "*pop_urb*.tif"))
hist_pop_rural = glob.glob(os.path.join(inws, "*pop_rur*.tif"))
hist_ppc_urban = glob.glob(os.path.join(inws, "*ppc_urb*.tif"))
hist_ppc_rural = glob.glob(os.path.join(inws, "*ppc_rur*.tif"))
hist_ww_int = glob.glob(os.path.join(inws, "*ww_int*.tif"))

Related

loading a graph from .meta file from Tensorflow in c++ for inference

I have trained some models using tensorflow 1.5.1 and I have the checkpoints for those models (including .ckpt and .meta files). Now I want to do inference in c++ using those files.
In python, I would do the following to save and load the graph and the checkpoints.
for saving:
images = tf.placeholder(...) // the input layer
//the graph def
output = tf.nn.softmax(net) // the output layer
tf.add_to_collection('images', images)
tf.add_to_collection('output', output)
for inference i restore the graph and the checkpoint then restore the input and output layers from collections like so:
meta_file = './models/last-100.meta'
ckpt_file = './models/last-100'
with tf.Session() as sess:
saver = tf.train.import_meta_graph(meta_file)
saver.restore(sess, ckpt_file)
images = tf.get_collection('images')
output = tf.get_collection('output')
outputTensors = sess.run(output, feed_dict={images: np.array(an_image)})
now assuming that I did the saving in python as usual, how can I do inference and restore in c++ with simple code like in python?
I have found examples and tutorials but for tensorflow versions 0.7 0.12 and the same code doesn't work for version 1.5. I found no tutorials for restoring models using c++ API on tensorflow website.
For the sake of this thread. I will rephrase my comment into an answer.
Posting a full example would require either a CMake setup or putting the file into a specific directory to run bazel. As I do favor the first way and it would burst all limits on this post to cover all parts I would like to redirect to a complete implementation in C99, C++, GO without Bazel which I tested for TF > v1.5.
Loading a graph in C++ is not much more difficult than in Python, given you compiled TensorFlow already from source.
Start by creating a MWE, which creates a very dump network graph is always a good idea to figure out how things work:
import tensorflow as tf
x = tf.placeholder(tf.float32, shape=[1, 2], name='input')
output = tf.identity(tf.layers.dense(x, 1), name='output')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver(tf.global_variables())
saver.save(sess, './exported/my_model')
There are probably tons of answers here on SO about this part. So I just let it stay here without further explanation.
Loading in Python
Before doing stuff in other languages, we can try to do it in python properly -- in the sense: we just need to rewrite it in C++.
Even restoring is very easy in python like:
import tensorflow as tf
with tf.Session() as sess:
# load the computation graph
loader = tf.train.import_meta_graph('./exported/my_model.meta')
sess.run(tf.global_variables_initializer())
loader = loader.restore(sess, './exported/my_model')
x = tf.get_default_graph().get_tensor_by_name('input:0')
output = tf.get_default_graph().get_tensor_by_name('output:0')
it is not helpful as most of these API endpoints do not exists in the C++ API (yet?). An alternative version would be
import tensorflow as tf
with tf.Session() as sess:
metaGraph = tf.train.import_meta_graph('./exported/my_model.meta')
restore_op_name = metaGraph.as_saver_def().restore_op_name
restore_op = tf.get_default_graph().get_operation_by_name(restore_op_name)
filename_tensor_name = metaGraph.as_saver_def().filename_tensor_name
sess.run(restore_op, {filename_tensor_name: './exported/my_model'})
x = tf.get_default_graph().get_tensor_by_name('input:0')
output = tf.get_default_graph().get_tensor_by_name('output:0')
Hang on. You can always use print(dir(object)) to get the properties like restore_op_name, ... .
Restoring a model is an operation in TensorFlow like every other operation. We just call this operation and providing the path (a string-tensor) as an input. We can even write our own restore operation
def restore(sess, metaGraph, fn):
restore_op_name = metaGraph.as_saver_def().restore_op_name # u'save/restore_all'
restore_op = tf.get_default_graph().get_operation_by_name(restore_op_name)
filename_tensor_name = metaGraph.as_saver_def().filename_tensor_name # u'save/Const'
sess.run(restore_op, {filename_tensor_name: fn})
Even this looks strange, it now greatly helps to do the same stuff in C++.
Loading in C++
Starting with the usual stuff
#include <tensorflow/core/public/session.h>
#include <tensorflow/core/public/session_options.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
#include <string>
#include <iostream>
typedef std::vector<std::pair<std::string, tensorflow::Tensor>> tensor_dict;
int main(int argc, char const *argv[]) {
const std::string graph_fn = "./exported/my_model.meta";
const std::string checkpoint_fn = "./exported/my_model";
// prepare session
tensorflow::Session *sess;
tensorflow::SessionOptions options;
TF_CHECK_OK(tensorflow::NewSession(options, &sess));
// here we will put our loading of the graph and weights
return 0;
}
You should be able to compile this by either put it in the TensorFlow repo and use bazel or simply follow the instructions here to use CMake.
We need to create such a meta_graph created by tf.train.import_meta_graph. This can be done by
tensorflow::MetaGraphDef graph_def;
TF_CHECK_OK(ReadBinaryProto(tensorflow::Env::Default(), graph_fn, &graph_def));
In C++ reading a graph from file is not the same as importing a graph in Python. We need to create this graph in a session by
TF_CHECK_OK(sess->Create(graph_def.graph_def()));
By looking at the strange python restore function above:
restore_op_name = metaGraph.as_saver_def().restore_op_name
restore_op = tf.get_default_graph().get_operation_by_name(restore_op_name)
filename_tensor_name = metaGraph.as_saver_def().filename_tensor_name
we can code the equivalent piece in C++
const std::string restore_op_name = graph_def.saver_def().restore_op_name()
const std::string filename_tensor_name = graph_def.saver_def().filename_tensor_name()
Having this in place, we just run the operation by
sess->Run(feed_dict, // inputs
{}, // output_tensor_names (we do not need them)
{restore_op}, // target_node_names
nullptr) // outputs (there are no outputs this time)
Creating the feed_dict is probably a post on its own and this answer is already long enough. It does only cover the most important stuff. I would like to redirect to a complete implementation in C99, C++, GO without Bazel which I tested for TF > v1.5. This is not that hard -- it just can get very long in the case of the plain C version.

How to make global variable (or function) in nix file?

I want to declare variable dotfiles_dir so all other files can see and use it.
For example (not working)
In /etc/nixos/configuration.nix (it's root file, right?)
dotfiles_dir="/home/bjorn/.config/dotfiles";
import "${dotfiles_dir}/nixos/root/default.nix"
and use it also in ~/.config/nixpkgs/home.nix (with https://github.com/rycee/home-manager)
import "${dotfiles_dir}/nixos/home/default.nix"
I want to declare variable dotfiles_dir so all other files can see and use it.
Sorry, but that's not possible. In Nix, there's no such thing as global variables. If there were, it would ruin it's ability to provide reproducible builds because then Nix expressions would have access to undeclared inputs.
/etc/nixos/configuration.nix is not place store global information, it's technically a NixOS module. But more importantly, it's a function.
However... there's a way to define a value in one place and use it where ever you need it. Something like this:
/etc/nixos/dotfiles-dir.nix
"/home/bjorn/.config/dotfiles"
~/.config/nixpkgs/home.nix
let
dotfiles_dir = import /etc/nixos/dotfiles-dir.nix;
dotfiles = import (builtins.toPath "${dotfiles_dir}/nixos/home/default.nix");
in
...
You could also get more fancy...
/etc/nixos/my-settings.nix
{ dotfiles_dir = "/home/bjorn/.config/dotfiles";
, some_other_value = "whatever";
}
~/.config/nixpkgs/home.nix
let
dotfiles_dir = (import /etc/nixos/my-settings.nix).dirfiles_dir;
dotfiles = import (builtins.toPath "${dotfiles_dir}/nixos/home/default.nix");
in
...

Error in saving and using model of TensorForestEstimator for Android

I use the randomforest estimator, implemented in tensorflow, to predict if a text is english or not. I saved my model (A dataset with 2k samples and 2 class labels 0/1 (Not English/English)) using the following code (train_input_fn function return features and class labels):
model_path='test/'
TensorForestEstimator(params, model_dir='model/')
estimator.fit(input_fn=train_input_fn, max_steps=1)
After running the above code, the graph.pbtxt and checkpoints are saved in the model folder. Now I want to use it on Android. I have 2 problems:
As the first step, I need to freeze the graph and checkpoints to a .pb file to use it on Android. I tried freeze_graph (I used the code here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). When I call the freeze_graph in my mode, I get the following error and the code cannot create the final .pb graph:
File "/Users/XXXXXXX/freeze_graph.py", line 105, in freeze_graph
_ = tf.import_graph_def(input_graph_def, name="")
File "/anaconda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 258, in import_graph_def
op_def = op_dict[node.op]
KeyError: u'CountExtremelyRandomStats'
this is how I call freeze_graph:
def save_model_android():
checkpoint_state_name = "model.ckpt-1"
input_graph_name = "graph.pbtxt"
output_graph_name = "output_graph.pb"
checkpoint_path = os.path.join(model_path, checkpoint_state_name)
input_graph_path = os.path.join(model_path, input_graph_name)
input_saver_def_path = None
input_binary = False
output_node_names = "output"
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
output_graph_path = os.path.join(model_path, output_graph_name)
clear_devices = True
freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
input_binary, checkpoint_path,
output_node_names, restore_op_name,
filename_tensor_name, output_graph_path,
clear_devices, "")
I also tried the freezing on the iris dataset in "tf.contrib.learn.datasets.load_iris". I get the same error. So I believe it is not related to the dataset.
As a second step, I need to use the .pb file on the phone to predict a text. I found the camera demo example by google and it contains a lot of code. I wonder if there is a step by step tutorial how to use a Tensorflow model on Android by passing a feature vector and get the class label.
Thanks, in advance!
UPDATE
By using the recent version of tensorflow (0.12), the problem is solved. However, now, the problem is that what I should pass to output_node_names ??? How can I get what are the output nodes in the graph ?
Re (1) it looks like you are running freeze_graph on a build of tensorflow which does not have access to contrib ops. Maybe try explicitly importing tensorforest before calling freeze_graph?
Re (2) I don't know of a simpler example.
CountExtremelyRandomStats is one of TensorForest's custom ops, and exists in tensorflow/contrib. As was pointed out, TF switched to including contrib ops by default at some point. I don't think there's an easy way to include the contrib custom ops in the global registry in the previous releases, because TensorForest uses the method of building a .so file that is included as a data file which is loaded at runtime (a method that was the standard when TensorForest was created, but may not be any longer). So there are no easily-included python build rules that will properly link in the C++ custom ops. You can try including tensorflow/contrib/tensor_forest:ops_lib as a dep in your build rule, but I don't think it will work.
In any case, you can try installing the nightly build of tensorflow. The alternative includes modifying how tensorforest custom ops are built, which is pretty nasty.

C, C++ Interface with Python

I have c++ code that has grown exponential. I have a number of variables (mostly Boolean) that need to be changed for each time I run my code (different running conditions). I have done this using the argument command line inputs for the main( int argc, char* argv[]) function in the past.
Since this method has become cumbersome (I have 18 different running conditions, hence 18 different argument :-( ), I would like to move to interfacing with python (if need be Bash ). Ideally I would like to code a python script, where I set the values of data members and then run the code.
Does anyone have a any pointer/information that could help me out? Better still a simple coded example or URL I could look up.
Edit From Original Question:
Sorry I don't think I was clear with my question. I don't want to use the main( int argc, char* argv[]) feature in c++. Instead of setting the variables on the command line. Can I use python to declare and initialize the data members in my c++ code?
Thanks again mike
Interfacing between C/C++ and Python is heavily documented and there are several different approaches. However, if you're just setting values then it may be overkill to use Python, which is more geared toward customising large operations within your process by farming it off to the interpreter.
I would personally recommend researching an "ini" file method, either traditionally or by using XML, or even a lighter scripting language like Lua.
Use subprocess to execute your program from python.
import subprocess as sp
import shlex
def run(cmdline):
process = sp.Popen(shlex.split(cmdline), stdout=sp.PIPE, stderr=sp.PIPE)
output, err = process.communicate()
retcode = process.poll()
return retcode, output, err
run('./a.out '+arg1+' '+arg2+' '+...)
You can use subprocess module to launch an executable with defined command-line arguments:
import subprocess
option1 = True
option2 = Frue
# ...
optionN = True
lstopt = ['path_to_cpp_executable',
option1,
option2,
...
optionN
]
lstopt = [str(item) for item in lstopt] # because we need to pass strings
proc = subprocess.Popen(lstrun, close_fds = True)
stdoutdata, stderrdata = proc.communicate()
If you're using Python 2.7 or Python 3.2, then OrderedDict will make the code more readable:
from collections import OrderedDict
opts = OrderedDict([('option1', True),
('option2', False),
]
lstopt = (['path_to_cpp_executable'] +
list(str(item) for item in opts.values())
)
proc = subprocess.Popen(lstrun, close_fds = True)
stdoutdata, stderrdata = proc.communicate()
I can only advise to have a look at swig : using director feature, it allows to fully integrate C++ and python, including cross derivation from onle language to the other
With the ctypes module, you can call arbitrary C libraries.
There are several ways for interfacing C and C++ code with Python:
SWIG
Boost.Python
Cython

reducing junk in gdb print

I am using gdb 7.2 with the configuration by Dan Marinescu that allows printing STL vectors strings, etc. (pstring, pvector, etc)
It doesn't seem very good. So looking at one of the answers below, I cleaned out and used the pretty printers available in 7.0 and better.
In order to do so, I put the following in my .gdbinit
python
import sys
sys.path.insert(0, '/home/me/gdb_printers/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)
end
set print elements 0
The instructions say to download the code from svn into /home/me/gdb_printers/python but that was a while ago. I noticed that there was code is in gdb 7.3. So I deleted the above and the basics work but stl does not. Here's an object containing a string:
{a = 2, b = 97 'a', c = 2469135780247, d = 1.1363636363636362, e = {
static npos = 18446744073709551615,
_M_dataplus = {> = {<__gnu_cxx::new_allocator> = {}, }, _M_p = 0x602028 "foo"}}}
./gdb-7.3.50.20110526/gdb/data-directory/python/gdb:
In order to work with STL, I needed to download the code for the archer project:
svn co svn://gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/python
and put it in the above directory, making sure all the other junk was gone, and it works beautifully.
What you want to do is addressed in GDB 7.0 and above with Python pretty printers.
You don't need pstring, regular print just works (for embedded strings too).

Resources