Selecting specific shapefiles using select layer by attribute - arcpy

I have a shapefile for counties containing many fields, among which Pop_Descrp is one of the field. I need to select the counties with "Highly Increased" attribute under the field and export it. I am not being able to make a correct expression for query.
Can anyone help me with this?
import arcpy
from arcpy import env
env.workspace=r"Z:\Ash Tree Project\Shapefiles_Arkansas"
env.OverwriteOutput=True
arcpy.MakeFeatureLayer_management("County_AR.shp","County_layer")
arcpy.SelectLayerByAttribute_management("County_layer", "NEW_SELECTION", "[Pop_Descrp]='Highly Increased'" )
arcpy.CopyFeatures_management("County_layer", "HighPopR_counties.shp")

You have used the wrong field delimiters for your data type. For shapefiles you need to use double-quote marks instead of square brackets:
arcpy.SelectLayerByAttribute_management("County_Layer", "NEW_SELECTION", """"Pop_Descrp" = 'Highly Increased'""")
Different spatial file types require different field delimiters. Some have no delimiters Pop_Descrp, some require square brackets [Pop_Descrp], and others (including shapefiles) require double-quotes "Pop_Descrp"
To save guess-work on which delimiters to use, the best way to handle these is to use the arcpy function Add Field Delimiters arcpy.AddFieldDelimiters()
The field delimiters used in an SQL expression differ depending on the
format of the queried data. For instance, file geodatabases and
shapefiles use double quotation marks (" "), personal geodatabases use
square brackets ([ ]), and enterprise geodatabases don't use field
delimiters. The function can take away the guess work in ensuring that
the field delimiters used with your SQL expression are the correct
ones.
> # Shapefile:
> x = "{0} = 'Highly Increased'".format(arcpy.AddFieldDelimiters("County_AR.shp", "Pop_Descrp"))
> print x
"Pop_Descrp" = 'Highly Increased'
> # Personal Geodatabase:
> y = "{0} = 'Highly Increased'".format(arcpy.AddFieldDelimiters(r"myPGDB.mdb\County_AR", "Pop_Descrp"))
> print y
[Pop_Descrp] = 'Highly Increased'
> # Enterprise (SDE) Geodatabase:
> z = "{0} = 'Highly Increased'".format(arcpy.AddFieldDelimiters(r"EntGdb.sde\geodatabase.dbo.County_AR", "Pop_Descrp"))
> print z
Pop_Descrp = 'Highly Increased'
So to make your select work, use the Add Field Delimiters function to insert the correct delimiters for your field.
import arcpy
arcpy.env.workspace = r"Z:\Ash Tree Project\Shapefiles_Arkansas"
arcpy.MakeFeatureLayer_management("County_AR.shp", "County_layer")
arcpy.SelectLayerByAttribute_management("County_layer", "NEW_SELECTION", "{0} = 'Highly Increased'".format(arcpy.AddFieldDelimiters("County_layer", "Pop_Descrp")) )
arcpy.CopyFeatures_management("County_layer", "HighPopR_counties.shp")

Related

String Algorithm Question - Word Beginnings

I have a problem, and I'm not too sure how to solve it without going down the route of inefficiency. Say I have a list of words:
Apple
Ape
Arc
Abraid
Bridge
Braide
Bray
Boolean
What I want to do is process this list and get what each word starts with up to a certain depth, e.g.
a - Apple, Ape, Arc, Abraid
ab - Abraid
ar -Arc
ap - Apple, Ape
b - Bridge, Braide, Bray, Boolean
br - Bridge, Braide, Bray
bo - Boolean
Any ideas?
Perhaps you're looking for something like:
#!/usr/bin/env python
def match_prefix(pfx,seq):
'''return subset of seq that starts with pfx'''
results = list()
for i in seq:
if i.startswith(pfx):
results.append(i)
return results
def extract_prefixes(lngth,seq):
'''return all prefixes in seq of the length specified'''
results = dict()
lngth += 1
for i in seq:
if i[0:lngth] not in results:
results[i[0:lngth]] = True
return sorted(results.keys())
def gen_prefix_indexed_list(depth,seq):
'''return a dictionary of all words matching each prefix
up to depth keyed on these prefixes'''
results = dict()
for each in range(depth):
for prefix in extract_prefixes(each, seq):
results[prefix] = match_prefix(prefix, seq)
return results
if __name__ == '__main__':
words='''Apple Ape Arc Abraid Bridge Braide Bray Boolean'''.split()
test = gen_prefix_indexed_list(2, words)
for each in sorted(test.keys()):
print "%s:\t\t" % each,
print ' '.join(test[each])
That is you want to generate all the prefixes that are present in a list of words between one and some number you'll specify (2 in this example). Then you want to produce an index of all words matching each of these prefixes.
I'm sure there are more elegant ways to do this. For for a quick and easily explained approach I've just built this from a simple bottom-up functional decomposition of the apparent spec. Of the end result values are lists each matching a given prefix, then we start with the function to filter out such matches from our inputs. If the end result keys are all prefixes between 1 and some N that appear in our input then we have a function to extract those. Then our spec. is an extremely straightforward nested loop around that.
Of course this nest loop might be a problem. Such things usually equate to an O(n^2) efficiency. As shown this will iterate over the original list C * N * N times (C is the constant number representing the prefixes of length 1, 2, etc; while N is the length of the list).
If this decomposition provides the desired semantics then we can look at improving the efficiency. The obvious approach would be to lazily generate the dictionary keys as we iterate once over the list ... for each word, for each prefix length, generate key ... append this word to the the list/value stored at that key ... and continue to the next word.
There's still a nested loop ... but it's the short loop for each key/prefix length. That alternative design has the advantage of allowing us to iterate over lists of words from any iterable, not just an in memory list. So we could iterate over lines of a file, results generated from a database query, etc --- without incurring the memory overhead of keeping the entire original word list in memory.
Of course we're still storing the dictionary in memory. However we can also change that, decouple the logic from the input and storage. When we append each input to the various prefix/key values we don't care if they're lists in a dictionary, or lines in a set of files, or values being pulled out of (and pushed back into) a DBM or other key/value store (for example some sort of CouchDB or other "noSQL clustered/database."
The implementation of that is left as an exercise to the reader.
You can use a Trie structure.
(root)
/
a - b - r - a - i - d
/ \ \
p r e
/ \ \
p e c
/
l
/
e
Just find the node that you want and get all its descendants, e.g., if I want ap-:
(root)
/
a - b - r - a - i - d
/ \ \
[p] r e
/ \ \
p e c
/
l
/
e
I don't know what you are thinking about, when you say "route of inefficiency", but pretty obvious solution (possibly the one you are thinking about) comes to mind. Trie looks like a structure for this kind of problems, but it's costly in terms of memory (there is a lot of duplication) and I'm not sure it makes things faster in your case. Maybe the memory usage would pay off, if the information was to be retrieved many times, but your answer suggests, you want to generate the output file once and store it. So in your case the Trie would be generated just to be traversed once. I don't think it makes sense.
My suggestion is to just sort the list of words in lexical order and then traverse the list in order as many times as the max length of the beginning is.
create a dictionary with keys being strings and values being lists of strings
for(i = 1 to maxBeginnigLength)
{
for(every word in your sorted list)
{
if(the word's length is no less than i)
{
add the word to the list in the dictionary at a key
being the beginning of the word of length i
}
}
}
store contents of the dictionary to the file
Using this PHP trie implementation will get you about 50% there. It's got some stuff you don't need and it doesn't have a "search by prefix" method, but you can write one yourself easily enough.
$trie = new Trie();
$trie->add('Apple', 'Apple');
$trie->add('Ape', 'Ape');
$trie->add('Arc', 'Arc');
$trie->add('Abraid', 'Abraid');
$trie->add('Bridge', 'Bridge');
$trie->add('Braide', 'Braide');
$trie->add('Bray', 'Bray');
$trie->add('Boolean', 'Boolean');
It builds up a structure like this:
Trie Object
(
[A] => Trie Object
(
[p] => Trie Object
(
[ple] => Trie Object
[e] => Trie Object
)
[rc] => Trie Object
[braid] => Trie Object
)
[B] => Trie Object
(
[r] => Trie Object
(
[idge] => Trie Object
[a] => Trie Object
(
[ide] => Trie Object
[y] => Trie Object
)
)
[oolean] => Trie Object
)
)
If the words were in a Database (Access, SQL), and you wanted to retrieve all words starting with 'br', you could use:
Table Name: mytable
Field Name: mywords
"Select * from mytable where mywords like 'br*'" - For Access - or
"Select * from mytable where mywords like 'br%'" - For SQL

Modify PL/SQL statement strings in C++

This is my use case: Input is a string representing an Oracle PL/SQL statement of arbitray complexity. We may assume it's a single statement (not a script).
Now, several bits of this input string have to be rewritten.
E.g. table names need to be prefixed, aggregate functions in the selection list that don't use a column alias should be assigned a default one:
SELECT SUM(ABS(x.value)),
TO_CHAR(y.ID,'111,111'),
y.some_col
FROM
tableX x,
(SELECT DISTINCT ID
FROM tableZ z
WHERE ID > 10) y
WHERE
...
becomes
SELECT SUM(ABS(x.value)) COL1,
TO_CHAR(y.ID,'111,111') COL2,
y.some_col
FROM
pref.tableX x,
(SELECT DISTINCT ID, some_col
FROM pref.tableZ z
WHERE ID > 10) y
WHERE
...
(Disclaimer: just to illustrate the issue, statement does not make sense)
Since aggregate functions might be nested and subSELECTs are a b_tch, I dare not use regular expressions. Well, actually I did and achieved 80% of success, but I do need the remaining 20%.
The right approach, I presume, is to use grammars and parsers.
I fiddled around with c++ ANTLR2 (although I do not know much about grammars and parsing with the help of such). I do not see an easy way to get the SQL bits:
list<string> *ssel = theAST.getSubSelectList(); // fantasy land
Could anybody maybe provide some pointers on how "parsing professionals" would pursue this issue?
EDIT: I am using Oracle 9i.
Maybe you can use this, it changes an select statement into an xml block:
declare
cl clob;
begin
dbms_lob.createtemporary (
cl,
true
);
sys.utl_xml.parsequery (
user,
'select e.deptno from emp e where deptno = 10',
cl
);
dbms_output.put_line (cl);
dbms_lob.freetemporary (cl);
end;
/
<QUERY>
<SELECT>
<SELECT_LIST>
<SELECT_LIST_ITEM>
<COLUMN_REF>
<SCHEMA>MICHAEL</SCHEMA>
<TABLE>EMP</TABLE>
<TABLE_ALIAS>E</TABLE_ALIAS>
<COLUMN_ALIAS>DEPTNO</COLUMN_ALIAS>
<COLUMN>DEPTNO</COLUMN>
</COLUMN_REF>
....
....
....
</QUERY>
See here: http://forums.oracle.com/forums/thread.jspa?messageID=3693276&#3693276
Now you 'only' need to parse this xml block.
Edit1:
Sadly I don't fully understand the needs of the OP but I hope this can help (It is another way of asking the 'names' of the columns of for example query select count(*),max(dummy) from dual):
set serveroutput on
DECLARE
c NUMBER;
d NUMBER;
col_cnt PLS_INTEGER;
f BOOLEAN;
rec_tab dbms_sql.desc_tab;
col_num NUMBER;
PROCEDURE print_rec(rec in dbms_sql.desc_rec) IS
BEGIN
dbms_output.new_line;
dbms_output.put_line('col_type = ' || rec.col_type);
dbms_output.put_line('col_maxlen = ' || rec.col_max_len);
dbms_output.put_line('col_name = ' || rec.col_name);
dbms_output.put_line('col_name_len = ' || rec.col_name_len);
dbms_output.put_line('col_schema_name= ' || rec.col_schema_name);
dbms_output.put_line('col_schema_name_len= ' || rec.col_schema_name_len);
dbms_output.put_line('col_precision = ' || rec.col_precision);
dbms_output.put_line('col_scale = ' || rec.col_scale);
dbms_output.put('col_null_ok = ');
IF (rec.col_null_ok) THEN
dbms_output.put_line('True');
ELSE
dbms_output.put_line('False');
END IF;
END;
BEGIN
c := dbms_sql.open_cursor;
dbms_sql.parse(c,'select count(*),max(dummy) from dual ',dbms_sql.NATIVE);
dbms_sql.describe_columns(c, col_cnt, rec_tab);
for i in rec_tab.first..rec_tab.last loop
print_rec(rec_tab(i));
end loop;
dbms_sql.close_cursor(c);
END;
/
(See here for more info: http://www.psoug.org/reference/dbms_sql.html)
The OP also want to be able to change the schema name of the table in a query. I think the easiest say to achieve that is to query the table names from user_tables and search in sql statement for those table names and prefix them or to do a 'alter session set current_schema = ....'.
If the source of the SQL statement strings are other coders, you could simply insist that the parts that need changing are simply marked by special escape conventions, e.g., write $TABLE instead of the table name, or $TABLEPREFIX where one is needed. Then finding the places that need patching can be accomplished with a substring search and replacement.
If you really have arbitrary SQL strings and cannot get them nicely marked, you need to somehow parse the SQL string as you have observed. The XML solution certainly is one possible way.
Another way is to use a program transformation system. Such a tool can parse a string for a language instance, build ASTs, carry out analysis and transformation on ASTs, and then spit a revised string.
The DMS Software Reengineering Toolkit is such a system. It has PLSQL front end parser. And it can use pattern-directed transformations to accomplish the rewrites you appear to need. For your example involving select items:
domain PLSQL.
rule use_explicit_column(e: expression):select_item -> select_item
"\e" -> "\e \column\(\e\)".
To read the rule, you need to understand that the stuff inside quote marks represents abstract trees in some computer langauge which we want to manipulate. What the "domain PLSQL" phrase says is, "use the PLSQL parser" to process the quoted string content, which is how it knows. (DMS has lots of langauge parsers to choose from). The terms
"expression" and "select_item" are grammatical constructs from the language of interest, e.g., PLSQL in this case. See the railroad diagrams in your PLSQL reference manual.
The backslash represents escape/meta information rather than target langauge syntax.
What the rule says is, transform those parsed elements which are select_items
that are composed solely of an expression \e, by converting it into a select_item consisting of the same expression \e and the corresponding column ( \column(\e) ) presumably based on position in the select item list for the specific table. You'd have to implement a column function that can determine the corresponding name from the position of the select item. In this example, I've chosen to define the column function to accept the expression of interest as argument; the expression is actually passed as the matched tree, and thus the column function can determine where it is in the select_items list by walking up the abstract syntax tree.
This rule handles just the select items. You'd add more rules to handle the other various cases of interest to you.
What the transformation system does for you is:
parse the language fragment of interest
build an AST
let you pattern match for places of interest (by doing AST pattern matching)
but using the surface syntax of the target langauge
replace matched patterns by other patterns
compute aritrary replacements (as ASTs)
regenerate source text from the modified ASTs.
While writing the rules isn't always trivial, it is what is necessary if your problem
is stated as posed.
The XML suggested solution is another way to build such ASTs. It doesn't have the nice pattern matching properties although you may be able to get a lot out of XSLT. What I don't know is if the XML has the parse tree in complete detail; the DMS parser does provide this by design as it is needed if you want to do arbitrary analysis and transformation.

clip analysis in arcpy

I have one shapefile that covers an entire city, and a list of shapefiles which are buffers in different places in the city. I want to clip the city with each buffer. I tried using ArcPy in Python but the code is not working. What am I doing wrong?
import arcpy
from arcpy import env
from arcpy.sa import *
env.workspace = "U:\Park and Residential Area\Test\SBA park_res_buffer_5\SBA.gdb"
infeature= "U:\Park and Residential Area\Test\park_res_merge.shp"
clipfeatture = arcpy.ListFeatureClasses("*", "polygon")
for i in clipfeatture:
outclipfeatture = arcpy.Clip_analysis(infeature,i)
outclipfeatture.save("U:\Park and Residential Area\Test\SBA park_res_buffer_5/"*i)
This is the appropriate syntax for using Clip in ArcPy:
arcpy.Clip_analysis(in_features, clip_features, out_feature_class)
so your for loop should instead be something like:
for i in clipfeatture:
outfeature = "U:\Foo\Bar\" + i
arcpy.Clip_analysis(infeature, i, outfeature)
I would also print() each file path string so you can check that its syntax is being used appropriately. Backslashes are escape characters in Python and can have special properties when followed by key letters.
I always put an r in front of any string that contains a file path, e.g. r"\\srvr\drv\proj\gdb.gdb\fc"; this tells Python it is a raw string and ignores the escape functions.
See link below for an entertaining analogy on handling backslashes in filenames.
https://pythonconquerstheuniverse.wordpress.com/2008/06/04/gotcha-%E2%80%94-backslashes-in-windows-filenames/
To do multiple clip with more than one clip features, you have to first create a list of all clip features and iterate them.
import arcpy
arcpy.env.workspace = file_path
fcList = arcpy.ListFeatureClasses()
for fc in fcList:
arcpy.Clip_analysis(input_feature, fc, output_feature)
Be sure to have different names for your multiple outpus. You can use arcpy.CreateUniqueName() to create distinct names such as Buffer.shp, Buffer_1.shp, Buffer_2.shp ...
You can also export the iterate feature selection tool from Model Builder if each place is unique.
http://desktop.arcgis.com/en/arcmap/10.3/tools/modelbuilder-toolbox/iterate-feature-selection.htm
# Import arcpy module
import arcpy
# Load required toolboxes
arcpy.ImportToolbox("Model Functions")
# Local variables:
Selected_Features = ""
Value = "1"
# Process: Iterate Feature Selection
arcpy.IterateFeatureSelection_mb("inputfeature", "fields", "false")

String templating

I have the task of designing a feature that generates Id numbers that follow rules that can be extended later.
For example; A system may require their id numbers be generated as follows: 18122424. This would be broken down into: [DepartmentId][YearCreated][Sequential]. Department = 18, Year = 12, Sequence = 2424.
How do I go about designing a rule engine that allows the user to change it? I came up with a format like:
Dept(#)
Year(#)
Seq(#)
Initials(#) <-- Name initials.
So the rule for that Id above is: [Dept(2)][Year(2)][Seq(4)]. If I get this as a string, how do I parse it to get the rules? Regex or normal string search?
Is there an easier or more efficient way of doing this conceptually?
If I am right, your question is about parsing the rule string, i.e. retrieving the field names and lenghts.
I would work one field at a time and possibly use a Regexpr for [ letters ( digits ) ], like
\[{[A-Za-z]+}\({[0-9]+}\)\]
(MSVC syntax, retrieve the two tagged expressions).
You'll also need to store a dictionary of possible field names and convert the digits to an integer.
Alternatively, combined C strchr and scanf can do the trick.
This is not a "rule engine", this is just templating.
Whatever language you are using has some kind of templating, and a way to specify all of these formats. Just use that.
I don't know who upvoted it.
Generating the number
- you first have to decide the template. something like depid-year-seq. Assuming you have only these 3 "Variables". Then you should have have these variables mapped to original value. use a string parser and split the "given" template on -. It will give you array with index 0 = depid, 1 = year, 2 = seq. Loop through the array and create a string by using the corresponding values for each string on given index i.e.
- 0 = depid = "18"
- 1 = year = "18" + "12" = "1812"
- 2 = seq = "1812" + "1212" = "18121212"
Reverse
you should parse your given number by splillting it up in 2-2-rest .. .. i guess you can pick up from here.

SQL engine for only SELECT statment - hand written parser in c++

I am a newbie of compilers, but i got a project SQL engine - for only select statement. For this i have to use only hand written parser and engine. I studied the samples of LL(k) grammar and recursive descent techniques(suggested by stackoverflow for writing parser by hand). But in any of the samples, didn't find the way to construct the parse tree from functions. Can any one of you tell me, how to do the whole compilation process step by step, by just taking "Select columnname1,columnname2 from table" example. And one more thing - boost libraries are also not allowed. Data is in memory. I used structures to store the data.Thanks in advance.
You might also find the BNF grammars for SQL 92, 99 and 2003 online which might come handy.
These are complete grammars, but it should not be to hard to isolate only the SELECT branch.
Also, you might explore already available grammars at http://www.antlr.org/grammar/list; there are a few SQL flavours there.
I would say the easier way is to deal with this as a compiler would:
Tokenization
Parsing (creation of the AST)
Tokenization is about identifying "words", for your example this gives:
"Select", "columnname1", ",", "columnanme2", "from", "table"
Parsing is about interpreting this list of tokens into the Abstract Syntax Tree. Given your requirements:
First token should be "select" (case-insensitive)
It may be followed by either "*" or a comma-separated list of column names
"from" (case-insensitive) marks the end of the list
it's followed by the table name
I'm going to outline this in Python
class Select:
def __init__(self):
self.columnList = None
self.tableName = None
def Tokenize(statement):
result = []
for word in statement.split(" "):
sub = word.split(",") # the comma may not be separated by space
for i in range(len(sub)-1):
result.append(sub[i].lower()) # case insensitive
result.append(",")
result.append(sub[-1])
return result
def CreateSelect(tokens):
if len(tokens) == 0: raise NoToken()
if tokens[0] != "select": raise NotASelectStatement()
select = Select()
i = 1
while tokens[i] != "from":
if tokens[i] == "*":
select.columnList == ALL
break
else:
select.columnList.append(tokens[i])
i = i + 1
if tokens[i] != ",": raise ExpectedComma(i)
i = i + 1
select.tableName = tokens[i+1]
Of course, as you realize, this is a limited example:
It does not take multiple tables into account
It does not take aliasing into account
It does not take nested select statements into account
However it works pretty well and is quite efficient. It could, normally, be even more efficient if we combined the tokenization and parsing phases, but I have found in general that it only made the code much harder to read.
Also note that for proper error reporting it would be better:
not to alter the tokens (.lower()) but simply to use case-insensitive comparison
to adjoin proper source location to each token (line/column) so as to be able to point the user to the right place
EDIT:
What an AST would look like with a less contrived example ?
"select foo, bar from FooBar where category = 'example' and id < 500;"
Let's go:
Select
|_ Columns (foo, bar)
|_ Table (FooBar)
\_ Where
\_ And
|_ Equal (category, 'example')
\_ LessThan (id, 500)
Here you have a tree-like structure, which is what you want to produce.

Resources