How to extract, rename, and move files via scripting - arcpy

I am a beginner in Python, so this problem is really daunting. I am using ArcGIS 10.2 with Python 2.7
I have about 3,000 zip files (Winzip) which each contain four shapefiles describing different vector features. These shapefiles have the same four identical names inside each zip file. They are essentially a time series, with the same four data sets broken up to individual dates.
The name of the zip file contains a string somewhere in the middle of the name that I need to retrieve and include in the name of the extracted shapefiles.
Then I need to move each renamed shapefile to a different directory based on type.
For example:
usdm_20001001.zip (I need the 20001001 from the title)
|--DI_Callout.shp (needs to be renamed C20001001 and moved to a directory = Callout)
|--DI_Type.shp (needs to be renamed T20001001 and moved to a directory = Type)
|--file three
|--file four
And so on, 3,000 times.

Python has a Zip interface. I've put together a little script that has all the pieces you need but you will need to do some look-up or conversion with names and paths; there's not enough info in your question to decide on the names and I can't follow your folder structure.
import sys, os, zipfile
InFolder = sys.argv[1]
for Zfile in os.listdir(InFolder + "\*.zip"):
# Open the archive
Archive = zipfile.ZipFile(InFolder + "\\" + Zfile)
# get the base name (no extension) and split it
# to exract the ID.
Zname, Zext = os.splitext(Zfile)
Zsplit = Zname.split("_")
BaseName = "C%s" % Zsplit[1]
ZDir = InFolder + "\\" + BaseName
# If the folder doesn't exist create it
if not os.path.exists(ZDir):
os.mkdir(ZDir)
# setp through each file in the archive
# extracting each one as you go.
for member in Archive.infolist():
InName, InExt = os.splitext(member.filename)
# Not sure about naming here, you will need to do
# some string manipulation
OutFileName = "NotSure" + InExt
# not sure about extract with a diffent name
# so extract and rename
Archive.extract(member,ZDir)
os.rename(ZDir + "\\" + member.filename, ZDir + "\\" + OutFileName)
All the pieces are there: splitting strings, working with a zip file, testing & creating folders, renaming files and splitting file names from extensions. Use this as a basis and you should be well on your way.

Related

How do I make my Python3 string matching code ignore files that do not match any criteria?

I have a Python3 script that reads the first eight characters of every filename in a directory in order to determine whether the file was created before or after 180 days ago based on each file's name. The file names all begin with YYYYMMDD or eerasedd_YYYYMMDD_etc.xls. I can collect all these filenames already.
I need to tell my script to ignore any filename that does not conform to the standard eight leading numerical characters, example: 20180922 or eerasedd_20171207_1oIkZf.so.
if name.startswith('eerasedd_'):
fileDate = datetime.strptime(name[9:17], DATEFMT).date()
else:
fileDate = datetime.strptime(name[0:8], DATEFMT).date()
I need logic to prevent the script from choking on files that don't fit the desired pattern. The script needs to carry on with its work and forget about non-conformant filenames. Do I need to add code that causes an exception or just add an elif block?
I have a function to get only the names of those files I need based on their extensions.
def get_files(extensions):
all_files = []
for ext in extensions:
all_files.extend(Path('/Users/mrh/Python/calls').glob(ext))
for file in get_files(('*.wav', '*.xml')):
print (file.name)
Now I need to figure out how to check each 'file.name' for the date string in its filename. i.e. now I need to run something like
if name.startswith('eerasedd_'):
fileDate = datetime.strptime(name[9:17], DATEFMT).date()
else:
fileDate = datetime.strptime(name[0:8], DATEFMT).date()
against 'file.name' to see whether the files are 180 days old or less.

How do I load multiple yaml files using Snake YAML Java?

I have a few yaml files in a directory. How do I load all of them into the same YAML Object (Map)?
# a.yaml
a: ValueA
c: ValueC
# b.yaml
a: ValueAA
b: ValueB
I want to a.yaml, followed by b.yaml. The result would be:
{ a: ValueAA, b: ValueB, c: ValueC }
One way I could do this is to explicitly concatenate the contents of a.yaml and b.yaml into a single String, and then load the merged String. I would like to know if I can avoid that and just load the 2 yaml files sequentially using the load() API.
I don't know the details of SnakeYAML, but you cannot just concatenate the two files a.yaml and b.yaml, even if they both have a mapping at the root level.
Doing so you would get a duplicate key in your mapping and according to the YAML 1.2 (and 1.0) specifications you are not allowed to have duplicate keys, and the 1.1 specification states you should get a warning on a duplicate key, and the first value (you indicate you want to have the second).
So you have to resolve this in Java, and you can do so by loading the documents from their respective files and update the data structure loaded from a.yaml with the one from b.yaml.
You can also concatenate the files into a file with multiple documents, but for that the documents would have to be separated by a directives-end-indicator (---) or a document-end-indicator (...). You normally need to use a special "load-all" function to load such a multi-document file resulting in a list of (or iterator over) data-structures loaded from the mappings, that you then can merge.
If you make the multi-document file programmatically, make sure to check that the files end in a newline, otherwise appending --- and the next file is not going to give the multi-document stream you expect.
Refer to Anthon's answer for the details. As a FYI to folks who want to a working snippet, this is what I did.
final Yaml yaml = new Yaml(configConstructor, representer);
try (
final InputStream defaultYamlStream = new FileInputStream(settingsPath + "/cbench-default.yaml");
final InputStream customerYamlStream = new FileInputStream(settingsPath + "/" + identifier + ".yaml");
final InputStream fullYamlStream = new SequenceInputStream(defaultYamlStream, customerYamlStream);
) {
//try
parsedConfig = (BenchmarkConfig) yaml.load(fullYamlStream);
} catch (IOException ioe) {
// ERROR
System.out.println("Exception parsing the YAML configuration.");
throw new RuntimeException("Exception parsing the YAML configuration.", ioe);
}
I'm creating a stream of concatenated files (sequence stream in my case) and using the load API as recommended by Anthon, and it works fine. Do make a note of the document end markers.

Sorting a file of file paths based on type of ending (over 7000 lines)

I am trying to sort a file (over 7000 lines) where each line is a file path from a server that I ssh'd into and put every single file path into one text file that is sorted alphabetically, and depending on the type of ending (.png, .jpg, .php, .html, .doc, etc.), and place those file paths in their own separate text file (for organization purposes).
Some example lines from the file:
./public_html/application/libraries/phpass-0.1/c/crypt_private.c
./public_html/creativity/archive/oldsite/curricular/revised ArtScience.10.1.doc
./public_html/chambers/Chambers Fund Guidelines9-1-2010 .pdf
./public_html/js/jquery-ui/development-bundle/demos/autocomplete/images/ui-anim_basic_16x16.gif
./tmp/webalizer/ssl/entrepreneurship.wfu.edu/hourly_usage_201112.png
./public_html/js/jquery-ui/development-bundle/demos/droppable/images/high_tatras2.jpg
./public_html/js/jquery-ui/development-bundle/demos/autocomplete/categories.html
The lines I've provided above represent only a very small amount of the different types of files I have to sort through. Some of them, after looking through the file either have more than one ending:
./public_html/creativity/archive/oldsite/home_images/_notes/home_nav_bottom.jpg.mno
or no ending at all:
./public_html/old/mambots/editors/tinymce/jscripts/tiny_mce/plugins/insertdatetime
After thinking about how I would implement this in C++, this is the ROUGH outline (in pseudocode) of what I would do:
int main()
{
/*have all necessary includes and namespaces*/
/*initialize variables and do file opening*/
while(/*we are not at end of file*/)
{
switch(/*by the type of file ending*/)
{
case .png:
/*store it in a separate file just for .png lines*/
break;
case .jpg
/*store it in a separate file just for .jpg lines*/
break;
/*have more cases to handle the rest of the type of endings*/
case default:
break;
}
}
/*close file*/
return 0;
}
And the questions that I have are the following:
How do I check line by line in the file that we have reached an ending like .jpg, .png, .php, etc.?
How do I account for all the different possible file endings (even though I've been through the whole file, I'm not exactly sure how many different endings there are) in my cases within my switch statement?
How do I handle the cases where a file path may have more than one ending (like the example I provided above)?
And of course, if there is a better way to do this using C++ (perhaps another language that would make this easier?), I'm all ears.
Why not use the file extension as part of the file name in order to ensure separate files for different file types?
A bit like this:
int main()
{
/*have all necessary includes and namespaces*/
/*initialize variables and do file opening*/
while(/*we successfully read a line from the file*/)
{
/* extract the file extension from end of line*/
/* create a file name incorporating the file extension (table lookup?) */
/* Append the line to the file of that file name */
}
/*close file*/
return 0;
}
So your file names might be something like this:
list-of-jpg.txt
list-of-mpeg.txt
list-of-html.txt
etc...
NOTES:
A file extension can be extracted from the line like this:
std::string ext;
std::string::size_type pos = line.rfind('.');
if(pos != std::string::npos)
ext = line.substr(pos + 1);
When a file has more than one ending it is usually the last that applies. For example a file with extension .tar.gz is a file that was creates as a tar but was later gzipped. So it is a now a gzip gz. So I would trust the last extension. It is likely the true format of the file that was converted from the previous extension's format.
That depends how you read your strings, assuming you have a string per line it could be something like this:
your_string.compare(your_string.length()-4, 4, ".jpg");
Note that C++ does not support string compares with switch statements. To make things easier however, you could split the extension using std::string::find() together with std::string::substr() so you can just compare the extensions straight away.
That's what the default is for :)
With the above compare statement you could do that easily, just make sure you have the compound extensions before the separate ones.
awk or perl spring to mind, or just some basic shell scripting in general. Something like this would probably do the trick:
awk -F '.' '{print $NF,$0}' your_file.txt | sort | cut -f2- -d'.'

How to automate and import files that are located in date sequential folders into SAS?

I currently have 700 folders that are all sequentially named.
The naming convention of the folders are as follows:-
2011-08-15_2011-08-15
2011-08-16_2011-08-16
2011-08-17_2011-08-17
...
2013-09-20_2013-09-20
There are 10 txt files within each folder that have the same naming convention.
With the txt files all being the same, what I am trying to achieve is to automate the infile and then use the name of the folder, eg 2011-08-15_2011-08-15 or part of eg. 2011-08-15 to then be the name of the created data set.
I can successfully import all the txt files so there is no issue there, the issue is i don't want to be changing the folder name each time in the infile step.
'C:\SAS data\Extract\2011-08-17_2011-08-17\abc.txt'
Is there an easier way to read these files in? I can find code for sequential txt/csv files but can find nothing to reference a folder and then rename the data set.
You should be able to wildcard the folders/files into a single fileref, e.g.
filename allfiles "c:\SAS_data\extract\*\*.txt" ;
data alldata ;
length fn _fn $256. ;
infile allfiles lrecl=256 truncover filename=_fn ;
fn = _fn ; /* Store the filename */
input ;
put _INFILE_ ;
run ;
The wildcard folder & file works in SAS Unix, not sure about SAS PC.

Insert file (foo.txt) into open file (bar.txt) at caret position

What would be the best method, please, to insert file (foo.txt) into open file (bar.txt) at caret position?
It would be nice to have an open-file dialog to choose anything to be inserted.
The word processing equivalent would be "insert file" here.
Here is a substitute for foo.sublime-snippet, which can be linked to form files elsewhere:
import sublime, sublime_plugin
class InsertFileCommand(sublime_plugin.TextCommand):
def run(self, edit):
v = self.view
template = open('foo.txt').read()
print template
v.run_command("insert_snippet", {"contents": template})
From within a text command you can access the current view. You can get the cursor positions using self.view.sel(). I don't know how to do gui stuff in python, but you can do file selection using the quick panel (similar to FuzzyFileNav).
Here is my unofficial modification of https://github.com/mneuhaus/SublimeFileTemplates which permits me to insert-a-file-here using the quick panel. It works on an OSX operating system (running Mountain Lion).
The only disadvantage I see so far is the inability to translate a double-slash \\ in the form file correctly -- it gets inserted instead as just a single-slash \. In my LaTex form files, the double-slash \\ represents a line ending, or a new line if preceded by a ~. The workaround is to insert an extra slash at each occurrence in the actual form file (i.e., put three slashes, with the understanding that only two slashes will be inserted when running the plugin). The form files need to be LF endings and I'm using UTF-8 encoding -- CR endings are not translated properly. With a slight modification, it is also possible to have multiple form file directories and/or file types.
import sublime, sublime_plugin
import os
class InsertFileCommand(sublime_plugin.WindowCommand):
def run(self):
self.find_templates()
self.window.show_quick_panel(self.templates, self.template_selected)
def find_templates(self):
self.templates = []
self.template_paths = []
for root, dirnames, filenames in os.walk('/path_to_forms_directory'):
for filename in filenames:
if filename.endswith(".tex"): # extension of form files
self.template_paths.append(os.path.join(root, filename))
self.templates.append(os.path.basename(root) + ": " + os.path.splitext(filename)[0])
def template_selected(self, selected_index):
if selected_index != -1:
self.template_path = self.template_paths[selected_index]
print "\n" * 25
print "----------------------------------------------------------------------------------------\n"
print ("Inserting File: " + self.template_path + "\n")
print "----------------------------------------------------------------------------------------\n"
template = open(self.template_path).read()
print template
view = self.window.run_command("insert_snippet", {'contents': template})
sublime.status_message("Inserted File: %s" % self.template_path)

Resources