Can I take a ref to part of std::string inside a loop and have it available outside? - c++

I am parsing a std::string in a function which takes the string as a const std::string& parameter, and it is useful for me to take the whole rest of the string after I encounter a space. I want to store a reference to the tail of the string at that stage and use it to avoid copying the string.
I have a switch statement inside a for loop iterating over the string, and I will need to take the ref in one of the case labels. The problem is that you can't re-assign to a reference AFAIK, and declaring the ref inside the case is also illegal. Furthermore, I need the ref outside the switch as well.
Here is a simplified example:
#include <string>
bool isValid(const std::string &s)
{
unsigned int length {s.length()};
for (unsigned int i {0}; i < length; ++i)
{
switch s[i]
{
// whatever other cases
case ' ':
const std::string &tail {s.substr(i, length - i};
break;
default:
// whatever
}
if (tail == "abcd")
// do something
}
}
I don't know exactly what should happen, because I am pretty much a C++ newbie, but I just want to save the cost of a copy and allocation on the heap if I don't use a ref.

I want to store a reference to the tail of the string at that stage and use it to avoid copying the string.
A reference is an alias for an object, so there is no such thing as a reference to a part of a std::string object. You can either create a new sub-string, e.g. with std::string::substr as you posted it:
const std::string &tail {s.substr(i, length - i};
Note that this is valid as const-qualified references extends the lifetime of a temporary, here the return value of s.substr(...), but weird.
A cleaner way would be to pass a std::string_view in the first place, which is exactly tailored for such a use case, and use std::string_view::substr to extract new views on parts of the strings. This is much more efficient (as no buffer must be copied) and idiomatic.
#include <string_view>
bool isValid(const std::string_view &s)
{
// ...
const std::string_View tail = s.substr(i, length - i);
}

I want to store a reference to the tail of the string
std::string & is a reference to a string. It is not a reference to a part of a string.
This is what std::string_view is for:
const std::string_view tail =
std::string_view(s).substr(i, std::string_view::npos);
Prior to C++17, you could simply use a pair of iterators:
auto substr_beg = s.begin() + i;
auto substr_end = s.end();

Related

std::vector<std::string> to char* array

I have a std::vector<std::string> that I need to use for a C function's argument that reads char* foo. I have seen how to convert a std::string to char*. As a newcomer to C++, I'm trying to piece together how to perform this conversion on each element of the vector and produce the char* array.
I've seen several closely related SO questions, but most appear to illustrate ways to go the other direction and create std::vector<std::string>.
You can use std::transform as:
std::transform(vs.begin(), vs.end(), std::back_inserter(vc), convert);
Which requires you to implement convert() as:
char *convert(const std::string & s)
{
char *pc = new char[s.size()+1];
std::strcpy(pc, s.c_str());
return pc;
}
Test code:
int main() {
std::vector<std::string> vs;
vs.push_back("std::string");
vs.push_back("std::vector<std::string>");
vs.push_back("char*");
vs.push_back("std::vector<char*>");
std::vector<char*> vc;
std::transform(vs.begin(), vs.end(), std::back_inserter(vc), convert);
for ( size_t i = 0 ; i < vc.size() ; i++ )
std::cout << vc[i] << std::endl;
for ( size_t i = 0 ; i < vc.size() ; i++ )
delete [] vc[i];
}
Output:
std::string
std::vector<std::string>
char*
std::vector<char*>
Online demo : http://ideone.com/U6QZ5
You can use &vc[0] wherever you need char**.
Note that since we're using new to allocate memory for each std::string (in convert function), we've to deallocate the memory at the end. This gives you flexibility to change the vector vs; you can push_back more strings to it, delete the existing one from vs, and vc (i.e vector<char*> will still be valid!
But if you don't want this flexibility, then you can use this convert function:
const char *convert(const std::string & s)
{
return s.c_str();
}
And you've to change std::vector<char*> to std::vector<const char*>.
Now after the transformation, if you change vs by inserting new strings, or by deleting the old ones from it, then all the char* in vc might become invalid. That is one important point. Another important point is that, you don't need to use delete vc[i] in your code anymore.
The best you can do is allocate an std::vector of const char* the same size as your vector. Then, walk each element of the vector, calling c_str() to get the string array and storing it the corresponding element of the array. Then you can pass the pointer to the first element of this vector to the function in question.
The code would look like this:
std::vector<const char *> cStrArray;
cStrArray.reserve(origVector.size());
for(int index = 0; index < origVector.size(); ++index)
{
cStrArray.push_back(origVector[index].c_str());
}
//NO RESIZING OF origVector!!!!
SomeCFunction(&cStrArray[0], cStrArray.size());
Note that you cannot allow the original vector of strings to be resized between the time you fetch the const char*s from the std::strings, and the time you call the C-function.
This should work:
char ** arr = new char*[vec.size()];
for(size_t i = 0; i < vec.size(); i++){
arr[i] = new char[vec[i].size() + 1];
strcpy(arr[i], vec[i].c_str());
}
EDIT:
Here's how you would free these data structures assuming vec still has the correct number of elements, if your C function modifies this array somehow you may need to get the size another way.
for(size_t i = 0; i < vec.size(); i++){
delete [] arr[i];
}
delete [] arr;
EDIT Again:
It may not be necessary to copy the strings if your C function does not modify the strings. If you can elaborate on what your interface looks like I'm sure we could provide you with better help.
A C++0x solution, where elements of std::string are guaranteed to be stored contiguously:
std::vector<std::string> strings = /* from somewhere */;
int nterms = /* from somewhere */;
// using std::transform is a possibility depending on what you want
// to do with the result of the call
std::for_each(strings.begin(), string.end(), [nterms](std::string& s)
{ ModelInitialize(&s[0], nterms); }
If the function null terminates its argument, then after the call (s.begin(), s.end()) might not be meaningful. You can post-process to fix that:
s = std::string(s.begin(), std::find(s.begin(), s.end(), '\0'));
A more elaborate version that separately copies each string into a char[]:
typedef std::unique_ptr<char[]> pointer;
std::vector<pointer> args;
std::transform(strings.begin(), strings.end()
, std::back_inserter(args)
, [](std::string const& s) -> pointer
{
pointer p(new char[s.size()]);
std::copy(s.begin(), s.end(), &p[0]);
return p;
});
std::for_each(args.begin(), args.end(), [nterms](pointer& p)
{ ModelInitialize(p.get(), nterms); });
const char* is also the same as char*, only different in the const_ness, your interface method accepts both const and non-const string.
Doesn't c_str() return a const char? Will that be a problem if I just
need a char*?
Yes, it returns a const string and no there should no problem
const char*a="something";
////whatever it is here
const char* retfunc(const char*a)
{
char*temp=a;
//process then return temp
}
Returning a local object is n't accepted by many people andthis tiny example is provided as an as-is.
The elements of a vector are stored contiguously, so the best and easy way is:
std::vector<char> v;
char* c = &v[0];

How to use a hash_map with case insensitive unicode string for key?

I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase) but in C++. This is roughly what I'm trying:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
The trouble is, neither find operation usually works (it returns someMap.end()). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.
How can I get a dictionary-like structure that uses case-insensitive keys and can store my custom struct?
Thanks.
The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.
Scanning the docs, I think that what you have to do is this, which basically replaces the use of StringComparer.OrdinalIgnoreCase you had in your Dictionary:
struct my_hash_compare {
const size_t bucket_size = 4;
const size_t min_buckets = 8;
size_t operator()(const LPCWSTR &Key) const {
// implement a case-insensitive hash function here,
// or find something in the Windows libraries.
}
bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
// implement a case-insensitive comparison function here
return _wcsicmp(Key1, Key2) < 0;
// or something like that. There's warnings about
// locale plastered all over this function's docs.
}
};
I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with operator==. That is, they might require that if my_hash_compare()(a,b) is false, and my_hash_compare()(b,a) is false, then a == b. Obviously that's not true for what I've written, in which case you're out of luck.
As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?
Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to insert, in which case you should use wstring as the key.
There was some great information given here. I gathered bits and pieces from the answers and put this one together:
#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>
typedef std::pair<std::wstring, int> MyPair;
struct key_comparer
{
bool operator()(std::wstring a, std::wstring b) const
{
return _wcsicmp(a.c_str(), b.c_str()) < 0;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::map<std::wstring, int, key_comparer> mymap;
mymap.insert(MyPair(L"GHI",3));
mymap.insert(MyPair(L"DEF",2));
mymap.insert(MyPair(L"ABC",1));
std::map<std::wstring, int, key_comparer>::iterator iter;
iter = mymap.find(L"def");
if (iter == mymap.end()) {
printf("No match.\n");
} else {
printf("match: %i\n", iter->second);
}
return 0;
}
If you use an std::map instead of the non-standard hash_map, you can set the comparison function to be used when doing the binary search:
// Function object for case insensitive comparison
struct case_insensitive_compare
{
case_insensitive_compare() {}
// Function objects overloader operator()
// When used as a comparer, it should function as operator<(a,b)
bool operator()(const std::string& a, const std::string& b) const
{
return to_lower(a) < to_lower(b);
}
std::string to_lower(const std::string& a) const
{
std::string s(a);
std::for_each(s.begin(), s.end(), char_to_lower);
return s;
}
void char_to_lower(char& c) const
{
if (c >= 'A' && c <= 'Z')
c += ('a' - 'A');
}
};
// ...
std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!
LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the wstring specialization of basic_string instead.
For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.

When would I pass const& std::string instead of std::string_view?

I understand the motivation for using std::string_view;
it can help avoid unecessary allocations in function arguments.
For example:
The following program will create a std::string from a string literal.
This causes an undesired dynamic allocation, as we are only interested observing the characters.
#include <iostream>
void* operator new(std::size_t n)
{
std::cout << "[allocating " << n << " bytes]\n";
return malloc(n);
}
void observe_string(std::string const& str){}
int main(){
observe_string("hello world"); //prints [allocating 36 bytes]
}
Using string_view will solve the problem:
#include <iostream>
#include <experimental/string_view>
void* operator new(std::size_t n)
{
std::cout << "[allocating " << n << " bytes]\n";
return malloc(n);
}
void observe_string(std::experimental::string_view const& str){
}
int main(){
observe_string("hello world"); //prints nothing
}
This leaves me with a question.
When would I choose std::string by const& instead of string_view for function arguments?
Looking at the interface of std::string_view, it looks as though I could replace all instances of std::string that are passed by const&. Are there any counter examples to this? Is std::string_view meant to replace std::string const& for parameter passing?
When would I choose std::string by const& instead of string_view for function arguments?
Do you need a null-terminated string? If so, then you should use std::string const& which gives you that guarantee. string_view does not - it's simply a range of const char.
If you do not need a null-terminated string, and you do not need to take ownership of the data, then you should use string_view. If you do need to take ownership of the data, then it may be the case that string by value is better than string_view.
Andrei Alexandrescu once said, "No Work is better than some work". So you should use const std::string& in such contexts. Because std::string_view still involves some work (copying a pair of pointer and length).
Of course, const references may still have the cost of copying a pointer; which is almost the equivalent of what std::string_view will do. But there's one additional work with std::string_view, it also copies the length.
This is in theory, but in practice, a benchmark will be preferred to infer performance
One possible reason to accept const std::string& instead of string_view is when you want to store reference to string object which can change later.
If you accept and store a string_view, it might become invalid when string internal buffer reallocates.
If you accept and store reference to string itself, you won't have that problem, as long as that object is alive (you probably want to delete r-value reference overload, to avoid obvious problem with temporaries).
It is not really what you were asking, but sometimes you want to take std::string by value rather than std::string_view for performance reasons. This is the case when you will need to modify the string before inspecting it:
bool matches(std::string s)
{
make_upper_case(s);
return lib::test_if_matches(s);
}
You need a mutable string somewhere anyway, so you may declare it as function parameter. If you changed it to to std::string_view, and somebody passes an std::string to function matches() you would be first converting string to string_view and then string_view to string, and therefore allocating twice.

How to adapt a string splitting algorithm using pointers so it uses iterators instead?

The code below comes from an answer to this question on string splitting. It uses pointers, and a comment on that answer suggested it could be adapted for std::string. How can I use the features of std::string to implement the same algorithm, for example using iterators?
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ',')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
Ok so I obviously replaced char by string but then I noticed he is using a pointer to the beginning of the character. Is that even possible for strings? How do the loop termination criteria change? Is there anything else I need to worry about when making this change?
You can use iterators instead of pointers. Iterators provide a way to traverse containers, and can usually be thought of as analogous to pointers.
In this case, you can use the begin() member function (or cbegin() if you don't need to modify the elements) of a std::string object to obtain an iterator that references the first character, and the end() (or cend()) member function to obtain an iterator for "one-past-the-end".
For the inner loop, your termination criterion is the same; you want to stop when you hit the delimiter on which you'll be splitting the string. For the outer loop, instead of comparing the character value against '\0', you can compare the iterator against the end iterator you already obtained from the end() member function. The rest of the algorithm is pretty similar; iterators work like pointers in terms of dereference and increment:
std::vector<std::string> split(const std::string& str, const char delim = ',') {
std::vector<std::string> result;
auto end = str.cend();
auto iter = str.cbegin();
do {
auto begin = iter;
while (iter != end && *iter != delim) ++iter;
result.push_back(std::string(begin, iter));
if (iter == end) break; // See note (**) below.
} while (iter++ != end);
return result;
}
Note the subtle difference in the iner loop condition: it now tests whether we've hit the end before trying to dereference. This is because we can't dereference an iterator that points to the end of a container, so we must check this before trying to dereference. The original algorithm assumes that a null character ends the string, so we're ok to dereference a pointer to that position.
(**) The validity of iter++ != end when iter is already end is under discussion in Are end+1 iterators for std::string allowed?
I've added this if statement to the original algorithm to break the loop when iter reaches end in the inner loop. This avoids adding one to an iterator which is already the end iterator, and avoids the potential problem.

How to make an iterator to a read-only object writable (in C++)

I've created a unordered_set of my own type of struct. I have an iterator to this set and would like to increment a member (count) of the struct that the iterator points to. However, the compiler complains with the following message:
main.cpp:61:18: error: increment of member ‘SentimentWord::count’ in read-only object
How can I fix this?
Here's my code:
#include <fstream>
#include <iostream>
#include <cstdlib>
#include <string>
#include <unordered_set>
using namespace std;
struct SentimentWord {
string word;
int count;
};
//hash function and equality definition - needed to used unordered_set with type SentimentWord
struct SentimentWordHash {
size_t operator () (const SentimentWord &sw) const;
};
bool operator == (SentimentWord const &lhs, SentimentWord const &rhs);
int main(int argc, char **argv){
ifstream fin;
int totalWords = 0;
unordered_set<SentimentWord, SentimentWordHash> positiveWords;
unordered_set<SentimentWord, SentimentWordHash> negativeWords;
//needed for reading in sentiment words
string line;
SentimentWord temp;
temp.count = 0;
fin.open("positive_words.txt");
while(!fin.eof()){
getline(fin, line);
temp.word = line;
positiveWords.insert(temp);
}
fin.close();
//needed for reading in input file
unordered_set<SentimentWord, SentimentWordHash>::iterator iter;
fin.open("041.html");
while(!fin.eof()){
totalWords++;
fin >> line;
temp.word = line;
iter = positiveWords.find(temp);
if(iter != positiveWords.end()){
iter->count++;
}
}
for(iter = positiveWords.begin(); iter != positiveWords.end(); ++iter){
if(iter->count != 0){
cout << iter->word << endl;
}
}
return 0;
}
size_t SentimentWordHash::operator () (const SentimentWord &sw) const {
return hash<string>()(sw.word);
}
bool operator == (SentimentWord const &lhs, SentimentWord const &rhs){
if(lhs.word.compare(rhs.word) == 0){
return true;
}
return false;
}
Any help is greatly appreciated!
Elements in an unordered_set are, by definition, immutable:
In an unordered_set, the value of an element is at the same time its
key, that identifies it uniquely. Keys are immutable, therefore, the
elements in an unordered_set cannot be modified once in the container
- they can be inserted and removed, though.
I would vote that you use an unordered_map instead, using a string as the key and an int as the mapped value.
One solution (but a dirty hack) is to make your counter mutable, which means, that you permit to change it even on const objects.
struct SentimentWord {
string word;
mutable int count;
};
As I already said, this is a dirty hack, since it allows you to violate rules (you soften them). And rules have a reason. I'm not even sure if this works, since the definition of the unordered_set says that the values can't be modified once being inserted, and this also has a reason.
A nicer solution is to use a map which uses the word as a key and the counter as a value. Your code then doesn't have to use find but simply access the element using the subscript operator ("array access" operator) which directly returns a reference (not an iterator). On this reference, use the increment operator, like this:
std::unordered_map<std::string,int> positiveWords;
//...
positiveWords[word]++;
Then you don't need your struct at all, and of course also not your custom comparison operator overload.
Trick (just in case you need it): If you want to order a map by its value (if you need a statistical map with the most frequent words coming first), use a second (but ordered) map with reversed key and value. This will sort it by the original value, which is now the key. Iterate it in reverse order to start with the most frequent words (or construct it with std::greater<int> as the comparison operator, provided as the third template parameter).
std::unordered_set is unhappy because it's worried you will change the object in such a way it is the same as another object, which would violate the set. ISTM you really want a map from string to int (not a set at all), and the iterator will let you change the returned value, if not the key.

Resources