Saturday, November 21, 2009

STLSoft 1.9.88 imminent

I'm just preparing a new release of recls, and in the process have needed to enhance winstl::basic_findfile_sequence, adding a new throwOnAccessFailure flag, which (as should be obvious from the name), causes an exception to be thrown in the event of access failure. This will allow consistent behaviour between recls (C/C++) and the new recls 100% .NET library.

Friday, May 29, 2009

frequency_map: new merge() method

The stlsoft::frequency_map container class template now includes a merge() method, which allows the contents from two frequency map instances to be merged into one, as in:


  typedef std::string string_t;
  typedef stlsoft::frequency_map<string_t> fmap_t;

  fmap_t map1;
  fmap_t map2;

  map1.push("key-11");
  map1.push("key-22");

  XTESTS_TEST_BOOLEAN_FALSE(map1.empty());
  XTESTS_TEST_INTEGER_EQUAL(2u, map1.size());
  XTESTS_TEST_INTEGER_EQUAL(1u, map1["key-11"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map1["key-22"]);

  map2.push("key-11");
  map2.push("key-21");

  XTESTS_TEST_BOOLEAN_FALSE(map2.empty());
  XTESTS_TEST_INTEGER_EQUAL(2u, map2.size());
  XTESTS_TEST_INTEGER_EQUAL(1u, map2["key-11"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map2["key-21"]);

  map1.merge(map2);

  XTESTS_TEST_BOOLEAN_FALSE(map1.empty());
  XTESTS_TEST_INTEGER_EQUAL(3u, map1.size());
  XTESTS_TEST_INTEGER_EQUAL(2u, map1["key-11"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map1["key-21"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map1["key-22"]);

  XTESTS_TEST_BOOLEAN_FALSE(map2.empty());
  XTESTS_TEST_INTEGER_EQUAL(2u, map2.size());
  XTESTS_TEST_INTEGER_EQUAL(1u, map2["key-11"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map2["key-21"]);

  map2.clear();

  XTESTS_TEST_BOOLEAN_FALSE(map1.empty());
  XTESTS_TEST_INTEGER_EQUAL(3u, map1.size());
  XTESTS_TEST_INTEGER_EQUAL(2u, map1["key-11"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map1["key-21"]);
  XTESTS_TEST_INTEGER_EQUAL(1u, map1["key-22"]);

  XTESTS_TEST_INTEGER_EQUAL(0u, map2.size());
  XTESTS_TEST_BOOLEAN_TRUE(map2.empty());

Saturday, May 23, 2009

STLSoft 1.10 new additions: integer_to_array()

A new component with STLSoft 1.10 (alpha 11 onwards) is the stlsoft::integer_to_array() function template. It is used to turn a single integer value into an array of bit-chunk values.

For example, given the integer i with value 0x01020304, we can split this into the 8 nibbles as follows:

stlsoft::integer_array r = stlsoft::integer_to_array(i, 4);

r[0] will be 4, r[1] will be 0, r[2] will be 3, r[3] will be 0, r[4] will be 2, r[5] will be 0, r[6] will be 1, and r[7] will be 0.

The split can be on any value between 0 and the bitsize of the input parameter (which can be any of the integral types); so you can split on 3 bits, 1 bit, 17 bits, whatever.

I've done extensive automated tests, and they all pass (natch), but I'm still a little dubious about the component, so I'm definitely interested in feedback.

In case you're wondering, the original rationale for this was a simple way to go from the s_addr member of struct in_addr, which is held in network byte order but is otherwise "opaque".

LP64 and -Wshorten-64-to-32

Along with new Mac OS-X 64-bit makefiles for FastFormat and Pantheios, I'm also playing around with the -Wshorten-64-to-32 warning flag.

In compiling FF with this I encountered a lot of similars, amounting to the following:

enum { x = sizeof(Y) };

Obviously an enum, being int in size, is too small to hold size_t (the result of sizeof operator). The ugly but effective solution to this is:

enum { x = int(sizeof(Y)) };

which you'll now see more of in the STLSoft libs.

Friday, May 22, 2009

Safely abstracting use of strerror / strerror_s (part 1)

A recent bug report on the STLSoft SourceForge project site reported that the stlsoft::error_desc component did not use the new "safe string" library function strerror_s(). This was actually a surprise, because I thought I'd already taken care of that.

Since I hadn't, I decided that I should. The change in implementation to use strerror_s() (when in the presence of the "safe string" library) goes along the lines of the following



char buff[1001];
if(0 != ::strerror_s(buff,
STLSOFT_NUM_ELEMENTS(buff) - 1, errno))
{
buff[0] = '\0';
}
else
{
buff[STLSOFT_NUM_ELEMENTS(buff) - 1] = '\0';
}



The dumb part of the strerror_s() function is that it doesn't tell you how many characters were received. This means that you cannot rely on having elicited the full message unless ::strlen() over the returned string is less than (buffer size - 1). So, the above code could return a partial error string (although the likelihood of that is, of course, vanishingly small).

Instead, what I've done is used an auto_buffer to provide resizable storage, and then strerror_s() is called in a loop until either it fails, or no more storage can be allocated, or the length of the returned string is less than (buffer size - 1). The code looks like the following:



stlsoft::auto_buffer buff(128);

for(;;)
{
int n = ::strerror_s(&buff[0], buff.size() - 1, error);

buff[buff.size() - 1u] = '\0';

if(0 == n)
{
size_t cch = ::strlen(buff.data());

if(cch < buff.size() - 2u)
{
m_length = cch;
buff.resize(cch + 1u);
break;
}
}

if(!buff.resize(1u + buff.size() * 2u))
{
buff.resize(1u);
break;
}
}



There's another problem with the class template, but that'll have to wait until a later time to discuss ...

Monday, May 18, 2009

WinSTL Registry library mods and fixes, part 3: exception-safety

In reviewing the implementation of the WinSTL Registry Library's winstl::basic_reg_value class - as described in part 1 and part 2 of this series of posts - I've also spotted a defect in exception-safety.

Consider the (chopped-down) definition of the basic_reg_value class:
template < . . . >
class basic_reg_value
{
. . .
private:
hkey_type    m_hkey; // The parent key of the value
string_type  m_name; // The name of the value
. . . // other members
};


The m_hkey member is obtained via winstl::reg_traits<>::reg_dup_key(). It is the basic_reg_value class itself that provides the RAII. Consequently, if any exception occur during its constructor, the release of m_hkey will not be carried out.

Since the m_name is a string class instance, its constructor can throw. Consequently, basic_reg_value is not exception safe.

Thankfully, the fix is very simple. Simply reverse the order of declaration of the two members. If m_name's constructor throws, that'll happen before the key duplication takes place. If the key duplication throws an exception, the (fully constructed) m_name's destructor will be invoked. Q.E.D.

WinSTL Registry library mods and fixes, part 2: race conditions

As discussed at great length in section 33.3 of Extended STL, volume 1, the Windows Registry API is one that is prone to race conditions, due to the fact that separate processes may make independent changes to the registry contents without any control over each other.

The recently discovered defect in the WinSTL Registry Library's winstl::basic_reg_value class, gave me cause to consider the implementation in detail again. It's been a long time since I've done that, and with the understanding of the registry race-conditions I gained while researching and writing Extended STL, I saw immediately the possibility of such a race accounting for the reported fault.

Consider again the implementation of winstl::basic_reg_value<>::value_sz() method. Assume that prior to the invocation of winstl::reg_traits<>::reg_query_info that the registry-value's value was non-zero size. The call commences. Meanwhile, another process overwrites the registry-value, with a zero size. reg_query_info returns, and indicates that the data size is zero. Without a further check on the data size, the same fault will be experienced. Naturally, the fix for the non-race defect will fix the race one as well. Which is nice.

WinSTL Registry library mods and fixes, part 1: empty values

An STLSoft user recently posted a possible defect in the implementation of the WinSTL Registry Library's winstl::basic_reg_value class, reporting that a registry value (of type REG_SZ) yields a data value of size 0, leading to a crash.

Upon first examination, I thought this was a result of the fragility of the Windows Registry with respect to race conditions, as I'll discuss in a follow-up post.

However, closer examination reveals it to be a true defect. The precise circumstances in which this occur are as follows:
  • the registry-value whose value is being elicited - as a string (REG_SZ) or as an array of strings (REG_MULTI_SZ) has zero size, and
  • it has one or more peer registry-values whose values are of non-zero size
This precise set of circumstances causes the defect to fault. The reason lies in a call to winstl::reg_traits<>::reg_query_info at the start of the winstl::basic_reg_value<>::value_sz() method. This is used to determine the maximum size of the value of any if the current key's registry-values. This is useful to be able to provide a buffer of the appropriate size to the subsequent call to winstl::reg_traits<>::reg_query_value(), which actually retrieves the value in question.

The problem occurs when the value's size is 0. The last block in the method decrements this - to account for the space for the nul-terminator added earlier - and then explicitly sets the nul-terminator. (I actually forget why it does this, but I do recall that it must be done this way.)

Anyway, when the value's size is 0, decrementing it gives a very large number, and so the next statement results in an access-violation. Yuck!

STLSoft 1.9.83 will contain the fix for this, which is simply to test again that the data size is non-0.

Friday, May 15, 2009

More 1.9.82 ...

It also includes a new method in winstl::reg_traits::reg_delete_tree(), which takes a key handle and a sub-key name, and deletes the sub-key and any/all its descendent keys, as in:

HKEY k = . . .
LONG res = winstl::reg_traits<char>::reg_delete_tree(k, "sub-key");

Use with care, because there's no un-delete!

Catching up with developments ... 1.9.82

I've been lax in the blogging - too much to do, too little time - but will try and catch up.

The latest release includes a fix to allow winstl::findfile_sequence - a facade over the FindFirstFile/FindNextFile API to be used with std::copy and the IOStreams, as in:

#include <stlsoft/iterators/ostream_iterator.hpp>
#include <winstl/filesystem/findfile_sequence.hpp>

#include <iostream>

#include <stdlib.h>

int main()
{
typedef winstl::findfile_sequence seq_t;

{  // 1. enumerate all contents (except dots dirs)

seq_t  entries("*.*");

std::cout << "\n1:\n";
std::copy( entries.begin(), entries.end()
, stlsoft::ostream_iterator<seq_t::value_type>(std::cout, "\t", "\n"));
}

{ // 2. enumerate all contents (including dots dirs)

seq_t  entries("*.*", seq_t::includeDots);

std::cout << "\n2:\n";
std::copy( entries.begin(), entries.end()
, stlsoft::ostream_iterator<seq_t::value_type>(std::cout, "\t", "\n"));
}

{  // 3. enumerate all files, displaying only the relative path

seq_t  files("*.*", seq_t::files | seq_t::relativePath);

std::cout << "\n3:\n";
std::copy( files.begin(), files.end()
, stlsoft::ostream_iterator<seq_t::value_type>(std::cout, "\t", "\n"));
}

{  // 4. enumerate all directories

seq_t  directories("*.*", seq_t::directories);

std::cout << "\n4:\n";
std::copy( directories.begin(), directories.end()
, stlsoft::ostream_iterator<seq_t::value_type>(std::cout, "\t", "\n"));
}

{  // 5. enumerate all files beginning with 'f' and with either extension .h or .hpp

seq_t  files("f*.h|f*.hpp", '|', seq_t::files);

std::cout << "\n5:\n";
std::copy( files.begin(), files.end()
, stlsoft::ostream_iterator<seq_t::value_type>(std::cout, "\t", "\n"));
}

{  // 6. enumerate all .exe files in the windows directory and all dlls in the system directory that begin with 'm', skipping any hidden files

seq_t  files("C:\\windows", "*.exe;system32/m*.dll", ';', seq_t::files | seq_t::skipHiddenFiles);

std::cout << "\n6:\n";
std::copy( files.begin(), files.end()
, stlsoft::ostream_iterator<seq_t::value_type>(std::cout, "\t", "\n"));
}

return EXIT_SUCCESS;
}

/* ///////////////////////////// end of file //////////////////////////// */

Saturday, January 24, 2009

memcpy() / bcopy() / strncpy(): char_copy()

Well, as you've probably already noticed, I've started removing dependencies on the so-called "unsafe" string library functions from all my open-source libraries.

As part of this continuing (and somewhat life-sapping) activity, I'm moving through the STLSoft libraries. The latest release incorporates changes to the important winstl::basic_findfile_sequence class template, which is an STL extension over the Windows FindFile API.

This makes use of a new facility that's been added to the winstl::system_traits class templates: char_copy(). This (static) function is effectively a typed memcpy().


class winstl::system_traits<char>
{
. . .
public:
  char* char_copy(char* dest, char const* src, size_t n);
}

class winstl::system_traits<wchar_t>
{
. . .
public:
  wchar_t* char_copy(wchar_t* dest, wchar_t const* src, size_t n);
}


It copies exactly the number of characters specified, with no checks on the actual length of src.

The modest benefits are that it is a bit type-safe, and that you don't have to keep having to remember sizeof(char_type) * n.

I'm not totally struck on the name, and did consider borrowing other names. I liked BSD's bcopy() (to mean block copy), but it's too likely that some code somewhere #defines it to memcpy().

I'll gradually filter char_copy() throughout the rest of WinSTL, and then through UNIXSTL, as I work my way through the thankless task of getting "unsafe"-free. Ho hum.

Wednesday, January 21, 2009

winstl::squeeze_path() redux

As of 1.9.69, the winstl::squeeze_path() function template is much improved, mainly due to extensive unit-testing. This, in turn, is a fall-out of the the removal of dependencies on so-called "unsafe" string functions, such as strncpy(), strcat(), and so forth. And that, in turn, is due to a "need" to be compatible with Microsoft's perverse designation of standard library functions as deprecated. Ho hum.

Anyway, the improvements now mean that it's robust for all buffer sizes between [0, strlen(path)] and beyond. So, you can do all of the following:

std::string path = "H:\\xyz\\mno\\abcdef.ghi";
char buffer[101];

// returns the number required (22)
winstl::path_squeeze(path, static_cast(NULL), 0);

// returns 1; buffer == ""
winstl::path_squeeze(path, buffer, 1);

// returns 2; buffer == "a"
winstl::path_squeeze(path, buffer, 2);

// returns 5; buffer == "abcd"
winstl::path_squeeze(path, buffer, 5);

// returns 6; buffer == "a...i"
winstl::path_squeeze(path, buffer, 6);

// returns 9; buffer == "ab...ghi"
winstl::path_squeeze(path, buffer, 9);

// returns 10; buffer == "abc...ghi"
winstl::path_squeeze(path, buffer, 10);

// returns 11; buffer == "abcdef.ghi"
winstl::path_squeeze(path, buffer, 11);

// returns 11; buffer == "abcdef.ghi"
winstl::path_squeeze(path, buffer, 12);

// returns 11; buffer == "abcdef.ghi"
winstl::path_squeeze(path, buffer, 17);

// returns 18; buffer == "H:\\...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 18);

// returns 19; buffer == "H:\\x...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 19);

// returns 20; buffer == "H:\\xy...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 20);

// returns 21; buffer == "H:\\xyz...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 21);

// returns 22; buffer == "H:\\xyz\\mno\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 22);

// returns 22; buffer == "H:\\xyz\\mno\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 23);

Friday, January 2, 2009

Working with other libraries, part 3: use consistent conventions for member types

Ever get confused by the names of member variables/types/functions of C++ class templates?

Boost.Format's basic_format class template defines five public and two private member types, using a mix of four different naming conventions!

[an extract from boost/format/format_class.hpp]

// in namespace boost
template <class Ch, class Tr, class Alloc>
class basic_format
{
private:
  typedef typename io::CompatTraits<Tr>::compatible_type compat_traits;
  typedef io::detail::stream_format_state<Ch, Tr> stream_format_state;
  
public:
  typedef Ch CharT;
  typedef std::basic_string<Ch, Tr, Alloc> string_type;
  typedef typename string_type::size_type size_type;
  typedef io::detail::format_item<Ch, Tr, Alloc> format_item_t;
  typedef io::basic_altstringbuf<Ch, Tr, Alloc> internal_streambuf_t;

In Extended STL, volume 1: Collections and Iterators I recommend the use of the following naming convention for member types:
  1. For public member types that share names with standard components that have the same logical purpose, follow the standard convention and use the standard name. An example would be iterator.
  2. For public member types that are not covered by clause 1, use the _type suffix. An example would be char_type.
  3. For private member types, use the _type_ suffix. An example would be compat_traits_type_.
and
  1. For public API (non-member) types use the _t suffix. An example would be pan_char_t.
  2. For non-public/implementation API (non-member) types use the _t_ suffix. An example would be b64ErrorString_t_.


I've been using this for many years without a blip. I can read my code, even some years later, and understand what's a type and what isn't and also, importantly, which types are for consumption in the outside world and which are internal to the component. Q.E.D.

Thursday, January 1, 2009

Working with other libraries, part 2: allocators

Much is often made of the supposedly still-born Allocator concept in the standard library. However, one very good use of them is in tracking memory.

At the moment I'm preparing some analyses of FastFormat's performance for an article I'm writing. One of the analyses conducting is to see how many memory allocations are involved in a formatting statement, for each of the comparison libraries. The standard way to achieve something like this is to overload the global operators new, as in:

// NOTE: this code is only valid for single-threaded operation

extern int s_nallocs = 0;

#ifdef OVERLOAD_OPNEW
void* counting_malloc(size_t cb)
{
  ++s_nallocs;

  return ::malloc(cb);
}

void counting_free(void* pv)
{
  ::free(pv);
}

void* operator new(size_t cb)
{
  return counting_malloc(cb);
}

void operator delete(void* pv)
{
  counting_free(pv);
}

void* operator new[](size_t cb)
{
  return counting_malloc(cb);
}

void operator delete[](void* pv)
{
  counting_free(pv);
}
#endif /* OVERLOAD_OPNEW */

Unfortunately, some components with some compilers - the exact permutations escape me at this point - don't go through operator new. This might be because they use a per-class operator new, or it might be because they use an allocator that doesn't use new. (Being under a publishing deadline, I didn't have the time - nor the inclination, if I'm honest - to find out which it was in each case.)

So, in order to get a fighting chance at an accurate depiction of how much memory each library is using I decided to force the issue, by requiring all the strings used to be an instance of the following specialisation, rather than std::string:

typedef std::basic_string<
  char
, std::char_traits
<char>
stlsoft::new_allocator<char>
>   string_t;

Of course, things aren't ever that simple. Such a string type is not compatible with the IOStreams default specialisations, requiring:

typedef std::basic_stringstream<
  char
, std::char_traits
<char>
stlsoft::new_allocator<char>
>   stringstream_t;

And the same thing applies for Boost.Format, requiring:

typedef boost::basic_format<
  char
, std::char_traits
<char>
stlsoft::new_allocator<char>
>   format_t;

Unfortunately, Loki's SafeFormat library does not allow for the specification of allocators (or character traits, for that matter), and only uses std::string. So a little horrifying trickery was required.

Step 1: Introduce string_t into the std namespace.

namespace std
{
  using ::string_t;
}

Now, if you've been paying attention this last decade or so you'll know that adding to the std namespace is strictly controlled. I won't go over the rules now; you can look it up. Suffice to say that this action is not allowed.

Of course, needs must, and in this case there's no choice. Since it's just a perf-test program, it's ok. Just don't go using this tactic in production code.

Step 2: Make Loki (and any other code, for that matter) think that std::string_t is std::string.

#define string string_t

I warned you it was horrid!

Step 3: #include the Loki.SafeFormat header

#include <loki/safeformat.h>

Obviously, this has to be done after steps 1 & 2, otherwise it won't work.


There were a few other dodgy things I had to do to get it to work with some really stupid compilers, but that'll have to wait until another day.