Saturday, January 24, 2009

memcpy() / bcopy() / strncpy(): char_copy()

Well, as you've probably already noticed, I've started removing dependencies on the so-called "unsafe" string library functions from all my open-source libraries.

As part of this continuing (and somewhat life-sapping) activity, I'm moving through the STLSoft libraries. The latest release incorporates changes to the important winstl::basic_findfile_sequence class template, which is an STL extension over the Windows FindFile API.

This makes use of a new facility that's been added to the winstl::system_traits class templates: char_copy(). This (static) function is effectively a typed memcpy().


class winstl::system_traits<char>
{
. . .
public:
  char* char_copy(char* dest, char const* src, size_t n);
}

class winstl::system_traits<wchar_t>
{
. . .
public:
  wchar_t* char_copy(wchar_t* dest, wchar_t const* src, size_t n);
}


It copies exactly the number of characters specified, with no checks on the actual length of src.

The modest benefits are that it is a bit type-safe, and that you don't have to keep having to remember sizeof(char_type) * n.

I'm not totally struck on the name, and did consider borrowing other names. I liked BSD's bcopy() (to mean block copy), but it's too likely that some code somewhere #defines it to memcpy().

I'll gradually filter char_copy() throughout the rest of WinSTL, and then through UNIXSTL, as I work my way through the thankless task of getting "unsafe"-free. Ho hum.

Wednesday, January 21, 2009

winstl::squeeze_path() redux

As of 1.9.69, the winstl::squeeze_path() function template is much improved, mainly due to extensive unit-testing. This, in turn, is a fall-out of the the removal of dependencies on so-called "unsafe" string functions, such as strncpy(), strcat(), and so forth. And that, in turn, is due to a "need" to be compatible with Microsoft's perverse designation of standard library functions as deprecated. Ho hum.

Anyway, the improvements now mean that it's robust for all buffer sizes between [0, strlen(path)] and beyond. So, you can do all of the following:

std::string path = "H:\\xyz\\mno\\abcdef.ghi";
char buffer[101];

// returns the number required (22)
winstl::path_squeeze(path, static_cast(NULL), 0);

// returns 1; buffer == ""
winstl::path_squeeze(path, buffer, 1);

// returns 2; buffer == "a"
winstl::path_squeeze(path, buffer, 2);

// returns 5; buffer == "abcd"
winstl::path_squeeze(path, buffer, 5);

// returns 6; buffer == "a...i"
winstl::path_squeeze(path, buffer, 6);

// returns 9; buffer == "ab...ghi"
winstl::path_squeeze(path, buffer, 9);

// returns 10; buffer == "abc...ghi"
winstl::path_squeeze(path, buffer, 10);

// returns 11; buffer == "abcdef.ghi"
winstl::path_squeeze(path, buffer, 11);

// returns 11; buffer == "abcdef.ghi"
winstl::path_squeeze(path, buffer, 12);

// returns 11; buffer == "abcdef.ghi"
winstl::path_squeeze(path, buffer, 17);

// returns 18; buffer == "H:\\...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 18);

// returns 19; buffer == "H:\\x...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 19);

// returns 20; buffer == "H:\\xy...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 20);

// returns 21; buffer == "H:\\xyz...\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 21);

// returns 22; buffer == "H:\\xyz\\mno\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 22);

// returns 22; buffer == "H:\\xyz\\mno\\abcdef.ghi"
winstl::path_squeeze(path, buffer, 23);

Friday, January 2, 2009

Working with other libraries, part 3: use consistent conventions for member types

Ever get confused by the names of member variables/types/functions of C++ class templates?

Boost.Format's basic_format class template defines five public and two private member types, using a mix of four different naming conventions!

[an extract from boost/format/format_class.hpp]

// in namespace boost
template <class Ch, class Tr, class Alloc>
class basic_format
{
private:
  typedef typename io::CompatTraits<Tr>::compatible_type compat_traits;
  typedef io::detail::stream_format_state<Ch, Tr> stream_format_state;
  
public:
  typedef Ch CharT;
  typedef std::basic_string<Ch, Tr, Alloc> string_type;
  typedef typename string_type::size_type size_type;
  typedef io::detail::format_item<Ch, Tr, Alloc> format_item_t;
  typedef io::basic_altstringbuf<Ch, Tr, Alloc> internal_streambuf_t;

In Extended STL, volume 1: Collections and Iterators I recommend the use of the following naming convention for member types:
  1. For public member types that share names with standard components that have the same logical purpose, follow the standard convention and use the standard name. An example would be iterator.
  2. For public member types that are not covered by clause 1, use the _type suffix. An example would be char_type.
  3. For private member types, use the _type_ suffix. An example would be compat_traits_type_.
and
  1. For public API (non-member) types use the _t suffix. An example would be pan_char_t.
  2. For non-public/implementation API (non-member) types use the _t_ suffix. An example would be b64ErrorString_t_.


I've been using this for many years without a blip. I can read my code, even some years later, and understand what's a type and what isn't and also, importantly, which types are for consumption in the outside world and which are internal to the component. Q.E.D.

Thursday, January 1, 2009

Working with other libraries, part 2: allocators

Much is often made of the supposedly still-born Allocator concept in the standard library. However, one very good use of them is in tracking memory.

At the moment I'm preparing some analyses of FastFormat's performance for an article I'm writing. One of the analyses conducting is to see how many memory allocations are involved in a formatting statement, for each of the comparison libraries. The standard way to achieve something like this is to overload the global operators new, as in:

// NOTE: this code is only valid for single-threaded operation

extern int s_nallocs = 0;

#ifdef OVERLOAD_OPNEW
void* counting_malloc(size_t cb)
{
  ++s_nallocs;

  return ::malloc(cb);
}

void counting_free(void* pv)
{
  ::free(pv);
}

void* operator new(size_t cb)
{
  return counting_malloc(cb);
}

void operator delete(void* pv)
{
  counting_free(pv);
}

void* operator new[](size_t cb)
{
  return counting_malloc(cb);
}

void operator delete[](void* pv)
{
  counting_free(pv);
}
#endif /* OVERLOAD_OPNEW */

Unfortunately, some components with some compilers - the exact permutations escape me at this point - don't go through operator new. This might be because they use a per-class operator new, or it might be because they use an allocator that doesn't use new. (Being under a publishing deadline, I didn't have the time - nor the inclination, if I'm honest - to find out which it was in each case.)

So, in order to get a fighting chance at an accurate depiction of how much memory each library is using I decided to force the issue, by requiring all the strings used to be an instance of the following specialisation, rather than std::string:

typedef std::basic_string<
  char
, std::char_traits
<char>
stlsoft::new_allocator<char>
>   string_t;

Of course, things aren't ever that simple. Such a string type is not compatible with the IOStreams default specialisations, requiring:

typedef std::basic_stringstream<
  char
, std::char_traits
<char>
stlsoft::new_allocator<char>
>   stringstream_t;

And the same thing applies for Boost.Format, requiring:

typedef boost::basic_format<
  char
, std::char_traits
<char>
stlsoft::new_allocator<char>
>   format_t;

Unfortunately, Loki's SafeFormat library does not allow for the specification of allocators (or character traits, for that matter), and only uses std::string. So a little horrifying trickery was required.

Step 1: Introduce string_t into the std namespace.

namespace std
{
  using ::string_t;
}

Now, if you've been paying attention this last decade or so you'll know that adding to the std namespace is strictly controlled. I won't go over the rules now; you can look it up. Suffice to say that this action is not allowed.

Of course, needs must, and in this case there's no choice. Since it's just a perf-test program, it's ok. Just don't go using this tactic in production code.

Step 2: Make Loki (and any other code, for that matter) think that std::string_t is std::string.

#define string string_t

I warned you it was horrid!

Step 3: #include the Loki.SafeFormat header

#include <loki/safeformat.h>

Obviously, this has to be done after steps 1 & 2, otherwise it won't work.


There were a few other dodgy things I had to do to get it to work with some really stupid compilers, but that'll have to wait until another day.