The Lumber Room

"Consign them to dust and damp by way of preserving them"

My “megabyte” rant

with 7 comments

Read this.

Quoting from the NIST site:

Once upon a time, computer professionals noticed that 2^10 was very nearly equal to 1000 and started using the SI prefix “kilo-” to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous “everybody” bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams.

Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today nobody knows what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 2^20 = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 10^6 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), “1.44 MB” diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems.

Really, please just read this.

Links:

  1. http://physics.nist.gov/cuu/Units/binary.html
  2. Mathew Somebody, A plea for sanity
  3. Markus Kuhn, Standardized units for use in information technology
  4. Wikipedia, Binary prefixes
  5. Pidgin, my tiny contribution :-)
  6. Random forum, my probably pointless contribution
About these ads

Written by S

Tue, 2007-10-30 at 06:32:12 +05:30

7 Responses

Subscribe to comments with RSS.

  1. Well we all have always hated how our 300gb hard disks are quite a few gbs less but changing or standardizing this is not an option anymore. Think of all the legacy software that will be broken if we did that. Remember the Y2k times? Unfortunately, this confusion is here to stay.

    anshul

    Mon, 2007-10-29 at 23:58:38 +05:30

  2. Of course not. I cannot think of what “legacy software will break” means; this is a UI/display problem.

    And changing and standardising are not only possible, they are happening — see this section of the Wikipedia article and one of the links above for examples :-)

    300 GB = 300 gigabytes = 300 × 10^9 bytes. It has no other meaning.
    “300gb” doesn’t mean anything; although “300 mb” does mean 300 millibits, or 0.3 bits, small enough to mean nothing :-)

    (And yes, just like the whole Y2K thing, newer software will eventually be sensible enough not to use absurd archaic conventions, hopefully.)

    shreevatsa

    Tue, 2007-10-30 at 00:39:05 +05:30

  3. Hmmm.. I can think of some. A 300 (binary)GB hard disk will show up in my partition manager as 322.1225472 (decimal)GB in the well coded case and potentially cause a crash or undefined behaviour in the not so well coded cases.

    Storage is too involved an issue for a lot of system critical legacy applications. You can always imagine a coder who expected hard disk sizes to be integer multiples of million bytes. What if the Indian railways 90’s designed systems expect it? What if some legacy motherboard bios still in mass usage expects it? What if some company’s legacy database farms expect it? What if lilo expects it? What if this triggers some weird bug in Vista? (I wouldn’t be too surprised. After all their network performance was related to playing music files.)

    If the switch happens we will have quite some auditing and testing to do on our hands. So much so that the switch will likely not happen anytime soon.

    This comes up on slashdot every time a hard disk related article is posted. It’s just like DST. Nobody likes it but it is something we are going to have to live on with for a very long time.

    anshul

    Tue, 2007-10-30 at 02:43:57 +05:30

  4. :)
    Firstly, it is very unlikely there would be a 300 GiB hard disk (see how easy it was to use the binary prefix? :-)) As I emphasised in the quote above, storage devices are not constructed on binary trees, and have no reason to use binary units. Most hard disks would indeed be an integer number of MB or GB (and I mean the correct meanings, not MiB or GiB), and if someone expects this it’s likely to be true anyway.

    But more generally, I still don’t see the point — whether one chooses to write the (correct) binary prefixes or the SI prefixes or the fake SI-prefixes-for-binary-units is a *display* issue. Every program that shows measurements to the user can *independently* do so. I don’t really picture partition managers getting their input through (say) pipes from unrelated third-party programs, a change in whose output style would cripple them, etc….

    The thing to note is that there is no switch to be made. No one using binary units is being requested to switch to decimal units; just to choose the right prefix when displaying them. In the case of daylight-saving time or other “global” situations (like the side of the road used for driving), one is stuck is an inferior Nash equilibrium :-) and cannot suddenly start using a different convention without being in serious disagreement with others with whom it is essential to agree — hence it makes sense that a “switch” which would require everyone to switch simultaneously, is hard to bring about. Here, every piece of software can independently start using the right conventions (at the cost of having to educate/tolerate a few confused/displeased users), and as most software work with binary units and the binary prefixes are always unambiguous, there is even less of a problem. (And software whose output is never seen by humans, only by other programs, need not change at all.) (Speaking of partition managers, both fdisk and gparted (and the Linux kernel, long ago) have changed, did you notice? :))

    When eventually enough binary-units-using software use binary prefixes, the ones incorrectly using binary units with decimal prefixes are likely to change to avoid confusion, and decimal prefixes will be more-or-less unambiguous.

    It *is* ambitious :-), but not as ambitious as you think.

    shreevatsa

    Tue, 2007-10-30 at 03:37:26 +05:30

  5. Hmmm… that sounds about right. I think I overestimated quite a bit there.

    anshul

    Thu, 2007-11-01 at 01:05:15 +05:30

  6. A 300 GB drive contains about 300,000,000,000 bytes. Doesn’t that make sense? This is the way hard drives have been measured for all of eternity, and makes infinitely more sense than the Windows convention of displaying it as “279 GB” in one place and “286,102 MB” in another. It’s such a pointless, confusing, and useless way to measure.

    Abe

    Fri, 2008-04-11 at 03:51:07 +05:30

  7. Abe: I agree, that’s exactly what I’m saying. There are very few contexts where it is more useful for the user to see things in binary units instead of decimal units, and in those few cases, the special binary prefixes (Ki, Mi, Gi) should be used. “300 GB” should and does mean 300,000,000,000 bytes.

    S

    Fri, 2008-04-11 at 16:20:02 +05:30


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 75 other followers