Can cluster chains end with 0 or 1? - fat32

In the course of my research into making a formal model for FAT32, I've come across an implementation detail that's making my work really complicated: what values are allowed to occur at the end of a cluster chain?
According to Microsoft's specification, "The list of free clusters in the FAT is nothing more than the list of all clusters that contain the value 0
in their FAT cluster entry.", a corollary of which is that 0 cannot be the value at the end of a clusterchain, only EOC can be there. However, Wikipedia states "Otherwise, if this value occurs in cluster chains (e.g., in directory entries of zero length or deleted files), file system implementations should treat this like an end-of-chain marker." for 0, citing this claim to an obscure German-language book.
So I really wanted to ask folks who work with filesystems whether they consider 0 (and 1) to be valid end-of-clusterchain markers in implementations.
Update: I checked the Linux kernel implementation of FAT32 and the function for counting free clusters seems to count all zeros as free clusters, no more and no less.

A further examination of the Linux FAT32 code showed that clusterchains which end with 0 are considered invalid clusterchains and cause a return value of -EIO.


Is it safe to parse a /proc/ file?

I want to parse /proc/net/tcp/, but is it safe?
How should I open and read files from /proc/ and not be afraid, that some other process (or the OS itself) will be changing it in the same time?
In general, no. (So most of the answers here are wrong.) It might be safe, depending on what property you want. But it's easy to end up with bugs in your code if you assume too much about the consistency of a file in /proc. For example, see this bug which came from assuming that /proc/mounts was a consistent snapshot.
For example:
/proc/uptime is totally atomic, as someone mentioned in another answer -- but only since Linux 2.6.30, which is less than two years old. So even this tiny, trivial file was subject to a race condition until then, and still is in most enterprise kernels. See fs/proc/uptime.c for the current source, or the commit that made it atomic. On a pre-2.6.30 kernel, you can open the file, read a bit of it, then if you later come back and read again, the piece you get will be inconsistent with the first piece. (I just demonstrated this -- try it yourself for fun.)
/proc/mounts is atomic within a single read system call. So if you read the whole file all at once, you get a single consistent snapshot of the mount points on the system. However, if you use several read system calls -- and if the file is big, this is exactly what will happen if you use normal I/O libraries and don't pay special attention to this issue -- you will be subject to a race condition. Not only will you not get a consistent snapshot, but mount points which were present before you started and never stopped being present might go missing in what you see. To see that it's atomic for one read(), look at m_start() in fs/namespace.c and see it grab a semaphore that guards the list of mountpoints, which it keeps until m_stop(), which is called when the read() is done. To see what can go wrong, see this bug from last year (same one I linked above) in otherwise high-quality software that blithely read /proc/mounts.
/proc/net/tcp, which is the one you're actually asking about, is even less consistent than that. It's atomic only within each row of the table. To see this, look at listening_get_next() in net/ipv4/tcp_ipv4.c and established_get_next() just below in the same file, and see the locks they take out on each entry in turn. I don't have repro code handy to demonstrate the lack of consistency from row to row, but there are no locks there (or anything else) that would make it consistent. Which makes sense if you think about it -- networking is often a super-busy part of the system, so it's not worth the overhead to present a consistent view in this diagnostic tool.
The other piece that keeps /proc/net/tcp atomic within each row is the buffering in seq_read(), which you can read in fs/seq_file.c. This ensures that once you read() part of one row, the text of the whole row is kept in a buffer so that the next read() will get the rest of that row before starting a new one. The same mechanism is used in /proc/mounts to keep each row atomic even if you do multiple read() calls, and it's also the mechanism that /proc/uptime in newer kernels uses to stay atomic. That mechanism does not buffer the whole file, because the kernel is cautious about memory use.
Most files in /proc will be at least as consistent as /proc/net/tcp, with each row a consistent picture of one entry in whatever information they're providing, because most of them use the same seq_file abstraction. As the /proc/uptime example illustrates, though, some files were still being migrated to use seq_file as recently as 2009; I bet there are still some that use older mechanisms and don't have even that level of atomicity. These caveats are rarely documented. For a given file, your only guarantee is to read the source.
In the case of /proc/net/tcp, you can read it and parse each line without fear. But if you try to draw any conclusions from multiple lines at once -- beware, other processes and the kernel are changing it while you read it, and you are probably creating a bug.
Although the files in /proc appear as regular files in userspace, they are not really files but rather entities that support the standard file operations from userspace (open, read, close). Note that this is quite different than having an ordinary file on disk that is being changed by the kernel.
All the kernel does is print its internal state into its own memory using a sprintf-like function, and that memory is copied into userspace whenever you issue a read(2) system call.
The kernel handles these calls in an entirely different way than for regular files, which could mean that the entire snapshot of the data you will read could be ready at the time you open(2) it, while the kernel makes sure that concurrent calls are consistent and atomic. I haven't read that anywhere, but it doesn't really make sense to be otherwise.
My advice is to take a look at the implementation of a proc file in your particular Unix flavour. This is really an implementation issue (as is the format and the contents of the output) that is not governed by a standard.
The simplest example would be the implementation of the uptime proc file in Linux. Note how the entire buffer is produced in the callback function supplied to single_open.
/proc is a virtual file system : in fact, it just gives a convenient view of the kernel internals. It's definitely safe to read it (that's why it's here) but it's risky on the long term, as the internal of these virtual files may evolve with newer version of kernel.
More information available in proc documentation in Linux kernel doc, chapter 1.4 Networking
I can't find if the information how the information evolve over time. I thought it was frozen on open, but can't have a definite answer.
According to Sco doc (not linux, but I'm pretty sure all flavours of *nix behave like that)
Although process state and
consequently the contents of /proc
files can change from instant to
instant, a single read(2) of a /proc
file is guaranteed to return a
``sane'' representation of state, that
is, the read will be an atomic
snapshot of the state of the process.
No such guarantee applies to
successive reads applied to a /proc
file for a running process. In
addition, atomicity is specifically
not guaranteed for any I/O applied to
the as (address-space) file; the
contents of any process's address
space might be concurrently modified
by an LWP of that process or any other
process in the system.
The procfs API in the Linux kernel provides an interface to make sure that reads return consistent data. Read the comments in __proc_file_read. Item 1) in the big comment block explains this interface.
That being said, it is of course up to the implementation of a specific proc file to use this interface correctly to make sure its returned data is consistent. So, to answer your question: no, the kernel does not guarantee consistency of the proc files during a read but it provides the means for the implementations of those files to provide consistency.
I have the source for Linux handy since I'm doing driver development at the moment on an embedded ARM target.
The file ...linux- at line 934 contains, for example
seq_printf(seq, "%4d: %08X:%04X %08X:%04X"
" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
i, src, srcp, dest, destp, sp->sk_state,
0, 0L, 0, sock_i_uid(sp), 0, sock_i_ino(sp),
atomic_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
which outputs
[wally#zenetfedora ~]$ cat /proc/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 017AA8C0:0035 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 15160 1 f552de00 299
1: 00000000:C775 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 13237 1 f552ca00 299
in function raw_sock_seq_show() which is part of a hierarchy of procfs handling functions. The text is not generated until a read() request is made of the /proc/net/tcp file, a reasonable mechanism since procfs reads are surely much less common than updating the information.
Some drivers (such as mine) implement the proc_read function with a single sprintf(). The extra complication in the core drivers implementation is to handle potentially very long output which may not fit in the intermediate, kernel-space buffer during a single read.
I tested that with a program using a 64K read buffer but it results in a kernel space buffer of 3072 bytes in my system for proc_read to return data. Multiple calls with advancing pointers are needed to get more than that much text returned. I don't know what the right way to make the returned data consistent when more than one i/o is needed. Certainly each entry in /proc/net/tcp is self-consistent. There is some likelihood that lines side-by-side are snapshot at different times.
Short of unknown bugs, there are no race conditions in /proc that would lead to reading corrupted data or a mix of old and new data. In this sense, it's safe. However there's still the race condition that much of the data you read from /proc is potentially-outdated as soon as it's generated, and even moreso by the time you get to reading/processing it. For instance processes can die at any time and a new process can be assigned the same pid; the only process ids you can ever use without race conditions are your own child processes'. Same goes for network information (open ports, etc.) and really most of the information in /proc. I would consider it bad and dangerous practice to rely on any data in /proc being accurate, except data about your own process and potentially its child processes. Of course it may still be useful to present other information from /proc to the user/admin for informative/logging/etc. purposes.
When you read from a /proc file, the kernel is calling a function which has been registered in advance to be the "read" function for that proc file. See the __proc_file_read function in fs/proc/generic.c .
Therefore, the safety of the proc read is only as safe as the function the kernel calls to satisfy the read request. If that function properly locks all data it touches and returns to you in a buffer, then it is completely safe to read using that function. Since proc files like the one used for satisfying read requests to /proc/net/tcp have been around for a while and have undergone scrupulous review, they are about as safe as you could ask for. In fact, many common Linux utilities rely on reading from the proc filesystem and formatting the output in a different way. (Off the top of my head, I think 'ps' and 'netstat' do this).
As always, you don't have to take my word for it; you can look at the source to calm your fears. The following documentation from proc_net_tcp.txt tells you where the "read" functions for /proc/net/tcp live, so you can look at the actual code that is run when you read from that proc file and verify for yourself that there are no locking hazards.
This document describes the interfaces
/proc/net/tcp and /proc/net/tcp6.
Note that these interfaces are
deprecated in favor of tcp_diag.
These /proc interfaces provide information about currently active TCP
connections, and are implemented by
tcp4_seq_show() in net/ipv4/tcp_ipv4.c
and tcp6_seq_show() in
net/ipv6/tcp_ipv6.c, respectively.

Flash ECC algorithm on STM32L1xx

How does the flash ECC algorithm (Flash Error Correction Code) implemented on STM32L1xx work?
I want to do multiple incremental writes to a single word in program flash of a STM32L151 MCU without doing a page erase in between. Without ECC, one could set bits incrementally, e.g. first 0x00, then 0x01, then 0x03 (STM32L1 erases bits to 0 rather than to 1), etc. As the STM32L1 has 8 bit ECC per word, this method doesn't work. However, if we knew the ECC algorithm, we could easily find a short sequence of values, that could be written incrementally without violating the ECC.
We could simply try different sequences of values and see which ones work (one such sequence is 0x0000001, 0x00000101, 0x00030101, 0x03030101), but if we don't know the ECC algorithm, we can't check, whether the sequence violates the ECC, in which case error correction wouldn't work if bits would be corrupted.
[Edit] The functionality should be used to implement a simple file system using STM32L1's internal program memory. Chunks of data are tagged with a header, which contains a state. Multiple chunks can reside on a single page. The state can change over time (first 'new', then 'used', then 'deleted', etc.). The number of states is small, but it would make things significantly easier, if we could overwrite a previous state without having to erase the whole page first.
Thanks for any comments! As there are no answers so far, I'll summarize, what I found out so far (empirically and based on comments to this answer):
According to the STM32L1 datasheet "The whole non-volatile memory embeds the error correction code (ECC) feature.", but the reference manual doesn't state anything about ECC in program memory.
The datasheet is in line with what we can find out empirically when subsequentially writing multiple words to the same program mem location without erasing the page in between. In such cases some sequences of values work while others don't.
The following are my personal conclusions, based on empirical findings, limited research and comments from this thread. It's not based on official documentation. Don't build any serious work on it (I won't either)!
It seems, that the ECC is calculated and persisted per 32-bit word. If so, the ECC must have a length of at least 7 bit.
The ECC of each word is probably written to the same nonvolatile mem as the word itself. Therefore the same limitations apply. I.e. between erases, only additional bits can be set. As stark pointed out, we can only overwrite words in program mem with values that:
Only set additional bits but don't clear any bits
Have an ECC that also only sets additional bits compared to the previous ECC.
If we write a value, that only sets additional bits, but the ECC would need to clear bits (and therefore cannot be written correctly), then:
If the ECC is wrong by one bit, the error is corrected by the ECC algorithm and the written value can be read correctly. However, ECC wouldn't work anymore if another bit failed, because ECC can only correct single-bit errors.
If the ECC is wrong by more than one bit, the ECC algorithm cannot correct the error and the read value will be wrong.
We cannot (easily) find out empirically, which sequences of values can be written correctly and which can't. If a sequence of values can be written and read back correctly, we wouldn't know, whether this is due to the automatic correction of single-bit errors. This aspect is the whole reason for this question asking for the actual algorithm.
The ECC algorithm itself seems to be undocumented. Hamming code seems to be a commonly used algorithm for ECC and in AN4750 they write, that Hamming code is actually used for error correction in SRAM. The algorithm may or may not be used for STM32L1's program memory.
The STM32L1 reference manual doesn't seem to explicitely forbid multiple writes to program memory without erase, but there is no documentation stating the opposit either. In order not to use undocumented functionality, we will refrain from using such functionality in our products and find workarounds.
Interessting question.
First I have to say, that even if you find out the ECC algorithm, you can't rely on it, as it's not documented and it can be changed anytime without notice.
But to find out the algorithm seems to be possible with a reasonable amount of tests.
I would try to build tests which starts with a constant value and then clearing only one bit.
When you read the value and it's the start value, your bit can't change all necessary bits in the ECC.
for <bitIdx>=0 to 31
earse cell
write start value, like 0xFFFFFFFF & ~(1<<testBit)
clear bit <bitIdx> in the cell
read the cell
If you find a start value where the erase tests works for all bits, then the start value has probably an ECC of all bits set.
Edit: This should be true for any ECC, as every ECC needs always at least a difference of two bits to detect and repair, reliable one defect bit.
As the first bit difference is in the value itself, the second change needs to be in the hidden ECC-bits and the hidden bits will be very limited.
If you repeat this test with different start values, you should be able to gather enough data to prove which error correction is used.

C/C++: maximum size of errno-associated strings (at compile-time)

Is there any way to get the maximum size of any string correlated with errno at compile time (at preprocessor time would be even better)? E.g. an upper bound on strlen(strerror(errno))?
My Thoughts
The best I can think of is running a program to do a brute-force search over the range of an int, over each locale, to get the string associated with each {errno, locale} pair, get its size, and generate a header on that system, then hooking that into e.g. a makefile or autoconf or whatever. I can't think of a better way to do it, but it seems ridiculous that it would be so: the standard library for a system has that information built-in, if only implicitly. Is there really no good way to get that information?
Okay, I'll admit the C and/or C++ standards might permit for error strings generated at runtime, with e.g. specific-to-circumstance messages (e.g. strerror(EINVAL) giving a string derived from other runtime metadata set when errno was last set, or something) - not sure if that is allowed, and I'd actually welcome such an implementation, but I've never heard of one existing which did so, or had more than one string for a given {errno, locale} pair.
For context, what I specifically wanted (but I think this question is valuable in a more general way, as was discussed amongst the comments) that led to this question was to be able to use the error string associated with errno in the syscall/function writev. In my specific usecase, I was using strings out of argv and errno-linked strings. This set my "worst-case" length to ARG_MAX + some max errno string length + size of a few other small strings).
Every *nix document I've consulted seems to indicate writev will (or "may", for what little good that difference makes in this case) error out with errno set to EINVAL if the sum of the iov_len values overflows SSIZE_MAX. Intuitively, I know every errno string I've seen is very short, and in practice this is a non-issue. But I don't want my code mysteriously failing to print an error at all on some system if it's possible for this assumption to be false. So I wrote code to handle such a case - but at the same time, I don't want that additional code being compiled in for the platforms which generally clearly don't need it.
The combined input of the answers and comments so far is making me lean towards thinking that in my particular use-case, the "right" solution is to just truncate obscenely long messages - but this is why I asked the question how I did initially: such information would also help select a size for a buffer to strerror_r/strerror_s (*nix/Windows respectively), and even a negative answer (e.g. "you can't really do it") is in my view useful for others' education.
This question contains answers for the strings given by strerror_r on VxWorks, but I don't feel comfortable generalizing that to all systems.
The C library that you build against may not be the same (ABI compatible C library maybe used) or even exact version of the C library (On GNU/Linux consider glibc 2.2.5 vs. glibc 2.23) that you run against, therefore computing the maximum size of the locale-dependent string returned from strerror can only be done at runtime during process execution. On top of this the locale translations may be updated on the target system at any time, and this again invalidates any pre-computation of this upper bound.
Unfortunately there is no guarantee that the values returned by strerror are constant for the lifetime of the process, and so they may also change at a later time, thus invalidating any early computation of the bound.
I suggest using strerror_r to save the error string and avoid any issues with non-multi-thread aware libraries that might call sterror and possibly change the result of the string as you are copying it. Then instead of translating the string on-the-fly you would use the saved result, and potentially truncate to SSIZE_MAX (never going to happen in reality).
I'm not aware that the C or C++ standards make any assertions regarding the length of these messages. The platforms you're interested in might provide some stronger implementation-defined guarantees, though.
For example, for POSIX systems, I found the following in limits.h.
The following constants shall be defined on all implementations in <limits.h>:
Maximum number of bytes in a message string.
Minimum Acceptable Value: {_POSIX2_LINE_MAX}
I believe that error messages produced by strerror would fall into this category.
That said, I'm unable to get this macro on my system. However, I do have _POSIX2_LINE_MAX (from <unistd.h>). It is #defined to 2048. Since the standard only says that this is a lower bound, that might not be too helpful, though.
The standards make no guarantees about the size limits of the null-terminated string returned by strerror.
In practice, this is never going to be an issue. However, if you're that paranoid about it, I would suggest that you just copy the string returned from strerr and clamp its length to SSIZE_MAX before passing it to writev.
It is safe to assume that SSIZE_MAX will be greater than the longest string (character array) that strerror returns in a normal C or C++ system. This is because usable system memory (usable directly by your C program) can be no larger than SIZE_MAX (an unsigned integer value) and SSIZE_MAX will have at least the same number of bits so using 2's compliment math to account for the signed nature of SSIZE_MAX (and ssize_t) SSIZE_MAX will be at least 1/2 the size of system memory.

simulate atomic operation in independent systems

I have two independent systems. At some point I would like to be able to make an operation that affects the both system, and I would like to simulate atomicity even this is technically impossible. To illustrate the problem lets say that we would like to move a object from one of the system to the other.
First because every operation might fail at any point I am adding a tentative record to the both system indicating the intention. The algo is:
Set the object in system 1 in tentative mode for remove
Set the object in system 2 in tentative mode for add
Move the object from system 1 to system 2
Remove the tentativeness from the system 2
Remove the tentativeness from system 1
The lack of atomic operation though might result in having the object in both systems are in none depending on the order of steps 4 and 5, and a crash between them.
My question is, is there an algorithm that could somehow resolve the lack of atomicity and allow me to guarantee it. I kind of see that it seems impossible, but I hope it is not.
Quite possible (though not perfect). Databases do this all the time. See and for an introduction.
It is, of course, a pithy subject, so I can't supply a quick thumbnail sketch in code. But yes, you can do this.
Your approach has some merit. What you need is more communication between the two systems.

How do you mitigate proposal-number overflow attacks in Byzantine Paxos?

I've been doing a lot of research into Paxos recently, and one thing I've always wondered about, I'm not seeing any answers to, which means I have to ask.
Paxos includes an increasing proposal number (and possibly also a separate round number, depending on who wrote the paper you're reading). And of course, two would-be leaders can get into duels where each tries to out-increment the other in a vicious cycle. But as I'm working in a Byzantine, P2P environment, it makes me what to do about proposers that would attempt to set the proposal number extremely high - for example, the maximum 32-bit or 64-bit word.
How should a language-agnostic, platform-agnostic Paxos-based protocol deal with integer maximums for proposal number and/or round number? Especially intentional/malicious cases, which make the modular-arithmetic approach of overflowing back to 0 a bit unattractive?
From what I've read, I think this is still an open question that isn't addressed in literature.
Byzantine Proposer Fast Paxos addresses denial of service, but only of the sort that would delay message sending through attacks not related to flooding with incrementing (proposal) counters.
Having said that, integer overflow is probably the least of your problems. Instead of thinking about integer overflow, you might want to consider membership attacks first (via DoS). Learning about membership after consensus from several nodes may be a viable strategy, but probably still vulnerable to Sybil attacks at some level.
Another strategy may be to incorporate some proof-of-work system for proposals to limit the flood of requests. However, it's difficult to know what to use this as a metric to balance against (for example, free currency when you mine the block chain in Bitcoin). It really depends on what type of system you're trying to build. You should consider the value of information in your system, then create a proof of work system that requires slightly more cost to circumvent.
However, once you have the ability to slow down a proposal counter, you still need to worry about integer maximums in any system with a high number of (valid) operations. You should have a strategy for number wrapping or a multiple precision scheme in place where you can clearly determine how many years/decades your network can run without encountering trouble without blowing out a fixed precision counter. If you can determine that your system will run for 100 years (or whatever) without blowing out your fixed precision counter, even with malicious entities, then you can choose to simplify things.
On another (important) note, the system model used in most papers doesn't reflect everything that makes a real-life implementation practical (Raft is a nice exception to this). If anything, some authors are guilty of creating a system model that is designed to avoid a hard problem that they haven't found an answer to. So, if someone says that X will solve everything, please be aware they they only mean that it solves everything in the very specific system model that they defined. On the other side of this, you should consider that the system model is closely tied to a statement that says "Y is impossible". A nice example to explain this concept is the completely asynchronous message passing of the Ben-Or consensus algorithm which uses nondeterminism in the system model's state machine to avoid the limits specified by the FLP impossibility result (which specifies that consensus requires partially asynchronous message passing when the system model's state machine is deterministic).
So, you should continue to consider the "impossible" after you read a proof that says it can't be done. Nancy Lynch did a nice writeup on this concept.
I guess what I'm really saying is that a good solution to your question doesn't really exist yet. If you figure it out, please publish it (or let me know if you find an existing paper).