本篇文章为你整理了Java Resizable Array()的详细内容,包含有 Java Resizable Array,希望能帮助你了解 Java Resizable Array。
Sometimes you want to keep data (typically raw bytes) in a single, consecutive array for fast and easy access, but need the array to be
resizable, or at least expandable. Java arrays are not resizable, so using an array by itself is not sufficient.
Thus, to get a resizable array for primitive types you need to implement
it yourself. In this tutorial I will show you how to implement a resizable array in Java.
You might wonder why you cannot just use a Java ArrayList for this purpose. The answer is - you can if
one of these conditions are true:
You are storing some object type in the array.
You are storing primitive types in the array and performance and memory usage doesnt matter.
The Java ArrayList class only works for objects - not for primitive types (byte, int, long etc). If you use a
Java ArrayList to store primitive data it will be autoboxed into an object when inserted into the ArrayList, and
unboxed out of this object back to a primitive type. This boxing and unboxing adds an overhead to insertion of and
access to each element - which you do not want when you are trying to optimize for performance (this article is
part of my Java Performance trail).
Additionally, you have no guarantee for where in memory the autoboxed objects are stored. They can be scattered all
over the heap. Thus consecutive access
to them can be much, much slower than when stored consecutively in memory - in a primitive type array.
Additionally, storing a primitive in its object counterpart (e.g. a long in a Long object) incurs significant
extra memory overhead per element.
Java Resizable Array - GitHub Repository
The code for the Java resizable array implemented in this tutorial can be found on GitHub (for easy access):
https://github.com/jjenkov/java-resizable-array
The code consist of 3 Java classes and 2 unit tests.
Imagine you have a server receiving messages of varying sizes. Some messages are very small (e.g. less than 4KB),
others as large as 1MB or even larger. If the server cannot see from the beginning of the message how big the message
will end up being, there is no way of pre-allocating the correct amount of memory for that message. The
alternative is to "over-allocate", meaning allocate a memory block that we believe is big enough to hold the
message.
If the server is receiving messages from many (100K +) connections at the same time, we need to limit how much
memory we allocate for each message. We cannot just allocate the maximal message size (e.g. 1MB or 16MB etc.) for
each message buffer. That would deplete the server memory fast when the number of connections and messages go up!
100.000 x 1MB = 100GB (approximately - not precisely - but you get the picture).
The alternative is to start with a smaller buffer and if that buffer turns out not to be big enough to hold the
full message, we expand the buffer.
This is an expensive set of operations! Especially the copying of the old data to the new data gets more and
more expensive the larger the old array is.
Note, that in languages with built-in garbage collection (like Java, C# etc.) you do not explicitly have to
deallocate the old array, but the VM still has to do it, so your application still pays the deallocation price
sooner or later.
To minimize the price paid for array expansion you want to expand the array as few times as possible, in order
to make it large enough to hold the total message.
If we assume that most messages are small, we can start with a small buffer. If a message grows beyond
the small buffer size, we allocate a new, medium sized array and copy the data to the new, medium sized array.
If the message outgrows the medium sized array too, a large array is allocated, and the message is copied to that.
The large array should be the largest size allowed for a message. If the message outgrows the large message buffer,
it is too large to be processed and should be rejected by the server.
Using this strategy, most messages will only ever use a small buffer. This means that the server memory is
used more efficiently. 100.000 x 4KB (small buffer size) is only 400MB. Most servers should be able to handle that.
Even at 4GB (1.000.000 x 4KB) modern servers should be able to handle it.
The ResizableArrayBuffer contains a single, large array. This array is divided into three sections.
One section for small arrays, one for medium size arrays and one for large arrays.
The ResizableArray represents a single, resizable array, with its data stored in the array in
the ResizableArrayBuffer.
Here is a diagram illustrating the large array divided into sections, and each section divided into blocks.
By reserving an area of the large array in the ResizableArrayBuffer for each message size interval
(small, medium, large), we make sure that the array cannot be fully filled with either size messages. For instance,
receiving a lot of small messages cannot take all the memory and thus block for medium and large messages. Similarly,
receiving a lot of large messages cannot take up all memory and block for medium and small messages.
Since all messages start as small messages, if the amount of small arrays is depleted, no new arrays can be allocated,
regardless of whether or not there is space in the medium and large array sections. However, it is possible to
make the small array section large enough that the probability of this happening is very low.
It is still possible to grow small messages into medium and large messages, even if the small message section
is fully used.
A possible optimization is to use only a single block size. Thus, sometimes a newly allocated block might
be allocted just after the block that needs to be expanded. In that case you dont need to copy the data from
the old array to the new. You can simply "expand" the block to include both of the two allocated blocks, and
write new data into the expanded block from where the second block was added. This saves copying of all array
data when a memory block can be "expanded" into the next consecutive block.
A drawback of the above mentioned possible optimization is, that in the cases where it is not possible to expand
into the next memory block, copying of data is still necessary. Thus you have a small overhead of checking
if expansion is possible. Additionally, if you use a small block size, you may have to expand blocks more often
than if you use e.g. a small, medium and large block size.
The large array inside the ResizableArrayBuffer is split up into three sections. Each section is split
up into smaller blocks. Each block in each section has the same size. Blocks in the small message section has the
same small block size. Blocks in the medium message section has the same medium block size. And blocks in the
large message section has the same large block size.
When all blocks within a section has the same size, it is easier to keep track of used and unused blocks.
You can simply use a queue containing the start indexes of each block. One queue is needed for each section of
the shared array. Thus, one queue is needed to keep track of free small blocks, one queue for free medium blocks
and one queue for free large blocks.
Allocating a block from any of the sections can be accomplished simply by taking the next free block start index
from the queue associated with the desired section. Freeing a block is done by putting the start index back into
the corresponding queue.
As a queue I have used a simple Ring Buffer implementation. The implementation
used in the GitHubRepository is called QueueIntFlip.
The Resizable array will expand itself when you write data to it. If you attempt to write more data to it than
it has space for in its currently allocated block, it will allocate a new, larger block and copy all its data
to that new block. The previous smaller block is then freed.
Once you are done with a resizable array you should free it again, so that it can be used for other messages.
I want to show you how to use the ResizableArrayBuffer (from the GitHub repository).
This example creates a ResizableArrayBuffer with a small array size of 4KB, medium array size of
128KB and a larger array size of 1MB. The ResizableArrayBuffer contains space for 1024 small arrays (4MB in total),
32 medium arrays (4MB in total) and 4 large arrays (4MB in total) - for a full shared array size of 12MB.
To obtain a ResizableArray instance, call the ResizableArrayBuffers getArray()
method, like this:
ResizableArray resizableArray = arrayBuffer.getArray();
This example obtains a ResizableArray of the smallest size (4KB the small size was set to earlier).
You write to a ResizableArray by calling its write() method. The ResizableArray
class in the GitHub repository only contains a single write() method which takes a ByteBuffer
as parameter. It should be pretty easy to add more write() methods yourself, though.
Here is a write example:
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
for(int i=0; i ByteBuffer into the array (block) of the ResizableArray.
The value returned by write() is the number of bytes copied from the ByteBuffer.
In case the ByteBuffer had contained more data than the ResizableArray, the
ResizableArray will attempt to expand itself to make space for the data in the ByteBuffer.
If the ResizableArray cannot contain all the data in the ByteBuffer after expanding
itself to the max size, the write() method will return -1 and no data will have been
copied at all!
When you read from a ResizableArray you do so directly in the shared array which all
ResizableArray instances share. The ResizableArray contains the following
public field:
public byte[] sharedArray = null;
public int offset = 0;
public int capacity = 0;
public int length = 0;
The sharedArray field references the shared array which all ResizableArray instances
keep their data in. That is the array kept internally in the ResizableArrayBuffer too.
The offset field contains the start index in the shared array where this ResizableArray
keeps its data (the block assigned to it).
The capacity field contains the size of the block in the shared array assigned to this
ResizableArray instance.
The length field contains how much of the assigned block the ResizableArray is actually
using.
To read the data written to a ResizableArray simply read the bytes from sharedArray[offset]
to sharedArray[offset + length -1].
Once you are done using a ResizableArray instance you should free it. You do so simply by calling
the free() method on the ResizableArray, like this:
resizableArray.free();
Calling free() takes care of returning the used block to the correct block queue, regardless of
the size of the block allocated to the ResizableArray.
You can change the design of my ResizableArrayBuffer to meet your needs. For instance, you can
create more than 3 sections inside it. It should be fairly easy to do. Just look at the code in the GitHub
repository and modify it.
以上就是Java Resizable Array()的详细内容,想要了解更多 Java Resizable Array的内容,请持续关注盛行IT软件开发工作室。
郑重声明:本文由网友发布,不代表盛行IT的观点,版权归原作者所有,仅为传播更多信息之目的,如有侵权请联系,我们将第一时间修改或删除,多谢。