Advertisement :
Hi,
I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don't have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can't redesign the way it distributes files into disk due to the other app requirements).
The filesystem I use is ext3 with teh following options enabled:
Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
Thanks in advance (and please excuse my bad english).
_________________________________________________________________
Connect to the next generation of MSN Messenger
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline

oooooooooooo ooooooooooooo a écrit :
> Hi,
>
> I have a program that writes lots of files to a directory tree
Did that program also write your address header ?
:o)
oooooooooooo ooooooooooooo wrote:
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don’t have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can’t redesign the way it distributes files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
I haven’t done, or even seen, any recent benchmarks but I’d expect
reiserfs to still be the best at that sort of thing. However even if
you can improve things slightly, do not let whoever is responsible for
that application ignore the fact that it is a horrible design that
ignores a very well known problem that has easy solutions. And don’t
ever do business with someone who would write a program like that again.
Any way you approach it, when you want to write a file the system must
check to see if the name already exists, and if not, create it in an
empty space that it must also find – and this must be done atomically so
the directory must be locked against other concurrent operations until
the update is complete. If you don’t index the contents the lookup is a
slow linear scan – if you do, you then have to rewrite the index on
every change so you can’t win. Sensible programs that expect to access
a lot of files will build a tree structure to break up the number that
land in any single directory (see squid for an example). Even more
sensible programs would re-use some existing caching mechanism like
squid or memcached instead of writing a new one badly.
–
Les Mikesell
lesmikesell at gmail.com
On 7/8/09 8:56 AM, "Les Mikesell" <lesmikesell at gmail.com> wrote:
> oooooooooooo ooooooooooooo wrote:
>> Hi,
>>
>> I have a program that writes lots of files to a directory tree (around 15
>> Million fo files), and a node can have up to 400000 files (and I don’t have
>> any way to split this ammount in smaller ones). As the number of files grows,
>> my application gets slower and slower (the app is works something like a
>> cache for another app and I can’t redesign the way it distributes files into
>> disk due to the other app requirements).
>>
>> The filesystem I use is ext3 with teh following options enabled:
>>
>> Filesystem features: has_journal resize_inode dir_index filetype
>> needs_recovery sparse_super large_file
>>
>> Is there any way to improve performance in ext3? Would you suggest another FS
>> for this situation (this is a prodution server, so I need a stable one) ?
>>
>> Thanks in advance (and please excuse my bad english).
>
> I haven’t done, or even seen, any recent benchmarks but I’d expect
> reiserfs to still be the best at that sort of thing. However even if
> you can improve things slightly, do not let whoever is responsible for
> that application ignore the fact that it is a horrible design that
> ignores a very well known problem that has easy solutions. And don’t
> ever do business with someone who would write a program like that again.
> Any way you approach it, when you want to write a file the system must
> check to see if the name already exists, and if not, create it in an
> empty space that it must also find – and this must be done atomically so
> the directory must be locked against other concurrent operations until
> the update is complete. If you don’t index the contents the lookup is a
> slow linear scan – if you do, you then have to rewrite the index on
> every change so you can’t win. Sensible programs that expect to access
> a lot of files will build a tree structure to break up the number that
> land in any single directory (see squid for an example). Even more
> sensible programs would re-use some existing caching mechanism like
> squid or memcached instead of writing a new one badly.
In many ways this is similar to issues you’ll see in a very active mail or
news server that uses maildir wherein the d-entries get too large to be
traversed quickly. The only way to deal with it (especially if the
application adds and removes these files regularly) is to every once in a
while copy the files to another directory, nuke the directory and restore
from the copy. This is why databases are better for this kind of intensive
data caching.
–
Gary L. Greene, Jr.
IT Operations
Minerva Networks, Inc.
Cell: (650) 704-6633
Phone: (408) 240-1239
On Wed, Jul 8, 2009 at 2:27 AM, oooooooooooo ooooooooooooo <
hhh735 at hotmail.com> wrote:
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15
> Million fo files), and a node can have up to 400000 files (and I don’t have
> any way to split this ammount in smaller ones). As the number of files
> grows, my application gets slower and slower (the app is works something
> like a cache for another app and I can’t redesign the way it distributes
> files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype
> needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another
> FS for this situation (this is a prodution server, so I need a stable one) ?
>
I saw this article some time back.
hhh735 />
I’ve not implemented it, but from past experience, you may lose some
performance initially, but the database fs performance might be more
consistent as the number of files grow.
–
>Perhaps think about running tune2fs maybe also consider adding noatime
Yes, I added it and I got a perfomance increase, anyway as the number of fields grows the speed keeps going below an acceptable level.
>I saw this article some time back.
http://www.linux.com/archive/feature/127055
Good idea, I already use mysql for indexing the files, so everytime I need to make a lookup I don’t need the entire dir and then get the file, anyway my requirements are keeping the files on disk.
>The only way to deal with it (especially if the
application adds and removes these files regularly) is to every once in a
while copy the files to another directory, nuke the directory and restore
from the copy.Thanks, but there will not be too many file updates once the cache is done, so recreating directories can not be very helpful here. The issue is that as the number of files grows, bot reads from existing files and new insertion gets slower and slower.
>I haven’t done, or even seen, any recent benchmarks but I’d expect
reiserfs to still be the best at that sort of thing. I’ve looking at some benchmarks and reiser seems a bit faster in my scenario, however my problem happens when I have a arge number of files, for what I have seen, I’m not sure if reiser would be a fix….
>However even if
you can improve things slightly, do not let whoever is responsible for
that application ignore the fact that it is a horrible design that
ignores a very well known problem that has easy solutions.My original idea was storing the file with a hash of it name, and then store a hash->real filename in mysql. By this way I have direct access to the file and I can make a directory hierachy with the first characters of teh hash /c/0/2/a, so i would have 16*4 =65536 leaves in the directoy tree, and the files would be identically distributed, with around 200 files per dir (waht should not give any perfomance issues). But the requiremenst are to use the real file name for the directory tree, what gives the issue.
>Did that program also write your address header ?
:)
Thanks for the help.
> From: http://www.linux.com/archive/feature/127055
> To: http://www.linux.com/archive/feature/127055
> Date: Wed, 8 Jul 2009 06:27:40 0000
> Subject: [CentOS] Question about optimal filesystem with many small files.
>
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don’t have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can’t redesign the way it distributes files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
>
>
> _________________________________________________________________
> Connect to the next generation of MSN Messenger
> http://www.linux.com/archive/feature/127055
> __________________________
Hi,
On Wed, Jul 8, 2009 at 17:59, oooooooooooo
ooooooooooooo<hhh735 at hotmail.com> wrote:
> My original idea was storing the file with a hash of it name, and then store a hash->real filename in mysql. By this way I have direct access to the file and I can make a directory hierachy with the first characters of teh hash /c/0/2/a, so i would have 16*4 =65536 leaves in the directoy tree, and the files would be identically distributed, with around 200 files per dir (waht should not give any perfomance issues). But the requiremenst are to use the real file name for the directory tree, what gives the issue.
You can hash it and still keep the original filename, and you don’t
even need a MySQL database to do lookups.
For instance, let’s take "example.txt" as the file name.
Then let’s hash it, say using MD5 (just for the sake of example, a
simpler hash could give you good enough results and be quicker to
calculate):
$ echo -n example.txt | md5sum
e76faa0543e007be095bb52982802abe -
Then say you take the first 4 digits of it to build the hash: e/7/6/f
Then you store file example.txt at: e/7/6/f/example.txt
The file still has its original name (example.txt), and if you want to
find it, you can just calculate the hash for the name again, in which
case you will find the e/7/6/f, and prepend that to the original name.
I would also suggest that you keep less directories levels with more
branches on them, the optimal performance will be achieved by getting
a balance of them. For example, in this case (4 hex digits) you would
have 4 levels with 16 entries each. If you group the hex digits two by
two, you would have (up to) 256 entries on each level, but only two
levels of subdirectories. For instance: example.txt ->
e7/6f/example.txt. That might (or might not) give you a better
performance. A benchmark should tell you which one is better, but in
any case, both of these setups will be many times faster than the one
where you have 400,000 files in a single directory.
Would that help solve your issue?
HTH,
Filipe
On Wed, 08 Jul 2009 18:09:28 -0400
Filipe Brandenburger wrote:
> You can hash it and still keep the original filename, and you don’t
> even need a MySQL database to do lookups.
Now that is slick as all get-out. I’m really impressed your scheme, though I
don’t actually have any use for it right at this moment.
It’s really clever.
–
MELVILLE THEATRE ~ Melville Sask ~ http://w3bfaq.com/go.php?d=http://www.melvilletheatre.com
On Wed, 2009-07-08 at 16:14 -0600, Frank Cox wrote:
> On Wed, 08 Jul 2009 18:09:28 -0400
> Filipe Brandenburger wrote:
>
> > You can hash it and still keep the original filename, and you don’t
> > even need a MySQL database to do lookups.
>
> Now that is slick as all get-out. I’m really impressed your scheme, though I
> don’t actually have any use for it right at this moment.
>
> It’s really clever.
—
Yes it is but think about a SAN server with terabytes of data
directories disparsed over multiple controllers. I’m am kinda curious
how that would scale. That’s my problem.
John
> You can hash it and still keep the original filename, and you don’t
> even need a MySQL database to do lookups.
There are an issue I forgot to mention: the original file name can be up top 1023 characters long. As linux only allows 256 characters in the file path, I could have a (very small) number of collisions, that’s why my original idea was using a hash->filename table. So I’m not sure if I could implement that idea in my scenario.
>For instance: example.txt ->
> e7/6f/example.txt. That might (or might not) give you a better
> performance.
After a quick calculation, that could put around 3200 files per directory (I have around 15 million of files), I think that above 1000 files the performance will start to degrade significantly, anyway it would be a mater of doing some benchmarks.
Thanks for the advice.
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
oooooooooooo ooooooooooooo wrote:
>> You can hash it and still keep the original filename, and you don’t
>> even need a MySQL database to do lookups.
>
> There are an issue I forgot to mention: the original file name can be up top 1023 characters long. As linux only allows 256 characters in the file path, I could have a (very small) number of collisions, that’s why my original idea was using a hash->filename table. So I’m not sure if I could implement that idea in my scenario.
>
>> For instance: example.txt ->
>> e7/6f/example.txt. That might (or might not) give you a better
>> performance.
>
> After a quick calculation, that could put around 3200 files per directory (I have around 15 million of files), I think that above 1000 files the performance will start to degrade significantly, anyway it would be a mater of doing some benchmarks.
There’s C code to do this in squid, and backuppc does it in perl (for a
pool directory where all identical files are hardlinked). Source for
both is available and might be worth a look at their choices for the
depth of the trees and collision handling (backuppc actually hashes the
file content, not the name, though).
–
Les Mikesell
lesmikesell at gmail.com
2009/7/9, oooooooooooo ooooooooooooo <hhh735 at hotmail.com>:
>
> After a quick calculation, that could put around 3200 files per directory (I
> have around 15 million of files), I think that above 1000 files the
> performance will start to degrade significantly, anyway it would be a mater
> of doing some benchmarks.
depending on the total size of this cache files, as it was suggested
by nate – throw some hardware at it.
perhaps a hardware ram device will provide adequate performance :
hhh735 />
>There’s C code to do this in squid, and backuppc does it in perl (for a
pool directory where all identical files are hardlinked).
Unfortunately I have to write the file with some predefined format, so these would not provide the flexibility I need.
>Rethink how you’re writing files or you’ll be in a world of hurt.
It’s possible that I will be able to name the directory tree based in the hash of te file, so I would get the structure described in one of my previous post (4 directory levels, each directory name would be a single character from 0-9 and A-F, and 65536 (16^4) leaves, each leave containing 200 files). Do you think that this would really improve performance? Could this structure be improved?
>BTW, you can pretty much say goodbye to any backup solution for this type
of project as well. They’ll all die dealing with a file system structure
like this.
We don’t plan to use backups (if the data gets corrupted, we can retrieve it again), but thanks for teh advice.
>I think entry level list pricing starts at about $80-100k for
1 NAS gateway (no disks).
That’s far above the budget…
>depending on the total size of this cache files, as it was suggested
by nate – throw some hardware at it.
Same that above, seems they don’t want to spend more in HW (so I have to deal with all performance issues…). Anyway if I can get all the directories to have around 200 files, I think I will be able to make this with the current hardware.
Thanks for the advice.
_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It’s easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
On Thu, 9 Jul 2009, oooooooooooo ooooooooooooo wrote:
>
> It’s possible that I will be able to name the directory tree based in the hash of te file, so I would get the structure described in one of my previous post (4 directory levels, each directory name would be a single character from 0-9 and A-F, and 65536 (16^4) leaves, each leave containing 200 files). Do you think that this would really improve performance? Could this structure be improved?
>
If you don’t plan on modifying the file after creation I could see it
working. You could consider the use of a Berkley DB style database for
quick and easy lookups on large amounts of data, but depending on your
exact needs maintenance might be a chore and not really feasable.
It’s an interesting suggestion but I don’t know if it would actually work
like you describe based on having to always compute the hash first.
–
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
HPC Coordinator
Simon Fraser University – Burnaby Campus
Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier at sfu.ca
Website : jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier | jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier
jpeltier />MSN : jpeltier at hotmail.com
The point of the HPC scheduler is to
keep everyone equally unhappy.
On Thu, 2009-07-09 at 10:09 -0700, James A. Peltier wrote:
> On Thu, 9 Jul 2009, oooooooooooo ooooooooooooo wrote:
>
> >
> > It’s possible that I will be able to name the directory tree based in the hash of te file, so I would get the structure described in one of my previous post (4 directory levels, each directory name would be a single character from 0-9 and A-F, and 65536 (16^4) leaves, each leave containing 200 files). Do you think that this would really improve performance? Could this structure be improved?
> >
>
> If you don’t plan on modifying the file after creation I could see it
> working. You could consider the use of a Berkley DB style database for
> quick and easy lookups on large amounts of data, but depending on your
> exact needs maintenance might be a chore and not really feasable.
MUMPS DB will go at it even faster.
> It’s an interesting suggestion but I don’t know if it would actually work
> like you describe based on having to always compute the hash first.
>
Indeed interesting. Actually it would be the same as taking the file to
base 64 on final storage. My thoughts are it would would. Even faster
would be to implement this with the table in RAM.
john
On a side note, perhaps this is something that Hadoop would be good with.
–
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
HPC Coordinator
Simon Fraser University – Burnaby Campus
Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier at sfu.ca
Website : jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier | jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier
jpeltier />MSN : jpeltier at hotmail.com
The point of the HPC scheduler is to
keep everyone equally unhappy.
Hi, After talking with te customer, I finnaly managed to convince him for using the first characters of the hash as directory names.
Now I’m in doubt about the following options:
a) Using directory 4 levels /c/2/a/4/ (200 files per directory) and mysql with a hash->filename table, so I can get teh file name from the hash and then I can directly access it (I first query mysql for the hash of the file, and the I read the file).
b) Using 5 levels without mysql, and making a dir listing (due to technical issues, I can’t only know an approximate file name, so I can’t make a direct access here), match the file name and then read it. The issue here is that I would have 16^5 leave directories (more than a million).
I could also make more combinations of mysql/not mysql and number of levels.
What do you think it would give the best performance in ext3?
Thanks.
_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It’s easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
oooooooooooo ooooooooooooo wrote:
> Hi, After talking with te customer, I finnaly managed to convince him for using the first characters of the hash as directory names.
>
> Now I’m in doubt about the following options:
>
> a) Using directory 4 levels /c/2/a/4/ (200 files per directory) and mysql with a hash->filename table, so I can get teh file name from the hash and then I can directly access it (I first query mysql for the hash of the file, and the I read the file).
>
> b) Using 5 levels without mysql, and making a dir listing (due to technical issues, I can’t only know an approximate file name, so I can’t make a direct access here), match the file name and then read it. The issue here is that I would have 16^5 leave directories (more than a million).
>
> I could also make more combinations of mysql/not mysql and number of levels.
>
> What do you think it would give the best performance in ext3?
I don’t think you’ve explained the constraint that would make you use
mysql or not. I’d avoid it if everything involved can compute the hash
or is passed the whole path since is bound to be slower than doing the
math, and just on general principles I’d use a tree like
00/AA/FF/filename (three levels of 2 hex characters) as the first cut,
although squid uses just two levels with a default of 16 first level and
256 2nd level directories and probably has some good reason for it.
–
Les Mikesell
lesmikesell at gmail.com
>I don’t think you’ve explained the constraint that would make you use
> mysql or not.
My original idea was using the just the hash as filename, by this way I could have a direct access. But the customer rejected this and requested to have part of the long file name (from 11 to 1023 characters). As linux only allows 256 characters in the path and I could get duplicates with the 256 first chars, I trim teh real filename to around 200 characters and I add the hash at the end (plus a couple metadata small fields).
Yes, there requirements does not makes too much sense, but I’ve tried to convince the customer to use just the hash with no luck (seems he does not understand well what is a hash although I’ve tried to explain it several times).
That’s why I need or a) use mysql or b) do a directory lising.
>00/AA/FF/filename
That would make up to 256^3 directory leaves, what is more than 16 Million ones, due I have around 15M files, I think that this is an excessive number of directories.
_________________________________________________________________
Connect to the next generation of MSN Messenger
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
>
> My original idea was using the just the hash as filename, by this way I
> could have a direct access. But the customer rejected this and requested to
> have part of the long file name (from 11 to 1023 characters). As linux only
> allows 256 characters in the path and I could get duplicates with the 256
> first chars, I trim teh real filename to around 200 characters and I add the
> hash at the end (plus a couple metadata small fields).
>
> Yes, there requirements does not makes too much sense, but I’ve tried to
> convince the customer to use just the hash with no luck (seems he does not
> understand well what is a hash although I’ve tried to explain it several
> times).
>
> That’s why I need or a) use mysql or b) do a directory lising.
I would use either only a database, or only the file system. To me -
using them both is a violation of KISS.
If you were able to convince them to change the directory layout, and
if you are more confortable with a database – try to convince them to
use a database.
Ok, I coudl use mysql, but think we have around 15M entries and I would have to add to each a file from 1KB to 150KB, in total the files size can be around 200GB. How will be the performance of this in mysql?
_________________________________________________________________
Discover the new Windows Vista
http://w3bfaq.com/go.php?d=http://search.msn.com/results.aspx?q=windows vista&mkt=en-US&form=QBRE
2009/7/10, oooooooooooo ooooooooooooo <hhh735 at hotmail.com>:
>
> Ok, I coudl use mysql, but think we have around 15M entries and I would have
> to add to each a file from 1KB to 150KB, in total the files size can be
> around 200GB. How will be the performance of this in mysql?
>
in the worst case – 150kb for a 15000000 of files I get:
15000000 * 150 / (1024 * 1024)
2145.76721191406250000000
or 2TB
According to my tests the average size per file is around 15KB (although there are files from 1Kb to 150KB).
_________________________________________________________________
Explore the seven wonders of the world
http://w3bfaq.com/go.php?d=http://search.msn.com/results.aspx?q=7 wonders world&mkt=en-US&form=QBRE
On Fri, Jul 10, 2009 at 16:21, Alexander
Georgiev<alexander.georgiev at gmail.com> wrote:
> I would use either only a database, or only the file system. To me -
> using them both is a violation of KISS.
I disagree with your general statement.
Storing content that is appropriate for files (e.g., pictures) as
BLOBs in an SQL database only makes it more complex.
Creating "clever" file formats to store relationships between objects
in a filesystem instead of using a SQL database only makes it more
complex (and harder to extend!).
Think a website that stores user’s pictures and has social networking
features (maybe like Flickr?). The natural place to store the JPEG
images is the filesystem. The natural place to store user info,
favorites, relations between users, etc. is the SQL database. If you
try to do it different, it starts looking like you are trying to fit a
square piece in a round hole. It may be possible to do it, but it is
certainly not elegant.
Just because you are using less technologies doesn’t necessarily make
it simpler.
Filipe
2009/7/10, Filipe Brandenburger <filbranden at gmail.com>:
> On Fri, Jul 10, 2009 at 16:21, Alexander
> Georgiev<filbranden at gmail.com> wrote:
>> I would use either only a database, or only the file system. To me -
>> using them both is a violation of KISS.
>
> I disagree with your general statement.
>
> Storing content that is appropriate for files (e.g., pictures) as
> BLOBs in an SQL database only makes it more complex.
>
Please, explain why. I was under the impression that storing large
binary streams is BLOB’s reason to exist.
> Creating "clever" file formats to store relationships between objects
> in a filesystem instead of using a SQL database only makes it more
> complex (and harder to extend!).
Indeed.
> Just because you are using less technologies doesn’t necessarily make
> it simpler.
Of course, but if one of those technologies can provide both
functionalities without hacks, twists and abuse, I would stay with
that single technology.
oooooooooooo ooooooooooooo wrote:
>> I don’t think you’ve explained the constraint that would make you use
>> mysql or not.
>
> My original idea was using the just the hash as filename, by this way I could have a direct access. But the customer rejected this and requested to have part of the long file name (from 11 to 1023 characters). As linux only allows 256 characters in the path and I could get duplicates with the 256 first chars, I trim teh real filename to around 200 characters and I add the hash at the end (plus a couple metadata small fields).
>
> Yes, there requirements does not makes too much sense, but I’ve tried to convince the customer to use just the hash with no luck (seems he does not understand well what is a hash although I’ve tried to explain it several times).
You mentioned that the data can be retrieved from somewhere else. Is
some part of this filename a unique key? Do you have to track this
relationship anyway – or age/expire content? I’d try to arrange things
so the most likely scenario would take the fewest operations. Perhaps a
mix of hash filename would give direct access 99 % of the time and you
could move all copies of collisions to a different area. Then you could
keep the database mapping the full name to the hashed path but you’d
only have to consult it when the open() attempt fails.
> That’s why I need or a) use mysql or b) do a directory lising.
>
>> 00/AA/FF/filename
> That would make up to 256^3 directory leaves, what is more than 16 Million ones, due I have around 15M files, I think that this is an excessive number of directories.
I guess that’s why squid only uses 16 x 256…
–
Les Mikesell
lesmikesell at gmail.com
> You mentioned that the data can be retrieved from somewhere else. Is
> some part of this filename a unique key?
The real key is up to 1023 chracters long and it’s unique, but I have to trim to 256 charactes, by this way is not unique unless I add the hash.
>Do you have to track this
> relationship anyway – or age/expire content?
I have to track the long filename -> short file name realation ship. Age is not relevant here.
I’d try to arrange things
> so the most likely scenario would take the fewest operations. Perhaps a
> mix of hash filename would give direct access 99 % of the time and you
> could move all copies of collisions to a different area.
yes its a good idea, but at this point I don’t want to add more complexity tomy app, and having a separate area for collisions would make it more complex.
>Then you could
> keep the database mapping the full name to the hashed path but you’d
> only have to consult it when the open() attempt fails.
As the long filename is up to 1023 chars long i can’t index it with mysql (it has a lower max limit). That’s why I use the hash which is indexed). What I do is keeping a list of just the md5 of teh cached files in memory in my app, before going to mysql, I frist check if it’s in the list (realy a RB-Tree).
_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It’s easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
2009/7/11 oooooooooooo ooooooooooooo <hhh735 at hotmail.com>:
>
>> You mentioned that the data can be retrieved from somewhere else. Is
>> some part of this filename a unique key?
>
> The real key is up to 1023 chracters long and it’s unique, but I have to trim to 256 charactes, by this way is not unique unless I add the hash.
>
The fact that this 1023 file name is unique is very nice. And no
trimming is needed!
I think you have 2 issues to deal with:
1) you have files with unique file names unfortunatelly with lenth <=
1023 characters.
Regarding filenames and paths in linux and ext3 you have:
file name length limit = 254 bytes
path length limit = 4096
If you try to store such a file directly, you will break the file name
limit. But if you decompose the name into N chunks each of 250
characters, you will be able to preserve the file as a sequence of
N – 1 nested folders plus a file with a name equal to the Nth
chunk residing into the N-1th folder.
Via this decomposition you will translate the unique 1023 character
‘file name’ into a unique 1023 character ‘file path’ with length lower
than the path length limit
2) You suffer performance degradation when number of files in a
folder goes beyond 1000.
Filipe Brandenburger has suggested a slick scheme to overcome this
problem, that will work perfectly without a database:
============quote start
$ echo -n example.txt | md5sum
e76faa0543e007be095bb52982802abe -
Then say you take the first 4 digits of it to build the hash: e/7/6/f
Then you store file example.txt at: e/7/6/f/example.txt
============quote end
of course, "example.txt" might be a long filename: "exaaaaa ….. 1000
chars here …..txt" so after the "hash tree" e/7/6/f you will store
the file path structure described in 1).
As was suggested by Les Mikesell, squid and other products have
already implemented similar strategies, and you might be able to use
either the algorithm or directly the code that implements it. I would
spend some time investigating squid’s code. I think squid has to deal
with exactly same problem – cache the contents of resources whose urls
might be > 254 characters.
If you use this approach – no need for a database to store hashes!
I did some tests on a Centos 3 system with the following script:
=====================script start
#! /bin/bash
for a in a b c d e f g j; do
f=""
for i in `seq 1 250`; do
f=$a$f
done
mkdir $f
cd $f
done
pwd > some_file.txt
=====================script end
which creates a nested directory structure with and a file in it.
Total file path length is > 8 * 250. I had no problems accessing this
file by its full path:
$ find ./ -name some\* -exec cat {} \; | wc -c
2026
Thanks, using directories as file names is a great idea, anyway I’m not sure if that would solve my performance issue, as the bottleneck is the disk and not mysql. I just implemented the directories names based on the hash of the file and the performance is a bit slower than before. This is the output of atop (15 secs. avg.):
PRC | sys 0.53s | user 5.43s | #proc 112 | #zombie 0 | #exit 0 |
CPU | sys 4% | user 54% | irq 2% | idle 208% | wait 131% |
cpu | sys 1% | user 24% | irq 1% | idle 54% | cpu001 w 20% |
cpu | sys 2% | user 15% | irq 1% | idle 31% | cpu002 w 52% |
cpu | sys 1% | user 8% | irq 0% | idle 52% | cpu003 w 38% |
cpu | sys 1% | user 7% | irq 0% | idle 71% | cpu000 w 21% |
CPL | avg1 10.58 | avg5 6.92 | avg15 4.66 | csw 19112 | intr 19135 |
MEM | tot 2.0G | free 49.8M | cache 157.4M | buff 116.8M | slab 122.7M |
SWP | tot 1.9G | free 1.2G | | vmcom 2.2G | vmlim 2.9G |
PAG | scan 1536 | stall 0 | | swin 9 | swout 0 |
DSK | sdb | busy 91% | read 884 | write 524 | avio 6 ms |
DSK | sda | busy 12% | read 201 | write 340 | avio 2 ms |
NET | transport | tcpi 8551 | tcpo 8204 | udpi 702 | udpo 718 |
NET | network | ipi 9264 | ipo 8946 | ipfrw 0 | deliv 9264 |
NET | eth0 5% | pcki 6859 | pcko 6541 | si 5526 Kbps | so 466 Kbps |
NET | lo | pcki 2405 | pcko 2405 | si 397 Kbps | so 397 Kbps |
in sdb is the cache and in sda is all other stuff, including the mysql db files. Check that I have a lot of disk reads in sdb, but I’m really getting one file from disk for each 10 written, so my guess is that all other reads are directory listings. As I’m using the hash as directory names, (I think) this makes the linux cache slower, as the files are distributed in a more homogeneous and randomly way among the directories.
The app is running a bit slower than using the file name for directory name, although I expect (not really sure) that it will be better as the number of files on disk grows (currently there are only 600k files from 15M). My current performance is around 50 file i/o per second.
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
>
> Thanks, using directories as file names is a great idea, anyway I’m not sure if that would solve my performance issue, as the bottleneck is the disk and not mysql.
The situation you described initally, suffers from only one issue -
too many files in one single directory. You are not the fists fighting
this – see qmail maildir, see squid etc. The remedy is always one and
the same – split the files into a tree folder structure. For a sample
implementaition – check out squid, backup pc etc …
>I just implemented the directories names based on the hash of the file and the performance is a bit slower than before. This is the output of atop (15 secs. avg.):
>
> PRC | sys 0.53s | user 5.43s | #proc 112 | #zombie 0 | #exit 0 |
> CPU | sys 4% | user 54% | irq 2% | idle 208% | wait 131% |
> cpu | sys 1% | user 24% | irq 1% | idle 54% | cpu001 w 20% |
> cpu | sys 2% | user 15% | irq 1% | idle 31% | cpu002 w 52% |
> cpu | sys 1% | user 8% | irq 0% | idle 52% | cpu003 w 38% |
> cpu | sys 1% | user 7% | irq 0% | idle 71% | cpu000 w 21% |
> CPL | avg1 10.58 | avg5 6.92 | avg15 4.66 | csw 19112 | intr 19135 |
> MEM | tot 2.0G | free 49.8M | cache 157.4M | buff 116.8M | slab 122.7M |
> SWP | tot 1.9G | free 1.2G | | vmcom 2.2G | vmlim 2.9G |
I am under the impression that you are swapping. Out of 2GB of cache,
you have just 157MB cache and 116MB buffers. What is eating the RAM?
Why do you have 0.8GB swap used? You need more memory for file system
cache.
> PAG | scan 1536 | stall 0 | | swin 9 | swout 0 |
> DSK | sdb | busy 91% | read 884 | write 524 | avio 6 ms |
> DSK | sda | busy 12% | read 201 | write 340 | avio 2 ms |
> NET | transport | tcpi 8551 | tcpo 8204 | udpi 702 | udpo 718 |
> NET | network | ipi 9264 | ipo 8946 | ipfrw 0 | deliv 9264 |
> NET | eth0 5% | pcki 6859 | pcko 6541 | si 5526 Kbps | so 466 Kbps |
> NET | lo | pcki 2405 | pcko 2405 | si 397 Kbps | so 397 Kbps |
>
On Sat, 2009-07-11 at 00:01 0000, oooooooooooo ooooooooooooo wrote:
> > You mentioned that the data can be retrieved from somewhere else. Is
> > some part of this filename a unique key?
>
> The real key is up to 1023 chracters long and it’s unique, but I have to trim to 256 charactes, by this way is not unique unless I add the hash.
>
> >Do you have to track this
> > relationship anyway – or age/expire content?
>
> I have to track the long filename -> short file name realation ship. Age is not relevant here.
>
> I’d try to arrange things
> > so the most likely scenario would take the fewest operations. Perhaps a
> > mix of hash filename would give direct access 99 % of the time and you
> > could move all copies of collisions to a different area.
>
> yes its a good idea, but at this point I don’t want to add more complexity tomy app, and having a separate area for collisions would make it more complex.
>
> >Then you could
> > keep the database mapping the full name to the hashed path but you’d
> > only have to consult it when the open() attempt fails.
>
> As the long filename is up to 1023 chars long i can’t index it with mysql (it has a lower max limit). That’s why I use the hash which is indexed). What I do is keeping a list of just the md5 of teh cached files in memory in my app, before going to mysql, I frist check if it’s in the list (realy a RB-Tree).
—
It is 1024 chars long. Witch want still help. MSSQL 2005 and up is
longer, if your interested:
http://msdn.microsoft.com/en-us/library/ms143432.aspx
But that greatly depends on your data size 900 bytes is the limit but
can be exceeded.
You can use either one if you do a unique key id name for the index.
File name to Unique short name. I would not store images in either one
as your SELECT LIKE and Random will kill it. As much as I like DBs I
have to say the flat file system is for those.
John
On Sat, 2009-07-11 at 11:48 -0400, JohnS wrote:
> On Sat, 2009-07-11 at 00:01 0000, oooooooooooo ooooooooooooo wrote:
> > > You mentioned that the data can be retrieved from somewhere else. Is
> > > some part of this filename a unique key?
> >
> > The real key is up to 1023 chracters long and it’s unique, but I have to trim to 256 charactes, by this way is not unique unless I add the hash.
> >
> > >Do you have to track this
> > > relationship anyway – or age/expire content?
> >
> > I have to track the long filename -> short file name realation ship. Age is not relevant here.
> >
> > I’d try to arrange things
> > > so the most likely scenario would take the fewest operations. Perhaps a
> > > mix of hash filename would give direct access 99 % of the time and you
> > > could move all copies of collisions to a different area.
> >
> > yes its a good idea, but at this point I don’t want to add more complexity tomy app, and having a separate area for collisions would make it more complex.
> >
> > >Then you could
> > > keep the database mapping the full name to the hashed path but you’d
> > > only have to consult it when the open() attempt fails.
> >
> > As the long filename is up to 1023 chars long i can’t index it with mysql (it has a lower max limit). That’s why I use the hash which is indexed). What I do is keeping a list of just the md5 of teh cached files in memory in my app, before going to mysql, I frist check if it’s in the list (realy a RB-Tree).
> —
> It is 1024 chars long. Witch want still help. MSSQL 2005 and up is
> longer, if your interested:
> http://msdn.microsoft.com/en-us/library/ms143432.aspx
> But that greatly depends on your data size 900 bytes is the limit but
> can be exceeded.
>
> You can use either one if you do a unique key id name for the index.
> File name to Unique short name. I would not store images in either one
> as your SELECT LIKE and Random will kill it. As much as I like DBs I
> have to say the flat file system is for those.
>
> John
—
Just a random thought on Hashes VIA DB that none hardly give any thought
about.
Using Extended Stored Procedures like:MSSQL. You can make your on hashes
on the file insert.
USE master;
EXEC sp_extendedproc ‘your_md5′, ‘your_md5.dll’
Of course you will have to create your own .DLL to to do the Hashing.
Then create your on functions:
SELECT dbo.your_md5(‘YourHash’);
Direct:
EXEC master.dbo.your_md5 ‘YourHash’
However I have not a clue that this is even doable in MySQL.
John
(i resent thsi message as previous one seems bad formatted, sorry for the mess).
>Perhaps think about running tune2fs maybe also consider adding noatime
Yes, I added it and I got a perfomance increase, anyway as the number of fields grows the speed keeps going below an acceptable level.
>I saw this article some time back.
http://www.linux.com/archive/feature/127055
Good idea, I already use mysql for indexing the files, so everytime I need to make a lookup I don’t need the entire dir and then get the file, anyway my requirements are keeping the files on disk.
>The only way to deal with it (especially if the
application adds and removes these files regularly) is to every once in a
while copy the files to another directory, nuke the directory and restore
from the copy.
Thanks, but there will not be too many file updates once the cache is done, so recreating directories can not be very helpful here. The issue is that as the number of files grows, bot reads from existing files and new insertion gets slower and slower.
>I haven’t done, or even seen, any recent benchmarks but I’d expect
reiserfs to still be the best at that sort of thing. I’ve looking at some benchmarks and reiser seems a bit faster in my scenario, however my problem happens when I have a arge number of files, for what I have seen, I’m not sure if reiser would be a fix….
>However even if
you can improve things slightly, do not let whoever is responsible for
that application ignore the fact that it is a horrible design that
ignores a very well known problem that has easy solutions.
My original idea was storing the file with a hash of it name, and then store a hash->real filename in mysql. By this way I have direct access to the file and I can make a directory hierachy with the first characters of teh hash /c/0/2/a, so i would have 16*4 =65536 leaves in the directoy tree, and the files would be identically distributed, with around 200 files per dir (waht should not give any perfomance issues). But the requiremenst are to use the real file name for the directory tree, what gives the issue.
>Did that program also write your address header ?
:)
Thanks for the help.
> From: http://www.linux.com/archive/feature/127055
> To: http://www.linux.com/archive/feature/127055
> Date: Wed, 8 Jul 2009 06:27:40 0000
> Subject: [CentOS] Question about optimal filesystem with many small files.
>
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don’t have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can’t redesign the way it distributes files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
>
>
> _________________________________________________________________
> Connect to the next generation of MSN Messenger
> http://www.linux.com/archive/feature/127055
On Wed, 8 Jul 2009, oooooooooooo ooooooooooooo wrote:
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don’t have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can’t redesign the way it distributes files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
There isn’t a good file system for this type of thing. filesystems with
many very small files are always slow. Ext3, XFS, JFS are all terrible
for this type of thing.
Rethink how you’re writing files or you’ll be in a world of hurt.
–
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
HPC Coordinator
Simon Fraser University – Burnaby Campus
Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier at sfu.ca
Website : jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier | jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier
jpeltier />MSN : jpeltier at hotmail.com
The point of the HPC scheduler is to
keep everyone equally unhappy.
James A. Peltier wrote:
> There isn’t a good file system for this type of thing. filesystems with
> many very small files are always slow. Ext3, XFS, JFS are all terrible
> for this type of thing.
I can think of one…though you’ll pay out the ass for it, the
Silicon file system from BlueArc (NFS), file system runs on
FPGAs. Our BlueArc’s never had more than 50-100,000 files in any
particular directory(millions in any particular tree), though
they are supposed to be able to handle this sort of thing quite
well.
I think entry level list pricing starts at about $80-100k for
1 NAS gateway (no disks).
Our BlueArc’s went end of life earlier this year and we migrated
to an Exanet cluster(runs on top of CentOS 4.4 though uses it’s
own file system, clustering and NFS services) which is still
very fast though not as fast as BlueArc.
And with block based replication it doesn’t matter how many
files there are, performance is excellent for backup, send
data to another rack in your data center or to another
continent over the WAN. In BlueArc’s case transparently
send data to a dedupe device or tape drive based on
dynamic access patterns(and move it back automatically
when needed).
http://www.bluearc.com/html/products/file_system.shtml
http://www.bluearc.com/html/products/file_system.shtml
Both systems scale to gigabytes/second of throughput linearly,
and petabytes of storage without downtime. The only downside
to BlueArc is their back end storage, they only offer tier
2 storage and only have HDS for tier 1. You can make an HDS
perform but it’ll cost you even more..The tier 2 stuff is
too unreliable(LSI logic). Exanet at least supports
almost any storage out there(we went with 3PAR).
Don’t even try to get a netapp to do such a thing.
nate
On Wed, 8 Jul 2009, oooooooooooo ooooooooooooo wrote:
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don’t have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can’t redesign the way it distributes files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
BTW, you can pretty much say goodbye to any backup solution for this type
of project as well. They’ll all die dealing with a file system structure
like this
—
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
HPC Coordinator
Simon Fraser University – Burnaby Campus
Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier at sfu.ca
Website : jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier | jpeltier">http://w3bfaq.com/go.php?d=http://lists.centos.org/mailman/listinfo/centos">jpeltier
jpeltier />MSN : jpeltier at hotmail.com
The point of the HPC scheduler is to
keep everyone equally unhappy.
>How many files per directory do you have?
I have 4 directory levels, 65536 leaves directories and around 200 files per dir (15M in total)-
>Something is wrong. Got to figure this out. Where did this RAM go?
Thanks I reduced the memory usage of mysql and my app it and I got around a 15% performance increase. Now my atop looks like this (currently reading only cached files from disk).
PRC | sys 0.51s | user 9.29s | #proc 114 | #zombie 0 | #exit 0 |
CPU | sys 4% | user 93% | irq 1% | idle 208% | wait 94% |
cpu | sys 2% | user 48% | irq 1% | idle 21% | cpu001 w 28% |
cpu | sys 1% | user 17% | irq 0% | idle 41% | cpu000 w 40% |
cpu | sys 1% | user 14% | irq 0% | idle 74% | cpu003 w 12% |
cpu | sys 1% | user 13% | irq 0% | idle 72% | cpu002 w 14% |
CPL | avg1 3.45 | avg5 7.42 | avg15 10.76 | csw 15891 | intr 11695 |
MEM | tot 2.0G | free 51.2M | cache 587.8M | buff 1.0M | slab 281.2M |
SWP | tot 1.9G | free 1.9G | | vmcom 1.6G | vmlim 2.9G |
PAG | scan 3072 | stall 0 | | swin 0 | swout 0 |
DSK | sdb | busy 89% | read 1451 | write 0 | avio 6 ms |
DSK | sda | busy 6% | read 178 | write 54 | avio 2 ms |
NET | transport | tcpi 3631 | tcpo 3629 | udpi 0 | udpo 0 |
NET | network | ipi 3632 | ipo 3630 | ipfrw 0 | deliv 3632 |
NET | eth0 0% | pcki 5 | pcko 3 | si 0 Kbps | so 1 Kbps |
NET | lo | pcki 3627 | pcko 3627 | si 775 Kbps | so 775 Kbps |
>It is 1024 chars long. Witch want still help.
I’m usng mysam and according to: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
"The maximum key length is 1000 bytes. This can also be changed by changing the source and recompiling. For the case of a key longer than 250 bytes, a larger key block size than the default of 1024 bytes is used. "
>I would not store images in either one
as your SELECT LIKE and Random will kill it.
Well, I think that this can be avoided, using just searches in teh key fields should not give these issues. Does somebody have experience storing a large amount of medium (1KB-150KB) blob objects in mysql?
>However I have not a clue that this is even doable in MySQL.
In mysql there is already a MD5 funtion: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
Thanks for the help.
_________________________________________________________________
Connect to the next generation of MSN Messenger
http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
On Mon, 2009-07-13 at 05:49 0000, oooooooooooo ooooooooooooo wrote:
> >It is 1024 chars long. Witch want still help.
> I’m usng mysam and according to: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
> "The maximum key length is 1000 bytes. This can also be changed by changing the source and recompiling. For the case of a key longer than 250 bytes, a larger key block size than the default of 1024 bytes is used. "
>
> >I would not store images in either one
> as your SELECT LIKE and Random will kill it.
>
> Well, I think that this can be avoided, using just searches in teh key fields should not give these issues. Does somebody have experience storing a large amount of medium (1KB-150KB) blob objects in mysql?
True
An option would be to encode them to Base64 on INSERT but if you Index
all of you BLOBS on INSERT really there should be no problem. Besides
150Kb is not a big for a BLOB. Consider 20MB to 100MB with multiple
joins on MSSQL, 64Bit although. Apparently size is based on the maximum
amount of memory the client has. VARBLOB apparently has no limit per
docs. As doing this on MySQL I can not relate to. I can on DB2 and
MSSQL. I can say you can rival the 32Bit MSSQL performance by at least
15 percent. I can only say that I have experiance with raw DB
predictions in Graphing. Edge and Adjacency Modeling on MySQL.
What I see slowing you down is the TQSL and SPROCS. The dll for the md5
I posted earlier will scale to 1000s of inserts at the time. If speed is
really your essence then use RAW Partitions for the DB and RAM. Use the
MySQL Connector or the ODBC or you will hit size limits on INSERT and
SELECT.
> >However I have not a clue that this is even doable in MySQL.
>
> In mysql there is already a MD5 funtion: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
Yes, I was informed that a call from a SPROC to "md5()" would do the
trick and take the load of the client. At least that was my intent of
the idea to balance the load. That is if this is client/server.
I do wonder about your memory allocation and disk. It is all about the
DB design. Think about a Genealogy DB. Where do you end design? You
don’t. Where does predictions end? They don’t.
John
JohnS wrote:
> On Mon, 2009-07-13 at 05:49 0000, oooooooooooo ooooooooooooo wrote:
>
>>> It is 1024 chars long. Witch want still help.
>> I’m usng mysam and according to: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
>> "The maximum key length is 1000 bytes. This can also be changed by changing the source and recompiling. For the case of a key longer than 250 bytes, a larger key block size than the default of 1024 bytes is used. "
>>
>>> I would not store images in either one
>> as your SELECT LIKE and Random will kill it.
>>
>> Well, I think that this can be avoided, using just searches in teh key fields should not give these issues. Does somebody have experience storing a large amount of medium (1KB-150KB) blob objects in mysql?
>
> True
>
> An option would be to encode them to Base64 on INSERT but if you Index
> all of you BLOBS on INSERT really there should be no problem. Besides
> 150Kb is not a big for a BLOB. Consider 20MB to 100MB with multiple
> joins on MSSQL, 64Bit although. Apparently size is based on the maximum
> amount of memory the client has. VARBLOB apparently has no limit per
> docs. As doing this on MySQL I can not relate to. I can on DB2 and
> MSSQL. I can say you can rival the 32Bit MSSQL performance by at least
> 15 percent. I can only say that I have experiance with raw DB
> predictions in Graphing. Edge and Adjacency Modeling on MySQL.
>
> What I see slowing you down is the TQSL and SPROCS. The dll for the md5
> I posted earlier will scale to 1000s of inserts at the time. If speed is
> really your essence then use RAW Partitions for the DB and RAM. Use the
> MySQL Connector or the ODBC or you will hit size limits on INSERT and
> SELECT.
>
>>> However I have not a clue that this is even doable in MySQL.
>> In mysql there is already a MD5 funtion: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
>
> Yes, I was informed that a call from a SPROC to "md5()" would do the
> trick and take the load of the client. At least that was my intent of
> the idea to balance the load. That is if this is client/server.
>
> I do wonder about your memory allocation and disk. It is all about the
> DB design. Think about a Genealogy DB. Where do you end design? You
> don’t. Where does predictions end? They don’t.
I think you are making this way too complicated. You are going to end
up filling a large disk with small bits of data and your speed is going
to be limited by how fast the disk head can get to the right place for
anything that isn’t already in a buffer. Other than the special case of
too many entries in a single directory, the software overhead isn’t
going to make much difference unless you can effectively predict what
you are likely to want next or keep the most popular things in your
buffers. Hardware-wise, adding RAM is likely to help even if it is just
for the filesystem inode/directory cache – and if you are lucky, the LRU
data buffering. Also, spreading your data over several disks would help
by reducing the head contention.
–
Les Mikesell
http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html