I'm Not A Mathematician But...

by
johnny_h
Profile picture of johnny_h
Posts: Threads: Thanks: Account age: less than a second
24 replies
Do you think Google really expends the processing power to analyze the data about your website & pages?

It's more than possible. The data is there for them to analyze & make a determination from, but are they doing it?

I'm no computer engineer, but one thing that I know for sure - the more data you want to process you'll either have to come up with one of two things - time or processing power.

Considering that Google processes millions of requests for its search engine daily, serves ads, crawls websites, serves email, offers blogs, knol, docs, reader (and the list goes on...), do you think they actually set aside the processing power to evaluate all the data they recieve?

I just started wondering this the other day - I tried to find information about what kind of servers they're running at Google - how many, what kinds of processors, etc., but it doesn't look like anyone has any definite information about what they're running at this very moment. If that was public, so I thought, it would be easy to evaluate whether or not they could actually process the data on top of the services they offer.

Since Google runs such a tight ship, we're left to guess. While it's totally possible for Google to analyze the data that we hand them about our activity, it's also possible that they do nothing with it other than store it. Just a thought.

What do you think?
  • Profile picture of the author Lawrh
    Lawrh
    Profile picture of Lawrh
    Posts: Threads: Thanks: Account age: less than a second
    This is from Wikipedia -

    Google runs over one million servers in data centers around the world,[14] and processes over one billion search requests[15] and twenty petabytes of user-generated data every day.[16][17][18]

    According to a story on Cnet a few months ago, each server has two 4 core CPU's. They are the biggest supercomputer in the world. The data processing you describe is just regular database operations for the most part and is no strain on their systems at all.

    When's the last time your searches took more than a fraction of a second? All of those million plus servers are connected. And with terabytes of memory their whole bloody index can be in RAM.

    The stuff we see is trivial. Googles computing focus has been AI.
    Signature

    “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

  • Profile picture of the author seasoned
    seasoned
    Profile picture of seasoned
    Posts: Threads: Thanks: Account age: less than a second
    ALSO, Google preprocesses everything. They PROBaBLY even have various keyword searches already preprocessed. I did something like what google does. Mine was better in some ways, and was running on ONE system, but mine could take a few seconds. And I was only going through a few million sites. Of course, secondary searches were FAR faster because of cache. If I simply SEARCHED every site in the database to do the searches, it would take several MINUTES!

    Steve
  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    Lawrh - are there any exact specs listed in those articles? clock speed, bus speed, type & amount of ram, etc? I've never had searches come back in less than a few seconds, true, but I have had gmail stall out on me, chat lag & break, blogger service been unavailable, etc. Google does a lot of different types of processing. Also, like you said they have millions of computers worldwide - the idea is to ascertain if they have the capability to process that other information, which is probably just as much if not more data than what results from their daily search queries, on top of what they're already doing out in the open.

    seasoned - sounds like you've got a really good idea of how their search engine functions - how do you think your machine would react to multiple queries at the same time? How much data does the processor have to move in each function? Could you determine a standard size based on database size & search string? could give more insight into calculating exactly what google machines are capable of...
    • Profile picture of the author Lawrh
      Lawrh
      Profile picture of Lawrh
      Posts: Threads: Thanks: Account age: less than a second
      Go to Cnet and do a search. The articles also cover their datacenter design.

      I'm not sure you understand the power of distributed supercomputing. Google processes 24 petabytes of data every single day. That's 24,000 terabytes. EACH DAY! This is entirely separate from search.

      Problems with Gmail tend to be very localized and network or datacenter related. One corner of the 'Net howls, no one else notices. This applies to Blogger, Youtube etc. Nothing computing capacity related.

      Chat lag is network related.

      Search has little impact on Google's computing capability. Simple look ups, many preprocessed like Steve said.

      In general, most non application servers (not just Google's) are grossly under utilized. Your $10/month hosting account is on the same box as up to 1000 other accounts. Serving web pages like blogger, youtube, picassa etc. takes very little CPU. The real problem is and always has been bandwidth, probably Google's biggest expense (10's of millions per month in California alone).

      Data processing capability is a non issue for Google's daily operations. It's always bandwidth.

      As I mentioned before, Google's focus is AI (Artificial Intelligence). They have more than enough EXCESS processing capability to make them THE world leaders in AI research. Which should scare the sh*t out of everyone...
      Signature

      “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

    • Profile picture of the author seasoned
      seasoned
      Profile picture of seasoned
      Posts: Threads: Thanks: Account age: less than a second
      Originally Posted by johnny_h View Post

      Lawrh - are there any exact specs listed in those articles? clock speed, bus speed, type & amount of ram, etc? I've never had searches come back in less than a few seconds, true, but I have had gmail stall out on me, chat lag & break, blogger service been unavailable, etc. Google does a lot of different types of processing. Also, like you said they have millions of computers worldwide - the idea is to ascertain if they have the capability to process that other information, which is probably just as much if not more data than what results from their daily search queries, on top of what they're already doing out in the open.

      seasoned - sounds like you've got a really good idea of how their search engine functions - how do you think your machine would react to multiple queries at the same time? How much data does the processor have to move in each function? Could you determine a standard size based on database size & search string? could give more insight into calculating exactly what google machines are capable of...
      Things aren't NEARLY as uniform as you seem to think. I had like a three stage queue, of sorts, to get times down to a few seconds. You CAN'T scan the document, because it would take too long, for ANYTHING. You couldn't simply use an index because the full text indexes didn't even exist at the time.

      Yeah, mine COULD choke if multiple people queried. It depends on the database, settings, caches, and queries. My system was a full binary type search in that it can handle words and phrases and add or subtract criteria. The number could have an impact, but the length wouldn't be so meaningful.

      People have a hard enough time figuring out what ONE known machine is capable of. Figuring the capacity of an array of unknown systems is harder.

      BTW When google stalls, etc..., at least for me, it is usually a network, or http issue.

      Steve
  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    I'm sorry if I offended either of you guys - didn't intend to cause a fuss with the post.

    It's just an interesting fact that we don't know what they're capable of - there's no references to actual hardware, exactly how many datacenters, etc in any of those cnet articles, they were more like PR articles for Google - yet the word fear is used in the same sentence with Google, even in the absence of fact. It's superstition. It's paranoia. It's unhealthy & counter-productive. I think that's what I was getting at with my original post.

    Talking about hardware and the measure of its capabilities is somewhat an art & science. At best, we could come up with a theoretical "how this system should operate under laboratory conditions" measure of its' capability. Everything about any given computer system is known, at least on paper. There are outside influences, like power fluctuation and temperature, but hardware is made to work within a certain range & the software's code is known and programmed to work in a specific way. I will admit that there's no finite way to pin point exactly how a system will perform at any given moment, but again, I think the point was to provoke thought about whether we should be so quick to assume that Google is really plotting our ultimate demise - if we could prove that in theory they were mortgaged to the data hilt, so to speak, or somewhere close to it (because we know they process over 24 petabytes of data every day), even if only theoretically, we could say that we could rule out Google expending extra power just to analyze our activity like big brother.

    And I am speaking about their capabilities at the server, not their bandwith issues. I have had chat lag excessively only to tell me moments later that it's sorry for my inconvenience, chat is having technical difficulties & Google is working on it (lags & breaks) - blogger gives me "service unavailable" or "server error", etc. that does show an deficiency somewhere in their system, not their bandwith. To complicate things, I've had this happening in the middle of the morning, hardly peak surfing time.

    Again, like I said, I didn't mean to upset you guys, but rather to ask you to think logically - what do you think Google is capable of? Without proof, is it more likely that theories and superstitions are thrown around, or facts? Since there's no proof, there are no facts.... superstitions are what these ideas are, not fact.
    • Profile picture of the author Lawrh
      Lawrh
      Profile picture of Lawrh
      Posts: Threads: Thanks: Account age: less than a second
      You very seriously need to improve your ability to do research on the 'Net. There is nothing theoretical about it. Google is a public company and all of the details I referred to in my previous posts are information that is required for them to publish. Public companies cannot keep assets and capabilities secret, it is illegal. Research projects can be secret but not the rest.

      If you read the previous posts you will see that your complaints are local to the datacenter you have connected to and have nothing to do with overall capabilities. What I described in my posts is real, not theory.

      I suspect the difficulty in understanding stems from being in the desktop computer mindset. The world of supercomputers is so far removed from that paradigm as to render comparisons and concepts meaningless. They really do process 24 petabytes (24,000 terabytes) per day. Completely separate from search. They really are the largest supercomputer in the world. No theories, no superstitions. Fact, published as required by law.

      Their involvement with the CIA and NSA can be called theories and superstitions. Not their capabilities.

      To help understand, research distributed computing, cluster computing and anything to do with supercomputers. Speculating about hardware is meaningless desktop thinking. Think instead about what a million plus servers working as one can do. They haven't even begun to take advantage of what they have.

      Once you understand this then you can begin to think logically and stop calling what you don't understand "superstitions".
      Signature

      “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

    • Profile picture of the author seasoned
      seasoned
      Profile picture of seasoned
      Posts: Threads: Thanks: Account age: less than a second
      Johnny_h,

      I heard that, at least originally, they used pretty generic PCs! They may be using some that can't be nearly as large as others they use. Apparently, they just keep adding nodes.

      Steve
  • Profile picture of the author HeySal
    HeySal
    Profile picture of HeySal
    Posts: Threads: Thanks: Account age: less than a second
    Considering that Google processes millions of requests for its search engine daily, serves ads, crawls websites, serves email, offers blogs, knol, docs, reader (and the list goes on...), do you think they actually set aside the processing power to evaluate all the data they recieve?
    LOL - have you ever been to one of their data storage complexes? How about one of their technician centers. If you were to put all of them in the same place, google owns a pretty nice size city.

    I like their complex out in the Dalles, Oregon - use goats to mow their lawns. Employees take their pets to work and they have a pretty cush gym and campus. Of course, the goats aren't eating the main greensward - they just munch all those voluminous acres that google owns with the complex.

    For a corporation the size and strength of Google, I really don't think power OR time is much of a concern.
    Signature

    Sal
    When the Roads and Paths end, learn to guide yourself through the wilderness
    Beyond the Path

  • Profile picture of the author HeySal
    HeySal
    Profile picture of HeySal
    Posts: Threads: Thanks: Account age: less than a second
    Actually, Lawrence - that is the type of info on their Annual Report and anyone can request a copy, stockholder or not -- oh, library has em, too.

    Actually there's nothing on paper that can compare to a walk through of one of their facilities. If 3 or 4 more other Google complexes were put beside the one in the Dalles, they would replace the whole town - just a few of them at that size. If you think you can comprehend what those complexes are like on the inside........I'm betting a tour would wow you........scared me crapless. I've seen Microsoft and Yahoo data storage facilities from the outside (the security it tight, you aren't just going to walk in the door) and they both would fit in one google complex with a lot of room left over for something else small like BOA or something else insignificant. LOL.

    And......I'm still most impressed with the goats. LOL.
    Signature

    Sal
    When the Roads and Paths end, learn to guide yourself through the wilderness
    Beyond the Path

  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    lawrence - post the link to your source and the exact number of computers and their specifications in this thread.

    The entire undertone of what you say is that it's a fact that Google monitors every inch of the webspace visible to them (which I have no doubt they do) and then uses that information to some end - which I don't necessarily believe that they do other than to know generic usage statistics. You talked about Google being one of the largest researchers of AI & that's a reason to fear them - watch your words, what are they doing with their AI technology? Why should there be fear? You're implying something there with no proof.

    As for whether I'm experiencing trouble with a local datacenter - it goes to prove my point that their servers experience load & it can cause problems - if it happened to my local datacenter, then why not others? and if it could happen to one, then what does that say about the capabilities of the other datacenters.

    I'm serious, you don't have any concrete facts & yet you're implying that Google can do this that or the other, when you don't know. That's superstition - that's a belief in something other than fact. Prove me wrong?
    • Profile picture of the author Lawrh
      Lawrh
      Profile picture of Lawrh
      Posts: Threads: Thanks: Account age: less than a second
      Go to your public library and pickup a copy of their annual report. Like Sal said it's all public. Or find the links yourself, everything is available to those who don't need hand holding.

      Remember that individual server specs are meaningless in the context of a million plus node distributed supercomputer.

      It is highly doubtful that computational load plays much of a part in datacenter problems. It is almost always bandwidth load or delays with datacenter synchronization (all of the thousands of datacenters do sync and sometimes glitches happen) or hardware failure. You seem to have skipped over Sal's posts. Why not do what she did and visit one of their datacenters.

      Google is the global leader in AI research, which is also public knowledge. Teaching spiders how to actually read and understand web pages is the holy grail for semantic indexing. It is also the goal for analyzing and predicting behavior, something which many find disturbing.

      Here's a blog that has Larry Page and Sergey Brin's quotes and interviews on AI in a timeline format.

      *Google Founders Artificial Intelligence Quotes Archive Ignorance Is Futile!

      Of course, not everyone is happy about how they've chosen to feed their AI.

      The Great Google Book Grab
      Signature

      “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

      • Profile picture of the author seasoned
        seasoned
        Profile picture of seasoned
        Posts: Threads: Thanks: Account age: less than a second
        Originally Posted by Lawrh View Post

        Go to your public library and pickup a copy of their annual report. Like Sal said it's all public. Or find the links yourself, everything is available to those who don't need hand holding.

        Remember that individual server specs are meaningless in the context of a million plus node distributed supercomputer.

        It is highly doubtful that computational load plays much of a part in datacenter problems. It is almost always bandwidth load or delays with datacenter synchronization
        And I say this as someone that HAS read annual reports, and worked with computers up to full computers, TONS of databases and ISAMS, etc.... and some of the largest companies in the US......


        *******************************BULL*************** *****************!

        And WHY would they state this stuff in the annual report? It is NOT required and, in fact, would be considered a trade secret! OK, do you want to see the most APPROPRIATE paragraph from their 2009 annual report about this:

        Infrastructure. We provide our products and services using our own software and hardware infrastructure, which provides substantial computing resources at low cost. We currently use a combination of off-the-shelf and custom software running on clusters of commodity computers. Our considerable investment in developing this infrastructure has produced several benefits. This infrastructure simplifies the storage and processing of large amounts of data, eases the deployment and operation of large-scale global products and services, and automates much of the administration of large-scale clusters of computers. Although most of this infrastructure is not directly visible to our users, we believe it is important for providing a high-quality user experience. It enables significant improvements in the relevance of our search and advertising results by allowing us to apply superior search and retrieval algorithms that are computationally intensive. We believe the infrastructure also shortens our product development cycle and lets us pursue innovation more cost effectively.
        AW, who knows, maybe I am using an improper source or an outdated document. Here is the URL I used:

        Form 10-K

        Frankly, I have worked with databases costing MILLIONS of dollars, and they usually have stuff LOCAL so the network ISN'T the weak link. Google apparently gets rid of network latency, by having many computers work on different pieces at once. That is from what I heard THEM say, not from annual reports. The above paragraph merely ALLUDES to this.

        Steve
  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    no, didn't skip sal's post - there's no information available to the public - you're both wrong about that. I actually contacted google directly, they said the information I requested wasn't available to the public - not to me or their shareholders. It's called trade secrets.

    You're still wrong about the server problems as well - if I can connect to google in order for them to give me a server error message or tell me that there's a problem with their chat service, there's no problem with the connection - server error is server error. I'm not sure you know as much as you claim to. In regards to distributed computing, I think you've completely ignored what I've said in previous posts - I understand that the workload is spread out over a multitude of systems, but there are still limitations - if you have two computers sharing the load, they can do the work of one much more efficiently, but they can't do the work of four computers.

    and I read everything regarding google on cnet - there's no information there other than some vague reports that publicized something that was going on with Google at the time of the report.

    Lawrence, you're wrong.

    Also, you've not addressed what I've said about you claiming we should fear Google. You keep mentioning AI & I see your articles that you've referenced here - I'll quote what Larry Page says:

    "Artificial Intelligence would be the ultimate version of Google. So, with the ultimate search engine it would understand everything on the web - it would understand exactly what you wanted, and it would give you the right thing. And that's obviously Artificial Intelligence, to answer any question, basically, because almost everything is on the web, right?

    So, we're nowhere near doing that now, however we can get incrimentally closer, and that's basically what we work on & that's tremendously interesting from an intellectual standpoint."

    The article you posted to there quotes the video, then goes to break down those two paragraphs - which took all of a minute for lary page to spit out, then shows a couple videos later, about a minute in length each where larry page again says "we'd like to have a search engine that can give you exactly what you wanted, using ai" and another where he addresses a conference & talks about what ai will be, and I quote - "when ai happens, I think it will be a lot of computation, and no such clever blackboard/whiteboard kinds of stuff or other algoriths, but a lot of computation" - before that he talks about his dad trying to do AI on a commodore - he also says that they have "some people at Google working on Artificial Intelligence" and then he says the same thing from the two previous videos, "to do a perfect job of search, you could ask any query & it would give you the perfect answer" and then "we're lucky enough to be working incrementally closer to that, but again very very few people are working on this" - then he goes on to talk about climate change - such as the weather - larry page wants to change the climate, or so he says. So he was a speaker at a conference about climate change a took a second to plug his search engine.

    Lawrence, did you read the article & really listen to what larry said? He literally says these things

    - AI is not being implemented, they're working on it
    - There are very few people working on AI
    - They're making small, incremental steps (it's going very slowly, at least that's what he says more than three times in the videos that you've offered)
    - AI is being developed to improve search, nothing else - not used to analyze any other data at Google.

    So why do you keep telling me about AI? Not only does the material you've referenced prove that (and that last video was in 2007) Google most likely still doesn't have any kind of AI implemented anywhere in their system that interacts with the public, but it's taking them a long time - the first video is from 2000 - the second 2006 and the third 2007 - that's seven years that Larry Page has been saying the exact same thing. Literally, it's a script he has well memorized because he repeats it word for word three times in interviews or speeches that he's given over a seven year span.

    I think that there's a lot of mysticism in your own mind about Google & what they're doing & capable of. I really think that you're overlooking a lot of facts, facts that you've tried to present here yourself - there was very little substance to that page or videos on that page, as was the case with the CNET articles and there's absolutely no hard facts about what kind of technology Google does have - both you and sal were wrong, I got it straight from the horse. If it was public information I could get at my local library, why wouldn't they tell me in an email? I think I'll have to go to my public library and get that information tommorrow, then I'll email it to Google & ask them why they would withhold the information that I can so easily produce in my own home town. Seriously, that's what you're telling me - am I wrong?
    • Profile picture of the author Lawrh
      Lawrh
      Profile picture of Lawrh
      Posts: Threads: Thanks: Account age: less than a second
      I seldom encounter someone so desperate to cling to their naivete.

      There is no mysticism, other than what you've conjured up yourself. G is a very large corporation that has as it's mandate the gathering of data. Thinking about a corporations computational capacity in terms of a desktop PC is silly, there are no points of comparison. Visiting any medium sized corporations datacenter will blow your mind. Visiting a big one like IBM is even harder to comprehend.

      The vast majority of web services require very little in terms of CPU usage. An example would be Hostgator. They currently host 2.5 million domains and seldom ever have any problems. G is many, many times larger.

      Another way to think of it is that G isn't really a search engine. They are a data gathering company that maintains a huge database. When a user "searches" they are actually entering a database query. G's index ensures that results come up instantly.

      Probably the only time a real search happens is when using their keyword tool and you select website and enter a URL. G will then send out spiders in real time. Other than that you are just making database queries.

      All your little "server error" messages are blanket statements meant to shut people up. I've used them myself. G's real problems are bandwidth, datacenter sync and hardware/software failure. Just like any other corporation.

      I'm not talking out of my butt. I was a career system administrator and built datacenters. Not G scale, but enough to run a successful company.

      Your efforts at dealing with G's PR will prove futile. They will never give a straight answer, even if it's on the front page of the New York Times. There are many hundreds of posts in this forum alone about the futility of asking G questions. When the search function is restored here, check for yourself.

      I bring up AI because that is G's main focus. The link I gave you was the first I found that day and I just scanned it. BTW, few people working on it refers to total organizations, not G itself. Refine your searches and you will find info everywhere from the Wall Street Journal to the BBC.

      In view of the previously stated fact the G processes 24 petabytes of data daily and responds to over a billion "searches" daily would suggest that my own estimations of their computational capacity are far too low.
      Signature

      “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    Wait a minute here, you're calling me naive but you've not proven anything yourself - again in your last response you've stated that AI is Google's main focus, but cite videos in which Larry Page actually says that "very few people are working on this (AI)".

    Google: Server Efficiency Needs New Recipe

    There's your CNET article. It makes a reference to Google having hundreds of thousands of servers, but in the context of the power consumption & doesn't give any detailed information. There's a link within that article to the CNET article on their data centers as well - a 250 word article that speculates on the location of their data centers, but no facts.

    I'm not trying to hold on to my naivete, I'm just responding to your posts that make claims they don't support. Other people that read these posts and don't know that much about how Google operates are going to read your words -

    "They have more than enough EXCESS processing capability to make them THE world leaders in AI research. Which should scare the sh*t out of everyone...
    After making that statement on a public forum, you then cite articles that don't support the statement. And I think what's worse is that you've chosen to tell the entire Warrior Forum that they should be scared of Google.

    There's a lot of misinformation in IM that opens the doors for a lot of scammers - I think that's why I continue to make my argument. Anyone on this board could capitalize on what you've said by referencing this thread & then writing a poorly constructed eBook, selling it to the people who believe what you've said.

    As for mysticism, that's what I'm left to assume is at play here, because it's not fact. It never has been & you've proven that repeatedly.
  • Profile picture of the author Lawrh
    Lawrh
    Profile picture of Lawrh
    Posts: Threads: Thanks: Account age: less than a second
    This demonstrates a quality often expressed in this forum. Someone does not understand something (corporate level computing) or cannot do something (research) therefore declares that it is impossible or does not exist. Everything Sal, Steve and I have said is true. G even recently signed a contract with NASA to take over some of their data processing. Maybe NASA should have checked with you first.
    Signature

    “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    steve,

    that occurred to me as well. I actually found a good article on wikipedia that showed their setup in 1999, the servers they were running then were something like dual 300mhz intel pentium II chips. Here's an article:

    Coding Horror: Google Hardware Circa 1999

    That just puts another spin on what their actual capabilites are - if they actually ran old hardware like that then the sentence - Google having hundreds of thousands of servers - takes on a whole new meaning.

    Lawrh,

    You're not proving anything. You're just repeating yourself & not supporting your argument. The idea behind my entire post was that nothing can be definitely proven about what Google does or does not do, because we don't know any specifics about their hardware. You still haven't proven what they do or don't have & everything that you have said, you've proven yourself wrong about.
    • Profile picture of the author Lawrh
      Lawrh
      Profile picture of Lawrh
      Posts: Threads: Thanks: Account age: less than a second
      Originally Posted by johnny_h View Post

      Lawrh,

      You're not proving anything. You're just repeating yourself & not supporting your argument. The idea behind my entire post was that nothing can be definitely proven about what Google does or does not do, because we don't know any specifics about their hardware. You still haven't proven what they do or don't have & everything that you have said, you've proven yourself wrong about.
      The bulk of your argument is that G has weaknesses that don't actually exist and that because you can't find the info, their massive computational capability doesn't exist or can only be speculated about.

      The reason I repeat myself is that I was (futilely) trying to express the concept of corporate level supercomputing and the absurdity of concerning yourself with ever changing hardware specs. Especially in a million node system.

      Read the first sentence -

      NASA - NASA Takes Google on Journey into Space

      "will couple some of Earth's most powerful technology resources"

      NASA doesn't speculate.

      There is so much about G on the 'Net that no one should have to provide you with "proof".
      Signature

      “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

      • Profile picture of the author seasoned
        seasoned
        Profile picture of seasoned
        Posts: Threads: Thanks: Account age: less than a second
        Originally Posted by Lawrh View Post

        The bulk of your argument is that G has weaknesses that don't actually exist and that because you can't find the info, their massive computational capability doesn't exist or can only be speculated about.

        The reason I repeat myself is that I was (futilely) trying to express the concept of corporate level supercomputing and the absurdity of concerning yourself with ever changing hardware specs. Especially in a million node system.

        Read the first sentence -

        NASA - NASA Takes Google on Journey into Space

        "will couple some of Earth's most powerful technology resources"

        NASA doesn't speculate.

        There is so much about G on the 'Net that no one should have to provide you with "proof".
        GIVE ME A BREAK! Proof, at the level you specify, doesn't exist ********ANYWHERE*********! Not with M/S, Oracle, Sybase, Terradata, netezza, Informix, Ingres, etc.... If anyone tells you differently, **********RUN************!
        CISCO, JUNIPER, etc.... can't tell you!
        NEC, TOSHIBA, IBM, APPLE, ETC..... can't tell you!

        AND, as things get larger, as busses vary speed, etc.... it gets less and less certain!

        HECK, two co0mputers that are the same EXACT model may vary. Maybe SLIGHTLY, but they almost certainly will.

        A good example is the early examples of Apple II sound. Close listening would notice a slight change in frequency at some points. The frequency of the computer was the SAME! It accessed the SAME memory, even the same CHIPS with a constant bus speed! etc... WHAT GIVES!?!?!? WELL, the 16 bit address bus used EIGHT bit registers! There was a SLIGHT delay every time it accessed another 256 bytes, due to a buffer overflow.

        AH YOU SAY, but that is OLD technology!!!!! Well, today, NO system runs faster than about 50MHz or so. If it DOES, they must be SEQUENTIAL! THAT might, on the MOST advanced systems get you to 400Mhz. If you go faster than that, you have to CACHE it. MAYBE it can go faster, but THAT requires ANOTHER cache. You can't go faster than the BUS speed. If you do, then there is a delay in pushing it to a queue, etc...... As you can see, the hardware is NOT predictable! THAT is why companies run benchmarks on several apps, etc....

        And THAT is just the CPU(In the traditional CPU/MEMORY meanng)! and what of the DISK DRIVE? AGAIN, it can only go so fast to cache which is only so large which takes a while to read from/write to disk. And that is just the DRIVE!

        OK, want to talk about software? You need read ahead buffers, write behind buffers, an indexed method, etc....

        And how do COMPANIES plan things? They try to balance cost, experience, and stats that are SIMILAR to what they expect to do. THEN, if they REALLY want to be precise, they try a "proof of concept".

        So don't think google is going to tell nasa aboout everything.

        Steve
        • Profile picture of the author Lawrh
          Lawrh
          Profile picture of Lawrh
          Posts: Threads: Thanks: Account age: less than a second
          Steve, I think you misunderstand the discussion. The proof is not in the actual hardware specs but in the capability of the overall system. As is the case with all supercomputers. The daily processing of 24 petabytes of data and over a billion daily searches is proof in itself that there is massive capability. So far as NASA is concerned it is likely they had extensive discussions with G before agreeing to the partnership.

          So far as "proof" on the 'Net, it is in the capabilities demonstrated and documented. As I have said, hardware is irrelevant, it is what can be done. In a million plus node system individual servers are throw away items. Yanked and replaced when failed or when upgraded just like any component of a larger system.

          Low level discussion of components is pointless in the context of massively distributed computing.

          Perhaps looking at a non distributed supercomputer could add perspective. The systems at Los Alamos and at the Lawrence Livermore Laboratories consist of thousands of racks of CPU's and their RAM. These are treated not as something special in and of themselves, they are components to be added or replaced as necessary. It's basically the same with distributed systems, they just aren't in the same building, or even the same continent.

          To repeat myself again It's the capability that counts.

          It's all that counts.
          Signature

          “Strategy without action is a day-dream; action without strategy is a nightmare.” – Old Japanese proverb -

          • Profile picture of the author seasoned
            seasoned
            Profile picture of seasoned
            Posts: Threads: Thanks: Account age: less than a second
            Originally Posted by Lawrh View Post

            Steve, I think you misunderstand the discussion. The proof is not in the actual hardware specs but in the capability of the overall system. As is the case with all supercomputers. The daily processing of 24 petabytes of data and over a billion daily searches is proof in itself that there is massive capability. So far as NASA is concerned it is likely they had extensive discussions with G before agreeing to the partnership.

            So far as "proof" on the 'Net, it is in the capabilities demonstrated and documented. As I have said, hardware is irrelevant, it is what can be done. In a million plus node system individual servers are throw away items. Yanked and replaced when failed or when upgraded just like any component of a larger system.

            Low level discussion of components is pointless in the context of massively distributed computing.

            Perhaps looking at a non distributed supercomputer could add perspective. The systems at Los Alamos and at the Lawrence Livermore Laboratories consist of thousands of racks of CPU's and their RAM. These are treated not as something special in and of themselves, they are components to be added or replaced as necessary. It's basically the same with distributed systems, they just aren't in the same building, or even the same continent.

            To repeat myself again It's the capability that counts.

            It's all that counts.
            SPECS?!?!?!? That is CHILDS PLAY! DELL will give you SPECS! Protocol, rated speed, voltage, amperage, Memory Cycle time, FBS, CLOCK speed, cache size, etc....

            ORACLE can tell you version, bit size, source language, block size, maximum record size, pagee size, size of int, date, varchar2, etc....

            SPECS are EASY! They are ALL OVER! HECK, they can even give you what a given dell with a given oracle version on a GIVEN BENCHMARK rated!!!!!

            SPECS ARE *******EASY*******!

            The actual PROCESSING speed?!?!?!? On a NON standard app? NO WAY! On an average system networked in nodes? NOPE! With tables defined YOUR way? NOPE! I had one customer try to make a MINOR change to a table, and a 30 minute process ended up taking many HOURS! And that was a MINOR change. In one case, a hint was needed to get a long running query on oracle to work. It ended up RELIABLY running FAR quicker. All the CERTIFIED DBAs could do was suggest trying.

            Things aren't running so fast simply because there are so many processors. 9 women can NOT have a child in one month. It is running that fast because GOOGLE has tailored things to provide several pieces that are tied together. HECK, the slower processors may be relegated to providing first page hits or tying pieces together. THAT stuff is relatively simple. There are so many ways that Google can use what it has.

            Steve
  • Profile picture of the author johnny_h
    johnny_h
    Profile picture of johnny_h
    Posts: Threads: Thanks: Account age: less than a second
    Originally Posted by Lawrh View Post

    The bulk of your argument is that G has weaknesses that don't actually exist and that because you can't find the info, their massive computational capability doesn't exist or can only be speculated about.
    No, you've misunderstood completely. That's never been my argument. I speculated on whether they used the information they collect outside of the google bot (such as adsense, analytics, etc) to manipulate the rankings.

    I then went on to state that there's no hard evidence one way or another about it. I did speculate about how Google was multifaceted and that it would require a lot of computing power to do what they do on a daily basis. Then I pointed out that we really don't have any definite information about what kind of power Google really has behind the scenes.

    I also admitted in the original post as well as throughout the other posts that Google could do whatever they wanted - they have the money to buy the machines to do whatever they want.

    I've never made any claims that Google definitely could or could not do something or that there was any kind of weakness in their system. What I've said is that no one knows any hard facts about Google's capabilities or what they actually do. In the absence of fact, however, people make claims that Google in fact does do one thing or another.

    That's what I've been talking about the whole time - we don't know anything about Google, but a lot of people make claims & pass them off as fact.

Trending Topics