SMOLNET PORTAL home about changes
(DIR) ←Back
AI algorithm achieved 98% accuracy in predicting different diseases by
analysing colour of human tongue, replicating 2000-year-old traditional
Chinese medicine. It can diagnose diabetes, stroke, anaemia, asthma,
liver/ gallbladder conditions, COVID-19, and vascular and
gastrointestinal issues.
(URL) https://www.unisa.edu.au/media-centre/Releases/2024/say-aah-and-get-a... (https://www.unisa.edu.au)
########################################################################
|u/AutoModerator - 1 month
|
|Welcome to r/science! This is a heavily moderated subreddit in order to
|keep the discussion on science. However, we recognize that many people
|want to discuss how they feel the research relates to their own personal
|lives, so to give people a space to do that, **personal anecdotes are
|allowed as responses to this comment**. Any anecdotal comments elsewhere
|in the discussion will be removed and our [normal comment rules](
|https://www.reddit.com/r/science/wiki/rules#wiki_comment_rules) apply to
|all other comments. **Do you have an academic degree?** We can verify
|your credentials in order to assign user flair indicating your area of
|expertise. [Click here to
|apply](https://www.reddit.com/r/science/wiki/flair/). --- User: u/mvea
|Permalink: https://www.unisa.edu.au/media-centre/Releases/2024/say-aah-
|and-get-a-diagnosis-on-the-spot-is-this-the-future-of-health/ --- *I
|am a bot, and this action was performed automatically. Please [contact
|the moderators of this subreddit](/message/compose/?to=/r/science) if
|you have any questions or concerns.*
|u/SPFCCMnT - 1 month
|
|AI is struggling in the epidemiology field because many things are good
|at predicting cases while also bad at predicting non-cases. The
|sensitivity / specificity issue needs more focus in medical AI
|u/bigboybanhmi - 1 month
|
|In other words: if there's something pathological then AI is pretty
|good, but if there's not, then AI will often make a false-positive
|diagnosis?
|u/helm - 1 month
|
|Yeah, one major problem can be false positives.
|u/iCameToLearnSomeCode - 1 month
|
|That seems simple to fix though. We just let AI tell us which
|tests to conduct. If it thinks you have lung cancer because your
|tongue looks like other people who have lung cancer we just send
|you down for a scan. If it thinks you have covid-19 then we can
|pull out a nasal swab. It's only a false positive if we accept
|what it had to say as fact instead of a guess to be followed up
|on.
|u/iScreamsalad - 1 month
|
|Except when you end up in the possible future time line of your
|scenario where you are having money spent on negative scan after
|negative scan some folks are going to start saying things
|like…do we need to be doing all these scans? 
|u/iCameToLearnSomeCode - 1 month
|
|If it's 98% effective then it's better than the average person
|ordering tests. 90% of mammograms are deemed completely
|normal (therefore they were unnecessary). Pictures of our
|tongues, retina and skin are non-invasive and completely
|harmless, if that could tell us who needs to be tested for
|certain diseases with a 98% accuracy it would prevent billions
|of dollars in needless spending and save millions of lives.
|If the Ai says your symptoms seem similar to diabetes and
|requests a picture of your tongue to confirm then says you
|need a diabetes screening, we can give you one. If your
|doctor thinks you need a diabetes screening we can still give
|you one without the Ai but the Ai is cheap and easy, asking
|its advice as a second opinion when it's 98% accurate just
|makes sense.
|u/chocolatethunderr - 1 month
|
|Exactly, today we ask people to get tested for a variety of
|diseases based on their age, sex, ethnicity and other
|factors. As the person above me mentioned many of these
|tests and scans return as normal, but we do them because
|we’ve identified that the chances for a positive to occur
|within these groups are higher and thus early testing could
|both save lives and reduce healthcare costs the earlier
|they’re caught. With AI, we can ask people within those
|same parameters, potentially, to take pictures of their
|tongue and send them in for AI processing. If that returns
|with an AI assessment of a potential risk for X disease, we
|can bring in those people for tests and decrease the
|frequency of tests for those who return results of standard
|tongue colors for example. The number of total scans may end
|up being the same, but the accuracy in finding positive
|cases could go up dramatically and that’s the point here.
|u/PetiteGorilla - 1 month
|
|The article didn’t get very deep into the model so it’s hard
|to say exactly but being 98% on guess which disease a
|patient with a know disease has vs diagnosing unknown
|diseases is a big difference. It’s hard to understand model
|value without seeing how often it guesses positive
|correctly, negative correctly, positive incorrectly and
|negative incorrectly. If it vastly over diagnoses the cost
|and harm of the additional tests can outweigh the benefit
|quickly.
|u/butts-kapinsky - 1 month
|
|Right. But. You gotta see how this is still worse than not using
|the AI.
|u/iCameToLearnSomeCode - 1 month
|
|I don't see how using the Ai as a consultant for doctors is
|worse than not using the Ai as a consultant for doctors.
|There's still a human making choices, it's just a second
|opinion. The best chess players study the moves of a chess
|bot, why shouldn't the best doctors see what moves a doctor
|bot would make and see if they agree?
|u/butts-kapinsky - 1 month
|
|Because the doctor bot blows wet ass whereas the chess bot
|may genuinely better than every human alive. When the
|doctor bot is the best doctor on the Earth, you may have a
|point.
|u/AlwaysUpvotesScience - 1 month
|
|AI often hallucinates. If there is nothing wrong with someone
|and you are asking it (in whatever way) to explain what is wrong
|with that person AI can make something up and even justify its
|diagnosis. This could lead to people being treated for issues they
|don't actually suffer from.
|u/fucksilvershadow - 1 month
|
|Well LLMs often hallucinate but a classifier like this paper
|uses isn’t that kind of “AI”. This paper is using methods that
|have been around much longer than the stuff behind ChatGPT. I
|wouldn’t call a false positive in this case a hallucination.
|EDIT: Specifically I wouldn't call it a hallucination because
|the models used are not generative models.
|u/Keesual - 1 month
|
|I hate that basically all machine learning and data science is
|just being blanketly called AI now because its trendy
|u/Towbee - 1 month
|
|Yo guys my microwave turns off after so long... This must be
|ai decision making. We're all doomed
|u/coughcough - 1 month
|
|AI has advanced to Hot Pocket levels of awareness
|u/accordyceps - 1 month
|
|Yes. It is so frustrating that you can’t know what
|technology anyone is actually referring to when they say
|“AI.” Might as well point to Haley Joel Osment and be done
|with it.
|u/justgetoffmylawn - 1 month
|
|We'll just 'AI' it! Yeah, XGBoost does not 'hallucinate'. I
|don't mind if someone doesn't understand ML. But it's
|frustrating when people think they do because they used
|ChatGPT once to write a limerick.
|u/fucksilvershadow - 1 month
|
|Yes, it is eternally frustrating but that is what people
|know it as. I try to use the specific terms when possible
|myself.
|u/coldrolledpotmetal - 1 month
|
|That’s because it all is AI and has been for decades, it’s a
|vague umbrella term but it still applies
|u/jellomonkey - 1 month
|
|It's the reverse. None of it is AI. There is no
|intelligence in any of these systems. There is the
|illusion of intelligence.
|u/pxr555 - 1 month
|
|Artificial intelligence is intelligence like artificial
|limbs are limbs: Not by far the same, but it emulates
|some functions well enough to be useful. A wooden leg
|isn't really a leg, but you can walk with one much
|better than without it. AI isn't really intelligent,
|yes. It's just artificial intelligent.
|u/jellomonkey - 1 month
|
|It's not a good analogy because the functions of a
|limb are finite and discreet. Also artificial limbs
|are light-years closer to being indistinguishable from
|real limbs. Some can even give sensations of touch and
|temperature. Peg legs are called peg legs for a
|reason. LLM, machine learning, neural nets, etc, are
|all important steps toward artificial intelligence but
|they are baby steps. Referring to them as AI is just
|marketing. Let me give you a better analogy. Imagine
|I sat you in front of 3 boxes. Each box is filled with
|pieces of paper and each piece of paper has a word or
|phrase on it. I tell you to pull one paper from each
|box and then read them in order. Now let's say every
|time you do so a complete sentence is formed. Let's go
|further and say that the sentences even form a short
|story. Now here's the question: would you say that
|the three boxes are intelligent? Even artificially? I
|doubt it.
|u/coldrolledpotmetal - 1 month
|
|You can’t just take the term literally and say it
|doesn’t fit, the phrase “artificial intelligence”
|encompasses all these things.
|u/jellomonkey - 1 month
|
|You can't just bastardize language to the point that
|terms mean nothing. Technology is rife with this
|problem. Rat faced marketing weasels take words with
|real meaning or the names of actual standards and then
|apply them as "sexy" buzzwords. Next thing you know
|everyone is using the term wrong until it has no real
|meaning at all.
|u/mtcwby - 1 month
|
|Yep. Generative is interesting for some applications but
|segmentation is where I've seen the most useful applications.
|That's the vein that mining for specific use cases seems to be
|the most bang for the buck.
|u/throwaway3113151 - 1 month
|
|Not all AI is LLM.
|u/echocage - 1 month
|
|No, LLMs hallucinate, not every AI
|u/LuckyHedgehog - 1 month
|
|You're mixing AI here. This type of AI isn't conversational, it
|spits out a confidence score about what it is trained on, that's
|it You don't ask it anything, and it doesn't hallucinate in
|the way chatgpt does because the output is a singular set of
|values
|u/stanglemeir - 1 month
|
|Yes but false positives are better than false negatives. If we can
|train an AI that gives 10% false positives but only 0.1% false
|negatives that’s amazing.
|u/omgu8mynewt - 1 month
|
|Not true, false positives can be very bad. For a rare disease,
|false positives could lead to thousands of wrongly diagnosed
|healthy people receiveing treatment they never needed if the
|false positive rate is bad. Or if a treatment has horrible
|side effects e.g. chemotherapy, too many false positives leads
|to healthy people suffering for no reason. Or if a treatment
|requires a lot of healthcare professional attention, too many
|false positives would waste a lot of doctors time treating
|healthy people. It also leads to healthy people worrying about
|their health and causing unnecesary stress when they think
|they're ill.
|u/True_Window_9389 - 1 month
|
|But if you can have a cheap, easy, and non-invasive diagnostic
|tool for a bunch of problems, and something comes back
|positive, *then* you can do more serious testing to confirm.
|Realistically, I don’t think a tool like this would alone lead
|to someone undergoing chemo or popping pills.
|u/omgu8mynewt - 1 month
|
|It depends on the disease and the context. I work in
|Tuberculosis diagnosistics, it is the worst infectious
|disease in the world and kills more people than HIV and
|Covid. In some countries e.g. South Africa, about 80% of
|people are walking around with it in their lungs - but they
|don't have bad symptoms yet, and it may never get worse. But
|some Doctors just treat almost anyone with a bad cough with
|TB antibiotics, which could be a good idea, because often it
|is. But the TB antibiotics are horrible - they wipe you out
|for 6 months, you can't work, your family suffer if you're
|the breadwinner. If you are from a wealthy country and you
|went to South Africa for a year, mingled with the real
|people not just wealthy and other tourists, you would
|probably catch TB too and get given the antibiotics without
|bothering to diagnose properly. So sometimes really strong
|medicine gets given to people with barely any diagnosis
|information.
|u/brilliantjoe - 1 month
|
|False positives in medical diagnosis leads to further testing, a
|lot of which is invasive and can carry its own risks. In
|addition to that, you have to take into account the stress that
|is put on people when think simply think they might be sick. The
|side effects of a false positive on a health person may end up
|being worse than having a false negative when diagnosing an
|actual problem.
|u/davesoverhere - 1 month
|
|But in this case, wouldn’t a false positive generally mean
|bloodwork, X-ray, or mri?
|u/omgu8mynewt - 1 month
|
|Depends completely on what you've been diagnosed with.
|What if it is leukemia and then you get given a very painful
|and expensive bone marrow biopsies. Or tuberculosis, then
|you get given a horrible six month course of antibiotics
|(there is no gold standard way of diagnosing TB).
|u/hazpat - 1 month
|
|Maybe in other cases but, u/Spfccmnt did not read the article. This
|is extremely accurate.
|u/justgetoffmylawn - 1 month
|
|This is what annoys me. Absolutely, sensitivity and specificity is
|important. But the paper has a whole section explaining precision,
|recall, and F1 - so that people who are used to medical terms can
|understand ML validation, true positive prediction, etc. But they
|commented about AI failing in medicine without reading it - which
|I guess means they're either a Redditor, a doctor, or both.
|u/butts-kapinsky - 1 month
|
|More or less. A great way to diagnose 100% of true cases is to
|always yield a positive result.
|u/hazpat - 1 month
|
|They did not really get false positives with this. This is a pretty
|sound study. Anytime the detected a color other than pink there was a
|specific reason.
|u/ChicksWithBricksCome - 1 month
|
|In this paper specifically the specificity was very good. If you look
|at[ the paper's confusion
|matrices](https://www.mdpi.com/2227-7080/12/7/97), the results are
|astounding. I think this is some really great work.
|u/michaelochurch - 1 month
|
|I expected it to be one of those "98% on training set" papers, but
|they did test out-of-sample and their numbers were still solid. This
|doesn't mean their distribution models reality, because there are
|all sort of possibilities there, but they didn't make the 101
|mistake that's everywhere these days. The class bias of their
|dataset is a problem, though; they only had a few healthy images.
|u/justgetoffmylawn - 1 month
|
|Yeah, I was pleasantly surprised to see the F1, etc. I'd like to
|see it validated by an outside group when the testing is run on
|another hospital system's patients without researchers running the
|tests. That said, cool paper and cool idea.
|u/Coomb - 1 month
|
|I mean, is anyone particularly surprised that they were able to use
|machine learning to identify the color of something? Because
|that's what they're doing.
|u/TheBrain85 - 1 month
|
|I wonder how they determined the classes for the tongues in the
|first place. This does not seem like a case where someone is
|manually labeling 5260 images. I wouldn't be surprised if it was
|some algorithm. In which case it would become "machine learning
|learns to perform some algorithm to determine color classes".
|u/potatoaster - 1 month
|
|Okay but if you read *this* paper, sensitivity and specificity are
|both high.
|u/never3nder_87 - 1 month
|
|Simply diagnose everyone with everything and achieve a 100% successful
|diagnosis rate
|u/Ted_Borg - 1 month
|
|Don't hate the player, hate the game
|u/AdviceMang - 1 month
|
|So AI is great for screening, but not diagnosing.
|u/Volsunga - 1 month
|
|Opposite. Great for diagnosing, bad for screening
|u/ilyich_commies - 1 month
|
|No, they were right. If you have the disease, this AI will detect
|it 98% of the time, but if you don’t, there is a decent chance it
|will still say you do. So this AI would be very useful for
|screening patients to determine which patients need follow-up
|tests
|u/GreatBigBagOfNope - 1 month
|
|In the paper the precision and recall were both given for the
|xgb model as 0.98, for an overall accuracy of 0.9871, which are
|all pretty good signs but still don't tell the full picture.
|This shakes out for positive cases being overwhelmingly
|prevalent in the training data than negative cases, which might
|explain why the MCC was only 0.35 – not actually a very good
|score, despite the other metrics being excellent and honestly
|probably outperforming med students and junior doctors. This is
|borne out in the paper, where the researchers note that their
|training set is 5260 images, of which only 310 were of healthy
|tongues. If the model predicted positive, it was usually right,
|and if the truth was positive, the model usually predicted it
|correctly, but if it didn't do well at identifying negatives and
|there were so few real negatives that they could get swept up
|into that 2% of error in the precision then this would only
|manifest in a performance measure that accounts for that in a
|balanced way. So the authors did the right thing by reporting a
|rich variety of binary classification performance metrics which
|enables this kind of follow-up, and as usual the journalists
|have been misleading by focussing not only on just one metric
|but quite possibly the worst one. I would however suggest to
|the researchers that while they have mentioned all of these,
|there was no discussion of the implications of having <10% of
|cases in their training and test sets being negatives. This
|proportion was only mentioned in passing in §3.1, with no
|justification given for the chosen size: you could argue that in
|a sample of all tongues, almost all would be healthy, so they
|should have made up at least a majority of the training and test
|sets; you could also argue that restricting your sample to only
|tongues that would be seen in a diagnostic context the
|likelihood of having any of these conditions is much higher than
|the wider population. I would have liked to see the motivation
|at least presented. In my opinion, the lack of discussion of the
|imbalanced sample and the poor TN performance is a noteworthy
|oversight. Doesn't take away from them making quite a good model
|and presenting a good amount of quality information though.
|u/justgetoffmylawn - 1 month
|
|All good criticisms. Unbalanced training sets in these kinds
|of diagnostic models are a much harder problem than
|'sensitivity / specificity', which is pretty easy to validate.
|I really wish we had better training data, more reliable EHR
|systems, etc. We could easily have smartphone apps to diagnose
|COVID from sound, but I guess we decided not to do that. (It
|would require not just the initial training set, but updating
|it as I would guess model rot happens quickly with new
|variants, etc.)
|u/Volsunga - 1 month
|
|No, it gives a significant amount of false positives if you
|don't have any of the diseases it's trained on. If you have one
|of them, it's extremely good at determining which one. So it's
|bad at determining if you have a disease, but good at
|determining which one if you do.
|u/TimedogGAF - 1 month
|
|Explain how diagnosing would work here. Whether it's good
|or bad for screening seems dependent on the specifics of its
|false positive rate.
|u/Simba7 - 1 month
|
|I suppose if you get the true positive rate high enough it's still
|very valuable in that you can then confirm or disprove the dAIgnosis
|with other criteria like labs.
|u/soleceismical - 1 month
|
|Accuracy already takes false positives into account. >The accuracy
|of a diagnostic test is defined as how often the test correctly
|classifies someone as having or not having the disease. The formula
|for accuracy is: > *(true positive + true negative) / (true
|positive + true negative + false positive + false negative) > *or
|correct results / all results
|https://radiopaedia.org/articles/diagnostic-test-accuracy#:~:text=Th
|e%20accuracy%20of%20a%20diagnostic%20test%20is,negative).%20or%20cor
|rect%20results%20/%20all%20results. Specificity (true negatives /
|(true negatives + false positives)) was also high, according to the
|paper.
|u/BamBam-BamBam - 1 month
|
|Does it feel like AI is trying to reverse 500 years of science-based
|diagnostics? Wasn't the color of your tongue used to diagnose an
|imbalance in your humors in the dark ages?
|u/pxr555 - 1 month
|
|Humors or not, if the color of your tongue does have a connection to
|some illnesses it's still science. If this proves this it would be
|quite a big thing. I'm not exactly holding my breath though.
|u/pxr555 - 1 month
|
|No, they did say it "replicates something used in traditional
|Chinese medicine". Not the same as "based on". I mean, in the
|past doctors tasted urine of patients to diagnose diabetes. Sugar
|in the urine really *is* a symptom of diabetes.
|u/jointheredditarmy - 1 month
|
|Yup good ol’ type 1 vs type 2 error. If you predict everyone has
|diabetes you’ll get 0% type 2 error
|u/justgetoffmylawn - 1 month
|
|Then AI isn't struggling, sloppy research is struggling. This is a
|problem in medicine, not just in AI. Most physicians don't even know
|the specificity and sensitivity of the tests they're running. For many
|COVID tests, for instance (just using that because most people these
|days have taken some), sensitivity can be garbage (sometimes 60% or
|worse), although specificity is generally good (but not always,
|depending on test and methodology). Meanwhile, a 4th Gen HIV test when
|properly performed will likely be 99% in both IIRC. [This
|paper](https://www.mdpi.com/2227-7080/12/7/97) about tongue prediction
|apparently examines precision, recall, F1, etc - like most validated
|ML models. So it's well over 90% for both cases and non-cases, as
|noted in the paper. Now, that sounds promising - but how it's
|validated by outside testing, accuracy when performed by non-
|researchers, etc - all that is obviously critical for evaluation. I
|just don't like it when "AI" is dismissed without reading the paper
|that literally has a whole section discussing sensitivity versus
|specificity (the usual terms in medicine) and precision and recall
|(the terms used in ML).
|u/SPFCCMnT - 1 month
|
|It isn’t one or the other. If analytic technique is struggling to
|identify non cases, and the analytic technique is AI, then AI would
|be struggling. If the discipline is medicine, then medicine would
|also be struggling. They aren’t mutually exclusive.
|u/justgetoffmylawn - 1 month
|
|Your statement might make it sound like 'AI' itself somehow excels
|in specificity but not in sensitivity. Yet a *fundamental* part of
|ML is a focus on both areas (through confusion matrixes, etc). ML
|classification algorithms perform as they are trained. They can be
|focused to balance specificity and sensitivity, or to prioritize.
|If a false positive is more concerning (eg. invasive follow-ups),
|then minimizing that can be prioritized. If a false negative is
|more concerning (eg. infectious disease), then minimizing that can
|be prioritized. A great aspect of ML is its flexibility. But I
|wholeheartedly agree medicine (whether using reagents or AI or
|clinical experience) needs to focus more on both sensitivity and
|specificity, and improve communicating that clearly.
|u/SPFCCMnT - 1 month
|
|ML is just hella stats. There’s nothing special going on in
|there.
|u/P3kol4 - 1 month
|
|The paper looks like an excerpt from someone's thesis with zero editing
|(e.g. entire sections dedicated to explaining different classification
|algorithms)
|u/Raz4r - 1 month
|
|Looking at the confusion matrix, I can guarantee that something is off,
|or the classification task is so easy that you don't even need machine
|learning. Take a look at the results from KNN—it's almost a perfect
|score.
|u/Coomb - 1 month
|
|The classification task is incredibly easy because all they're doing
|is asking it to figure out what color an image is. The seven classes
|are just seven colors. There is no disease diagnosis going on in here
|other than the fact that the authors took a bunch of images of
|tongues, sorted them by color, and then looked at the diseases
|associated with those tongues to create a list of possible illnesses
|associated with the seven colors used. Literally the only thing the
|model is doing is deciding what color the tongue is.
|u/NeuroGenes - 1 month
|
|The news and the paper are saying completely different things. The
|author of the news should be fired immediately
|u/Hayred - 1 month
|
|So they train it on some dataset they don't ID the source of, evaluated
|it on a subset of the same dataset used to train it, and then
|"validated" it by testing it on novel images that they collected
|themselves It's [just been
|demonstrated](https://www.science.org/doi/10.1126/science.adg8538) quite
|well that models trained and tested on single data sets are not reliably
|able to actually perform predictive diagnoses on other data sets.
|Standard MDPI paper, then!
|u/Vaxtin - 1 month
|
|This is how most models are trained. The training set is always
|considered one data set regardless of the size. The validation tests
|are always performed on a subset of this dataset — in some cases,
|these tests are not part of training, while in others they are. This
|is to ensure that the model is not overtrained to fit the training
|data. If the training data accurately represents real world data,
|then there really is no issue. The problems arise if the training set
|is too small, or too large and too specific and doesn’t cover enough
|situations. If the validation passes, then they move on to “real
|world” tests. This is a dataset that was not trained on at all and the
|model does not know any of the information in it. This is what the
|“novel images” are. Just because they trained on one data set — the
|training set — does not mean it is inherently wrong. The training data
|could cover enough real world scenarios and be large enough to be
|suitable. The number of data sets does not matter, what matters is the
|substance of the data points and the number of points the model was
|trained on.
|u/notabiologist - 1 month
|
|So, I’m no computer science expert - but the way I was taught to
|validate machine learning (neural network, random forest, extreme
|gradient boosting, etc) is that you *need* to exclude test data from
|your dataset. I think there’s some functions which don’t take
|arguments for test data and use cross validation instead (splitting
|the set in multiple subsets and testing the similarity between the
|subsets), but you can always set aside ~10% of the data yourself to
|do external validation after the training. I used different
|algorithms to do gap filling of data and data that’s been included
|in the training *always* shows better results than data I’ve
|excluded before any training. Within what I use it for, I am very
|skeptical of results that don’t have set aside chunks of data for
|external validation.
|u/justgetoffmylawn - 1 month
|
|Yes, I think usually for typical training sets (not talking about
|cross validation) you'd want the dataset split into training data,
|validation data, and then test data. These would all be distinct.
|You might refine hyperparameters with your training data and
|validation data - you hold aside the test data so it doesn't get
|trained into the model. That's why training data and validation
|data aren't usually enough without a test set. Cross validation
|techniques change some of that, but I think the concept is
|important nonetheless.
|u/Vaxtin - 1 month
|
|In my AI classes, we would always be given a training set and a
|test set. We would have to create the validation set from a subset
|of the training data. We would also train models on different
|subsets of the given training data. We’d use 50%, 60%,… 90%, 100%
|of the given training data. This is because models can overfit to
|the training data and some models will perform better in real
|world data if trained on a smaller training set. In practice, my
|most accurate models were often 80% 90% or 100% of the training
|set. The remaining data points left over would be used as the
|subset for the validation set. All validation tests would have the
|same data points, so if the subset chosen is larger than that
|threshold, we would just randomly pick from them (without repeats,
|of course) until we met our threshold. For the 100% training
|model, any subset you choose for the validation set would have
|been used in training. You can’t get around this of course. So we
|would just pick random data points until the threshold was met.
|You might think that the model will have 100% accuracy on this
|validation set (since it has seen all the data points beforehand)
|but this is not the case. It of course can happen, but in my
|experience this wasn’t really practical. It’s not like it’s
|bookmarking every data point it has seen and knows the answer, it
|is slightly modifying weights for every data pint it comes across
|and the entire network should slowly reach the minimum of the cost
|function. Side note, the reason why models don’t achieve 100%
|accuracy is because they can only ever descend to the *local*
|minimum. It is not guaranteed to reach the *global* minimum of the
|cost function. But with so many parameters, it has a much higher
|chance of reaching a local minimum (of which there are many) than
|*the* one singular global minimum. There is a method to achieve
|global minimum, but this is not practical in large scale models. I
|have asked professors on this, as we went over this algorithm
|beforehand. I do not remember the name but it randomly chooses
|points to move to at first, slowly descending to the minimum. As
|it progresses , it still randomly moves, but less often. This has
|a much better chance of reaching the global min, but the
|randomness of it causes the training to take abysmally slow for
|large scale models (of which already take weeks or months to
|train, GPT takes months!) If the model achieves 100% accuracy
|then it reached the global minimum for the training set. Even
|then, you won’t have 100% accuracy for real world scenarios unless
|the data points truly indeed reflect real world data. Any
|reasonable model has to do this otherwise it is utterly useless.
|u/resumethrowaway222 - 1 month
|
|Incorrect. It's been demonstrated that the particular model tested in
|that study did not perform well. Nothing was demonstrated about how
|models in general perform. >We scrutinized this optimism by
|examining how well a machine learning model performed across several
|independent clinical trials of antipsychotic medication for
|schizophrenia.
|u/Hayred - 1 month
|
|See the
|[perspective](https://www.science.org/doi/10.1126/science.adm9218)
|for a more in depth criticism of poorly validated predictive models.
|I don't see how the fundamental issues surrounding validation should
|be unique to schizophrenia, and not applicable to a model used to
|predict "cold syndrome", or "decrease in the body immune forces",
|and indicates a healthy person by the "bink" colour of their tongue.
|u/resumethrowaway222 - 1 month
|
|I'm paywalled. I think I may have misread your meaning, though,
|and we are actually in agreement. When you said models "trained
|and tested on a single data set" I didn't get on first pass that
|you meant the same dataset for both validation and training. I
|thought you were generalizing to models trained on any single
|dataset and tested on another dataset. Clearly training and
|testing on the same dataset is really sloppy, and IMO shouldn't be
|publishable. It makes sense that they wouldn't be reliable
|because how can you be confident they work when they haven't
|really been validated!
|u/solidbebe - 1 month
|
|You train on a subset of the dataset, say 80-85%, then you
|validate on the remaining 15%. This is standard practice in
|Machine Learning. It is not sloppy.
|u/FruitOfTheVineFruit - 1 month
|
|At one point, my son had the symptoms of thrush - a yeast infection
|which leads to a white tongue - but his tongue was purple. We and the
|doctors were very confused. There are no diseases that lead to a purple
|tongue. Eventually, he mentioned that he had recently drunk a
|blackberry milkshake.
|u/intronert - 1 month
|
|Did it predict 25 of the 12 cases of disease X?
|u/windowpanez - 1 month
|
|Accuracy means very little as a metric. You could have 1000 people be
|not sick, and 20 who are, and just say they are all not sick and have
|98% accuracy... They should specify precision and recall.
|u/ilyich_commies - 1 month
|
|The paper gave confusion matrices
|u/itsmebenji69 - 1 month
|
|>RF algorithm had an accuracy of 98.62% with a balanced precision,
|recall, F1-score, and Jaccard index of 0.97, 0.98, 0.98, and 0.9826,
|respectively What do these values mean ? Is higher better or is it
|supposed to be close to 0 ?
|u/7734128 - 1 month
|
|All of these are great, if true. They should all be close to 1. You
|can't "hack" both precision and recall at the sane time, and F1 is a
|single number which reflects that.
|u/potatoaster - 1 month
|
|They do. Read the paper.
|u/dat_mono - 1 month
|
|"replicating traditional Chinese medicine" hahahaha
|u/ilyich_commies - 1 month
|
|Don’t you know about the ancient Chinese practice of using back
|propagation to train convolutional neural networks for multiple
|classification in PyTorch?
|u/Miseryy - 1 month
|
|They didn't even do that. They used like every sklearn model off
|the shelf.
|u/spicycupcakes- - 1 month
|
|2000 year old btw 2000 years A year count of 2000, btw (They always
|make it a point of pride to mention this)
|u/Laura-ly - 1 month
|
|LOLOLOL. I know. If traditinal medicines actually worked China would
|be a country with no arthritis or other diseases. This is a very
|interesting investigation of the history of acupuncture and it's a
|fascinating read...
|https://onlinelibrary.wiley.com/doi/full/10.1211/fact.2004.00244
|Here's part of the article. >Eventually the Chinese and other Eastern
|societies took steps to try to eliminate the practice altogether. In
|an effort to modernise medicine, the Chinese government attempted to
|ban acupuncture for the first of several times in 1822, when the Qing
|government forbade the teaching of acupuncture and moxacautery in the
|taiyiyuan. The Japanese officially prohibited the practice in 1876. By
|the 1911 revolution, acupuncture was no longer a subject for
|examination in the Chinese Imperial Medical Academy. >During the
|Great Leap Forward of the 1950s and the Cultural Revolution of the
|1960s, Chairman Mao Zedong promoted acupuncture and traditional
|medical techniques as pragmatic solutions to providing health care to
|a vast population that was terribly undersupplied with doctors and as
|a superior alternative to decadent ‘imperialist’ practices (even
|though Mao apparently eschewed such therapies for his own personal
|health). Here they lay until rediscovered in the most recent wave of
|interest in Chinese medical practices, dating from US President
|Richard Nixon's 1972 visit to the People's Republic of China, which
|ended nearly a quarter century of China's isolation from the USA.
|u/crotte-molle3 - 1 month
|
|hard to take an article seriously with BS like that
|u/Harkannin - 1 month
|
|Well we did develop chemotherapy based off of arsenic used for
|leukemia. hahaha
|u/BAT123456789 - 1 month
|
|Took me a minute to figure out how this is steaming garbage. You can
|train a system on any data set and get it to think that it can do
|something. However, you then have to test it. They didn't do that. They
|didn't test this to see if it worked. That's why this is in a Technology
|journal and not a medical journal, because it's bad, worthless science.
|They literally are telling you that it did great at learning and then
|never checked to see if it learned!
|u/falsewall - 1 month
|
|The abstract just said the system could id with 98*% accuracy patient
|tongues into color/attributes. correcting for lighting. Eg : Blue,
|dry in a darkish room. Didn't discuss disease identify rates in
|abstract. Id rename it Chinese researchers use ai to identify tongue
|colors and characteristics in varied lights with 98% success.
|u/BAT123456789 - 1 month
|
|And that is all it can do, on the test sample that it was tested on,
|not on any other tongue, whatsoever.
|u/falsewall - 1 month
|
|https://www.mdpi.com/2227-7080/12/7/97 from op at bottom of
|thread. At the very bottom in the conclusion, it seems they
|collected 60 tongue pics with a webcam and tested with 96-97% of
|the color/state of the tongue. Would make a fun college project
|for learning AI.
|u/indy2305 - 1 month
|
|A doctor opens my dead body's tounge after a heart attack and says "
|Definetly Stroke".
|u/Noisyplum - 1 month
|
|Can it diagnose cotton mouth
|u/michez22 - 1 month
|
|What a terrible article with a misleading press release. The machine
|learning uses features which are the color of the tongue to predict the
|color of the tongue...
|u/TO_Commuter - 1 month
|
|Why did something like this go to MDPI? I was expecting a better journal
|u/wareika - 1 month
|
|Probably because the study has major issues and results are
|questionable at best. Red flags are only 60 case samples and reporting
|accuracy when clearly one expects a massive class imbalance.
|u/justinwiel - 1 month
|
|As others have mentioned, it only predicts tongue colors, no where
|does it actually seem to prove the relationship between those and
|the colors it predicts.
|u/climbsrox - 1 month
|
|Because they couldn't get it published anywhere that did actual peer
|review would be my guess
|u/omgu8mynewt - 1 month
|
|It is a bad paper full of holes
|u/Miseryy - 1 month
|
|Why were you expecting a better journal?
|u/spinjinn - 1 month
|
|This might be the technology that brings us all up to that high standard
|of health for which the Chinese are world famous.
|u/Wuhan_bat13 - 1 month
|
|If I predict that everyone is Covid free, my accuracy will be greater
|than 98%
|u/Solokian - 1 month
|
|What is the difference between an AI algorithm and an algorithm?
|u/Celemourn - 1 month
|
|Did they confirm its abilities by using the training data? Cause that’s
|how you get 97% accuracy.
|u/softclone - 1 month
|
|>The proposed imaging system trained 5260 images classified with seven
|classes (red, yellow, green, blue, gray, white, and pink) ... to predict
|tongue color under any lighting conditions. achieves 98.71% accuracy
|Not exactly a breakthrough for ML, but interesting application
|nonetheless!
|u/TheChickening - 1 month
|
|Was that 98% after it was trained and then given a random tongue?
|u/potatoaster - 1 month
|
|"80% of the dataset was employed to train the machine learning
|algorithms, and 20% of the remaining dataset was employed for
|testing."
|u/motu8pre - 1 month
|
|Let me guess, it also makes you more potent in bed, as with any Asian
|"medicine"?
|u/vegemite4ever - 1 month
|
|Well that's super cool. Would've expected this to be in a better
|journal? 
|u/Miseryy - 1 month
|
|Why would you expect that?
|u/mrmrmrj - 1 month
|
|Was the 2% error one of omission or commission? 2% false negative rate
|is catastrophic.
|u/Bootsypants - 1 month
|
|What? No, friend, 2% false negative is better than most tests we're
|using rhese days. Sounds like this study is deeply flawed, but it's
|not the 2% false negative.
|u/potatoaster - 1 month
|
|98% accuracy means errors of both omission and commission summed to
|2%. In this study, the rates of each type of error were roughly
|equal.
|u/SomaSemantics - 1 month
|
|I make my living diagnosing through tongues. I'm a Doctor of Oriental
|Medicine. I've never seen a perfectly healthy tongue. Even young
|children do not have a perfectly healthy tongue. This is part of what
|makes Eastern medicine preventative. It is possible to observe problems
|in the organism before they reach the threshold of disease. Under
|those circumstances, how could there not be false-positives? And when a
|positive is determined false, how can we know that it was truly false?
|Disease diagnosis is only a practical way of categorizing illness. It
|even varies depending on what level of organization is being observed.
|u/mvea - 1 month
|
|I’ve linked to the press release in the post above. In this comment, for
|those interested, here’s the link to the peer reviewed journal article:
|https://www.mdpi.com/2227-7080/12/7/97 From the linked article: A
|computer algorithm has achieved a 98% accuracy in predicting different
|diseases by analysing the colour of the human tongue. The proposed
|imaging system developed by Iraqi and Australian researchers can
|diagnose diabetes, stroke, anaemia, asthma, liver and gallbladder
|conditions, COVID-19, and a range of vascular and gastrointestinal
|issues. Engineering researchers from Middle Technical University (MTU)
|and the University of South Australia (UniSA) achieved the breakthrough
|in a series of experiments where they used 5260 images to train machine
|learning algorithms to detect tongue colour. Two teaching hospitals in
|the Middle East supplied 60 tongue images from patients with various
|health conditions. The artificial intelligence (AI) model was able to
|match the tongue colour with the disease in almost all cases. A new
|paper published in Technologies outlines how the proposed system
|analyses tongue colour to provide on-the-spot diagnosis, confirming that
|AI holds the key to many advances in medicine. Senior author, MTU and
|UniSA Adjunct Associate Professor Ali Al-Naji, says AI is replicating a
|2000-year-old practice widely used in traditional Chinese medicine –
|examining the tongue for signs of disease.
Response: application/gopher-menu
Original URLgopher://gopherddit.com/1/cgi-bin/reddit.cgi?view&1es133k...
Content-Typeapplication/gopher-menu; charset=utf-8