Data Science

  • 15 Posts
  • 166 Comments
Joined 1Y ago
cake
Cake day: Jun 17, 2023

help-circle
rss

There seems to be mixed reactions to this suggestion. I don’t know enough to understand why.



Nice article.

why bother? Why I self host

Most of this article is not purely about that question, but I dislike clickbait, so I’ll actually answer the question from the title: Two reasons.

First of all, I like to be independent - or at least, as much as I can. Same reason we have backup power, why I know how to bake bread, preserve food, and generally LARP as a grandmother desperate to feed her 12 grandchildren until they are no longer capable of self propelled movement. It makes me reasonably independent of whatever evil scheme your local $MEGA_CORP is up to these days (hint: it’s probably a subscription).

It’s basically the Linux and Firefox argument - competition is good, and freedom is too.

If that’s too abstract for you, and what this article is really about, is the fact that it teaches you a lot and that is a truth I hold to be self-evident: Learning things is good & useful.

Turns out, forcing yourself to either do something you don’t do every day, or to get better at something you do occasionally, or to simply learn something that sounds fun makes you better at it. Wild concept, I know.

Contents

Introduction
My Services
Why I self host
Reasoning about complex systems
Things that broke in the last 6 months
Things I learned (or recalled) in the last 6 months

  • You can self host VS Code
  • UPS batteries die silently and quicker than you think
  • Redundant DNS is good DNS
  • Raspberry PIs run ARN, Proxmox does not
  • zfs + Proxmox eat memmory and will OOM kill your VMS
  • The mystery of random crashes (Is it hardware? It’s always hardware.)
  • SNMP(v3) is still cool
  • Don’t trust your VPS vendor
  • Gotta go fast
  • CIFS is still not fast
  • Blob storage, blob fish, and file systems: It’s all “meh”
  • CrowdSec

Conclusion


What self-hosted services did you set up passkeys on? How did setting it up go?


Is there a passkey setup that’s easy to self host? I think passkeys with a backup would be best.


I’m expecting that everything that the statistical models reveal or make convincing results about which benefit the owners of the models will be exploited. Anything that threatens power or the model owners will be largely ignored and dismissed.


The few laws that govern this type of activity will be strictly adhered to, right?


You should be aware that this is classified and marketed as a microcontroller, so it’s just a bootloader to some code with no OS or a RTOS.

Something like the RPi Zero is a SBC that’s relatively close in size.


The two rooms linked above are mirrored, so you can use either XMPP or Matrix, from any client you prefer, on pretty much any platform under the sun!

There’s no XMPP link in the README above the quoted statement.



Mp3 is a proprietary format on copyright. Some idiot ceo can came and change the rules, let’s add an ads mandatory for each decoder.

This is not true. Copyright is not relevant to an encoding standard. The standard has been unchanged for 26 years and all legal claims of patent rights related to implimentations of the standard have expired before May 2017.

@swooosh@lemmy.world you should probably know about this as well.


I’m very confused about what your requirements are based on reading your post and some of your responses to comments, but I’m going to suggest that you look into Quarto


Oh. I was thinking opensource and the organizations above that pay for Discourse to host for them a are non-profit. I don’t know why I read the post body and forgot about the title.

I guess programming.dev sorta fits except the UI is different. Maybe someone can create a frontend that mimics the Stack Overflow UI.


There are many Discourse forums for various programming related tools, services, and programming languages. I’ve shared 3 examples below.

https://discuss.python.org/

https://discourse.julialang.org/

https://discourse.jupyter.org/


You can use this as an opportunity to have a conversation about what it is about those movies that she likes. This could open up to a larger conversation where you can connect and grow your relationship as mother and child. Or she might just say something vague and simple and you can ignore the movies while they sit in a separate library.


I try to be positive here on programming.dev but someone gave you an incredibly thoughtful reply and you returned the favor with absolute disrespect. I think the only positive outcome here would be for me to simply block you and encourage others to do the same.


No.

The current incarnation of OpenOrb works well enough for two day’s worth of code, but I’ve got some future plans for it already


I’m going to throw this out there not being sure how true it is, but I find it interesting to think about.

XMPP is much more widely used than Matrix if you count WhatsApp (Meta/Facebook). ActivityPub is much more widely used than AT Protocol and nostr combined if you count Threads (Meta/Facebook). So reasons why people aren’t talking about XMPP include not wanting to recognize that Meta is hugely influential in this space and that most people don’t talk about the underlying protocols of the services and tools they’re use at all leaving a self selected group of people looking for alternatives with traction that don’t depend on Meta. Outside of WhatsApp, there’s not a lot of traction with any particular XMPP implementation. And none of the XMPP implementations have a Discord-ish organization of chat rooms that’s popular and familiar right now. Matrix has both right now (although I don’t think it will ever be more than a small niche in the mobile messaging space).

I’m fine with using Matrix for what it is. There are programming language communities that have been very helpful for me and a number of Lemmy related communities that have been nice to be a part of.


The obvious answer is to subtract 1 before applying the scaling factor.


It sounds like you’re describing a normal mentoring relationship.


Get started, make mistakes so you can learn. The best stack is the one you use to build things with. If you want to learn a new tool, use it. Otherwise, use what you’re familiar with.


Sign in with Google/Facebook/etc. bypasses the problem that ActivityPub isn’t all that popular (this may change with Threads but it’s unclear how that will play out). But I also think you’re overstating the hesitation most people have in creating an account for a service. Also, being a hobby project, it doesn’t necessarily need to be or desired to be popular right away. It doesn’t need to have all the features right away. It doesn’t have to be built in one try or architected perfectly.


Building a house (or any construction project) is notoriously impossible to be on schedule and on budget too.



Knowing nothing else about you or the new job offer, it makes sense to take the offer. But I’d rather know more than the information you’ve presented. The good news is that you do know more and can better determine if there are mitigating circumstances that make turning down the offer make more sense.


The claims and conclusions of this article are merely asserted rather than suported with evidence. (This is true of most of the articles I’ve seen claiming the opposite as well.)


Gitea wasn’t bought, the people running the project held the trademarks and decided to move the trademarks to a new for-profit entity they created in order to provide git related services for some fee structure that isn’t clear to me. Largely it’s CI/CD service that they are looking to sell.


Just say and lable it as your public git repo.

ex: “Here’s my public git repo.”


I don’t know if one is better than the other, but knowing that certain libraries are incompatible based on this bifurcation is a good thing to remember.


Whatever you’re not familiar in the digital textbook OCaml Programming: Correct + Efficient + Beautiful

There’s a whole chapter on modules

There’s also a section on Monads

You should also know that there is a schism in OCaml ecosystem created by the libraries developed by Jane Street and the those developed by INRIA.


When you install a fresh OS and it asks you about keyboard layout, how do you get it usable for this sort of keyboard?




Podman supports Docker images and makes things easier for users in doing so.


It cuts both ways. Less commercial interest means only hobby level development (which can be high quality, but is typically slow and unpolished for users).

So you can spend your energy on making up the gap between the ease of use of the commercially supported software and the pure volunteer projects or you can have free time for things you’re more interested in and jump ship when they squeeze too hard for cash.


What makes it make sense in a work environment?




Element is the thing that’s subpar (to be generous) compared to other chat apps. Element X is better for the features that have been implemented, but the current feature set is very incomplete.



cross-posted from: https://programming.dev/post/8149733 > [Andrew Cunningham (arstechnica.com) - Jan 4, 2024 8:01 am UTC Writes](https://arstechnica.com/author/andrew_cunningham/): > > > Microsoft pushed throughout 2023 to add generative AI capabilities to its software, even extending its new Copilot AI assistant to Windows 10 late last year. Now, those efforts to transform PCs at a software level is extending to the hardware: Microsoft is adding a dedicated Copilot key to PC keyboards, adjusting the standard Windows keyboard layout for the first time since the Windows key first appeared on its Natural Keyboard in 1994. > > > > The Copilot key will, predictably, open up the Copilot generative AI assistant within Windows 10 and Windows 11. On an up-to-date Windows PC with Copilot enabled, you can currently do the same thing by pressing Windows + C. For PCs without Copilot enabled, including those that aren't signed into Microsoft accounts, the Copilot key will open Windows Search instead (though this is sort of redundant, since pressing the Windows key and then typing directly into the Start menu also activates the Search function). > > > > A quick [Microsoft demo video](https://youtu.be/S1R08Qx6Fvs) shows the Copilot key in between the cluster of arrow keys and the right Alt button, a place where many keyboards usually put a menu button, a right Ctrl key, another Windows key, or something similar. The exact positioning, and the key being replaced, may vary depending on the size and layout of the keyboard. > > > > We asked Microsoft if a Copilot key would be required on OEM PCs going forward; the company told us that the key isn't mandatory now, but that it expects Copilot keys to be required on Windows 11 keyboards "over time." Microsoft often imposes some additional hardware requirements on major PC makers that sell Windows on their devices, beyond what is strictly necessary to run Windows itself. > > Read [Microsoft is adding a new key to PC keyboards for the first time since 1994](https://arstechnica.com/gadgets/2024/01/ai-comes-for-your-pcs-keyboard-as-microsoft-adds-dedicated-copilot-key/)
fedilink

Do Users Write More Insecure Code with AI Assistants?
cross-posted from: https://programming.dev/post/8121843 > [~n (@nblr@chaos.social) writes](https://chaos.social/@nblr/111698366167829445): > > >This is fine... > >>"We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group." > > > >[Do Users Write More Insecure Code with AI Assistants?](https://arxiv.org/abs/2211.03622? > >
fedilink

Do Users Write More Insecure Code with AI Assistants?
cross-posted from: https://programming.dev/post/8121843 > [~n (@nblr@chaos.social) writes](https://chaos.social/@nblr/111698366167829445): > > >This is fine... > >>"We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group." > > > >[Do Users Write More Insecure Code with AI Assistants?](https://arxiv.org/abs/2211.03622? > >
fedilink

Japan determines copyright doesn’t apply to LLM/ML training data
cross-posted from: https://programming.dev/post/8121669 > [Taggart (@mttaggart) writes](https://infosec.town/notes/9o2c3aijben6rgxe): > > > Japan determines copyright doesn't apply to LLM/ML training data. > > > > On a global scale, Japan’s move adds a twist to the regulation debate. Current discussions have focused on a “rogue nation” scenario where a less developed country might disregard a global framework to gain an advantage. But with Japan, we see a different dynamic. The world’s third-largest economy is saying it won’t hinder AI research and development. Plus, it’s prepared to leverage this new technology to compete directly with the West. > > > > I am going to live in the sea. > > > > [www.biia.com/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/](https://www.biia.com/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/)
fedilink

Emacs From Scratch Part Two: Projects and Keybindings
See also: [Emacs From Scratch, Part 1: Foundations](https://arne.me/articles/emacs-from-scratch-part-one-foundations)
fedilink

AoC Input fetch tool (Rust)
cross-posted from: https://lemmy.world/post/9117180 > If you're writing Advent of Code solutions in Rust, then I've [written a crate](https://github.com/ooterness/AdventOfCode/tree/main/aocfetch) that can fetch the user input data directly from the main website. > > Long story short, you provide it a login token copied from your browser cookies, and it can fetch the input data by year and day. Inputs are cached locally, so it'll only download it once for a given problem. This was heavily inspired by the PyPi [advent-of-code-data](https://pypi.org/project/advent-of-code-data/) package. > > Unlike other AoC-centric Rust crates, that's all it does. The other crates I've seen all want the code structured in a specific way to add timing benchmarks, unit testing, and other features. I wanted something lightweight where you just call a function to get the input; no more and no less. > > To use the crate: > * Follow the [AoCD instructions](https://pypi.org/project/advent-of-code-data/) to set the AOC_SESSION environment variable.\ > This key is used for authentication and should not be shared with anyone. > * Add the `aocfetch` crate to your Cargo.toml `[dependencies]` section:\ > `aocfetch = { git = "https://github.com/ooterness/AdventOfCode.git" }` > * Import the crate and call `aocfetch::get_data(year, day)` to fetch your input data. > > An example: > ``` > use aocfetch; > > fn main() { > let input = aocfetch::get_data(2023, 1).unwrap(); > println!("My input data: {}", input); > println!("Part 1 solution: 42"); // TODO > println!("Part 2 solution: 42"); // TODO > } > ``` > > If this goes well I will submit it to crates.io, but I wanted to open this up for beta-testing first.
fedilink

cross-posted from: https://programming.dev/post/6660679 > It's about asking, "how does this algorithm behave when the number of elements is significantly large compared to when the number of elements is orders of magnitude larger?" > > Big O notation is useless for smaller sets of data. Sometimes it's worse than useless, it's misguiding. This is because Big O is only an estimate of asymptotic behavior. An algorithm that is O(n^2) can be faster than one that's O(n log n) for smaller sets of data (which contradicts the table below) if the O(n log n) algorithm has significant computational overhead and doesn't start behaving as estimated by its Big O classification until after that overhead is consumed. > > #computerscience > > Image Alt Text: > > "A graph of Big O notation time complexity functions with Number of Elements on the x-axis and Operations(Time) on the y-axis. > > Lines on the graph represent Big O functions which are are overplayed onto color coded regions where colors represent quality from Excellent to Horrible > > Functions on the graph: > O(1): constant - Excellent/Best - Green > O(log n): logarithmic - Good/Excellent - Green > O(n): linear time - Fair - Yellow > O(n * log n): log linear - Bad - Orange > O(n^2): quadratic - Horrible - Red > O(n^3): cubic - Horrible (Not shown) > O(2^n): exponential - Horrible - Red > O(n!): factorial - Horrible/Worst - Red" > > [Source](https://alpha.polymaths.social/@ericjmorey/statuses/01HGGPST0FNXW2YZYV3QZQ3Z1N)
fedilink


> Project tutorials are a very popular way to start building your first few projects. But unfortunately most people go about it in the wrong way and don't end up learning very much in the process. > > In this article, I will provide some tips on how to properly learn from tutorials and gain confidence to start building your own projects. I will also provide advice on how to avoid tutorial hell.
fedilink

Many of the Humble Tech Book Bundles seem like they offer little to no value. But this one looks like a good value of you're interested in any one of the books available. 2 days 22 hours remaining at the time this was posted.
fedilink

[Source](https://www.reddit.com/r/Mastodon/comments/zcam81/with_apologies_to_xkcd/) by [joedeandev](https://www.reddit.com/user/joedeandev/)
fedilink

Today, Reddit forcibly removed me (and everyone else) as mods of /r/iOSProgramming, a subreddit of about 130k users. I was keeping the sub private / NSFW | Tanner B 🦕🧁 (@objc@mastodon.social)
Update: !ios_dev@programming.dev has been created, temporarily managed by [@Ategon@programming.dev](https://programming.dev/u/Ategon) until some mods volunteer for it
fedilink

cross-posted from: https://programming.dev/post/1048663 > This month we look at debuggers in Python and how to choose your own debugger instead of relying on the built-in pdb.
fedilink

Word never really got out about Discuss.Online which was set up to handle a huge influx on signups. But the signups haven't materialized. Here's what the admin has to say. cross-posted from: https://discuss.online/post/198448 > # Timeline and reasoning behind recent infra changes > > Recently, you may have noticed some planned outages and site issues. I've decided to scale down the size and resilience of the infrastructure. I want to explain why this is. The tl;dr; is cost. > > ## Reasons > > - I started discuss.online about 4 weeks ago. I had hoped that the reaction to Reddit's API changes would create a huge rush to something new, for the people, by the people; however, people did not respond this way. > - I built my Lemmy instance like any other enterprise software I have worked on. I planned for reliability and performance. This, of course, costs money. I wanted to be known as the poster child for how Lemmy should operate. > - As I built out the services from a single server instance to what it became the cost went up dramatically. I justified this assuming that the rush of traffic would provide enough donors to supplement the cost for better performance and reliability. > - The traffic load on discuss.online is less that extraordinary. I've decided that I've way over engineered the resilience and scale. Some SubReddits that had originally planned to stay closed decided to re-open. I no longer needed to be large. > - The pricing of the server had gotten way out of control. More than the cost of some of the largest instances in Lemmy while running a fraction of the user base. > > ### Previous infrastructure > - Load balancer (2 Nodes @ $24/month total) > - Two front-end servers (2 Nodes @ $84/month total) > - Backend Server (1 Node @ $84/month total) > - Pictures server (1 Node @ $14/month total) > - Database (2 Nodes @ $240/month total) > - Object Storage ($5/month + Usage see: https://docs.digitalocean.com/products/spaces/details/pricing/) > - Extra Volume Storage ($10/month) > - wiki.discuss.online web node ($7/month) > - wiki.discuss.online database node ($15/month) > [Total cost for Lemmy Alone: $483 + Usage] > > > Additionally: > - I run a server for log management that clears all lots after 14 days. This helps with finding issues. This has not changed. ($21/month) > - Mastdon server & DB ($42/$15/+storage ~ $60 total/month) > - Matrix server & DB ($42/$30/+storage ~ $75 total/month) > > *Total Monthly server cost out of pocket: ~$640/month.* > > The wiki, Mastodon, Matrix, & log servers all remained the same. The changes are for Lemmy only and will be the focus going forward. > > ## First attempt > > As you can see it was quite large. I've decided to scale way down. I attempted this on 7/12. However, I had some issues with configuration and database migration. That plan was abandoned. This is what it looked like: > > ### Planned infrastructure > - Single instance server (1 Node @ $63/month total) > - Includes front-end, backend, & pictures server. > - Database server (1 Node @ $60/month total) > - Object Storage ($5/month + Usage) > - Extra Volumes ($20 / month total) > > [Total new cost: ~$150 + Usage] > > ## Second attempt > > I had discovered that the issues from the first attempt were caused by Lemmy's integration with Postgres. So I decided to take a second attempt. This is the current state: > > ### Current infrastructure > - Single instance server (1 Node @ $63/month total) > - Includes front-end, backend, & pictures server. > - Database server (1 Node @ $60/month total) > - Object Storage ($5/month + Usage) > - Extra Volumes ($20 / month total) > - wiki.discuss.online web node ($7/month) > - wiki.discuss.online database node ($15/month) > > *[Total new cost for Lemmy alone: ~$170 + Usage]* > > ***New** total monthly server cost out of pocket: ~$330* > > My current monthly bill is already more than that from previous infrastructure @ $336. > > ## Going forward > Going forward I plan to monitor performance and try to balance the benefits of a snappy instance with the cost it takes to get there. I am fully invested in growing this community. I plan to continue to financially contribute and have zero expectations to have everything covered; however, community interest is very important. I'm not going to overspend for a very small set of users. > > If the growth of the instance continues or rapidly changes I'll start to scale back up. > > I'm learning how to run a Lemmy server. I'll adjust to keep it going. > > ## Here are my current priorities for this instance: > 1. Security > - This has to be number one for every instance. Where you decide to store your data is your choice again. You must be able to trust that your data is safe and bad actors cannot get it. > 2. Resilience & backups > - Like before, it's your data and I'm keeping it useable for you. I plan to keep it that way by providing disaster recovery steps and tools. > 3. Performance > - Performance is important to me mostly because it helps ensure trust. A site that responds well mans the admin cares. > 4. Features > - Lemmy is still very new and needs a lot of help. I plan to contribute to the core of Lemmy along with creating 3rd party tools to help grow the community. I've already began working on https://socialcare.dev. I hope to help supplement some missing core features with this tool and allow others to gain from it in the process. > 5. User engagement > - User engagement would be #1; however, everything before this is what makes user engagement possible. People must be using this site for it to matter and for me to justify cost and time. > > ## Conclusion > > If you notice a huge drop in performance or more issues than normal please let me know ASAP. I'd rather spend a bit more for a better experience. > > Thanks, > Jason > >
fedilink

cross-posted from: https://programming.dev/post/431512 > I'm skeptical on this being a good resource for learning to program. But It's the first time I've seen Odin being used in an introductory resource. Would you recommend this to someone looking to learn programming?
fedilink